Trigger in Octoparse is used as conditions and constraints for users to make a quick judgment to either abandon or keep certain data lines. It helps users to filter out the data that they want directly, so they don't need to scrape the whole dataset and delete unwanted ones later after exporting the data into excel or CSV files.
When to use the Trigger?
Use Case 1
If you are scraping products from an e-commerce website and you only want products with a price of less than $100, you can use Trigger to dump "useless" data lines, specifically, any products with a price equal/to over $100 and only keep the ones you need.
To achieve this, you can create a trigger like this: if the data field "price" is equal to or greater than "100", do "dump the line of data". This way, Octoparse will "judge" whether the data meets the defined criteria before having it extracted. In the end, the dataset will only have the data desired.
Use Case 2
Another useful application is when you need to extract data associated with a specific date, say, all news articles published today (e.g., 2020-01-01). To achieve this, you can create a trigger: If the data field "date" is not "2020-01-01", do "dump the line of data". As a result, you will only fetch articles for 2020-01-01.
Multiple conditions can be used together. For example, if you need to extract news articles for 2020-01-01 and only when the article title contains the words "CPI", it can be done using the following two conditions:
Condition 1: If the data field "date" is not "2020-01-01", do "dump the line of data"
Condition 2: If the data field "title" does not contain "CPI", do "dump the line of data"
How to use a Trigger?
STEP 1. Create a new Trigger
- Go to Extract Data action
- Click "Add a Trigger" in the Options tab to create a new trigger
STEP 2. Name your Trigger
- Name the Trigger by entering a name directly in the Trigger Name box
STEP 3. Choose the target field and set up the condition
- Select one target field from the dropdown menu
- Set the conditions for the selected data field. You can set conditions based on "text", "numerals" or "time"
Three different conditions can cover up most of the demands from texts to numbers, even time and dates.
a. For text
There are five options (is, is not, contains, does not contain, is not blank) for texts.
For example, If you select "contains" and type in the word "SKIRT" in the text box, the whole condition will be: If the data field "Title" contains the words "SKIRT".
b. For numbers
There are four options available for numbers (greater than, less than, greater than, or equal to).
For example, if you select the data field "Price", "greater than", and fill in the value "50", the condition will be: If the data field "Price" is greater than "50".
TIP: Please make sure the field only contains the number value. If it has a text value, you can use the Clean Data feature to refine it. For example, if the price is "$100", you should remove the currency symbol "$" before setting Trigger.
c. For time and date
There are four options available for time and date (after, before, on or after, on or before).
For example, for the data field "Time", if you select "after", "12 am of the extraction day" and do "dump this line of data", the condition will be: if the time is after 12 am of the extraction day, then dump the line of data. As a result, only those threads that are published before 0:00 AM of the extraction day will get fetched.
You can also use customize the time or date range.
STEP 4. Add more conditions by using [AND] or [OR]
Multiple conditions can be added to the same trigger. Use condition [AND] or condition [OR] to define the relationships between the various conditions.
If you click "Add [AND] condition" and add a condition, the action will be executed if the data field meets both conditions.
If you click "Add [OR] condition" and add a condition, the action will be executed if the data field meets either one of the two conditions.
STEP 5. Choose an action from "Do" and click "Confirm" to save
Octoparse will execute one of the following steps when the conditions are triggered.
a. Dump this line of data
If "Dump this line of data" is selected, Octoparse will abandon the whole data line from the extraction step no matter what steps it has been triggered.
b. End the loop
If "End the loop" is selected, you'll need to choose a Loop Item to end.
c. Stop the entire extraction
If "Stop the entire extraction" is selected, the extraction will be terminated once the corresponding condition is satisfied.
TIP: You can edit, copy, delete, or disable the existing trigger after saving the changes.