The Trigger in Octoparse is used as conditions and constraints for users to do a quick judgment to either abandon or keep certain data lines. It helps users to filter out the data that they want directly and they don't need to scrape the whole dataset and delete unwanted ones later after exporting the data into excel or CSV files.
When to use the Trigger?
If you are scraping products from an e-commerce website and you only want products with a price of less than $100, you can use Trigger to dump "useless" data lines, specifically, any products with price equal/over $100 and only keep the ones you need.
To achieve, you can create a trigger like this: if the data field "price" is equal or greater than "100", do "dump the line of data". This way, Octoparse will "judge" whether the data meets the defined criteria before having it actually extracted. In the end, the dataset will only have the data desired.
Another useful application is when you need to extract data associated with a specific date, say, all news articles published today (ex. 2020-01-01). To achieve this, you can create a trigger: If the data field "date" is not "2020-01-01", do "dump the line of data". As a result, you will only fetch the articles for 2020-01-01.
Multiple conditions can be used together. For example, if you need to extract news articles for 2020-01-01 and only when the article title contains the words "CPI", it can be done using the following two conditions:
Condition 1: If the data field "date" is not "2020-01-01", do "dump the line of data"
Condition 2: If the data field "title" does not contain "CPI", do "dump the line of data"
How to use a Trigger?
1. Create a new Trigger
- Go to the settings of the Extract Data action
- Click "Add a Trigger" to create a new trigger
2. Name your Trigger
- Name the Trigger by entering the name directly in the Trigger Name box
3. Choose the target field and set up the condition
- Select one target field from the dropdown menu
- Set the condition for the selected data field. You can set conditions based on "text", "numerals" or "time"
Three different conditions can cover up most of the demands from texts to numbers even time and dates.
a. For text
There are five options (is, is not, contains, does not contain, is not blank) for texts.
For example, If you select "contains" and type in the word "Apple" in the text box, the whole condition will be: If the data field "Title" contains the words "Apple".
b. For numbers
There are four options available for numbers(greater than, less than, greater than, or equal to).
For example, if you select data field "Price", "greater than", and fill in the value "500", the condition will be: If the data field "Price" is greater than "500".
Please make sure the field only contains the number value. If it has text value, you can use Clean Data feature to refine it.
c. For time and date
There are four options available for time and date (after, before, on or after, on or before).
For example, for the data field "Time", if you select "after", "12 am of the extraction day" and do "dump this line of data", the condition will be: if the time is after 12 am of the extraction day, then dump the line of data. As a result, only those threads that are published before 0:00 AM of the extraction day will get fetched.
You can also use the current time or customize the time or date range.
4. Add more conditions by using [AND] or [OR]
Multiple conditions can be added to the same trigger. Use condition [AND] or condition [OR] to define the relationships between the various conditions.
If you click "Add [AND] condition" and add a condition, the action will be executed if the data field meets both conditions.
If you click "Add [OR] condition" and add a condition, the action will be executed if the data field meets either one of the two conditions.
5. Choose an action from "Do"
Now that you have the conditions defined, Octoparse will execute one of the following steps when the conditions are triggered.
a. Dump this line of data
If "Dump this line of data" is selected, Octoparse will abandon the whole data lines from the extraction step no matter in what steps it has been triggered.
b. End the loop
If "End the loop" is selected, you'll need to choose a Loop Item to end.
c. Stop the entire extraction
If "Stop the entire extraction" is selected, the extraction will be terminated once the corresponding condition is satisfied.