Not all pages are created equal. When web pages are showing variations, you can use "Branch conditions" to achieve condition-based scraping. Here is how it works:
When should you consider using "Branch Conditions"?
- Hover your mouse over the where you want to add the branch condition
- Click on to add a "Branch Conditions" action inside of the loop
- Click the of branch on the left-hand side, select "Execute if the current page contains specific element"
- Fill in the XPath for element : "//div[@class='pricing-price__savings']" into the text box below (how to get the XPath ).
If writing the XPath is too difficult, you can click on the and select the element from the web page. Octoparse would automatically generate an XPath.
- Click "OK"
- Click the branch on the right-hand side, select "Always execute the branch"
- Click "OK"
In Octoparse, you can set the condition to one of the following:
1. Always execute the branch
When this option is selected, Octoparse will not judge at all and will proceed to execute the actions within the branch immediately. Only select this option for the branch on the right side.
2. Execute if the page contains specific text
When selected, Octoparse will look for the designated text string within the current page.
3. Execute if the current page contains specific element
When selected, Octoparse will look for the designated element (according to the XPath filled in) within the current page.
4. Execute if the current loop contains specific text
When selected, Octoparse will look for the designated text string within the current loop item.
5. Execute if the current loop contains specific element
When selected, Octoparse will look for the designated element (according to the Relative XPath filled in) within the current loop item. Use this option only when you need to judge between items of a loop.
), click on any desired data fields to capture (learn how ). Rename the fields if needed.
4) Drag the "Extract Data" action into the branch to the left
So now, we have configured Octoparse to look for the element on the page. If the element is found, capture the desired data, otherwise, skip the product.
1. If a condition is set as "whether an element is found", the designated element must be uniquely found on the page or the judgment may fail to work.
2. Octoparse goes through the branches from left to right by default. It is important to always keep the condition you want to test for within the left branch; if the condition for the left branch is "Always execute the branch", Octoparse will not proceed to the branch on the right as "Always execute the branch" will always be tested "True".
3. You can leave the branch blank if no data extraction action is needed when the condition is not met.
4. When a data extraction action is being added to both branches, both the number of the data fields and the name of the data fields are required to be kept the same.
5. You can use nested branch judgment to further refine the test.
If you need any help with task configuration or data collection, submit a ticket to our support team! We'll get back to you soon.
Artículo en español: Branch Condition
También puedes leer artículos de web scraping en sitio web oficial