Retry action is a feature provided in Octoparse for dealing with page loading errors. There are a number of conditions you can choose from to have Octoparse reload the current web page. For the sake of web scraping, it is essential to make sure that the web page is loaded correctly so Octoparse can go on to extract the information you need.


Why set up Retry?

Octoparse runs into trouble fetching the target web data or even proceeding to the next action down the road when the web page is not loaded properly. For this reason, it is useful to set up "Retry" conditions for when the web page should be reloaded prior to extracting the data.


How to set up Retry?

The Retry option is only available for two page-loading-related actions in the workflow: Go to Webpage and Click Item/Click to Paginate.

  • Click on the action to access the settings. Then, you can further click open Retry to reveal the options.

2.png
  • Tick the box for Retry the action when

  • Click Add conditions to set up conditions for when the page should be reloaded. Basically, you are telling Octoparse when to reload the page if one or more conditions are met.

3.png

Now, set up your retry conditions using the options provided.

4.png
5.png

Usually, when a page fails to load properly, you'll get error messages like "errors", "500 Internal Server Error" or "Too many requests". Let's say that we want to have the page reloaded when we get "500 Internal Server Error" on the page. In this case, the condition should be: if the current page Text contains "500 Internal Server Error" then reload the page. As a result, Octoparse would retry loading the page when the string is found on the current page.

6.png

You can also input the XPath of a certain element that would only be there when the page is loaded correctly. In this case, you need to select Does not contain. As a result, if the designated element is not found on the page, Octoparse would reload the page.

7.png

Keep clicking on Add conditions to add as many conditions as needed based on your project requirements. Or you can click the delete button to delete the conditions you do not need.

  • Set up Retry for and Wait time

After setting up the retry conditions, you can then decide if you want to retry loading the web page once, twice, or more. Having a max number of times for the retry is critical, so Octoparse does not reload the web page endlessly. When Octoparse reaches the maximum number of retries, it would stop and proceed to the next step.

9.png
Did this answer your question?