Retry action is a feature provided in Octoparse for dealing with page loading errors. There are a number of conditions you can choose from to have Octoparse reload the current web page. For the sake of web scraping, it is essential to make sure that the web page is loaded correctly so Octoparse can go on to extract the information you need.
1. Why set up Retry?
Octoparse runs into trouble fetching the target web data or even proceeding to the next action down the road when the web page is not loaded properly. For this reason, it is useful to set up "Retry" conditions for when the web page should be reloaded prior to extracting the data.
2. How to set up Retry?
Retry option is only available for two page-loading related actions in the workflow: 1) Go to Web Page and 2) Click Item/Click to Paginate.
1) Double click on the action to access the settings. Then, you can further click open Retry to reveal the options.
2) Now, tick the box for "Retry the action when", then click to set up conditions for when the page should be reloaded. Basically, you are telling Octoparse when to reload the page if one or more conditions are met.
Now, set up your retry conditions using the options provided.
Usually, when a page fails to load properly, you'll get error messages like "errors", "500 Internal Server Error" or "Too many requests". Let's say that we want to have the page reloaded when we get "500 Internal Server Error" on the page. In this case, the condition should be: if the current page Text contains "500 Internal Server Error" then reload the page. As a result, Octoparse would retry loading the page when the string is found on the current page.
You can also input the XPath of a certain element that would only be there when the page is loaded correctly. In this case, you need to select Does not contain. As a result, if the designated element is not found on the page, Octoparse would reload the page.
Keep clicking on to add as many conditions as needed based on your project requirements.
3) Set up Retry for and Wait time
After setting up the retry conditions, you can then decide if you want to retry loading the web page for once, twice or more. Have a max. number of times for the retry is critical so Octoparse does not reload the web page endlessly. When Octoparse reaches the maximum number of retries, it would stop and proceed to the next step.
If you have questions, you are welcome to submit a request here. Our support team will get back to you within 24 hours.
Artículo en español: Acción de reintentar
También puedes leer artículos de web scraping en sitio web oficial