Why does the task wait for a long before scraping the second page?
FollowIf it takes a long time before Octoparse is able to move on to the next action in the workflow or if you ever get stuck upon clicking a "Next Page" button, this is likely due to the AJAX technique (short for Asynchronous JavaScript and XML) used for the "Next Page" button. In this tutorial, I will explain how to work around the issue so you can fetch data efficiently and faster.
Why "AJAX Load" slows down the process
Before Octoparse goes on to execute actions such as Click Item and Click to Paginate, it needs to confirm that the page's fully loaded. And to do this, Octoparse takes page-reloading as the signal for when the web page is ready for the next action in the workflow. For a web page that loads with AJAX though, the new content is usually updated without reloading, in this case, Octoparse would not get the signal to proceed. As a result, you may get zero, or much less data extracted than expected.
To work around this issue, we can set up an AJAX Load timeout for the Click Item action. When the timeout is reached, Octopares will proceed to the next action regardless of whether page-reloading is detected.
Where to set up the AJAX Load
- Click on Click Item or Click to Paginate action
- Tick Load with AJAX in Options tab on the bottom of the workflow
- Select the AJAX timeout according to how fast your page loads and click Apply to save
Author: Joy
Editor: Yina