Why does Octopasre skip pages during the scrape?(Version 8)
FollowMany users have encountered such cases that Octoparse skips some pages when scraping a website. For example, after it successfully scrapes the first two pages, it directly jumps to page 5, then maybe page 10, but not go to the pages in a sequence.
That is caused by the auto-generated XPath of the pagination loop not always locating the next page button on every page.
Have a look at the following example: Example URL
On the first page, you can see the pagination loop XPath locates the next button perfectly.
However, on the second page, the XPath locates page 10.
So after finishing scraping the second page, Octoparse would directly go to page 10, missing a lot of data on the pages in between.
How to solve such a skipping page issue?
It is easy to solve such an issue: just modify the XPath to make sure it will always locate the "Next" button.
- Inspect the next button in a regular browser to check the source code
There is a title attribute in the A tag. We can use this attribute to write the XPath: //a[@title='Next'] (Check out how to write an XPath here )
- Enter the XPath into Octoparse to check if it can always locate the next button
Tips! Please refer to the following for more details about how to use XPATH in Octoparse. |
Author: Lesley
Editor: Yina