You are browsing a tutorial guide for Octoparse's latest version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Many users have encountered such cases where Octoparse skips some pages when scraping a website and in a result, they get less data. For example, after it successfully scrapes the first two pages, it directly jumps to page 5, then maybe page 10, but does not go to the pages in a sequence.

This is caused by the auto-generated XPath of the pagination loop not always locating the next page button on every page.

Have a look at the following example: https://www.kijiji.ca/b-apartments-condos/canada/house/c37l0a29276001?ad=offering&unit-type=house

On the first page, you can see the pagination loop XPath locates the next button perfectly.

66996.png

However, on the second page, the XPath locates page 10.

vvrr.png

So after finishing scraping the second page, Octoparse will go directly to page 10, missing a lot of data on the pages in between.

How to solve such a skipping page issue?

It is easy to solve such an issue: just modify the XPath to make sure it will always locate the "Next" button.

  • Inspect the next button in a regular browser to check the source code

1.png

There is a title attribute in the A tag. We can use this attribute to write the XPath: //a[@title='Next'] (Check out how to write an XPath here)

  • Enter the XPath into Octoparse to check if it can always locate the next button

77777777777777.gif

Tip: After making a pagination loop in a task, you'd better manually click the Pagination and Click to paginate action to go to several pages like this tutorial shows to check if the auto-generated XPath could locate the next button precisely.

Did this answer your question?