How to handle pagination with page numbers?
FollowThe updated tutorial for the latest version 8.1 is available here. Go to have a check now!
"Next" button for pagination is not always available. In some cases, you may have to click through the page numbers to view all the available content.
(Check out the tutorial of extracting multiple pages by clicking “Next” )
So in this case, to extract multiple pages of data, you will need to modify the XPath of "Click to pagination" step to make sure the next page number in the line is always located correctly.
(e.g. if you are on page 1 then you would want to click page 2, if you are on page 2, then you would want to click page 3, so on and so forth. )
(Check out the complete tutorial of extracting multiple pages by page number links )
There are two general steps:
Step 1 - Click on page 1,then select "Loop click single element".
Step 2 - Modify the auto-generated XPath to one that locates the next page in the line.
XPath syntax "following-sibling" is used most often for this case to select the siblings after the current node (Learn more about locating elements with XPath ).
Here’s an example:
(example URL: http://www.enzolifesciences.com/product-listing/?product_type=Antibodies&application=&text= )
Using Firefox XPath tool (Only available for Firefox 54 or earlier versions), you can easily find the XPath of the first-page element (check Locate elements with XPath for detailed instructions).
So for this example, the XPath for the first-page element "1" is
//DIV[@id='content_contentempty']/DIV[1]/DIV[1]/DIV[3]/DIV[1]/DIV[4]/DIV[1]/DIV[2]/TABLE[1]/TBODY[1]/TR[1]/TD[3]/B[1]
Then looking at the HTML codes you can easily spot where "page 2" is at, usually somewhere close to the first-page element.
Using XPath Syntax "following-sibling" which tracks for the next following node down the line, you can now modify the auto-generated XPath for the page-1 element to one that tracks the page following it (page-2 in this case). So here the correct XPath that will always locate the next page following the current page is:
//DIV[@id='content_contentempty']/DIV[1]/DIV[1]/DIV[3]/DIV[1]/DIV[4]/DIV[1]/DIV[2]/TABLE[1]/TBODY[1]/TR[1]/TD[3]/B[1]/following-sibling::a[1]
By adding "/following-sibling::a[1]" to the end of the auto-generated XPath, it now looks for the first href element (a[1]) following the first-page element.
When you have the new XPath ready and tested, replace the auto-generated XPath for the pagination loop with the new XPath. Click "OK" to save the setting.
Now, check how the new XPath is working by clicking through the pagination steps in the workflow. If you have the next page loaded correctly every time you click "click to pagination", then you are sure that the new XPath is the right one.
[Video version of this tutorial is available here]
Artículo en español: ¿Cómo manejar la paginación con números de página?
También puede leer artículos de web scraping en el sitio web oficial
From: https://www.octoparse.com/tutorial-7/pagination-with-page-numbers