Why does Octoparse scrape less data when there should be more?(Version 8)
FollowAfter you set up a task and take a test run on your local device, you may sometimes encounter such a problem:
The number of data output doesn't match with the number of results on the target website.
If you have encountered the same problem, please check the possible causes and solutions below to see if any of them is helpful to your case.
1. Pagination Issue
If the target website has multiple pages, you should first check if the pagination step works well to go to the next page continuously.
How to check it?
- Go to the Workflow side.
- Click the outside Pagination block, and then click the step "Click to Paginate".
- Repeat actions above to see if the page goes to the next page correctly all the time.
If the pagination is okay, you can skip this part and check the next possible causes.
If you find by checking pagination, the pagination skips some pages or jumps to the last page directly, then you need to correct the XPath of the Pagination step.
How to revise XPath for pagination?
Check the following tutorials or refer to our Support Team to help.
- Customize element XPath
- What is XPath and how to use it in Octoparse
- Why does Octopasre skip pages during the scrape?
- Dealing with pagination (No "Next" button)
Note: If the web page applies infinitive scroll down to load the content and you find data missing, you can check this FAQ for details:
- How to deal with missing items when creating a list?
- Infinitive Scroll has setup but no new elements added to the list?
2. Page Load
When you test run the task on your local device, you should keep an eye on the upper part of the progress window, which shows how the target web page is being navigated to the next page or to open a new page.
If you find that before the web page is completely loaded, the browser has skipped to the other page, then you can try the following methods to help the page load:
a) A longer wait time for some steps(e.g. "Extract Data")
- Check details here: Wait before action
b) Increase timeout for some steps(e.g. "Go to Web Page", "Click")
- Timeout for "Go to Web Page"
- AJAX timeout for "Click Item"
c) Page Scroll (e.g. "Go to Web Page", "Click")
- Check details here: Page scroll-down
3. Loop Mode
Normally after checking the pagination, you should then check the Loop Item which loops through each item from the page. As for Loop Item, please pay attention to the Loop Mode especially if it is "Fixed List".
A fixed list is using elements' fixed positions to locate them. But if the page's structure changes a bit, for example, some pages have more or fewer items, or the location is different, then you may receive this kind of error message:
"Cannot find any element matching this XPath expression"
To solve it, you may need to switch to "Variable List" first and then write a new XPath.
You can check this example for details: Infinitive Scroll has setup but no new elements added to the list?
If you still cannot solve the problem after trying the methods above, please send the task to us with details so that we can help. Feel free to contact us via email(support@octoparse.com) or submit a ticket here.
Author: Vanny
Editor: Yina