Scraping data from part of a web page that needs to be scrolled down has been a problem for Octoparse for a long time, especially for web pages with multiple scroll bars.
We have finally found a way to deal with it and add this feature to our new version of Octoparse 8.4!
Here, in this tutorial, we will show you how to scroll down a designated area of a web page on Octoparse 8.4.
What is "Scroll within a designated area"?
In most cases, we need to scroll the whole page with the default scrollbar normally on the right side of the entire web page. For this kind of web page, the default scrolling method, which is scrolling within the default scrollbar, works well.
However, there are pages, like reviews or posts pages, that are designed differently.
Take the Google Maps reviews page as an example: https://www.google.com/maps/place/The+Schoolhousefirstname.lastname@example.org,-1.0878661,9.7z/data=!4m10!1m2!2m1!1srestaurants+in+london!3m6!1s0x4876058fd98fc091:0xbf1c07755166b551!8m2!3d51.4604646!4d-0.1757991!9m1!1b1
The reviews are displayed on the left part of the web page, and this part has a scroll bar. When you drag this bar down, the review part will scroll down and load more reviews. However, you will notice that other parts of the page won't be scrolled.
We will need to set up a partial scroll for the task settings with Octoparse.
Here is another example of a TikTok video's comment page. As you can see, the comment section has a scrollbar separately from the main video page.
How to scroll within a designated area in Octoparse?
There are two ways to set it up:
1. Set up scroll to "Go to web page" action or "Click Item" (scroll to finish loading first and then extract data)
- Click on the step "Go to web page" or "Click Item"
- Click "Options" and tick "Scroll down the page after it is loaded "
- Select "Partial" from the scroll area
2. Set up scroll with Loop Item (scroll and extract at the same time)
- Add a "Loop Item" step to the workflow
- Click on "Loop Item" and choose "Scroll page" under Loop Mode
- Select "Partial" in the scroll area
Enter XPath of the scroll area
After that, you need to tell Octoparse where to scroll. You need to enter the XPath of the scroll area.
You could write an XPath if you know how to do it. Check out details about XPath in this tutorial: What is XPath and how to use it in Octoparse
If you don't know how to write an XPath yourself, please click the icon and select the scroll area manually from the web page. Octoparse will automatically generate an XPath.
Tip: Please note that the auto-generated one and even the one you write on your own won't work well all the time. You may need to try several times to make sure that the selected area is scrollable.
- Choose the scroll way option: "scroll for one screen" or "scroll to the bottom"
- Set up scroll times (how many times you want to scroll) and wait time (interval time between each scroll)
- Click "Apply" to save the settings