Scraping data from part of a web page that needs to be scrolled down has been a problem for Octoparse for a long time, especially for web pages with multiple scroll bars.

What is "Scroll within a designated area"?

In most cases, we need to scroll the whole page with the default scrollbar normally on the right side of the entire web page. For this kind of web page, the default scrolling method, which is scrolling within the default scrollbar, works well.

However, there are pages, like reviews or posts pages, that are designed differently.

Take the Google Maps reviews page as an example: https://www.google.com/maps/place/The+Schoolhouse/@51.4374112,-1.0878661,9.7z/data=!4m10!1m2!2m1!1srestaurants+in+london!3m6!1s0x4876058fd98fc091:0xbf1c07755166b551!8m2!3d51.4604646!4d-0.1757991!9m1!1b1

The reviews are displayed on the left part of the web page, and this part has a scroll bar. When you drag this bar down, the review part will scroll down and load more reviews. However, you will notice that other parts of the page won't be scrolled.

We will need to set up a partial scroll for the task settings with Octoparse.

scroll.png

Here is another example of a TikTok video's comment page. As you can see, the comment section has a scrollbar separately from the main video page.

tiktok.png

How to scroll within a designated area in Octoparse?

There are two ways to set it up:

1. Set up scroll to "Go to web page" action or "Click Item" (scroll to finish loading first and then extract data)

  • Click on Go to web page or Click Item

  • Click Options and tick "Scroll down the page after it is loaded "

  • Select Partial from the scroll area

scroll_setting.png

2. Set up scroll with Loop Item (scroll and extract at the same time)

  • Add a Loop Item step to the workflow

  • Click on Loop Item and choose Scroll page under Loop Mode

add_loop_item.gif
  • Select Partial in the scroll area


Enter XPath of the scroll area

After that, you need to tell Octoparse where to scroll. You need to enter the XPath of the scroll area.

XPath.png

You could write an XPath if you know how to do it. Check out details about XPath in this tutorial: What is XPath and how to use it in Octoparse

If you don't know how to write an XPath yourself, please click the icon next to the Matching XPath input box and select the scroll area manually from the web page. Octoparse will automatically generate an XPath.

Tip: Please note that the auto-generated one and even the one you write on your own won't work well all the time. You may need to try several times to make sure that the selected area is scrollable.

scroll_XPath.gif
  • Choose the scroll way option: scroll for one screen or scroll to the bottom

  • Set up scroll times (how many times you want to scroll) and wait time (interval time between each scroll)

  • Click Apply to save the settings

scroll_settings.png
Did this answer your question?