Many websites use a "Load More" or "Show More" button to load content in a continuous manner. This technique is very commonly used by websites for creating a better user experience.
Unlike pagination with a "Next" button, the "Load More" button keeps on adding more content onto one single web page, which makes it trickier to scrape. In this article, I will show you how to deal with the "Load More" button in Octoparse.
1. Use Auto-detect to deal with "Load More" button
If you are building a new task with Webpage Auto-detect, Octoparse automatically scans the web page for any "Load More" buttons. Let's use this webpage (https://www.capterra.com/search/category?search=CRM%20Software) for demonstration.
- Start the Auto-detect process and you will be provided with the option to Click on a "Load More" button on the Tips Panel.
- Click "Check" to see if Octoparse has selected the right button. If you don't think it has selected the right button, you can click "Edit" to select the right button manually and input the desired number of clicks.
- Click "Create workflow" to generate the settings.
- If for any reason, Octoparse fails to detect the "Load More" button during the Auto-detect process, you can still have the workflow created first, then choose the option "Click on a Load More" button. Follow the tips to select the "Load More" button on the web page and input the desired number of clicks.
Notice there are two Loop Items in the workflow generated. Octoparse will keep clicking the "Load More" button for a certain number of times before starting to scrape the list of items.
2. Create a pagination action manually
Even when Auto-detect fails to work or if you are building a scraping task without Auto-detect, you can still deal with the "Load More" button by creating a pagination action manually.
- Select the "Load More" button on the web page and choose "Loop click single button"
- Set up a proper AJAX timeout (what is AJAX?)
If you only wish to click the "Load More" button for X number of times, click the setting icon of the Pagination box and click open "Exit loop", set Repeats to the number X.
- Once you've finished building the pagination loop for the "Load More" button, you can then go on to build a list of page elements to loop through.
If the extraction loop has been built inside of the pagination loop, drag it out manually since we would want to finish the first loop before executing the second.
If you have any questions, you are welcome to submit a request here. Our support team will get back to you within 24 hours.
Artículo en español: Tratar la paginación (con el botón "Cargar más")
También puedes leer artículos de web scraping en sitio web oficial