You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!
Clutch is a leading ratings and reviews platform for B2B service providers, featuring companies in over 100 countries and 500 industries. Clutch categorizes companies by their geographic location, field of expertise, and the focus on proven skills. Based on the data gathered, Clutch formulates a fair rating for all the firms.
This tutorial will show you how to scrape a company listing page for company details from clutch.co with Octoparse.
The sample URL we will use in this tutorial is:
The main steps are shown in the menu on the right, and you can download the sample task file here.
1. Create a Go to Web Page - to open the target web page
Enter the page URL on the home screen and click Start to create a new task
2. Set up Pagination Loop - to scrape data from multiple listing pages
To instruct Octoparse to extract data from every page, you'll need to set up pagination first by scrolling to the bottom of the page
3. Create Loop Item - to go through all the companies
Click on any of the company names, and all similar titles are highlighted in red.
Click Select all similar elements on the Tips panel.
Select Text on the Tips panel
You'll see a Loop Item being generated in your workflow for all 50 companies on one page.
Note: If you have more than 50 items in the loop, you probably have the sponsored results or ads on the page included too.
In this case, you can modify the loop item XPath to this to avoid including the sponsored result: //li[@data-type="Directory"]
4. Extract more data - to extract other information about the companies
To extract information other than the company name:
Click on your desired data (Location in this case)
Select Text on the Tips panel
You'll find a data field that has been added to the Data Preview section:
5. Set Wait before Action - to make sure data is fully loaded
Wait before action is a function that can be set to every action in the workflow. It will let the task wait before the action is executed.
In this case, it is better to add a Wait before Action for Loop Item in the workflow.
6. Run the task - to get your desired data
Click Save on the upper right side to save your task
Click Run next to it and wait for the Run Task window to pop up
Select Run on your device to run the task on your local device
Wait for the task to complete
Here is a sample output from a local run: