You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!
GoodFirms is a research and review platform that helps software buyers and service seekers opt for the best software or firm. At the same time, it helps IT companies and software vendors to boost user acquisition stats, market share, and brand awareness.
In four steps, this tutorial will show you how to scrape company info, such as company name, location, website, etc., from Goodfirms.
To follow through, you may want to use the URL below:
Here are the main steps of this tutorial: [Download task file here]
1. Create a Go to Webpage - to open the target website
Enter the page URL on the home screen and click Start to create a new task
2. Auto-detect the webpage - to create a workflow
Choose Auto-detect webpage data and wait for the detection to complete
Check the data fields in Data preview and delete unwanted fields or rename them if needed (double click to rename)
Uncheck Add a page scroll
Click Create workflow
3. Modify the setting of Pagination - to locate the pagination button accurately
Click on the Pagination box
Replace the auto-generated Matching XPath with: //li[@class='next']/a[@title='Next page']
Click Apply to save the change
NOTE: To learn more about XPath in Octoparse, please check: What is XPath and how to use it in Octoparse?
Click on Click to Paginate box in the workflow
Select the Option panel
Tick Load with AJAX > set the AJAX timeout (7-10s recommended)
Note: Why do you need to set up AJAX timeout? Check out here: Handling AJAX
4. Run the task - to get your desired data
Click Save on the upper right to save your task
Click Run next to it and wait for a Run Task window to pop up
Select Run on your device to run the task on your local device
Wait for the task to complete
Here is sample output from a local run: