You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
Yelp is one of the largest business directory websites on the Internet. In this tutorial, we are going to show you how to scrape customer reviews from Yelp.
For Yelp scraping, you can use our ready-to-use Task Template available on the home page or follow this tutorial to build the task from scratch.
To demonstrate, we will use this URL as an example: https://www.yelp.com/biz/pike-place-chowder-seattle
Here are the main steps in this tutorial: [Download demo task file here]
1. Go to Web Page - to open the target web page
Paste the URL on the home screen and click Start
2. Create Pagination - to scrape from multiple pages
Scroll down to find the paging button for the review section (>), click on it
Select Loop click next page on the Tips
Adjust Set AJAX timeout to 10s
3. Extract review information
Click on Pagination in the workflow
Click on 2 random review blocks - Select all sub-elements - Extract data
You will see a Loop Item created inside the Pagination.
4. Check Data Preview and workflow
Go to Data Preview, double click the field header to rename it
Click ... to delete it
Below is what the final workflow looks like. Once everything is in place, you can continue to run the task
5. Run Task - to get the data
Run the task on the top right corner: Run task on your device to run the task on your local device, or select Run task in the cloud to run the task on the Cloud (for premium users only)
Here is the sample output -