Online reviews can help the buyer to choose the right product or the seller to improve his product and service. In this tutorial, we will show you how to scrape the product reviews from Amazon.com.
Before start, you need to prepare a product URL like this one: https://www.amazon.com/PlayStation-Portable-3000-System-Sony-PSP/dp/B001KMRN0M/ref=lp_11076481_1_1?s=videogames&ie=UTF8&qid=1601797632&sr=1-1
For Amazon product URL scraping, you could use the easy-to-use "Task Template" on the home screen to get some product URLs. All you need is to type in several parameters(keywords) and the task is ready to go. For further details, you may check it out here: Task Templates
We will scrape data such as the customers' name, rating, title, time, and review contents from the product details page with Octoparse.
Here are the main steps in this tutorial: [Download task file here]
- Open the target web page
- Click the "See all reviews" button
- Auto-detect the web page to generate the workflow
- Set up AJAX timeout for "Click to Paginate"
- Run extraction - run your task and get data
1) Open the target web page
- Enter the URL on the home page and click Start
2) Click the "See all reviews" button
- Scroll down the page to find the "See all reviews" button
- Click on it and choose "Click URL" on the Action Tips
3) Auto-detect the web page to generate the workflow
- Click "Auto-detect web page data" and wait for the detection to complete
Sometimes you may encounter a robot detect which needs you to enter security code. In this case, you can click on to reload the page first. If it's not working still, you can try to switch to browser mode by clicking on the top right of the build-in browser. And then type in the code to pass the detection. Finally, click to switch back to select mode.
- Go to "Data preview" to see if you're okay with the current data output
- You can delete unnecessary data fields directly by clicking the icon
- You can also modify the data field names here directly by clicking the icon
- Click "Create workflow"
4) Set up AJAX timeout for "Click to Paginate"
- Click open the Action Settings of "Click to Paginate"
- Tick "Load with AJAX" and select 10s as the AJAX timeout
5) Run extraction - run your task and get data
- Click "Run" on the upper left side
- Select "Run task on your device" to run the task on your computer, or select "Run task in the cloud" to run the task in the Cloud (for premium users only)
Here is the sample output.
Is this article helpful? Contact us anytime if you need our help!