In this tutorial, we will show you how to scrape the product reviews from Amazon.com.
To follow through, you may want to use this URL in the tutorial:
We will scrape data such as the customers' name, rating, title, time and review contents from product details page with Octoparse.
Here are the main steps in this tutorial: [Download task file here]
1)"Go To Web Page" - to open the targeted web page
- Click "+ Task" to start a new task with Advanced Mode
Advanced Mode is a highly flexible and powerful web scraping mode. For people who want to scrape from websites with complex structures, like Amazon.com, we strongly recommend Advanced Mode to start your data extraction project.
- Paste the URL into the "Input URL" box
- Click "Save URL" to move on
2) Create a pagination loop - to scrape all the reviews from multiple pages
If you are using the latest version of Octoparse, the "Workflow Mode" would be automatically on. If not, you could turn on the "Workflow Mode" by switching the "Workflow" button in the top-right corner in Octoparse
- Scroll down the page and click “see all reviews”
To scrape all the reviews, we need to add this step to lay out all the review information on this page.
- Scroll down the page and click the next page button ">"
- Click "Loop click the selected link" on the "Action Tips"
3）Create a "Loop Item"- to scrape all the reviews on one page
The Reviews are organized on the page as a list. We need to build a "Loop Item" to loop extracting each review one by one.
- Select the first review item in the built-in browser
We need to make sure the whole block of the first review is selected, that said, the whole review block is highlighted in green, with all the sub-elements, like title, customer name, date, content… in red, just as the following image shows:
- Click "Select all sub-elements" on the “Action Tips"
Now Octoparse will automatically recognize all the similar sections on this page and highlight them in red.
- Click "Select all"
- Click "Extract data in the loop"
By default, all the data are automatically extracted from the items selected. We can delete the unwanted ones in the "Customize Action" area.
To learn the detailed information about capturing a list of items, here is the tutorial you might need:
4) Extract data - to select the data and remove the unwanted information
- Delete the unwanted or useless data fields
Press "Shift" or "Ctrl" to batch deleting the unwanted data field
- Select the information that Octoparse fails to generate. Click on the information, and select "extract data" in the "Action Tips".
- Rename the fields by selecting from the pre-defined list or inputting on your own
- Click "OK" to save the result.
5) Run extraction - to run your task and get data
- Click "Start Extraction"
- Select "Local Extraction" to run the task on your computer
Below is the output sample.
Was this article helpful? Contact us any time if you need our help!