Amazon is one of the most popular e-commerce websites around the world. Many users try to scrape it to collect product information. In this tutorial, we are going to show you how to scrape product details from Amazon.
You can also go to "Task Templates" on the main screen of the Octoparse scraping tool and start with the ready-to-use Amazon Templates directly to save your time. Octoparse provides several Amazon templates designed for different countries such as Germany, France, the US, Spain, and India. With this feature, there is no need to configure scraping tasks. For further details, you may check it out here: Task Templates
If you would like to know how to build the task from scratch, you may continue reading the following tutorial or check this video below.
To follow through, you may want to use this URL in the tutorial:
Here are the main steps in this tutorial: [Download task file here]
1. Go to Web Page - Open the targeted web page
Enter the URL on the home page and click Start
2. Auto-detect the web page - create the workflow
Click Auto-detect web page data and wait for the detection to complete
Delete unwanted fields or rename fields if needed in the Data preview
Uncheck the Add a page scroll
Click Create workflow
A Pagination and Loop Item would be generated automatically in the workflow.
If all the data you need could be scraped from the listing page, you can stop here and jump to Set up AJAX timeout for "Click to Paginate". If you want to go to each product detail page to get more info, follow the steps below.
3. Click into each product link to scrape more information
Choose Click on link(s) to scrape the linked page(s) on the Tips panel
Select Click on an extracted data field and select the field you want to click on from the drop-down menu (you can confirm if it's the correct link on the Data Preview)
Octoparse will automatically go to the first product page.
4. Extract Data - extract data on the detail pages
Select information on the web page
Choose Extract text of the selected element
Repeat the above steps to extract all the data you need
5. Set up AJAX timeout for "Click to Paginate"
Click open the Action Settings of Click to Paginate
Tick Load with AJAX and select 10s as the AJAX timeout
6. Run extraction - run your task and get data
Click Run on the upper left side
Select "Run on your device" to run the task on your computer, or select "Run task in the Cloud" to run the task in the Cloud (for premium users only)
Here is the sample output.