In this tutorial, we are going to show you how to scrape product data from Walmart.com.
Also, you can go to "Task Templates" on the main screen of the Octoparse scraping tool, and start with the ready-to-use Walmart Template directly to save your time. With this feature, there is no need to configure scraping tasks. For further details, you may check it out here: Task Templates
If you would like to know how to build the task from scratch, you may continue reading the following tutorial.
Suppose we want to scrape some specific information about headphones, and we can start with the search results page (https://www.walmart.com/search/?query=headphones) to create our crawler. We will scrape data such as the product title, price, product ID, and reviews from the product details page with Octoparse.
Here are the main steps in this tutorial: [Download demo task file here]
- Open the target web page
- Auto-detect the web page to generate the workflow
- Click into each product link to scrape more information
- Extract data from the detail page
- Extend the AJAX timeout for "Click to Paginate"
- Run extraction - run your task and get data
1) Open the target web page
- Enter the URL on the home page and click Start
2) Auto-detect the web page to generate the workflow
- Click "Auto-detect web page data" and wait for the detection to complete
- Go to "Data preview" to see if you're okay with the current data output
- You can delete unnecessary data fields directly by clicking the icon
- You can also modify the data field names here directly by clicking the icon
- Click "Create workflow"
3) Click into each product link to scrape more information
- Choose to “Click on link(s) to scrape the linked page(s)”
- Select "Click on an extracted data field" and select the one you want to click on from the drop-down menu, you can confirm if it's the correct link in the data preview section
- Click "Confirm"
4) Extract data from the detail page
- Select information on the web page
- Choose "Extract text of the selected element"
- Repeat the above steps to extract all the data you need
- Click to modify the field names if needed
5) Extend the AJAX timeout for "Click to Paginate"
- Click open the Action Settings of "Click to Paginate"
- Set up the AJAX timeout as 10s
6) Run extraction - run your task and get data
- Click "Run" on the upper left side
- Select "Run task on your device" to run the task on your computer, or select "Run task in the cloud" to run the task in the Cloud (for premium users only)
Here is the sample output.
Is this article helpful? Contact us anytime if you need our help!