You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!

If you need to scrape e-commerce data, especially product data, you may encounter a situation like this:

For some products with different options, you want to collect each variant's price, SKU, etc. Taking this hair dye product as an example, you may need to scrape its pricing for each color.

In this tutorial, we will show you how to scrape information about different product variants. To show you how to do it with Octoparse, we can take this web page URL as an example:

https://www.walmart.com/ip/SoftSheen-Carson-Dark-and-Lovely-Fade-Resist-Rich-Conditioning-Color/10314047

For this product, its color, pricing, images, page URL, and product ID will vary when you switch the option.

pv2.png

Here are the main steps in this tutorial:

  1. Create a Go to Webpage - to open the target website

  2. Create a Loop Item - to loop through each color option

  3. Extract Data - to extract all product-related data

  4. Run the task - to get your desired data


1. Create a Go to Webpage - to open the target website

2021-09-22_17-57-13.png

2. Create a Loop Item - to loop through each color option

  • Click the 1st color option on the list, and then choose Select all on the Tips panel

1.png
  • Choose Loop click each element

2.png
  • AJAX is detected for this web page, and you can modify the time based on your local Internet to load the page content (Learn more about Handling AJAX)

3.png
  • Click Click Item inside the Loop Item to uncheck Open in a new tab.

  • Click Apply to save

4.png
  • (Optional) Click Loop Item to change the "Loop Mode" from Fixed List to Variable List. Then, enter the Element XPath: //DIV[@class="variants__list"]/LABEL/DIV[2]. This is important when you have different products with different numbers of colors to scrape.

Tip: The XPath above only works for the example web page we use in this tutorial. For your own target websites, you will need to write the XPath on your own. Check out this tutorial to learn how to write it: What is XPath and how to use it in Octoparse

5.png

3. Extract Data - to extract all product-related data

You can click on the elements on the page to extract the data you need and rename the data fields if needed.

_7.gif

4. Run the task - to get your desired data

  • Click Save on the upper right to save your task

  • Click Run next to it and wait for a Run Task window to pop up

  • Select Run on your device to run the task on your local device

  • Wait for the task to complete

Here is the data output sample.

17.png
Did this answer your question?