If you need to scrape e-commerce data, especially the product data, you may have encounter a situation like this:
For some products with different options, you want to collect each variant's price, SKU, etc. Taking this hair dye product as an example, you may need to scrape its pricing for each color.
In this tutorial, we will show you how to scrape information of different product variants. To show you how to do it with Octoparse, we can take this web page URL as an example:
For this product, its color, pricing, images, page URL, and product ID will vary when you switch the option.
1. Enter product URL(s) to start a new task
2. Create a loop item to loop through each color option
- Click the 1st color option on the list, and then choose Select all on the Tips panel
- Choose Loop click each element
- AJAX is detected for this web page, and you can modify the time based on your local Internet to load the page content (Learn more about Handling AJAX)
- Click Click Item inside the Loop Item to uncheck Open in a new tab.
- Click Apply to save
- (Optional) Click Loop Item to change the "Loop Mode" from Fixed List to Variable List. Then, enter the Element XPath: //DIV[@class="variants__list"]/LABEL/DIV. This is important when you have different products with different numbers of colors to scrape.
Learn more about XPath here: What is XPath and how to use it in Octoparse.
3. Extract the data you need on the page
You can click on the elements on the page to extract the data you need and rename the data fields if needed.
Here is the data output sample.
If you're still having trouble with this issue, submit a ticket to our support team! We're here to help.