You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
SHEIN is an online fast fashion retailer now has a great impact on the fast fashion industry and is a super hit on Tik-Tok. It serves fast fashion mainly on women's wear at a much lower price.
To follow through, you may want to use the URL in the tutorial:
We will scrape data such as the Product Name, Price, Image URL, SKU, Number of reviews, and Scores.
Here are the main steps in this tutorial: [Download task file here]
1. "Go to Web Page" - to open the targeted web page
Enter the URL on the home page and click Start
2. Auto-detect web page - to create a workflow
Choose Auto-detect web page data
Wait for the detection to complete
Check the data fields on the Data Preview, and you can also delete the unwanted fields or rename fields if needed
Untick Add a page scroll
Click Create workflow button on the Tips panel
3. Click on links to scrape the linked page - to extract detailed product information
Choose Click on links to scrape the linked page on the Tips panel
Select the "Title_URL" button on the web page from the drop-down menu (you can confirm if it's the correct link on the Data Preview)
4. Extract data - to select the data for extraction
Click on the data you want to extract on the page
Select Extract the text of the selected element on the Tips panel
Repeat the steps until you get all the data needed to be scraped
Edit the name of data fields if needed
Scraping the product rating is a little bit tricky in this case since there is no text information we can scrape directly. We need to get the data from the source code.
Select the stars
Choose Extract outer HTML of the selected element
Click on the More button and choose Clean Data
Click Add Step and choose Match with Regular Expression
Select Not sure about RegEx? Try the RegEx tool!
Tick Start with and End with
Input "Average Rating " (with a space in the end) in the Start with box
Input a space to the End with the box
Click on Generate and Apply
Confirm and Apply to save
5. Run task - get the data you want
Click Save, and click Run on the upper right side
Select Run on your device to run the task on your computer
Here is the sample output -