SHEIN is an online fast fashion retailer now has a great impact on fast fashion industry, and is a super hit on Tik-Tok. It serves fast fashion mainly on women's wear at a much lower price.
To follow through, you may want to use the URL in the tutorial:
We will scrape data such as the Product Name, Price, Image URL, Sku ID, Number of Review, Scores.
Here are the main steps in this tutorial.
- "Go to Web Page" - to open the targeted web page
- Auto-detect web page - to create a workflow
- Click into each link - to get more detailed information
- Extract data - to select the data for extraction
- Run task - run the task and get the data
1. "Go to Web Page" - to open the targeted web page
- Enter the URL on the home page and click Start
2. Auto-detect web page - to create a workflow
- Choose Auto-detect web page data
- Wait for the detection to complete
- Check the data fields on the Data Preview, and you can also delete the unwanted fields or rename fields if needed
- Untick Add a page scroll
- Click Create workflow button on the Tips panel
3. Click on links to scrape the linked page - to extract detailed product information
- Choose Click on links to scrape the linked page on the Tips panel
- Select the "Title_URL" button on the web page from the drop-down menu (you can confirm if it's the correct link on the Data Preview)
- Click Confirm
4. Extract data - to select the data for extraction
- Click on the data you want to extract on the page
- Select Extract the text of the selected element on the Tips panel
- Repeat the steps until you get all the data needed to be scraped
- Edit the name of data fields if needed
Scraping the product rating is a little bit tricky in this case since there is no text information we can scrape directly. We need to get the data from the source code.
- Select the stars
- Choose Extract outer HTML of the selected element
- Click on More button and choose Clean Data
- Click Add Step and choose Match with Regular Expression
- Select Not sure about RegEx? Try the RegEx tool!
- Tick Start with and End with
- Input "Average Rating " (with a space in the end) in the Start with box
- Input a space to the End with box
- Click on Generate and Apply
- Confirm and Apply to save
5. Run task - get the data you want
- Click Save, and click Run on the upper right side
- Select Run on your device to run the task on your computer
Here is the sample output.