In this video, I am going to show you how to scrape product details from Aliexpress.com step by step. Before we get started, open https://www.aliexpress.com/ using your own browser, type in the keyword that you need. In this case, we are looking for the hotel information in “laptop”. So type that in. After the page finishes loading, copy this URL. This is the URL we will use in this demonstration.
note: To learn more about AJAX click https://youtu.be/MuOC1yCKai0
To learn more about XPath click https://youtu.be/kZwD6szlvas
Step One: Enter the URLs of the websites you would like to scrape
- Build a new task by clicking “Advance Model”, and enter the URL
- And click “Save URL” on the left corner. This will bring you to the hotel listing page with Octoparse’s built-in browser.
Step Two: Create a pagination loop
- Scroll down to the bottom of the page and find the pagination bar. Then click the “Next Page” button. The command panel called “Action tip” will show up once you interact with the websites with the action of clicking. It will show you what you can to do with the selected element
- In this case, choose "Loop click the selected link"
- As I’ve said in other tutorials, we need to go back to the setting areas and do a few adjustments. You can ignore other options, but you need to pay attention to the “Ajax Load” and “Auto Retry”. Sometimes Octoparse will select these two options by default considering many users don’t know how to apply the options when it comes to a website using Ajax. Octoparse does all the job you, but you still need to if it does the right job and checks the right box.
- Click the “Ok” to save the steps.
Step three: Create a “Loop item”
- To create a loop item, select the element. In this case the product from the listing. Click the first product. The selected item is highlighted in green color. Octoparse should be able to find other similar items and put them in red color. In this case, Octoparse can’t find other similar items from the same list. This is due to incorrect Xpath.
- Before we fix the problem, let’s create the loop item first. Select the second product name and then choose “loop click selected link from the “Action Tip”
- Since we have created an aloop list, we should be able to edit the Xpath of the listings from its setting panel. I have prepared the Xpath for the purpose of the demonstration. Copy and paste the Xpath expression on the variable List.
- For people who are confused about why I choose Variable List but not Fixed List, or any others, this is the tutorial you may interest. I also attached the blog link for your reference as well.
- Now, all 45 listings results are highlighted in green, which means they have been successfully selected. We still need to go back to the setting area and adjust the setting.
- Uncheck “Auto Retry” Still
- This time we don’t need to check the “Ajax Load” since the detail page doesn’t apply Ajax Technique.
Step Four: Data extraction.
- To extract the data, Click the element, for example, in this case, the product name and choose “extract the text of the selected element” from Action Tip.
- You can preview the extraction from the data fields and edit accordingly.
- We finish setting one extraction field if you want to extract more element just repeat the steps
- For example Click “Price”, choose “extract the text of the selected element” from Action tip.
- Click “Save” to save the steps.
Step Five: Run the task and get data
- After finishing setting up the rules, we can run the task by clicking “start extraction”
- Then the Select “Local extraction” to run the task. You can switch the view to check. The scraping status on the websites and the data have been extracted in the table.