In this tutorial, I am gonna show you how to scrape product information from eBay step by step.
Before we get started, open https://www.ebay.com/ using your own browser, and type the product category in the search bar. After the websites finish loading, copy that URL. This URL is what we going to need in this demonstration.
note: To learn more about AJAX click https://youtu.be/MuOC1yCKai0
To learn more about XPath click https://youtu.be/kZwD6szlvas
Step One: Enter the URLs of the websites you would like to scrape
- Build a new task by clicking the “Advance Model”, and enter the URL of the websites.
- And click “Save URL” on the left corner. This will bring you to the product category on eBay with Octoparse’s built-in browser
Step Two: Create a pagination loop
- To create a pagination check “Auto Retry” since we don’t need this function in this tutorial.
- Scroll down to the bottom of the page and find the pagination bar. Then click the “page next” button. The command panel called “Action tip” will show up once you interact with the websites with the action of clicking. It will show you what you can to do with the selected element click.
- Then select “Loop click selected link.”
- Now go to the setting area, we need to do a few adjustments. uncheck both “auto-retry” and “Ajax Load”
- Click “Save” to save the process.
Step three: Create a “Loop item”
- Select the product name from the list. As you can see, the selected item is highlighted in green color. Other similar items were highlighted in red. You need to pay attention here, as you may notice, not all laptop names are found and highlighted in red. This is due to incorrect Xpath. In this case, we need to come back and fix the Xpath once we created a loop item. Ok, So Click “Select All” command from “Action tips”
- Follow the tips and select “Loop click each element”. This will tell Octoparse to go through each detail page which is inside the loop item list.
- We still need to go back to the setting area and adjust the setting. Uncheck both “Auto Try” and “Ajax Load”
- Now we have a loop item with all part of detail pages. We need to fix the Xpath. This will tell Octoparse to locate the element and avoid the situation of incomplete extraction. To do this, we need to come back to the Loop Item on the “workflow” and go to the setting area. I have already prepared the Xpath for the purpose of the demonstration. Copy this Xpath, and past the expression at Variable List. I have also attached the tutorial of how to write an Xpath down below. Since not all webpages are well written with the exact same structure. The robot will skip the element if scraper can’t locate it. You can ignore this step if the webpage is well organized.
- Click “Save” to save the step.
Step Four, Data Extraction.
- To extract the data, Click the element, for example, In this case, we select the name of the product,
- Click "Extract text of the selected element", click “condition” and then click “Extract text of the selected element from the Action Tip. Click the seller, then click the selected element from the Action tip.
- You can preview the extraction from the data fields and edit. In this case, we edit the name,
- Click "Save" to save the step
Step Five, Run the task and get data
- After finishing setting up the rules, we can run the task by clicking “start extraction”
- Then the Select “Local extraction” to run the task. You can switch the view to check. The scraping status on the websites and the data have been extracted in the table.