Web scraping online shops like eBay or Amazon has become a critically important data source, which allows you to do the comparison between the hot-sale products from prices, features, and product descriptions, conveniently. E-commerce web scraping is of great importance as it can help you do the comparison among the hot-sale products from different online shops like eBay and Amazon based on their prices, features, and product descriptions.
In this tutorial, you will learn how to scrape product data from eBay.
You can go to "Task Template" on the main screen of the Octoparse scraping tool and start with the ready-to-use eBay Templates directly to save your time. With this feature, there is no need to configure scraping tasks. For further details, you may check it out here: Task Templates
If you would like to know how to build the task from scratch, you may continue reading the following tutorial.
We will scrape data such as the name, condition, price, and more info from the product details page with Octoparse.
To follow through, you may want to use this URL in the tutorial:
- "Go To Web Page" - open the target web page
- Auto-detect web page data - create the workflow
- Select the link to scrape data on the detail page
- Extract data on the product detail page
- Modify the XPath of the data fields
- Start extraction - run the task and get the data
1. "Go To Web Page" - open the target web page
- Enter the example URL and click "Start"
2. Auto-detect web page data - create the workflow
- Click "Auto-detect web page data" and wait for the detection to complete
- Delete unwanted fields or modify field names on the Data Preview
- Choose "Create workflow" on the Tips panel
Now, you will get a workflow as below.
If all the data you need could be scraped from the listing page, you can stop here and jump to Start extraction - run the task and get the data. If you want to go to each product detail page to get more info, follow the steps below.
3. Select the link to scrape data on the detail page
- Choose "Click on link(s) to scrape the linked page(s)"
- Choose "Title_URL" from the drop-down option
- Choose "Confirm"
Octoparse would automatically go to the first product detail page.
4. Extract data on the product detail page
- Choose "Auto-detect web page data"
- If Octoparse detects data you need, choose "Create workflow"
- If not, then choose "Cancel"
- Click the element(s) you want and choose "Extract the text of the selected element"
- Click to modify the field name if needed
You can modify the step of "Extract Data" by double-clicking it in the workflow.
Check the following tutorials for details:
5. Modify the XPath of the data fields
You may need to modify the XPath of some data fields that do not show on every product page, or the field position varies from page to page, like MPN or UPC. We can modify the XPath to make the data scraping more precisely. No worries! We have prepared some frequently-used XPaths for you. You can just use the element XPath provided below.
- Double-click the "Ectract Data1"
- Click the "Modify XPath" option of one field
- Replace the XPath with the revised one (You can choose based on your scraping needs. XPath is to match elements that can be found on the web page.)
- MPN: //td[contains(text(),'MPN')]/following-sibling::td
- EAN: //td[contains(text(),'EAN')]/following-sibling::td
- UPC: //td[contains(text(),'UPC')]/following-sibling::td
- Item Weight: //td[contains(text(),'Item Weight')]/following-sibling::td
- Click "OK" to save
You can check the XPath tutorials below to write XPaths for other fields if needed:
5. Start extraction - run the task and get the data
- Click "Save"
- Click "Run" on the upper left side
- Select "Run task on your device" to run the task on your computer, or select "Run task in the Cloud" to run it in our Cloud servers(for premium users only)
Here is the sample output.
Tutorial en español: Scrapear información de producto de eBay
También puedes leer más artículos de web scraping en el sitio web oficial
Was this article helpful? Contact us any time if you need our help!