Web scraping online shops like eBay or Amazon has become a critically important data source, which allows you to do the comparison between the hot-sale products from prices, features, and product descriptions, conveniently.
This tutorial will show you how easy it is to retrieve the product data from eBay by Octoparse 7.X, an incredibly user-friendly web-scraping tool to facilitate your data mining on websites.
We will scrape data such as the name, condition, price, and more info from the product details page with Octoparse.
To follow through, you may want to use this URL in the tutorial:
1) "Go To Web Page" - to open the target webpage
· Select "Advanced Mode", create a task. Advanced mode supports extracting data on various websites and configuring much flexible.
· Enter URL and click "Save URL".
· Turn on "Workflow" mode to check and edit your workflow conveniently.
2) Create a pagination loop - to scrape all data from multiple pages
· Click the next page button, select "Loop click the selected link" in the "Action Tips" panel.
3) Create a "Loop Item" - to scrape all the items on each page
· Click the title of the first-listed product, Octoparse 7.X will automatically identify the similar URL on the page.
· Click "Select all" in the "Action Tips"
· Click "Loop click each URL"
4) Extract data - to select the data extracted from the web page
· Click the data you required, and select "Extract text of the selected element" in the "Action Tips"
· Edit the filed name
1. If the item you selected to extract data doesn't have enough information, you could select the other item in" Loop Item" to fulfill the data field. In this case, products on eBay present their price in a different way, some showed by "Current Bid", while others showed by "Price", so we select the third option in the "Loop Item" to fulfill the data extracted field.
2. As the price on online shops may change from time to time, so you may want to add the timing of the data extraction. Now just clicking "Add predefine fields" at the bottom of the data field, you will see the option of "Add Current Time".
5) Customize data field - to clean the data by deleting extra strings (Optional)
Now you may notice that the title of every product begins with "Details about ", which you may want to delete to make the data tidy. The operation is simple as following:
· Select the data
· Click "Customize data field"
· Choose "Refine extracted data"
· Click "Add step", and choose "Replace"
· Copy "Details about " in the "Replace" field, and empty in "With" field, then click "evaluate"
· Click "OK"
6) Save and start extraction - to run your task and get data
· Click "Save"
· Click "Start Extraction"