In this tutorial, we are going to show you how to scrape images URLs from eBay.
To follow through, you might want to use the URL in this tutorial:
Here are the main steps in this tutorial:[Download task file here]
1)"Go To Web Page" - to open the targeted web page
- Click "+Task" to start a new task with Advanced Mode
- Paste the following URL into the"Extracting URL" box and click "Save URL" to move on
2) Create a pagination loop - to scrape data from multiple listing pages
- Scroll down and click ">" button
- Click "Loop click selected link" on "Action Tips"
3) Create a "Loop Item" - to loop click into each product page on every listing page
We are now on the second page. When creating a "Loop Item", we should always start with the first item on the first page. Thus, we'd better go back to the first page.
- Click "Go To Web Page" in the workflow.
- Select the pagination loop in the workflow
By doing this, we can help Octoparse decide the execution order and generate the Loop Item at the appropriate position in the workflow.
- Click the title of the first item
Octoparse will automatically identify other product links on the current page. The selected links will be highlighted in green while others will be highlighted in red.
- Click "Select All" on "Action Tips"
- Click "Loop click each element" to create a "Loop Item"
Octoparse will click through each link captured in the "Loop Item", and open the product detail page.
4) Extract data - to extract image URLs
- Click on one of the images
When you select an IMG element, the selected tag would be "IMG".
- Click "Extract the URL of the selected image"
The XPath that Octoparse generates automatically is not accurate, so we need to revise it.
- Click on the first field
- Click "Customize data field
- Revise Matching XPath to "//ul[@class="lst icon"]//li//div/img"
Then we can repeat the above steps to get other images URLs.
If we need all of the images extracted in a cell, we can use RegExp Tool to pick up all the Image URLs from its HTML. Please check out the details from down below:
Pick up image URL with RegExp Tool.
5)Customize the data field – to get a normal size image (Optional)
The image URL we just extracted is the URL of a thumbnail image. Thus, if we want to get a normal size image, we need to reformat its URL with Octoparse Regular Expression Tool.
- Click"customize data field"
- Press Shift or Ctrl to batch editing the fields
- Select"Refine extracted data", click "Add step" and then select"Replace"
- Replace 64 with 1600
- Click "OK" to save the result
If you want to learn Octoparse RegExp Tool in detail，please refer to the following tutorials：
6) Save and start extraction - to get all the URLs of the desired images
- Click "Save"
- Click "Start Extraction"
- Select "Local Extraction" to start execution.
With the above steps, we can only extract the Image URLs.
If we need to download all the images from the extracted URLs, we could refer to How to download images from a list of URLs?
Here is the sample output: