Clicking each link in a list and scraping data from a new page is a common scenario in web scraping. This tutorial will show you how to click through a listing page to a detail page for getting the data you need. This is especially useful when extracting from e-commerce sites (Amazon, eBay, etc) and business directories (Yelp, Yellowpage, etc).

web scraping with octoparse - extract from item page

You may need this link to follow through:

https://www.ebay.com/b/Car-Audio-Amplifiers/18795/bn_887008

  1. Use "Auto-detect" to set up the workflow

  2. Set up the workflow manually


1. Use "Auto-detect" to set up the workflow

autodetect.gif
  • Select Click on link(s) to scrape the linked page(s) in the Tips panel and choose an option from the dropdown menu. Here you can choose Title_URL.

1.png

Octoparse will now take you to the detail page of the first product.

  • Auto-detect the web data again or click on target data fields such as title, condition, price, etc. to scrape them

1111.png

2. Set up the workflow manually

  • Click on the first product title that contains the product page URL. The selected title will be highlighted in green while all the other similar product titles will be highlighted in red.

1.png
  • Click Select all in the Tips panel

11.png

TIP: If there is no Select all option in the Tips panel after you select the first URL, please continue to select the second URL.

  • Select Loop click each element, or Loop click each URL from the Tips panel. Notice a Loop-click step is being auto-generated and added to the workflow.

22.png

TIP: To loop click-through all links on the list, it is important that you select the anchor element. Octoparse automatically identifies tags of selected items. So when you select an item with a URL, the selected tag would be "A", which stands for an anchor that usually links one page to another.

If you find Octoparse does not locate the A tag, you can click the "A" on the Tips panel.

A.png
  • Click on target data fields such as title, review, price, etc. to scrape them

1111.png

TIP: Setting up a wait time in Options for steps like "Click Item" or "Extract Data" can effectively avoid data skipping and make the crawling process more human-like. (Usually, 2-5 seconds would work well). Then click Apply to confirm.

1112.png
Did this answer your question?