Click each link in a list and scrape data from new pages
Follow1. Use "Auto-detect" to set up the workflow
2. Set up the workflow manually
1. Use "Auto-detect" to set up the workflow
1) Once you've created a new task using the example URL, select "Auto-detect web page data". Octoparse will now detect any data on the page and we can click "Create workflow" to generate the workflow.
Tips! You can switch the detected results to locate the elements you want if you find Octoparse does not select the correct data. And if all the detected results do not work for you, please refer to 2. Set up the workflow manually |
3) Select "Click on link(s) to scrape the linked page(s)", choose "Click on an extracted data fields" and select a data field(here we select the Title_URL) from the dropdown menu.
Or you can choose "Click a web link on the web page " and select a link on the page manually.
Note that you can only select a link from the detected sections.
4) Auto-detect the web data again or click on target data fields such as title, review, price, etc. to scrape them
2. Set up the workflow manually
1) Click on the first product title that contains the product page URL.
The selected title will be highlighted in green while all the other similar product titles will be highlighted in red.
2) Click "Select all" on Tips panel
Tips! If there is no "Select all" option on the Tips panel after you select the first URL, please continue to select the second URL. |
3) Select "Loop click each URL" from the Tips panel. Notice a Loop-click step is being auto-generated and added to the workflow.
Tips! To loop click-through all links on the list, it is important that you select the anchor element. Octoparse automatically identifies tags of selected items. So when you select an item with URL, the selected tag would be "A", which stands for an anchor that usually links one page to another. If you find Octoparse does not locate the A tag, you can click the "A" on the Tips panel. |
4) Click on target data fields such as title, review, price, etc. to scrape them
Tips! Set up a wait time in "Advanced Options" for steps like "Click Item" or "Extract Data" can effectively avoid data skip and make the crawling process more human-like. (Usually, 2-5 seconds would work well). |
If you have any questions, you're welcome to submit a ticket to our Support team.
Artículo en español: Hacer clic en cada enlace de la lista y extraer datos de páginas nuevas
También puedes leer artículos de web scraping en sitio web oficial
Author: Fergus
Editor: Yina