“Loop Item” is very important in Octoparse because it is one of the most frequently-used steps while you build a scraping task.
If you have a task that Octoparse only extracts the first item and keeps producing duplicates, you may need to revise the “Loop Item” you create in the task.
There are mainly two reasons why it happens:
1) Data to be extracted is not in the selected area. (e.g. You only select the title to create a loop and yet you click the data outside the title area to extract)
This mistake may usually happen when you need to extract data from the list page.
In this case, you may need to delete the entire “Loop Item” and rebuild another one. Please note that you need to select the entire area as an item to create a loop. (Data extraction is only allowed in the selected area.) If you cannot select the entire area directly, expand the area by clicking this icon on "Action Tips" to include all the data you need.
2) When finishing a loop, Octoparse will mark the first item in red as shown in the screenshots below to remind you to start extracting data from the first item.
But if you start to extract data from the second item or other items without following Octoparse’s hints, Octoparse may scrape the second item or other items’ data and produce duplicates. You should delete the step of “Extract Data” and drag a new step of “Extract Data” in your loop under the instruction of Octoparse.
You can follow these two steps to check the “Loop Item” manually.
- Just click the first item in your “Loop Item” to check the data extracted as shown in the screenshot below.
- Click the second item in “Loop Item” to check the data. If the data extracted is always the same even though you select the second item, you should follow the above solutions to revise your task.
Artículo en español: ¿Por qué Octoparse solo extrae el primer elemento y produzca duplicados?
También puede leer artículos de web scraping en el sitio web oficial