Sometimes, you might find the problem with a task where Octoparse extracts only the first item and stops to loop through other items. In fact, it can be considered that Octoparse cannot return to the list page after scraping data from detail page. This issue could be divided into two different situations.
1. The detail page is not set up to be opened in a new tab.
Click the “Click Item” and you will find an advanced option named “New Tab”. You should click on the “New Tab” and re-create the following steps.
Do remember to re-create the steps following the “Click Item” because Octoparse needs to identify a new page to extract and the previous steps would not work out.
2. The website applies AJAX to update information or is not compatible with Octoparse.
If Octoparse still cannot work even though you click “New Tab”, the website either applies AJAX or is incompatible with Octoparse. Page loaded with AJAX will cover the previous one, so Octoparse cannot get to the next item to scrape. The compatible issue relates to application compatibility and adaptability between Octoparse and websites you want to scrape. In this case, you should divide your task into two steps. Extract detail page URLs with Octoparse firstly, and then scrape data you want with the URL list. If you are new to URLs list extraction, please follow this video tutorial to learn more. [Click here ]
You can follow these steps to manually check if Octoparse can return to the list page once it enters the detail page.
1. Click “Go To Web Page” to open the website.
2. Click “Loop Item” box in your workflow.
3. Click “Click Item” to open the detail page.
4. Click “Loop Item” again and see if Octoparse can return to the list page. If not, you should follow the above steps to revise your workflow.
Artículo en español: ¿Por qué Octoparse solo extrae el primer elemento y produzca duplicados?
También puede leer artículos de web scraping en el sitio web oficial