If you have a loop click action created in the workflow, by default, Octoparse will have each individual link loaded in a new tab. So while it clicks open each link in the list, it will never lose the tab containing the listing page. However, there are special cases when the new page can only be loaded directly on the current tab. In this case, Octoparse is unable to load the second item in the list as it can no longer access the original listing. Such that you will get a loop click action that only clicks the first item in the loop without proceeding to the other items in the list.
How do I know if my Loop is working or not?
You can check to see if the Loop Item is working correctly by manually clicking through the actions in the workflow.
- Click the "Loop Item" after the listing page loads
- Click "Click Item" and wait for the new page to load
- Click the "Loop Item" again to see if Octoparse shows the listing page
If you can see the listing page, that means Octoparse is able to return to the listing page and the Loop should be working. Otherwise, you will need to modify the workflow to make it work. Below are a few quick fixes you can try for yourself.
1. Open the detail page in a new tab
For websites that support opening links in a new tab, go to the settings of the Click Item (which clicks open the new page). Check if "Open in a new tab" is ticked. If not, just ticking the option will help to resolve the issue.
2. Add a Back button
If the Loop Item is still not working correctly even with "Open in a new tab" selected, chances are the new pages cannot be loaded in the new tab.
In this case, the new page will overwrite the listing page as it gets loaded, hence Octoparse cannot switch back to the listing page.
To resolve this, look for any buttons that will take you back to the listing page. In the example below, it would be the "Insurance" button. If there's a button like this, set up a Click action using the "Insurance" button. This way, when Octoparse finishes scraping data on the new page, it will click the "Insurance" button to return to the original list and continue to loop through other items in the list.
- Click on the "Insurance" button
- Choose "Click element" or "Click button" on the Tips panel
- Adjust the AJAX timeout to make it long enough for the page to render
The workflow should be like this:
3. Add a "Go to Web Page"
Even when "Open in a new tab" is selected and no "Back" button is available, there's still another trick you can try - add a "Go to Web Page" action manually to help Octoparse go back to the original listing page.
Cursor over the workflow and add an "Open Page" step as the last step in the "Loop Item". This action will reopen the listing page every time Octoparse finishes scraping the current item page. However, this trick may not work well if you are scraping a list that spans multiple pages.
- Add a "Go to Web Page" action to the workflow
- Copy and paste the URL of the listing page
4. Split the task into two tasks
Last but not the least, regardless of how long your list is or why is it that the page is not loading in a new tab, you can always try to split the task into two, one that fetches the URLs embedded in the list items and the other task that is set up to extract specific information from each of those URLs. This is a trick that does wonders as not only is it reliable but it makes the scraping process much more efficient by not having to switch back and forth between the tabs. Check Scraping product information from Target.com to see how it is done step-by-step.
To further automate the scraping process, you can even associate the tasks to run together using feature: "import URLs from another task".