When you have a loop click action created in the workflow, by default, Octoparse will have each individual link loaded in a new tab. So while it clicks open each link in the list, it will never lose the tab containing the listing page. However, there are special cases when the new page can only be loaded directly on the current tab and so, in this case, Octoparse is unable load the second item in the list as it can on longer access to the original listing. Such that, you'll actually get a loop click action that only clicks the first item in the loop without proceeding to the other items in the list.
How do I know if my Loop is working or not?
You can check to see if the loop is working correctly by manually clicking through the actions in the workflow.
- Click the "Loop Item" after the listing page loads
- Click "Click Item" and wait for the new page to load
- Click the "Loop Item" again to see if Octoparse shows the listing page
If you can see the listing page, that means Octoparse is able to return to the listing page and the Loop should be working. On the other hand, you'll need to modify the workflow to make it work. Below are a few quick fixes you can try for yourself.
1. Open the detail page in a new tab
For websites that supports loading links in a new tab, go to the setting of the Click item (which clicks open the new page). Check if "Open in a new tab" is ticked. If not, just ticking the option will help to resolve the issue.
2. Add a Back button
If the Loop is still not working correctly even with "Open in a new tab" selected, chances are the new pages are loaded with AJAX. In such case, the new page will overwrite the listing page as it gets loaded, hence Octoparse cannot switch back to the list page.
To solve this, look for any buttons that takes you back to the listing page. In the example below, it would be the "Back to Search Results" button. If there's a button like this, set up a Click action using the "Back" button. This way, when Octoparse finishes scraping data on the new page, it will click the "Back" button to return to the original list as it continues to loop through other items of the list.
- Click on the "Back" button
- Choose "Click the element" or "Click the button" on the Tips panel
- Adjust the AJAX timeout to make it long enough for the page to render
The workflow should be like this:
3. Add a "Go to Web Page"
So even when "Open in a new tab" is selected and no "Back" button is available, there's still another trick you can try - add a "Go to Web Page" action manually to help Octoparse go back to the original listing page.
Cursor over the workflow and add a "Go To Web Page" step as the last step in the "Loop Item". This action will reopen the listing page every time Octoparse finishes sraping the current item page. However, this trick may not work well if you are scraping a list that spans through multiple pages.
- Add a "Go to Web Page" action to the workflow
- Copy and paste the URL of the listing page
4. Split the task into two tasks
Last but not least, regardless of how long your list is or why is it that the page is not loading in a new tab, you can always try to split the task into two, one that fetches the URLs embeded in the list items and another task that is set up to extract specific information from each of those URLs. This is the trick that does wonders as not only it is reliable but it makes the scraping process much more efficient by not having to switch back and forth between the tabs. Check Scraping product information from Target.com for how it is done step-by-step.
To further automate the scraping process, you can even associate the tasks to run together using feature: "import URLs from another task".