Deep dives into websites
I just wanted to share a tip for a configuration I was struggling with. My aim was to scrape information using a list of keywords (which the tutorials showed how to do well). My issue was that the data required three clicks, first on the search bar, then on a link within the results, then a second link within those results.
In all my attempts, the scraper would extract the data for the first item - but would never return to the original page where input was required. I tried with and without a pagination loop (it seems all the video examples on YouTube are essentially the configuration), as well as other loop configurations.
Here is the configuration that worked for me (simpler than some of my other attempts). I assume even deeper dives into a website would still work with the same configuration:
- Go to Web Page
- Loop Item:
- Go to Web Page [the same page as above]
- Enter text [my keyword list]
- Click Item [link within results]
- Click Item [link within results]
- Click Item [link within results]
- Extract Data
-
Official comment
When Octoparse cannot return to the original page, you need to find a button to get back on the web page or maybe use another Go to web page. Here is a related tutorial: https://helpcenter.octoparse.com/hc/en-us/articles/900004440263-Why-does-Octopasre-only-click-the-first-item-in-a-Loop-and-stop-Version-8-
Comment actions
Please sign in to leave a comment.
Comments
2 comments