is there any rules that you might share with me about the IP rotative option with a Bing and Google search results extraction ?


1 comment

  • Kara

    Hi there,

    Thank you for reaching out.

    When the task is running with cloud extraction, it is rotating IP address automatically, for more information, please refer to: How does IP rotation work?

    When running with local extraction, you can add proxies to avoid getting blocked(proxies can only be used with local extraction at the moment ). to learn how to set it up, please refer to Set up proxies

    After we scraped the website for a large amount of data in a very short period of time, it's very likely to trigger the robot test(CAPTCHA/reCAPTCHA) and get blocked. For CAPTCHA/reCAPTCHA, It is easier to not trigger it than solve it. For most people, the easiest way is to slow down or randomize the extracting process in order to not trigger the Captcha test. Adjusting the delay time or use anti-scraping settings can effectively reduce the probability of triggering the test. Here are some related tutorials for your reference:
    Set up wait time
    Octoparse Anti-Blocking Settings
    How to scrape websites without being blocked?

    Then if the captcha does pop up, we can solve it manually in the local extraction window. To learn how, please check:
    Is Octoparse able to handle CAPTCHA/reCAPTCHA?

    And for this project, the list of keywords is too long, we'd suggest you use only 100-200 keywords each time. Please give it try, to divide the list into small amount first.

    Best regards,

    Comment actions Permalink

Please sign in to leave a comment.