Web scraping, if not done responsibly can have some negative effects on the target websites, so some websites do not welcome web scraping as much. If the website you are going to scrape does take anti-scraping measures such as IP blocking, Octopares makes it possible for you to dramatically reduce the chance of being blocked.

  1. Use IP proxies

  2. Auto-rotate web browser (User-agent)

  3. Auto clear cookies

TIP: All anti-blocking options can be found in Task Setting.

mceclip3.png

Use IP proxies

You can set up proxies manually in Octoparse if you would like to access the website with external proxies (e.g. from a specific country) or if you prefer to use your own proxies to protect your local IP. For more information about how to set up proxies, please refer to Set up proxies.

  • Check the Use IP proxies box and click open Settings

  • Enter the proxies and adjust the number of seconds

  • Confirm the changes

sto.gif

Octoparse will automatically switch proxies according to your setup when running specific tasks.


Auto-rotate web browser (User-agent)

Your browser sends what’s known as a user agent for any web page you visit. This is a string to tell the target website what kind of device you are accessing the page with. When scraping a website very consistently with the same user agent, it is easy to be detected as a scraping bot. Thus, with this feature, the chance of being blocked can be reduced.

To set up the auto-rotate browser:

  • Check the Auto-rotate web browsers box

  • Click Settings to select a user agent

  • Confirm the setting

mceclip4.png

Not all the UAs work for every website, so you might need to experiment a bit. If you want Octoparse to visit the website "via PC" when scraping the website, you should check the box for "Select all" and uncheck the boxes of all the user agents for mobile, like "Firefox for mobile". If you want Octoparse to visit the website "via mobile", you should only check the boxes of the agents for mobile.

  • Set how often you'd like to rotate the user agents or select Switch IPs concurrently

Octoparse will automatically switch the user agent every X mins when the task is running locally or in the Cloud.


Auto clear cookies

When scraping a website very consistently with the same cookies, it is easy to be detected as a scraping bot activity. With this feature, Octoparse will clear the cookies from time to time which makes it look like it's accessing the website for the first time.

  • Check the Auto clear cookies box

  • Set how often you'd like to clear the cookies or select Clear cookies when IPs rotate

mceclip5.png

Octoparse will automatically clear cookies every X seconds when the task is running locally or in the Cloud.

  • Remember to Save your setup!

7.png

NOTE: the anti-blocking settings may not 100% bypass a website's blocking mechanisms. The best way is to treat a website nicely and control the accessing speed.

Did this answer your question?