Some websites are very sensitive to web scraping and take anti-scraping measures such as IP blocking to avoid any possible scraping activities.
In this tutorial, we will show you how to set up Octoparse Anti-Blocking in "Settings" in a task to reduce the chance of being blocked.
Use IP proxies (for local extraction only)
You can set up proxies manually in Octoparse if you would like to access the website with external proxies (e.g. from a specific country) or you prefer to use your own proxies to protect your local IP. For more information about how to set up proxies, please refer to Set up proxies.
- Check the box for "Use IP proxies" and click "Settings ".
- Enter the proxies and the number of seconds for switching proxies.
- Click "OK" to save the change.
Octoparse will automatically switch proxies as you set when the task is running locally.
Auto-rotate browser (User-agent)
Your browser sends what’s known as a user agent for any web page you visit. This is a string to tell the target website what kind of device you are accessing the page with. When scraping a website very consistently with the same user agent, it is easy to be detected as a scraping bot activity. Thus, with this feature, the chance of being blocked can be reduced.
To set up the auto-rotate browser:
- Check the box for "Auto-rotate browser (User-agent)".
- Click "Settings" to select the user agents.
Not all the UAs work for every website, so you might need some testings. If you want Octoparse to visit the website "via PC" when scraping the website, you should check the box for "Select all" and uncheck the boxes of all the user-agents for mobile, like "Firefox for mobile" if you want Octoparse to visit the website "via mobile", you should only check the boxes of the agents for mobile.
- Click OK to save the change.
- Either select how often you want for switching user agents or check the box for "Switch IPs concurrently" when the task runs with IP proxies.
Octoparse will automatically switch the user agent as you set when the task is running locally or in the Cloud.
Auto clear cookies
When scraping a website very consistently with the same cookies, it is easy to be detected as a scraping bot activity. With this feature, Octoparse will clear the cookies from time to time to pretend to be the first time to access the web page.
- Check the box for "Auto clear cookies"
- Either select how often to clear cookies or check the box for "Clear cookies when IPs switch".
Octoparse will automatically clear cookies as you set when the task is running locally or in the Cloud.
After setting up Octoparse Anti-Blocking, you can click "Save" to save the settings.
*Note that the anti-blocking settings still do not 100% pass a website's blocking mechanisms. The best way is to treat a website nicely and control the accessing speed.
Should you have any questions, feel free to leave your message.