You are browsing a tutorial guide for Octoparse's latest version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
It can be a big challenge for users to find ways to speed up their tasks, particularly when dealing with some complicated tasks. This article is designed to help you troubleshoot a very low-speed task. Specifically, it will go through all the possible factors of determining whether the problem is more likely to be caused by the local environment, website structure, or simply the settings of your task.
Situation 1: Too many steps in the workflow
Resolution 1: Simplify your task
The workflow sometimes might be too complicated with many steps to get to the target page, so it is quite essential for you to simplify your workflow by deleting some unnecessary steps such as click actions. You will need to use the URL from the nearest layer in order to make your task much simpler as well as straightforward.
For example, if you want to extract 3D glasses from Amazon. You will want to avoid the following situation of clicking items layer by layer to reach the 3D glass product page.
You will need to directly use the URL from the 3D glass product page to start your task.
Resolution 2: Split your task
When your task needs to click a list of elements to get the data, you can try to split the task into two.
- Task 1: Get the URL for each entry from the listing page
- Task 2: Use the list of URLs extracted from the first step to set up a new task for data from the detail page.
Check out for an example case: Scraping property data from Realtor.com
Situation 2: The website applies AJAX but you have not set it up.
Resolution: Set a proper AJAX time
Many websites use the AJAX technique to update information without reloading the entire webpage over and over again. When a page is loaded with AJAX, but you have forgotten to set it up, the task may get stuck and work really slowly. An appropriate AJAX time will allow the extraction process to go on smoothly.
TIP: For more details about AJAX time settings, check out Handling AJAX
Situation 3: The local environment is not good (local runs)
Resolution 1: Improve the local environment
If the local extraction runs quite slow, it is likely caused by the local environment, such as operating system, hardware capacity, IP address, network bandwidth, CPU performance, and so on. You will need to manually check the current status of all the related factors listed above.
Resolution 2: Run tasks in the Cloud (premium users only)
However, it is quite understandable that those sorts of issues will be less likely to be settled down or fixed up. However, running tasks in the Cloud will be more effective and feasible for you to enjoy times faster data extraction with Octoparse.
Check out how to speed up the tasks in this tutorial: How can I scrape data faster in Cloud? (Version 8)
Situation 4: The website content might take a longer time to be fully loaded
When a website contains too many elements like images or videos, then the overall loading speed of the web pages will be slowed. This will also be another main factor that slows the overall running speed of certain tasks.
Resolution: Disable image loading
We can choose not to load the images on the web pages to shorten the time of page loading.
Go to task settings
Tick Disable image loading and click Save
NOTE: Disable image loading sometimes may cause page loading failure. If you find the task not working properly after selecting it, please cancel this option.