Octoparse users may encounter situations where subtasks stop unexpectedly, or fewer than the expected number of subtasks are running. This issue generally relates to cloud resource prioritization or task-specific configurations.
Here’s what you need to know and how to address these common issues.
1. Why Are Subtasks Stopped?
Subtasks may be stopped due to the following reasons:
No data for 15 minutes: Subtasks that scrape no data within 15 minutes will be stopped automatically.
IP blocking: When subtasks are blocked, they may scrape no data and then be stopped.
Cloud node crashes: When subtasks run for too long, cloud nodes may crash and stop the subtasks.
Solutions and Recommendations
Step 1. Check the Cloud log for details
Open the Cloud Run panel and click Subtask Status.
Find the stopped subtask and click Details.
Review the logs and check the screenshots.
Step 2. Try the following solutions
Restart the stopped subtasks to see if they can resume scraping data.
Switch the IP pool to match the task’s target location.
Example: If the task targets U.S. websites, changing the IP pool to the United States may help.
Enable Octoparse IP proxies to resolve anti-scraping issues.
Adjust the task workflow or split the task into two smaller tasks.
Use a URL list instead of pagination to create more subtasks.
Scrape subpage (detail page) URLs first, then configure another task to extract data from those URLs.
2. Why Are Fewer Than Expected Subtasks Running?
Cloud Resource Allocation and Prioritization
Octoparse allocates cloud resources based on the user's subscription plan.
For instance, on a Professional plan, users are typically allowed to run up to 20 subtasks concurrently. However, in cases where resources are temporarily limited, priority is given to users with higher subscription plans. This means fewer subtasks may run, such as 12 to 16 instead of 20, until cloud resources are expanded. This prioritization ensures optimal allocation across all users.