Octoparse offers a powerful Cloud platform for premium users (Standard & Professional) to run your tasks 24/7.
When a task is run with Cloud Extraction, it takes advantage of multiple servers using Octoparse's IPs. You can shut down the APP or even your computer while the task is running. No need to worry about hardware limitations. Data extracted will be saved in the cloud and can be accessed at any time.
Task scheduling is also supported by Octoparse Cloud extraction. To retrieve the most updated information, you can schedule your task to run as frequently as you need.
Features covered in this tutorial：
1. Run your task with cloud extraction:
When you finish configuring your task, click "Run" and select "Run task in the Cloud" to execute a run in the cloud.
Once a task is set to run in the cloud, its status will change to "Running" on the dashboard.
2. Batch run tasks with cloud extraction:
Select any tasks that need to be run, click on 'Run (Cloud)' and the tasks will run together in the Cloud.
3. Settings of cloud extraction:
Octoparse cloud extraction allows for executing multiple tasks simultaneously.
On the Standard Plan, you can run 6 concurrent tasks in the cloud (6 cloud servers available), and on the Professional Plan, you can run 20 concurrent tasks (20 cloud servers available). To set the maximum number of tasks running in parallel, click and select a desired number from the drop-down options:
Getting data extracted in the Cloud can be a lot faster than running the tasks locally given the task is split-table (Learn about when a task is split-table).
A split-table task can be broken down into multiple subtasks which can be run on multiple servers simultaneously, thus making the extraction faster.
Yes, you can. But some of the tasks will be queued until more cloud servers become available upon completion of the earlier tasks.
4. Schedule a run in the cloud:
4.1 For a single task
When you finish configuring your task, click Run and select Schedule (Cloud).
Select the frequency and customize the time and date according to your requirements. Click Save and Run and the task will be run as scheduled.
Timing for the next run can be found on the dashboard in the Next Run column.
And if you wish to cancel a scheduled run, click More, and select Schedule OFF in the Cloud runs.
FAQ: What's the default time zone for the Octoparse Cloud platform?
The next execution time shown on the dashboard is in your local time zone (according to your operating system) by default. However, if you've built the task to extract "current date & time" in the Cloud, the extracted time & date will be in UTC±00:00 regardless of your actual location.
Currently, Octoparse does not support changing the timezone.
4.2 For a group of tasks
Go to your dashboard, switch to Task Group view, select your target task group, and click on the clock icon to set a schedule for the task group.
5. Frequently asked questions
5.1. What are concurrent Cloud extractions?
Concurrent Cloud extraction means the maximum number of tasks you can run at the same time. If you are on the Standard Plan, you can run at most 6 concurrent extractions in the Cloud because you have 6 Cloud servers (one task needs at least one server to run).
Please note that you may find that sometimes you cannot run 6 tasks in the Cloud because one splittable task may take up more or all of the servers in your account. Once one task takes up all the servers, the other tasks need to wait for the Cloud resource to run them. Read this tutorial for more details about task split: How can I scrape data faster in Cloud?
5.2. What affects the number of concurrent extractions?
The main factors influencing your concurrent extractions are 1) the number of Cloud servers you have and 2) the number of servers your running tasks take up.
For example, you are on the Standard Plan, which means you have 6 Cloud servers. If you have 6 tasks, and these tasks only take up 1 server each when running, you will see 6 tasks running at the same time.
If one of the tasks takes up 2 servers (it is split into 2 or more sub-tasks), then you will only see 4 tasks running at the same time. If the task takes up 6 servers, then you will only see one task running.