What is Cloud Extraction? (V8.1)
FollowOctoparse offers a powerful Cloud platform for premium users (Standard & Professional ) to run your tasks 24/7.
When a task is run with "Cloud Extraction", it takes advantage of multiple servers using Octoparse's IPs. You can shut down the APP or even your computer while the task is running. No need to worry about hardware limitations. Data extracted will be saved in the cloud and can be accessed at any time.
Task scheduling is also supported by Octoparse Cloud extraction. To retrieve the most updated information, you can schedule your task to run as frequently as you need.
Features covered in this tutorial:
· Run your task with Cloud extraction
· Batch run tasks with Cloud extraction
· Settings of Cloud extraction
To run your task with cloud extraction:
When you finish configuring your task, click "Run" and select "Run task in the Cloud" to execute a run in the cloud.
Once a task is set to run in the cloud, its status will change to "Running" on the dashboard.
To batch run tasks with cloud extraction:
Select any tasks that need to be run, click on 'Run (Cloud)' and the tasks will run together in the Cloud.
Settings of cloud extraction:
Octoparse cloud extraction allows for executing multiple tasks simultaneously.
On the Standard Plan, you can run 6 concurrent tasks in the cloud (6 cloud servers available), and on the Professional Plan, you can run 20 concurrent tasks
(20 cloud servers available). To set the maximum number of tasks running in parallel, click and select a desired number from the drop-down options:
Tips! 1. How’s the performance of cloud extraction? Getting data extracted in the Cloud can be a lot faster than running the tasks locally given the task is split-table (Learn about when a task is split-table ). A split-table task can be broken down into multiple subtasks which can be run on multiple servers simultaneously, thus making the extraction faster. 2. Can I run more tasks than the maximum number's allowing for? Yes, you can. But some of the tasks will be queued until more cloud servers become available upon completion of the earlier tasks. |
To schedule a run in the cloud:
When you finish configuring your task, click "Run" and select "Schedule(Local)".
Select how frequently you want to run it: Once/Weekly/Monthly/Repeat. And customize the time and date according to your data requirements. Click "Save and Run" and the task will be run as scheduled.
The time for the next execution can be found on the dashboard on the "Next Run" column.
And if you wish to cancel a scheduled task, click "More", select "Schedule OFF" in the "Cloud runs".
Tips! What's the default time zone for Octoparse Cloud platform? The next execution time shown on the dashboard is in your local time zone (according to your operating system) by default. However, if you've built the task to extract "current date & time" in the Cloud, the extracted time & date will be in UTC±00:00 regardless of your actual location. Currently, Octoparse does not support changing the timezone. |
To set a schedule for a group of tasks, switch to 'Task Group' mode, select a task group, then choose "Set a schedule for the task group".
What are concurrent Cloud extractions?
Concurrent Cloud extraction means the maximum number of tasks you can run at the same time. If you are on the Standard Plan, you can run at most 6 concurrent extractions in the Cloud because you have 6 Cloud servers (one task needs at least one server to run).
Please note that you may find that sometimes you cannot run 6 tasks in the Cloud because one splitable task may take up more or all of the servers in your account. Once one task takes up all the servers, the other tasks need to wait for the Cloud resource to run them. Read this tutorial for more details about task split: What is "task split" on Cloud Extraction?(Speed up Cloud Extraction)
What affects the number of my concurrent extractions?
The main factors influencing your concurrent extractions are 1) the number of Cloud servers you have and 2) the number of servers your running tasks take up.
For example, you are on the Standard Plan, which means you have 6 Cloud servers. If you have 6 tasks, and these tasks only take up 1 server each when running, you will see 6 tasks running at the same time.
If one of the tasks takes up 2 servers (it is split into 2 or more sub-tasks), then you will only see 4 tasks running at the same time. If the task takes up 6 servers, then you will only see one task running.
Other advantages of Cloud Extraction:
日本語記事:クラウドでタスクを行う/スケジュール設定
Webスクレイピングについての記事は 公式サイトでも読むことができます。
Artículo en español: Ejecutar/Programar tareas en Cloud
También puede leer artículos de web scraping en el sitio web oficial