When running tasks with "Cloud Run," the task is split into multiple sub-tasks. Multiple sub-tasks will run in the cloud by multiple servers at the same time. Data scraped from the sub-tasks are transferred to us at the same time. That's why the data is out of order.

Knowing the working principle of could extraction, we can then solve it. There are two approaches to make it run in order, both to make all the tasks run with one sub-task:

  1. Disable task split - to the task as one sub-task

  2. Set Cloud resources as 1 - to run one sub-task at a time


1. Disable task split - to run all tasks under one IP

  • Click the setting button on the upper right side of the Octoparse interface

setting.png
  • Tick Disable task split

  • Click Save

disable.png

After this option is selected, the task will not be split into sub-tasks. So the data will be in the same order as the local run. In case you still find the order different, you can try to set up some wait time for the Extract Data action.

Extract_Data.png

2. Set Cloud resources as 1 - to use one IP to run all the tasks

  • Go to Dashboard

  • Click the More button to open more settings of the task

  • Click Could Runs

  • Choose Cloud Resources

....png
  • Input 1 in the box

  • Click Save

SAVE.png

With this option, the task can still be split into sub-tasks, but Octoparse will only run one sub-task at a time; therefore, the results returned are in order.

Note: Both solutions will slow the scraping speed as the Cloud speed depends on how many sub-tasks are running at the time.

Did this answer your question?