When running tasks with "Cloud Run", the task is split into multiple sub-tasks. Mutlple sub-tasks will run in the cloud by multiple servers at the same time. Data scraped from the sub-tasks are transferred to us at the same time, that's why the data is out of order.
Knowing the working principle of could extraction, we can then solve it. There are two approaches to make it run in order, both to make all the tasks run with one sub-task:
- Disable task split - to the task as one sub-task
- Set Cloud resources as 1 - to run one sub-task at a time
1. Disable task split - to run all tasks under one IP
- Click on the upper right side
- Tick Disable task split
- Click Save
After this option is selected, the task will not be split into sub-tasks. So the data will be be the same order as the local run. In case you still find the order different, you can try to set up some wait time for the Extract Data action.
2. Set Cloud resources as 1 - to use one IP to run all the tasks
- Go to Dashboard
- Click to open more settings of the task
- Click Could Runs
- Choose Cloud Resources
- Input 1 in the box
- Click Save
With this option, the task can still be split into sub-tasks, but Octoparse will only run one sub-task at a time; therefore, the results returned are in order.
If you have further questions about the settings, you are welcome to submit a request here. Our support team will get back to you ASAP.