Skip to main content

Why can't I get data scraped in order during cloud runs?

Updated this week

When running tasks with Cloud Extraction, the task is split into multiple sub-tasks. These sub-tasks are processed simultaneously in the cloud by different servers. As a result, the scraped data is transferred back at the same time but not in sequence, which is why the data may appear out of order.

Now that we understand the cause, let’s look at the solutions available to resolve this issue.


Disable task split - to run all tasks under one IP

  • Click the Task Settings button on the upper right side of the Octoparse interface

  • Go to Run Settings

  • Check Disable task split

  • Click Save

After this option is selected, the task will not be split into sub-tasks. So the data will be in the same order as the local run. In case you still find the order different, you can try to set up some wait time for the Extract Data action.

Extract_Data.png

Note: The solution will slow the scraping speed as the Cloud speed depends on how many sub-tasks are running at the same time.

Did this answer your question?