What are a parent task and a child task?
In Octoparse, the previous task can provide URLs for the next task. This allows you to link 2 to 100 tasks together by passing the URLs through the extracted data. The task that provides the URL is called the parent task, and the one that uses the URLs from it is called the child task.
How to set up a child task?
There are two methods to set up the child task, depending on whether the child task is ready or not.
1. If the child task is ready:
Click Loop Item
Click the Edit button next to the Manual Input box
On the following page, choose Import from task
Select the right Task Group
Select the right parent task
Select the correct data field in the parent task (which contains URLs)
Click Save to save the settings
Then you have successfully transferred URLs from a parent task to a child task.
Notes:
Inputting URLs from another task is only supported by Octoparse Cloud Extraction.
If there is no data extracted in the parent task, you'll need to paste at least one URL manually to start configuring the child task.
Check Batch URL input if you'd like to know more about how to input URLs.
2. If the child task is not ready:
Click the +New button on the sidebar on the Octoparse homepage
Select Custom Task
When you are on the setting page, repeat the same steps in method 1:
Choose Import from task
Select the right Task Group
Select the right parent task
Select the correct data field in the parent task (which contains URLs)
Click Save to save the settings
Please pay attention to the notification above the Save button: If the parent task doesn't run and scrape any data before, the Select Field and the URL Preview will be blank, and thus the child task may fail to run. Therefore, please make sure you have run the parent task before setting the child task.
How to schedule parent tasks and child tasks?
Scheduling a parent task is just the same as scheduling a normal task. You can check Schedule tasks to run.
For child tasks, you can schedule them to run based on the status of their parent task.
Click on Run -> Parent task settings
Select Task Settings
You can choose to run the child task according to the status of the parent task or run it manually.