Q: What is a parent task and a child task?
A: In Octoparse, you can associate more than 2 tasks (less than 100 tasks) together by the field of "URL" in the extracted field, meaning the previous task can provide URLs for the next task. The task which provides the URL is called a "parent task" while the task which uses the URLs from a parent task is called a "child task".
Q: How to set up the parent task?
1) If the child task is ready, please go edit the "Loop Item". This loop item doesn't have to be the first step, it also works in the later steps.
When you reach this page, please select "Input from task". Then select the right parent task and URL field in the drop-down.
Then you have successfully transferred URLs from a parent task into a child task.
2) If your child task is not ready. Go to build your crawler with the advanced mode.
We are on this page again. Make the right choices for the drop-downs and "Save URL".
If the parent task doesn't have any result yet, a popup will tell you to copy some URLs to save the setting.
So far, you have made two tasks associated. Octoparse provides four execution options (as the image down below shows) to execute the tasks. For example, if you select "Run task as soon as its parent task starts", then once Octoparse reads any URL extracted in the parent task, it would automatically transfer the URL into the child task and set the task to execute.
1. Input URLs from another task is only supported by Octoparse Cloud Extraction .
2. When there is no data extracted in the parent task, to start configuring the child task, you’ll need to manually paste in at least one URL.
3. Check Batch URL input if you'd like to know more about the ways to input URLs.
Artículo en español: ¿Qué es la tarea principal y la tarea secundaria en Octoparse?
También puede leer artículos de web scraping en el sitio web oficial