By now, you've finished building your first scraping task and knows how to run the task to get the data you need. Let's take it to the next level and find out how you can make your daily scraping routines more effective and efficient using task scheduling, auto-data export, and API.
If you are planning on getting data extracted on any regular basis, task scheduling is exactly what you need and can save you a lot of time. You can schedule your task to run once, on a recurring schedule, or even run repeatedly, such as every 1 min, 5 mins, 10 mins, or 30 mins.
1. Find your task on the Dashboard, click the show more icon then choose "Cloud runs" and select "Set schedule".
2. Choose how often you would like to run the task.
3. For recurring crawls, select the day of the week/day of the month, and time of the day to run your task.
For repeating crawls, select the desired time interval.
4. You can also save the setting for later use. Give the setting a name and click "Save". This way, you can always select the saved schedule setting and apply it directly to any other tasks.
5. After everything‘s done. Click "Save and Run" to start running the task on schedule right away. If you want to save the schedule only, but do not wish to run the task on schedule yet, click "Save" instead.
6. Once you have the schedule set up, you can easily turn it ON and OFF by clicking the show more icon on the Dashboard, then select "Cloud runs", there you can choose "Schedule ON" or "Schedule OFF".
7. When a task is scheduled, you'll see the next run time on the Dashboard. Click the + sign on the Dashboard, then select "Next Run". This way, you'll have a clear picture of the tasks that are scheduled and when the next run is expected.
Auto-data export (for Cloud data)
Data export to database can also be automated and scheduled. If you need to export data to your databases on a regular basis, data export scheduling can save you tons of work.
1. Load the cloud data for your task.
2. Click "Export Data"
3. Click open "Auto-export to database", then select the type of database you have.
4. Complete the information to connect with your database. Click "Test connection" to test if the database is connected successfully. Then, click "Next" to proceed.
5. The next step is to map the data fields and choose the desired time interval for the export.
6. Lastly, click "Next" to finish the process.
With the Octoparse API, you can run scraping tasks, retrieve the extracted data, and even edit your tasks programmatically via coordinating with your own application.
Artículo en español: Lección 6: Programar regulares runs
También puede leer artículos de web scraping en el sitio web oficial