How can I extract data with a list of URLs?
Follow1. Understand Octoparse List of URLs loop mode
When your desired data spans through multiple pages sharing the same page structure, you can input the URLs of these pages into Octoparse to set up a loop. Octoparse will load the URL one by one to scrape the data from each page.
2. Maximum amount of URLs allowed to input
We suggest you add no more than 10,000 URLs for one task. Depending on the length of the URLs, this number would be slightly different.
You will receive an error indicating as below when you've exceeded the limit.
3. Start a new task with a list of URLs
- Enter your list of URLs
When more than one line of URL is added to the Extraction URL box, Octoparse would enter the List of URLs loop mode by default and create a Loop Item automatically.
- Set Wait before execution
To prevent the URLs from incompletely loading, we can set a wait time before the action is executed (2 seconds will work usually).
Advanced Options > Wait before execution
4. Edit the list of URLs you enter
After you entered the list of URL, you are still able to modify them.
Advanced Options > List of URLs
Artículo en español: ¿Cómo puedo extraer datos con una lista de URL?
También puede leer artículos de web scraping en el sitio web oficial
From: https://www.octoparse.com/tutorial-7/extract-data-with-a-list-of-urls