With the "List of URLs" loop mode, Octoparse has no need to deal with some steps like "Click to paginate" or "Click Item" to enter the item page. As a result, the speed of extraction will be faster, especially for Cloud Extraction. When a task built using "Lists of URLs" is set to run in the Cloud, the task will be split up into sub-tasks which are then set to run on various cloud servers simultaneously.
1. Speed up pagination by using URL list
If your scraping task needs to extract data from thousands of multiple pages, you can use the URL list to scrape rather than "click to paginate" one by one. This can help your task run in a more efficient way.
Let's take the URLs below as an example:
This website has a total number of 849 pages. By observing the URLs for each page, you can find that they share the same structure. In this case, you can use "Batch Generate" to auto-generate the URLs for each page.
Here are the steps you can follow:
- Click New+ on the left sidebar of Octoparse
- Choose Advanced Mode
- Click Batch generate
- Enter URL of page one into URL Format bar
- Click Add Parameter
- Choose Number as Parameter Type
- Initial value>>1
- Every time>>+1
- Repeat >> 849
- Click Confirm
Remember to remove the number "1" after the page. The output after generating the URL would look below:
- Batch import URLs from local files
- Batch import URLs from another task
- Manually Enter
Please check this tutorial Batch URL input for more details.
2. Speed up scraping detail pages by using URL list
When you need to click through the items on the list and scrape their corresponding detail pages, it takes some time to click all the items one by one. In this case, it is wise to scrape the URLs of all the listed items first. After you get all the URLs of detail pages, you can start a new task by inputting all the scraped URLs from the previous task.
Here is a case tutorial on how to scrape the URLs of items: Scrape product info from Sam's Club
Artículo en español: Acelere el scraping utilizando la lista de URL
También puede leer artículos de web scraping en el sitio web oficial
Was this article helpful? Contact us any time if you need our support.