Batch URL input
FollowWhat is Batch URL input?
The Batch URL input feature is importing a large number of URLs into Octoparse. Octoparse supports batch/bulk URL import from local files (text or spreadsheet), from another task or even generate the URLs based on a pre-defined pattern.
How to batch input URLs?
Click "+New" to create a new task, select "Advanced Mode" and you will see the URLs importing panel.
There are three ways to batch import URLs to any single task/crawler (up to a million URLs):
2) Import URLs from another task
3) Batch generate URLs based on a pre-defined pattern
Tips! Once the number of imported/generated URLs reaches the limit of 1 million, Octoparse would stop importing/generating immediately. |
1) Import URLs from a file
You can import URLs from any of the file formats below,
- CSV
- TXT
- Excel (.xlsx & .xls)
1. Select "Import from file".
2. Click "Select" then choose the file containing the URLs and then select the sheet and column that contains the URLs.
3. Click "Save" to complete the import process.
Note: only the first 100 URLs will be shown for preview purposes.
2) Import URLs from another task
This feature makes it possible to integrate two tasks seamlessly when URL extraction needs to be done separately with another task. No more manual URL export-and-import is needed.
1. Select "Import from task".
2. Select the task containing the target URLs, then specify the proper data field.
3. Click "Save" to complete the import process.
Note the selected task (one that contains the URLs needed for more crawling) is referred to as the parent task, and the new task to be configured with the URLs becomes the child task. Two tasks will be associated automatically and can be executed in association with one another.
Tips! 1. You can set up to run the child task according to the status of the parent task in the Cloud. If you set up an associated run by selecting an option from Parent task settings, both tasks will be executed in the cloud via Octoparse Cloud Service 2. When an associated run is set up, task scheduling |
3) Batch generate URLs based on a pre-defined pattern
With the "Batch generate" feature, you can easily generate a large number of URLs following specific patterns by modifying various parameters of one given URL.
1. Select "Batch generate".
2. Input one URL as a base for batch generate.
3. Highlight the selected URL parameter, and click "Add parameter".
4. Select from the four Parameter Type options to define the pattern you need and click "Save URL" to save the list.
Four Parameter Type options
- Type 1: Numbers
- Type 2: Letters
- Type 3: Time
- Type 4: Custom list
Tips! You can set up multiple parameters to generate URLs. For example, if the base URL is www.XXX.com/[parameter1]/[parameter2] Parameter1={A, B}, Parameter2={1, 2} The final URL list would be like: www.XXX.com/A/1 www.XXX.com/B/1 www.XXX.com/A/2 www.XXX.com/B/2 |
If you have questions, you are welcome to submit a request here. Our support team will get back to you later.
Artículo en español: Ingresar URLs por lotes
También puedes leer artículos de web scraping en sitio web oficial
Author: Fergus
Editor: Tina