In this tutorial, we are going to show you how to scrape company information from Crunchbase with multiple URLs.
For Crunchbase, you could visit our easy-to-use "Task Template" on the main screen of the Octoparse scraping tool. All you need is to type in several parameters and the task is ready to go. For further details, you may check it out here: Task Templates
To follow through, you may want to use these URLs in the tutorial:
We will scrape data such as the Company name, Location, and Introduction from the company detail pages with Octoparse.
Here are the main steps in this tutorial: [Download task file here]
- "Go To Web Page" - to open the target web page
- Extract data - to select the data for extraction
- Start extraction - to run the task and get data
1. "Go To Web Page" - to open the target web page
- Click "+ Task" to start a new task with Advanced Mode
- Paste the URLs into the "Extraction URL" box
- Click "Save URL" to move on
In this case, you need to prepare a list of URLs of the companies you would like to scrape from Crunchbase.
2. Extract data - to select the data for extraction
- Uncheck the box for "Retry when page fails to load"
- Click "OK" and "Save"
- Click on the data you need on the page
- Select "Extract text of the selected element" on the "Action Tips"
- Rename the fields by selecting from the pre-defined list or inputting on your own
3. Save and start extraction - to run the task and get data
- Click “Start Extraction” on the upper left side
- Select “Local Extraction” to run the task on your computer, or select “Cloud Extraction” to run the task in the Cloud (for premium users only)
Here is the sample output:
Happy data hunting!