Scrape company information from Crunchbase
FollowIn this tutorial, we are going to show you how to scrape company information from Crunchbase with multiple URLs.
For Crunchbase, you could visit our easy-to-use "Task Template" on the main screen of the Octoparse scraping tool. All you need is to type in several parameters and the task is ready to go. For further details, you may check it out here: Task Templates
To follow through, you may want to use these URLs in the tutorial:
https://www.crunchbase.com/organization/paypal
https://www.crunchbase.com/organization/apple
https://www.crunchbase.com/organization/twitter
We will scrape data such as the Company name, Location, and Introduction from the company detail pages with Octoparse.
Here are the main steps in this tutorial: [Download task file here]
- "Go To Web Page" - open the target web page
- Extract data - select the data for extraction
- Start extraction - run the task and get data
1. "Go To Web Page" - open the target web page
- Click "+ Task" to start a new task with Advanced Mode
- Paste the URLs into the "Website" box
- Click "Save URL" to move on
In this case, you need to prepare a list of URLs of the companies you would like to scrape from Crunchbase.
2. Extract data - select the data for extraction
- Click the title
- Select "Extract text of the selected element" on the "Action Tips"
- Click the description
- Select "Extract text of the selected element" on the "Action Tips"
- Click the location
- Click
"expand select area"
- Select "Extract text of the selected element" on the "Action Tips"
- Rename the fields by selecting from the predefined list or inputting on your own
3. Save and start extraction - run the task and get data
- Click “Start Extraction” on the upper left side
- Select “Local Extraction” to run the task on your computer, or select “Cloud Extraction” to run the task in the Cloud (for premium users only)
Here is the sample output:
Happy data hunting!
Author: Kara
Editor: Fergus