In this tutorial, we are going to show you how to scrape company information from Crunchbase.
To follow through, you may want to use these URLs in the tutorial:
We will scrape data such as the Company name, Location, and Introduction from the company details page with Octoparse.
Here are the main steps in this tutorial: [Download task file here]
- "Go To Web Page" - to open the targeted web page
- Extract data - to select the data for extraction
- Start extraction - to run the task and get data
1. "Go To Web Page" - to open the targeted web page
- Click "+ Task" to start a new task with Advanced Mode
Advanced Mode is a highly flexible and powerful web scraping mode. For people who want to scrape from websites with complex structures, like amazon.com, we strongly recommend Advanced Mode to start your data extraction project.
- Paste the URLs into the "Extraction URL" box and click "Save URL" to move on
In this case, you need to prepare a list of URLs of the companies you would like to scrape from Crunchbase.
2. Extract data - to select the data for extraction
- Uncheck the box for "Retry when page fails to load"
- Click "OK" and "Save"
- Click on the data you need on the page
- Select "Extract text of the selected element" on the "Action Tips"
- Rename the fields by selecting from the pre-defined list or inputting on your own
3. Save and start extraction - to run the task and get data
- Click “Start Extraction” on the upper left side
- Select “Local Extraction” to run the task on your computer, or select “Cloud Extraction” to run the task in the Cloud (for premium users only)
Here is the sample output: