Google Maps is not just a map website to help you find the location, but also a rich database where you can find lots of business insights. Many people scrape Google Maps data to aggregate their own business directory or build a business leads base.
This tutorial will guide you on how to get business information from Google Maps.
First, let me introduce the easiest way - Task Templates for Google Maps.
With the template(s), you just need to enter a keyword (e.g. Accounting, NY) or a web page URL (e.g. https://www.google.com/maps/search/insurance+West+University+Place,+TXemail@example.com,-95.4987615,10z/data=!3m1!4b1) and then wait for the data to come out.
Here is the template data sample for your reference. To try out the template, you can apply for a 14-day premium trial to get started: Try Octoparse 14-day free premium trial!
If you want to learn how to set up the crawler on your own, you may continue with this tutorial.
We will scrape the data fields: Title, Review number, Review rating, Address, Phone, Website, Open time.
Here are the main steps in this tutorial: [Download task file here]
- Go To Web page - to open the targeted web page
- Create a pagination loop - to scrape all the results from multiple pages
- Create a "Loop Item" - to loop all the items on the current page
- Extract data - to select the data for extraction
- Click Item - Click "Back to results"
- Start extraction - to run the task and get data
1) Go To Web Page - to open the targeted web page
- Enter the example URL into the search bar and click "Start"
If you have many URLs to scrape, you can enter several URLs into the bar. To learn more about "Open web page", check this guide: Go to Web Page
- Double-click "Go To Web Page" and set a longer timeout such as "120s"
2) Create a pagination loop - to scrape all the results from multiple pages
- Click the next page button ">"
- Click "Loop click single element" on the Tips panel
After the above actions, a "Pagination" is created in the workflow. You can click the pagination box and the "Click to Paginate" to test if it works well to paginate to the next page.
The default XPath for the Pagination works well in most cases, but it has a problem to scrape the last page data. In this case, you may need to revise the XPath for the "Pagination".
- Double-click the "Pagination" step or click the
- Copy and paste the revised XPath into the text box: //button[contains(@jsaction,"pane.paginationSection.nextPage")][not(contains(@class,"button-disabled"))]
- Double-click "Click to Paginate"
- Adjust AJAX Timeout to "7s" or longer
If you want to learn more about AJAX, check this guide: Handling AJAX
3) Create a "Loop Item" - to loop click items on the list
- Click the 1st and the 2nd title on the list until Octoparse detects all the other similar items
- Click "Loop click each element" on the Tips panel
After the actions above, a "Loop Item" is generated in the workflow. Also, the 1st item page is opened.
Then, we need to do some adjustments to the "Loop Item".
- Double-click "Loop Item"
- Switch the Loop Mode from "Fixed List" to "Variable List"
- Enter the Element XPath: //h3
We also need to modify the settings of the Click Item.
- Double-click "Click Item"
- Uncheck the option "Open in a new tab"
- Adjust the AJAX timeout to "10s" (You can set it based on your local network condition if you run it in your local device)
4) Extract data - to select the data for extraction
Now, you're on the business detail page.
- Click the information you need on the page, such as the title, address, etc
- Select "Extract the text of the selected element" on the "Tips" panel
- Keep repeating until you get all the data fields you need
- Double-click the "Extract Data" step in the workflow
- Click the field names to rename the fields if needed
Google is quite strict with data scraping and it has a very hard-to-read source code, so we need to revise the element XPath for each data field.
No worries! We have prepared all you need for you. You can just use the element XPath provided below.
- Click the icon to modify the XPath one by one
- Replace the default XPath with the revised one (You can choose based on your scraping needs. XPath is to match elements that can be found on the web page.)
- Title: //h1
- Number of review: //button[@jsaction="pane.rating.moreReviews"]
- Review rating: //span[@class="section-star-display"]
- Category: //button[@jsaction="pane.rating.category"]
- Address: //button[@data-item-id="address"]
- Website: //button[@data-item-id="authority"]
- Phone number: //button[contains(@data-item-id,"phone")]
- Open time: //div[contains(@class,"open-hours")]
- Click "OK" to save
If you want to learn more about XPath, please check the following tutorial:
5) Click Item - Click "Back to results"
Normally, we don't need to add this, but Google Maps is a special case. This is to help the task go back to the previous result page and continue to scrape the next item.
- Click the "Back to results" button on the web page
- Choose "Click the button" on the Action Tips
- Extend the AJAX timeout as 7s-10s according to the network condition
6) Start extraction - to run the task and get data
- Click "Save" to save the task
- Click "Run" on the upper left side
- Select "Run task on your device" to run the task on your computer
- Local runs are normally for testing. If you want the crawler to run at higher speed, you can select "Run task in the Cloud" to run the task in the Cloud (for premium users only)
- Try Octoparse 14-day free premium trial!
Here is the sample output.
Tutorial en español: Scrapear información comercial de Google Maps
También puedes leer más artículos de web scraping en el sitio web oficial