Google Maps is not just a map website to help you find the location, but also a rich database where you can gain lots of business insights. Many people scrape Google Maps data to aggregate their own business directory or build a business leads base.
This tutorial will guide you on how to get business information from Google Maps.
First, let me introduce the easiest way - Task Templates for Google Maps.
With the template(s), you just need to enter a keyword (e.g. Accounting, NY) or a web page URL (e.g. https://www.google.com/maps/search/insurance+West+University+Place,+TXfirstname.lastname@example.org,-95.4987615,10z/data=!3m1!4b1) and then wait for the data to come out.
Here is the template data sample for your reference. To try out the template, you can apply for a 14-day premium trial to get started: Try Octoparse 14-day free premium trial!
If you want to learn how to set up the crawler on your own, you may continue with this tutorial.
We will scrape the data fields: Title, Review number, Review rating, Address, Phone, Website, Open time.
Here are the main steps in this tutorial: [Download task file here]
- Go to Web page - to open the targeted web page
- Create a pagination loop - to scrape all the results from multiple pages
- Create a "Loop Item" - to loop all the items on the current page
- Extract data - to select the data for extraction
- Extract from page-level data - to extract GPS coordinates (optional)
- Click Item - Click "Back to results"
- Start extraction - to run the task and get data
1) Go To Web Page - to open the targeted web page
- Enter the example URL into the search bar and click "Start"
If you have many URLs to scrape, you can enter several URLs into the bar. To learn more about the "Open web page", check out this guide: Go to Web Page
- Click "Go To Web Page" and set a longer timeout at the bottom such as "120s", then click "Apply".
2) Create a pagination loop - to scrape all the results from multiple pages
- Click the next page button ">"
- Click "Loop click single element" on the Tips panel
After the above actions, a "Pagination" is created in the workflow. You can click the pagination box and the "Click to Paginate" to test if it works well to paginate to the next page.
The default XPath for the Pagination works well in most cases, but there is a problem scraping the last page data. In this case, you may need to revise the XPath for the "Pagination".
- Click the "Pagination" step on the workflow
- Copy and paste the revised XPath into the "Matching Xpath" text box: //button[contains(@jsaction,"pane.paginationSection.nextPage")][not(contains(@class,"button-disabled"))]
- Click "Click to Paginate" on the workflow
- Click "Options"
- Adjust AJAX Timeout to "7s" or longer
- Click "Apply"
If you want to learn more about AJAX, check this guide: Handling AJAX
3) Create a "Loop Item" - to loop click items on the list
- Click the first company block on the list
- Select "Click URL" on the Tips panel
- Set AJAX timeout for 5-10s
- Then go to the bottom of the workflow and click "Options"
- Uncheck "Open in a new tab" and click "apply"
- Click the name on the first block of the list shown on the page
- Click "Select all"
- Click "Loop click each element"
- Set up the AJAX to 5s-10s
4) Extract data - to select the data for extraction
- Select the information you want on the web page
- Select "Extract the text of the element"
Please note that Google is quite strict with data scraping and it has a very hard-to-read source code, so we need to revise the element XPath for each data field to ensure scraping precisely.
No worries! We have prepared all you need for you. You can just use the element XPath provided below.
- Go to the data preview and click "More"
- Click "Customize Xpath"
- Replace the default XPath with the revised one.
You can choose based on your scraping needs. XPath is to match elements that can be found on the web page.
- Title: //h1
- Number of review: //button[@jsaction="pane.rating.moreReviews"]
- Review rating: //span[@class="section-star-display"]
- Category: //button[@jsaction="pane.rating.category"]
- Address: //button[@data-item-id="address"]
- Website: //button[@data-item-id="authority"]
- Phone number: //button[contains(@data-item-id,"phone")]
- Open time: //div[contains(@class,"open-hours")]
- Click "Apply" to save
If you want to learn more about XPath, please check the following tutorial:
5) Extract from page-level data - to extract GPS coordinates (optional)
As many of you have requested, this step will teach you how to extract GPS coordinates data from Google Maps.
The coordinates are actually hidden in the page url. So first, we need to extract the page url in the loop.
- Click the "Add Custom Field" icon in the data preview section
- Select Page-level data and then Page URL
Next, we need to match out the coordinates from the page url with RegEx tool.
- Click the three dot of the Page URL data field and select "Clean data"
- Click "+ Add Step" and then "Match with Regular Expression"
Try the RegEx tool if you don't want to write regular expression yourself.
- Input the following parameters and tick "Match all“,
- Check the "Results" box to see if the data is in our desired format
- Click "Apply" to save the settings
6) Click Item - Click "Back to results"
Normally, we don't need to add this, but Google Maps is a special case. This is to help the task go back to the previous result page and continue to scrape the next page.
- Click the "Arrow" icon on the web page
- Choose the "Click" button on the Tips
- Extend the AJAX timeout as 7s-10s according to the network condition
- Drag the action in the workflow to the right spot
7) Start extraction - to run the task and get data
- Click "Save" to save the task
- Click "Run" on the upper left side
- Select "Run task on your device" to run the task on your computer
- Local runs are normally for testing. If you want the crawler to run at a higher speed, you can select "Run task in the Cloud" to run the task in the Cloud (for premium users only)
- Try Octoparse 14-day free premium trial!
Here is the sample output.