Google Maps is not just a map website to help you find the location but also a rich database where you can gain lots of business insights. Many scrape Google Maps data to aggregate their business directory or build a business lead base.

This tutorial will guide you on how to get business information from Google Maps.

For Google Maps scraping, you can use our ready-to-use Task Template available on the home page or follow this tutorial to build the task from scratch.

mceclip0.png

With the template(s), you just need to enter a keyword (e.g., Accounting, NY) or a web page URL (e.g., https://www.google.com/maps/search/insurance+West+University+Place,+TX/@29.716598,-95.4987615,10z/data=!3m1!4b1) and then wait for the data to come out.

mceclip1.png

Here is the template data sample for your reference. To try out the template, you can apply for a 14-day premium trial to get started: Try Octoparse 14-day free premium trial!

mceclip2.png

If you want to learn how to set up the crawler on your own, you may continue with this tutorial.

Example URL: https://www.google.com/maps/search/insurance+West+University+Place,+TX/@29.716598,-95.4987615,10z/data=!3m1!4b1

We will scrape the data fields: Title, Review number, Review rating, Address, Phone, Website, Open time.

Here are the main steps in this tutorial: [Download task file here]

  1. Go to Web page - to open the targeted web page

  2. Create a pagination loop - to scrape all the results from multiple pages

  3. Create a "Loop Item" - to loop all the items on the current page

  4. Extract data - to select the data for extraction

  5. Extract from page-level data - to extract GPS coordinates (optional)

  6. Click Item - Click "Back to results"

  7. Start extraction - to run the task and get data


1. Go To Web Page - to open the targeted web page

  • Enter the example URL into the search bar and click "Start"

You can enter several URLs into the bar if you have many URLs to scrape.

mceclip3.png
  • Click "Go To Web Page" and set a longer timeout at the bottom such as "120s", then click "Apply."

306.png

2. Create a pagination loop - to scrape all the results from multiple pages

  • Click the next page button ">"

  • Click "Loop click single element" on the Tips panel

After the above actions, a "Pagination" is created in the workflow. You can click the pagination box, and the "Click to Paginate" to test if it works well to paginate to the next page.

The default XPath for the Pagination works well in most cases, but there is a problem scraping the last page data. In this case, you may need to revise the XPath for the "Pagination."

  • Click the "Pagination" step on the workflow

  • Copy and paste the revised XPath into the "Matching Xpath" text box: //button[contains(@jsaction,"pane.paginationSection.nextPage")][not(contains(@class,"button-disabled"))]

nm.gif

Additional action - AJAX setting for "Click to Paginate"

Sometimes, the web page may take longer to load. You can modify the AJAX timeout based on the network conditions.

  • Click "Click to Paginate" on the workflow

  • Click "Options"

  • Adjust AJAX Timeout to "7s" or longer

  • Click "Apply"

852.png

3. Create a "Loop Item" - to loop click items on the list

  • Click the first company block on the list

  • Select "Click URL" on the Tips panel

88.png
  • Set AJAX timeout for 5-10s

3.png
  • Then go to the bottom of the workflow and click "Options"

  • Uncheck "Open in a new tab" and click "apply"

28.png
  • Click the name on the first block of the list shown on the page

  • Click "Select all"

  • Click "Loop click each element"

  • Set up the AJAX to 5s-10s

loop_click.gif

4. Extract data - to select the data for extraction

  • Select the information you want on the web page

  • Select "Extract the text of the element"

69.gif

Please note that Google is quite strict with data scraping and has a very hard-to-read source code, so we need to revise the element XPath for each data field to ensure precisely scraping.

No worries! We have prepared all you need for you. You can just use the element XPath provided below.

  • Go to the data preview and click "More"

  • Click "Customize Xpath"

67.png
  • Replace the default XPath with the revised one.

You can choose based on your scraping needs. XPath is to match elements that can be found on the web page.

    • Title: //h1

    • Number of review: //button[@jsaction="pane.rating.moreReviews"]

    • Review rating: //span[@class="section-star-display"]

    • Category: //button[@jsaction="pane.rating.category"]

    • Address: //button[@data-item-id="address"]

    • Website: //button[@data-item-id="authority"]

    • Phone number: //button[contains(@data-item-id,"phone")]

    • Open time: //div[contains(@class,"open-hours")]

  • Click "Apply" to save

1.png

TIP: check out more on XPath: What is XPath and how to use it in Octoparse


5. Extract from page-level data - to extract GPS coordinates (optional)

As many of you have requested, this step will teach you how to extract GPS coordinates data from Google Maps.

The coordinates are hidden in the page URL. So first, we need to extract the page URL in the loop.

  • Click the "Add Custom Field" icon in the data preview section

  • Select Page-level data and then Page URL

1.jpg

Next, we need to match out the coordinates from the page URL with RegEx tool.

  • Click the More button of the Page URL data field and select "Clean data"

2.jpg
  • Click "+ Add Step" and then "Match with Regular Expression"

3.jpg

Try the RegEx tool if you don't want to write regular expressions yourself.

  • Input the following parameters and tick "Match all, “

  • Check the "Results" box to see if the data is in our desired format

  • Click "Apply" to save the settings

4.jpg

6. Click Item - Click "Back to results"

Normally, we don't need to add this, but Google Maps is a special case. This is to help the task go back to the previous result page and continue to scrape the next page.

  • Click the "Arrow" icon on the web page

05.png
  • Choose the "Click" button on the Tips

08.png
  • Extend the AJAX timeout as 7s-10s according to the network condition

881.png
  • Drag the action in the workflow to the right spot

17.gif

7. Start extraction - to run the task and get data

  • Click "Save" to save the task

  • Click "Run" on the upper left side

  • Select "Run task on your device" to run the task on your computer

    • Local runs are normally for testing. If you want the crawler to run at a higher speed, you can select "Run task in the Cloud" to run the task in the Cloud (for premium users only)

Here is the sample output.

mceclip3.png
Did this answer your question?