Google Maps is not just a map website to help you find the location but also a rich database where you can gain lots of business insights. Many scrape Google Maps data to aggregate their business directory or build a business lead base.
This tutorial will guide you on how to get business information from Google Maps.
For Google Maps scraping, you can use our ready-to-use Task Template available on the home page or follow this tutorial to build the task from scratch.
With the template(s), you just need to enter a keyword (e.g., Accounting, NY) or a web page URL (e.g., https://www.google.com/maps/search/insurance+West+University+Place,+TXfirstname.lastname@example.org,-95.4987615,10z/data=!3m1!4b1) and then wait for the data to come out.
Here is the template data sample for your reference. To try out the template, you can apply for a 14-day premium trial to get started: Try Octoparse 14-day free premium trial!
If you want to learn how to set up the crawler on your own, you may continue with this tutorial.
We will scrape the data fields: Title, Review number, Review rating, Address, Phone, Website, Open time.
Here are the main steps in this tutorial: [Download task file here]
1. Go To Web Page - to open the targeted web page
Enter the example URL into the search bar and click "Start"
You can enter several URLs into the bar if you have many URLs to scrape.
Click "Go To Web Page" and set a longer timeout at the bottom such as "120s", then click "Apply."
2. Create a pagination loop - to scrape all the results from multiple pages
Click the next page button ">"
Click "Loop click single element" on the Tips panel
After the above actions, a "Pagination" is created in the workflow. You can click the pagination box, and the "Click to Paginate" to test if it works well to paginate to the next page.
The default XPath for the Pagination works well in most cases, but there is a problem scraping the last page data. In this case, you may need to revise the XPath for the "Pagination."
Click the "Pagination" step on the workflow
Copy and paste the revised XPath into the "Matching Xpath" text box: //button[contains(@jsaction,"pane.paginationSection.nextPage")][not(contains(@class,"button-disabled"))]
Additional action - AJAX setting for "Click to Paginate"
Sometimes, the web page may take longer to load. You can modify the AJAX timeout based on the network conditions.
Click "Click to Paginate" on the workflow
Adjust AJAX Timeout to "7s" or longer
3. Create a "Loop Item" - to loop click items on the list
Click the first company block on the list
Select "Click URL" on the Tips panel
Set AJAX timeout for 5-10s
Then go to the bottom of the workflow and click "Options"
Uncheck "Open in a new tab" and click "apply"
Click the name on the first block of the list shown on the page
Click "Select all"
Click "Loop click each element"
Set up the AJAX to 5s-10s
4. Extract data - to select the data for extraction
Select the information you want on the web page
Select "Extract the text of the element"
Please note that Google is quite strict with data scraping and has a very hard-to-read source code, so we need to revise the element XPath for each data field to ensure precisely scraping.
No worries! We have prepared all you need for you. You can just use the element XPath provided below.
Go to the data preview and click "More"
Click "Customize Xpath"
Replace the default XPath with the revised one.
You can choose based on your scraping needs. XPath is to match elements that can be found on the web page.
Number of review: //button[@jsaction="pane.rating.moreReviews"]
Review rating: //span[@class="section-star-display"]
Phone number: //button[contains(@data-item-id,"phone")]
Open time: //div[contains(@class,"open-hours")]
Click "Apply" to save
TIP: check out more on XPath: What is XPath and how to use it in Octoparse
5. Extract from page-level data - to extract GPS coordinates (optional)
As many of you have requested, this step will teach you how to extract GPS coordinates data from Google Maps.
The coordinates are hidden in the page URL. So first, we need to extract the page URL in the loop.
Click the "Add Custom Field" icon in the data preview section
Select Page-level data and then Page URL
Next, we need to match out the coordinates from the page URL with RegEx tool.
Click the More button of the Page URL data field and select "Clean data"
Click "+ Add Step" and then "Match with Regular Expression"
Try the RegEx tool if you don't want to write regular expressions yourself.
Input the following parameters and tick "Match all, “
Check the "Results" box to see if the data is in our desired format
Click "Apply" to save the settings
6. Click Item - Click "Back to results"
Normally, we don't need to add this, but Google Maps is a special case. This is to help the task go back to the previous result page and continue to scrape the next page.
Click the "Arrow" icon on the web page
Choose the "Click" button on the Tips
Extend the AJAX timeout as 7s-10s according to the network condition
Drag the action in the workflow to the right spot
7. Start extraction - to run the task and get data
Click "Save" to save the task
Click "Run" on the upper left side
Select "Run task on your device" to run the task on your computer
Local runs are normally for testing. If you want the crawler to run at a higher speed, you can select "Run task in the Cloud" to run the task in the Cloud (for premium users only)
Here is the sample output.