You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

As the king of navigation apps, Google Maps started out just offering an easy way to get directions from one place to another but has slowly evolved into an interactive global database overflowing with some of the most valuable business information available on the internet.

However, if you are a business owner wanting to extract reviews for businesses or places from Google Maps, you'll soon find out that the official way of getting reviews through Google Places API is limited to 5 reviews, which is barely enough even for the simplest task. But with Octoparse 8.4, you can build your own crawler and scrape an unlimited number of reviews for businesses or places directly from Google Maps within minutes.

In this tutorial, we will guide you through the steps to design your own task workflow for Google Maps reviews.

NOTE: This tutorial is for version 8.4 only, as the task requires the newly-added Partial Scroll feature. If you are on an older version of Octoparse, we strongly recommend that you update to the latest version to enjoy this powerful new feature as well as a newly designed task edit interface.

For demonstration purposes, we will scrape Google Maps reviews for Tesla's Gigafactory 1. See the sample URL below:

https://www.google.com/maps/place/Tesla+Gigafactory/@39.5375591,-119.4412284,17z/data=!3m1!4b1!4m5!3m4!1s0x80991fc240ba30b9:0x7e66b0fa4fe55cd8!8m2!3d39.537555!4d-119.4390397?hl=en

Here are the major steps that will be mentioned in this tutorial: [Download task file here]

  1. Create a Go to Web Page - to open the target web page

  2. Create a Click Item - to go to the “All reviews” page

  3. Create a Loop item with Partial Scroll - to scroll down the review column

  4. Extract Data in the Loop - to select the data for extraction

  5. Clean the data fields - to refine data

  6. Run the task - to get your target data

TIP: You can download the demo task file at the bottom of the article. Import it into Octoparse and compare it with your own to see if you have done anything wrong.


1. Create a Go to Web Page - to open the target web page

Every workflow in Octoparse starts by telling Octoparse a web page to start from.

  • Enter the sample URL into the search bar at the top of the home screen and click Start".

google_maps.jpg

You can also enter the URL by creating the task in advanced mode.

  • Find the + New button on the sidebar, click it and then select Advanced Mode.

2.png
  • Manually input the sample URL into the website box and click Save to start

Either way, check if a Go to Web Page action has been generated in your workflow. If you have more than one URL, check this article to see how Octoparse handles a list of URLs.


2. Create a Click Item - to go to the "All reviews" page

  • Click on "600 reviews" that will direct you to the "All reviews" page and select Click button to generate a Click Item action in your workflow

1.gif
  • Set AJAX timeout to 15s or longer

5.png

Now we have reached the page that hosts reviews.


3. Create a Loop item with Partial Scroll - to scroll down the review column

You will find that the new page has multiple scroll bars and the reviews you want are inside a scrollable column on the left. The page won't load more reviews unless you scroll inside the left column, therefore we need to set up a loop Item with a partial scroll for our workflow to scroll and extract at the same time.

  • Add a Loop Item step to your workflow

2.gif
  • Click on Loop Item, set loop mode to Scroll Page and change the scroll area from Default to Partial

3.gif
  • Enter scroll area XPath to tell Octoparse where to scroll

Input the XPath directly if you know how to write an XPath. Check out this article to embark on your journey to become an XPath master.

Don't know how to write an XPath yourself? Don't worry, you're not alone. Thanks to the latest update, you can now simply click the icon

mceclip0.png

and select the entire scroll area manually from the web page. Octoparse will automatically generate an XPath for you.

4.gif

TIP: Adjust the selected area delicately to make sure you have selected the entire scrollable area (including the scroll bar). The auto-generated XPath and even the ones you write on your own won't work well all the time. Learn from trial and error!

  • Choose between scrolling "to the bottom of the page" or "for one screen"

  • Set scroll repeats (how many times you want to scroll)

  • Set a wait time (interval time between each scroll)

  • Click "Apply" to save your settings

7.png

Now we have successfully set up a partial scroll loop.


4. Extract Data in the Loop - to select the data for extraction

This step is quick and easy with Octoparse's innovative auto-detect function.

  • Click Auto-detect web page data in the Tips box and wait for it to complete

8.png

Note: If for some reason the Auto-detect fails to detect the list, you may also select multiple similar elements on the web page to tell Octoparse the pattern for selection. Check out this article to see how to set up a list extraction manually.

  • Rename the data fields you want and remove the ones you don't

10.png
11.png

In this case, we want to extract the data like reviewer name, review date, review count, review content and the number of likes each review gets.

9.png
  • Confirm settings inside the Tips box and click Create workflow

12.png
  • Make sure the loop item you create (should be named Loop Item 1 by default) is put inside the previous loop item.

13.png

5. Clean the data fields - to refine the data

You may note that some data in the review count column has a useless dot in front of them. Use Clean data to delete the spare dots.

  • Click on the three dots for more options for data fields

  • Click on Clean data

14.png
  • Click + Add Step and select the Replace option

15.png
  • Input a dot in the Replace bar and replace it with a blank (just leave the “With” bar blank)

  • Click Evaluate to see if we have got the desired result

  • Click Confirm to apply the change

16.png

6. Run the task - to get your target data

  • Click Save on the upper right to save your task

  • Click Run and wait for a Run Task window to pop up

  • Select Run on your device to run the task on your local device

  • Wait for the task to complete

Here is the sample output from a local run.

17.png
Did this answer your question?