As the king of navigation apps, Google Maps started out just offering an easy way to get directions from one place to another but has slowly evolved into an interactive global database overflowing with some of the most valuable business information available on the internet.
However, if you are a business owner wanting to extract reviews for businesses or places from Google Maps, you'll soon find out that the official way of getting reviews through Google Places API is limited to 5 reviews, which is barely enough even for the simplest task. But with Octoparse 8.4, you can build your own crawler and scrape an unlimited number of reviews for businesses or places directly from Google Maps within minutes.
In this tutorial, we will guide you through the steps to design your own task workflow for Google Maps reviews.
For demonstration purposes, we will scrape Google Maps reviews for Tesla's Gigafactory 1. See the sample URL below:
Here are major steps that will be mentioned in this tutorial:
- Create a Go to Web Page - to open the target web page
- Create a Click Item - to go to the “All reviews” page
- Create a Loop item with Partial Scroll - to scroll down the review column
- Extract Data in the Loop - to select the data for extraction
- Clean the data fields - to refine data
- Run the task - to get your desired data
1. Create a Go to Web Page - to open the target web page
Every workflow in Octoparse starts by telling Octoparse a web page to start from.
- Enter the sample URL into the search bar at the top of the home screen and click Start".
You can also enter the URL by creating the task in advanced mode.
- Find the + New button on the sidebar, click it and then select Advanced Mode.
- Manually input the sample URL into the website box and click Save to start
2. Create a Click Item - to go to the "All reviews" page
- Click on "600 reviews" that will direct you to the "All reviews" page and select Click button to generate a Click Item action in your workflow
- Set AJAX timeout to 15s or longer
Now we have reached the page that hosts reviews.
3. Create a Loop item with Partial Scroll - to scroll down the review column
You will find that the new page has multiple scroll bars and the reviews you want are inside a scrollable column on the left. The page won't load more reviews unless you scroll inside the left column, therefore we need to set up a loop Item with a partial scroll for our workflow to scroll and extract at the same time.
- Add a Loop Item step to your workflow
- Click on Loop Item, set loop mode to Scroll Page and change the scroll area from Default to Partial
Enter scroll area XPath to tell Octoparse where to scroll
Input the XPath directly if you know how to write an XPath. Check out this article to embark on your journey to become an XPath master.
Don't know how to write an XPath yourself? Don't worry, you're not alone. Thanks to the latest update, you can now simply click the icon and select the entire scroll area manually from the web page. Octoparse will automatically generate an XPath for you.
- Choose between scrolling "to the bottom of the page" or "for one screen"
- Set scroll repeats (how many times you want to scroll)
- Set a wait time (interval time between each scroll)
- Click "Apply" to save your settings
Now we have successfully set up a partial scroll loop.
4. Extract Data in the Loop - to select the data for extraction
This step is quick and easy with Octoparse's innovative auto-detect function.
- Click Auto-detect web page data in the Tips box and wait for it to complete
- Rename the data fields you want and remove the ones you don't
In this case, we want to extract the data like reviewer name, review date, review count, review content and the number of likes each review gets.
- Confirm settings inside the Tips box and click Create workflow
- Make sure the loop item you create (should be named Loop Item 1 by default) is put inside the previous loop item.
5. Clean the data fields - to refine data
You may note that some data in the review count column has a useless dot in front of them. Use Clean data to delete the spare dots.
- Click on the three dots for more options for data fields
- Click on Clean data
- Click + Add Step and select the Replace option
- Input a dot in the Replace bar and replace it with blank (just leave the “With” bar blank)
- Click Evaluate to see if we have got the desired result
- Click Confirm to apply the change
6. Run the task - to get your desired data
- Click Save on the upper right to save your task
- Click Run and wait for a Run Task window to pop up
- Select Run on your device to run the task on your local device
- Wait for the task to complete
Here is the sample output from a local run.
If you have further issues with the task or have a suggestion that would make this a better resource for you, we’d love to hear about it. Submit a request here.