Google Play is a good website for collecting reviews of mobile applications. The reviews can help users to make a better choice about which app to use or push the developers to improve their apps.

In this tutorial, we will scrape the applications' reviews from Google Play.

You could visit our easy-to-use "Task Template" on the home screen of the Octoparse. All you need to do is type in several parameters, and the task is ready to go. For further details, please check it out here: Task Templates

mceclip0.png

To follow through, you may want to use this URL in the tutorial:

https://play.google.com/store/apps/details?id=com.target.ui&hl=en&showAllReviews=true

We will scrape data such as reviewer name, post time, and review content from each APP details page with Octoparse.

The website applies an infinite scroll coupled with a "Show More" to load more reviews. After we scroll the page to the bottom like 4 times, a "show more" button would reveal, and if we want to continue to load reviews, we have to click on the button, then scroll down the page to the bottom 4 times again.

Here are the main steps in this tutorial: [Download task file here]

  1. Go To Web Page - to open the target web page

  2. Auto-detect the web page data - to create the workflow

  3. Modify the "Go to Web Page" by adding "Scroll down the page after it is loaded"

  4. Modify the XPath of the Loop Item - locate the "Show More" button precisely

  5. Add a Branch - to scrape full reviews

  6. Run extraction - to run your task and get data

1. Go To Web Page - to open the target web page

  • Enter the page URL on the home screen and click Start

gp1.png

2. Auto-detect the web page data - to create the workflow

  • Choose Auto-detect the web page data

  • Wait for the detection to complete

mceclip3.gif
  • Uncheck Add a page scroll

  • Click Create workflow in the Tips window

mceclip16.png
  • Check the data fields in Data Preview section, and you can also delete the unwanted fields or rename fields if needed

mceclip0.gif

3. Modify the "Go to Web Page" by adding "Scroll down the page after it is loaded"

  • Click Go to Web Page in the workflow

  • Go to Options

  • Tick Scroll down the page after it is loaded and set the number of Scroll Repeats as 4, wait time as 4s (as the load more button will show after we scroll 4 times to the bottom)

  • Click Apply

scroll_down.jpg

4. Modify the XPath of the Loop Item - to locate the "Show More" button precisely

  • Click on Loop Item

  • Enter the XPath //span[text()='Show More']/..

modify_Xpath.jpg
  • Click on Click on a Load More button

  • Set up AJAX Load as 5s

AJAX.jpg

The final workflow should be like this:

mceclip10.png

Tip: If you want to learn more about XPath, please check the following tutorial:

What is XPath and how to use it in Octoparse

5. Add a Branch - to scrape full reviews

You may have noticed that some reviews are not displayed completely on the web page since the text is too long. You need to click on the Full Review button to display the complete review text.

Full_reviews.jpg

In this case, when some reviews show the complete text while some others do not, we can use a Branch to deal with it. The branch is used when there are different page formats or element formats to be scraped in one workflow. (You can jump to Step 6 Run extraction - run your task and get data if it doesn't matter if you're getting incomplete reviews.)

  • Move the cursor over inside the Loop Item1

  • Click on the plus button

  • Choose Branch Conditions

Add.jpg
  • Click on the left branch

  • Choose Execute if the current loop contains specific text

  • Input the text Full Review

  • Click Apply to save

branch_condition.jpg
  • Copy Extract Data action

copy_action.jpg
  • Click on the left branch and paste it

paste.jpg

Now we need to modify the XPath of the review content to make it scrape the full review.

  • Click on this copied Extract Data

  • Go to Data Preview

  • Click More button of the review content field and choose Customize XPath

Customize_XPath.jpg
  • Input the XPath //span[@style="display: none;"]

input_Xpath.jpg
  • Click Apply to save

Note: You only to modify the XPath of the field in the Extract Data in the left branch.

  • Drag the other Extract Data to the right branch

The whole workflow looks like this:

workflow.jpg

6. Run extraction - run your task and get data

  • Click Save

  • Click Run on the upper right side

  • Select Run on your device to run the task on your computer, or select Run in the Cloud to run the task in the Cloud (for premium users only)

mceclip11.png

Here is the sample output.

mceclip12.png
Did this answer your question?