Scrape reviews from Google Play
FollowGoogle Play is a good website to collect reviews of mobile applications. The reviews can help users to make a better choice about which app to use, or push the developers to improve their apps.
In this tutorial, we are going to scrape the reviews of applications from Google Play.
You could visit our easy-to-use "Task Template" on the home screen of the Octoparse. All you need to do is type in several parameters and the task is ready to go. For further details, please check it out here: Task Templates
To follow through, you may want to use this URL in the tutorial:
https://play.google.com/store/apps/details?id=com.target.ui&hl=en&showAllReviews=true
We will scrape data such as reviewer name, post time, and review content from each APP details page with Octoparse.
The website applies an infinite scroll coupled with a "Show More" to load more reviews. After we scroll the page to the bottom like 4 times, a "show more" button would reveal and if we want to continue to load reviews, we have to click on the button, then scroll down the page to the bottom 4 times again.
Here are the main steps in this tutorial. You can download the demo task here.
- Go To Web Page - to open the target web page
- Auto-detect the web page data - to create the workflow
- Modify the "Go to Web Page" by adding "Scroll down the page after it is loaded"
- Modify the XPath of the Loop Item - locate the "Show More" button precisely
- Add a Branch - to scrape full reviews
- Run extraction - to run your task and get data
1. Go To Web Page - to open the target web page
- Enter the page URL on the home screen and click Start
2. Auto-detect the web page data - to create the workflow
- Choose Auto-detect the web page data
- Wait for the detection to complete
- Uncheck Add a page scroll
- Click Create workflow in the Tips window
- Check the data fields in Data Preview section, and you can also delete the unwanted fields or rename fields if needed
3. Modify the "Go to Web Page" by adding "Scroll down the page after it is loaded"
- Click Go to Web Page in the workflow
- Go to Options
- Tick Scroll down the page after it is loaded and set the number of Scroll Repeats as 4, wait time as 4s (as the load more button will show after we scroll 4 times to the bottom)
- Click Apply
4. Modify the XPath of the Loop Item - to locate the "Show More" button precisely
- Click on Loop Item
- Enter the XPath //span[text()='Show More']/..
- Click on Click on a Load More button
- Set up AJAX Load as 5s
The final workflow should be like this:
5. Add a Branch - to scrape full reviews
You may have noticed that the some reviews are not displayed completely on the web page since the text is too long. You need to click on Full Review button to display the complete review text.
In this case, some reviews show the complete text while some others do not, we can use a Branch to deal with it. Branch is used when there are different page formats or element formats to be scraped in one workflow. (You can jump to Step 6 Run extraction - run your task and get data if it doesn't matter if you're getting incomplete reviews.)
- Move the cursor over inside the Loop Item1
- Click on the plus button
- Choose Branch Conditions
- Click on the left branch
- Choose Execute if the current loop contains specific text
- Input the text Full Review
- Click Apply to save
- Copy Extract Data action
- Click on the left branch and paste it
Now we need to modify the XPath of the review content to make it scrape the full review.
- Click on this copied Extract Data
- Go to Data Preview
- Click More button of the review content field and choose Customize XPath
- Input the XPath //span[@style="display: none;"]
- Click Apply to save
Note: You only to modify the XPath of the field in the Extract Data in the left branch.
- Drag the other Extract Data to the right branch
The whole workflow looks like this:
6. Run extraction - run your task and get data
- Click Save
- Click Run on the upper right side
- Select Run on your device to run the task on your computer, or select Run in the Cloud to run the task in the Cloud (for premium users only)
Here is the sample output.
If you still have questions, you are welcome to submit a request here. Our support team will get back to you ASAP.
Author: Kisad
Editor: Yina