Google Play is a good website to collect reviews of mobile applications. The reviews can help users to make a better choice about which app to use, or push the developers to improve their apps.
In this tutorial, we are going to scrape the reviews of applications from Google Play.
You could visit our easy-to-use "Task Template" on the home screen of the Octoparse. All you need is to type in several parameters and the task is ready to go. For further details, please check it out here: Task Templates
To follow through, you may want to use this URL in the tutorial:
We will scrape data such as username, review time, and review content from each APP details page with Octoparse.
The website applies an infinite scroll coupled with a "Show More" to load more reviews. After we scroll the page to the bottom like 4 times, a "show more" button would reveal and if we want to continue to load reviews, we have to click on the button, then scroll down the page to the bottom 4 times again.
Here are the main steps in this tutorial: [Download task file here]
- "Go To Web Page" - open the target web page
- Auto-detect the web page data - create the workflow
- Loop click "Show More" button - load more reviews
- Modify the XPath of the Loop Item1 - locate the "Show More" button precisely
- Run extraction - run your task and get data
1. Go To Web Page - open the target web page
- Enter the page URL on the home screen and click Start
2. Auto-detect the web page data - create the workflow
- Choose "Auto-detect the web page data"
- Wait for the detection to complete
- Check the data fields on the Data Preview, and you can also delete the unwanted fields or rename fields if needed
- Click "Create workflow" on the Tips panel
- Modify the "Go to Web Page1" by adding "Scroll down the page after it is loaded"
3. Loop click "Show More" button - load more reviews
- Choose "Click on a 'Load More' button" on the Tips panel
- Select the "SHOW MORE" button on the web page
We need to select the large block of the show more button first. If we select the button directly, Octoparse may click it instead of select it.
Then we click the arrow right to the last DIV tag and choose the DIV on the pop-up.
- Set up the number of clicks according to how many reviews you need
- Click "Confirm"
- Click open the settings of "Click on a Load More button"
- Set up AJAX Load as 4s
4. Modify the XPath of the Loop Item1 - locate the "Pagination" button precisely
- Click open the settings of "Loop Item1"
- Enter the XPath //span[text()='Show More']/..
- The final workflow should be like:
If you want to learn more about XPath, please check the following tutorial:
5. Run extraction - run your task and get data
- Click "Run" on the upper left side
- Select "Run on your device" to run the task on your computer, or select "Run in the Cloud" to run the task in the Cloud (for premium users only)
Here is the sample output.
Is this article helpful? Contact us any time if you need our help!