Google Play is a good website to collect reviews of mobile applications. The reviews can help users to make a better choice about which app to use, or push the developers to improve their apps.
In this tutorial, we are going to scrape the reviews of applications from Google Play.
You could visit our easy-to-use "Task Template" on the home screen of the Octoparse. All you need is to type in several parameters and the task is ready to go. For further details, please check it out here: Task Templates
To follow through, you may want to use this URL in the tutorial:
We will scrape data such as username, review time, and review content from each APP details page with Octoparse.
The website applies an infinite scroll coupled with a "Show More" to load more reviews. After we scroll the page to the bottom like 4 times, a "show more" button would reveal and if we want to continue to load reviews, we have to click on the button, then scroll down the page to the bottom 4 times again.
Here are the main steps in this tutorial: [Download task file here]
- "Go To Web Page" - open the target web page
- Auto-detect the web page data - create the workflow
- Loop click "Show More" button - load more reviews
- Modify the XPath of the Loop Item1 - locate the "Show More" button precisely
- Run extraction - run your task and get data
1. Go To Web Page - open the target web page
- Enter the page URL on the home screen and click Start
2. Auto-detect the web page data - create the workflow
- Choose "Auto-detect the web page data"
- Wait for the detection to complete
- Check the data fields on the Data Preview, and you can also delete the unwanted fields or rename fields if needed
- Click "Edit" under the "Add page scroll" option on the Tips panel
- Set up the wait time as 4-5 seconds (make sure the time is long enough for the page to load new reviews)
- Click "Create workflow" on the Tips panel
3. Loop click "Show More" button - load more reviews
- Choose "Click on a 'Load More' button" on the Tips panel
- Select the "SHOW MORE" button on the web page
We need to select the large block of the show more button first. If we select the button directly, Octoparse may click it instead of select it.
Then we click the arrow right to the last DIV tag and choose the DIV on the pop-up.
- Set up the number of clicks according to how many reviews you need
- Click "Confirm"
- Click open the settings of "Click on a Load More button"
- Set up AJAX Load as 4s
- Set up "scroll to the bottom of the page", repeats 4 times, and wait for 4s between each scroll
- Click "OK" to confirm
4. Modify the XPath of the Loop Item1 - locate the "Show More" button precisely
- Click open the settings of "Loop Item1"
- Enter the XPath //span[text()='Show More']/..
If you want to learn more about XPath, please check the following tutorial:
5. Run extraction - run your task and get data
- Click "Run" on the upper left side
- Select "Run on your device" to run the task on your computer, or select "Run in the Cloud" to run the task in the Cloud (for premium users only)
Here is the sample output.
Is this article helpful? Contact us any time if you need our help!