You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

In this tutorial, we will show you how to scrape customer reviews from Trustpilot.com, which is a consumer review website hosting reviews of businesses worldwide.

We will use the link below to scrape consumers' reviews of Bank of America:

https://www.trustpilot.com/review/www.bankofamerica.com

In this case, we are going to scrape all the information including username, the total number of reviews posted, location, rating, date posted, title, and review contents, as shown below.

The main steps are shown in the menu on the right, and you can download the sample task file here.

1. "Go to Web Page" - to open the target webpage

Paste the link into the URL input bar and hit Start

Click on Start

2. Set up the pagination loop - to scrape data from multiple pages

Scroll down to the end of the page and click Next page
Select Loop click on the Tips panel

Set AJAX timeout for 5s (Optional setting depends on your local network speed, 5-10s are recommended)

Click Apply to save settings

3. Modify the XPath of Pagination

The auto-generated XPath does not work well. We can modify the XPath for the Pagination to make sure we scrape all the pages.

Click on Pagination
Replace the XPath with //a[@name="pagination-button-next"]
Click Apply to save

4. Set up a loop item - to loop extract reviews

Select the first review block
We have to make sure the whole block of the review is selected, which means the whole review block has been highlighted in green, with all the sub-elements, such as title, username, date, etc. in red, to ensure the precise positioning in the following section.
Select the second review block
Click Text after the entire review section is selected
After the Loop item is created, drag it into pagination. The workflow should look like this:

5. "Extract Data" - Select the data needed

Click the data needed (e.g., username) from the first block of the review section
choose Text on the Tips panel

Do the same to scrape other information like review content, review title, etc

6. Modify data fields - to rename, delete and refine data

Delete unwanted fields by clicking on the More button and choosing Delete field

Renaming the fields by double-clicking the header

For scraping ratings, it is a little bit complicated, but you can follow the steps below.

Click on the rating info and choose OuterHtml

After extracting the HTML code from the rating, we have to change its XPath since the auto-generated one is not working properly.

Click More on the rating data field and choose Customize XPath

Click Relative XPath to the loop item and paste //img[contains(@alt,"Rated")]

Click Apply

Click Customise field - Select other attributes - alt, to extract the designated attribute from HTML code

You may notice the Data Posted field is shown as "X days ago", which is hard for us to know the exact date. In this case, we want it to be in the "year/month/day" format. Therefore, we need to conduct Customize field and Clean data to modify the extracted content.

Click Customise field - Extract attribute - datetime, to extract the designated attribute from HTML code

Click Clean data - Add step - Reformat extracted date/time to modify content

Tip: To learn more about data cleaning, please click on the following titles:

7. "Run extraction" - Run your task and collect data

Click Save and Run on the top right corner
Select Run on your device or Run in the Cloud to run the task in the cloud (for premium users only).

Here is the output sample for your information:

Scrape articles from Medium

Scrape customer reviews from Tripadvisor

Scrape reviews from Booking

Troubleshooting Common Octoparse Scraping Issues

Scrape customer reviews from Trustpilot