You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!
Some websites (like Trustpilot) store ratings in HTML attributes rather than plain text. Here’s how to extract them:
Example Page:
📌 Trustpilot Review - Airforce Gift Shophttps://www.trustpilot.com/review/airforcegiftshop.co.uk
There are two ways to fetch the star rating info.
Method 1: Extract Attributes from HTML
✅ Best for
Simple rating extraction from alt
, src
, or other attributes.
Steps:
Select the star rating element on the page.
In the Tips panel, choose:
Click Extract Data → "..." (More Options) → Customize Field.
Select "Other Attributes" → Pick
alt
orsrc
.Preview & confirm the extracted value (e.g.,
alt="5 stars"
).
Method 2: Extract & Clean HTML with RegEx
✅ Best for
Complex cases where ratings are buried in HTML.
Steps:
Select the rating then choose OuterHtml.
Click Extract Data → "..." → Clean Data.
Add Step → Match with Regular Expression (RegEx).
For Octoparse Version 8.8.0 and later
Click Need help with RegEx? Try our RegEx tools!
For each test string, manually highlight only the text you want to match.
Click Generate. The AI will analyze your examples and propose a RegEx pattern.
Click Test to verify the pattern works against all your samples.
Click Apply & Save, give your pattern a name, and confirm.
Before version 8.8.0