Skip to main content

How to scrape gtin number from a web page source code

Updated over a year ago

GTIN, or Global Trade Item Number, is a unique identifier for a product that is used in retail and e-commerce. In many cases, the GTIN number can only be found from the source code of a web page. In this article, we will discuss how to scrape a GTIN number from a web page source code.

Step 1. Find the GTIN number by inspecting the web page in your regular browser

In most browsers, you can access the developer tools by right-clicking on the web page and selecting "Inspect" or "Inspect Element."

Once the developer tools are open, you can use the search function to search for the GTIN number.

Take this page as an example: https://www.yeppon.it/p-hotpoint-fi7-871-sh-1239165. You can find the GTIN number is inside a <script> tag.


Step 2: Write an XPath to locate the tag that contains the GTIN number

We can easily use text() to write the XPath: //script[contains(text(),'gtin')]


Step 3. Create a field to get the whole text information

  • Click on the Add Custom Field and choose Capture data on the page

  • Input the XPath and Apply


Step 4. Clean data to match out the GTIN number

Click on More and select Clean data

  • Add a step of Match with Regular Expression

  • Try Regular Expression Tool

  • Find the string before and after the GTIN number we want and input the string to generate an expression

  • Apply the expression and confirm the settings and we will get the number correctly

Did this answer your question?