GTIN, or Global Trade Item Number, is a unique identifier for a product that is used in retail and e-commerce. In many cases, the GTIN number can only be found from the source code of a web page. In this article, we will discuss how to scrape a GTIN number from a web page source code.
Step 1. Find the GTIN number by inspecting the web page in your regular browser
In most browsers, you can access the developer tools by right-clicking on the web page and selecting "Inspect" or "Inspect Element."
Once the developer tools are open, you can use the search function to search for the GTIN number.
Take this page as an example: https://www.yeppon.it/p-hotpoint-fi7-871-sh-1239165. You can find the GTIN number is inside a <script> tag.
Step 2: Write an XPath to locate the tag that contains the GTIN number
We can easily use text() to write the XPath: //script[contains(text(),'gtin')]
Step 3. Create a field to get the whole text information
Click on the Add Custom Field and choose Capture data on the page
Input the XPath and Apply
Step 4. Clean data to match out the GTIN number
Click on More and select Clean data
Add a step of Match with Regular Expression
Try Regular Expression Tool
Find the string before and after the GTIN number we want and input the string to generate an expression