Octoparse tracks data with XPath but data can change location within a web page. To tackle this, we will show you how you can extract data more accurately by associating it with a text nearby.
First, let’s look at an example of when this technique can be useful.
- Now, open the page in Chrome, right-click to inspect the target data
- Notice the actual words of "Item Weight" can be found within the <th> tag while its associated value is found within the <td> tag right below it.
- Once we see the pattern, we can write an XPath to look for the value of "Item Weight" relative to where we will actually find the words: "//th[contains(text(),'Item Weight')]/following-sibling::td" - This XPath expression is telling the program to look for the <th> tag containing the text of "Item Weight" then find the first <td> tag located right below it. And this will give exactly what we want, the associated value of "Item Weight".
- Input the new XPath to the text box for "Matching XPath", click "OK" to save the settings.
Following-sibling is very often used for finding an element located next to another designated element.
Learn more about XPATH here!
If you need any help with task configuration or data collection, submit a ticket to our support team! We'll get back to you soon.
Tutorial en español: Localizar y scrapear un elemento a través del texto cercano
También puedes leer más tutoriales de web scraping en sitio web oficial