XPath is a language that allows you to locate specific elements from a page. Modifying XPath in Octoparse works very well with more flexibility and accuracy than the XPath auto-generated by clicking elements during the task configuration.
Octoparse allows you to customize element XPath so that we can precisely locate the data we are going to scrape. It helps to resolve data missing, page skipped, blank fields, and duplicates.
In this tutorial, we are going to show you how to customize element XPath.
Where can I modify XPath in Octoparse?
- Click on the Action Settings icon on the Extract Data button
- Click on Customize XPath of the field you want to modify
- Enter the new XPath in Matching XPath textbox
If the "Extract Data" is within a "Loop Item" to scrape information from the elements in the loop list, note to make sure the Relative XPath is ticked and then input the correct XPath.
For steps like "Loop Item" or "Pagination", you can easily find the XPath textbox under "Action Settings" too. Enter the new XPath and click "OK" to save your changes.
How to write XPath?
If you are new to XPath, you might need to grab some basics of HTML first. XPath locates elements based on the tags and attributes. So before you get down to write your own XPath, you would need to inspect the HTML structure of the page first. (More tutorials about HTML )
Then you can check this tutorial to learn more about XPath What is XPath and how to use it in Octoparse
Tutorial en español: Personalizar elemento XPath
También puedes leer más tutoriales de web scraping en sitio web oficial
If you need any help with task configuration or data collection, submit a ticket to our support team! We'll get back to you within 24 hours.