Skip to main content

XPath Cheatsheet for Web Scraping with Octoparse

Updated over 3 months ago

XPath (XML Path Language) is a powerful tool for navigating and selecting elements in HTML/XML documents. This cheatsheet provides quick-reference syntax and examples to help you write precise XPath queries for accurate data extraction in Octoparse.

1. Basic XPath Syntax

Syntax

Example

Description

//tag

//div

Selects all <div> elements

//tag[@attr]

//a[@href]

Selects all <a> tags with a href attribute

//tag[@attr="value"]

//input[@type="text"]

Selects <input> elements where type="text"

/

/html/body/div

Absolute path (avoids ambiguity)

.

./span

Selects from the current node

..

../div

Selects the parent node


2. Advanced XPath Techniques

A. Dynamic Element Selection

Use Case

XPath Example

Contains text

//p[contains(text(), "price")]

Starts-with

//a[starts-with(@href, "https")]

Logical OR

//div[@class="A" or @class="B"]

Position-based selection

(//ul/li)[1] (1st <li>)

B. Handling Tables & Lists

//table/tr[2]/td[3] (3rd column of 2nd row) //ul/li[position()<=3] (First 3 list items)

C. Wildcards & Axes

Syntax

Example

*

//div/* (all child elements)

following-sibling

//h2/following-sibling::p (<p> after <h2>)


3. Octoparse-Specific Tips

Best Practices

Use relative XPath (e.g., //div[@class="result"]) over absolute paths (like /html/body/div[1]) for reliability.
Test XPath in Browser DevTools (Press F12Console tab → $x("//your_xpath")).
Combine with Octoparse’s Auto-Detect to validate selections.

Common Fixes

🔧 Broken XPath? Try:

  • normalize-space(): //p[normalize-space()="Hello"] (ignores extra spaces)

  • contains() for dynamic classes: //div[contains(@class, "product-")]


4. Quick Reference Table

Scenario

XPath Solution

Extract all links

//a/@href

Click "Next" button

//a[contains(text(), "Next")]

Select dropdown options

//select/option

Avoid hidden elements

//div[not(contains(@style, "display:none"))]

Did this answer your question?