XPath (XML Path Language) is a powerful tool for navigating and selecting elements in HTML/XML documents. This cheatsheet provides quick-reference syntax and examples to help you write precise XPath queries for accurate data extraction in Octoparse.
1. Basic XPath Syntax
Syntax | Example | Description |
|
| Selects all |
|
| Selects all |
|
| Selects |
|
| Absolute path (avoids ambiguity) |
|
| Selects from the current node |
|
| Selects the parent node |
2. Advanced XPath Techniques
A. Dynamic Element Selection
Use Case | XPath Example |
Contains text |
|
Starts-with |
|
Logical OR |
|
Position-based selection |
|
B. Handling Tables & Lists
//table/tr[2]/td[3] (3rd column of 2nd row) //ul/li[position()<=3] (First 3 list items)
C. Wildcards & Axes
Syntax | Example |
|
|
|
|
3. Octoparse-Specific Tips
Best Practices
✔ Use relative XPath (e.g., //div[@class="result"]
) over absolute paths (like /html/body/div[1]
) for reliability.
✔ Test XPath in Browser DevTools (Press F12
→ Console
tab → $x("//your_xpath")
).
✔ Combine with Octoparse’s Auto-Detect to validate selections.
Common Fixes
🔧 Broken XPath? Try:
normalize-space()
://p[normalize-space()="Hello"]
(ignores extra spaces)contains()
for dynamic classes://div[contains(@class, "product-")]
4. Quick Reference Table
Scenario | XPath Solution |
Extract all links |
|
Click "Next" button |
|
Select dropdown options |
|
Avoid hidden elements |
|