Skip to main content

How to fix incorrect data extraction

Updated over 10 months ago

Before You Begin

This tutorial applies to the latest Octoparse version. For optimal performance, upgrade now if you're using an older release.


The Problem: Misdirected Data

When running tasks (locally or in the cloud), you might encounter:

  • Data extracted to the wrong columns

  • Missing data fields

Root Cause:
Faulty XPath expressions that fail to consistently locate target elements across pages.

Example:

This is the data we expected:

2022-06-15_11-34-16.jpg

This is the actual output. Note that not all the highlighted data is being extracted correctly.

2022-06-15_11-47-03.jpg


The Solution: XPath Correction

Step 1: Write a Robust XPath

Learn XPath fundamentals with our guide:
🔗 What is XPath and How to Use It in Octoparse?

Step 2: Update the XPath in Your Task

  1. Click More (···) next to the problematic data field

    2022-06-15_11-41-02.jpg
  2. Select Customize XPath

    2022-06-15_11-42-10.jpg
  3. Replace the existing XPath with your new expression

  4. Click Save

Step 3: Validate with Test Run

Always test updated tasks with Preview before full execution.

Tips

✔ Use relative XPath (not absolute) for dynamic pages
✔ Bookmark our XPath cheatsheet for common scenarios

Did this answer your question?