I need to extract data where the Xpath is not constant and the only way to constantly extract it correctly is to use regular expressions to find where the field is, and then extract data.

Comments

1 comment

  • Fergus

    For your future reference, since the title of these items remains unchanged, we can use the method introduced in this tutorial to revise the XPath:
    How to associate data with nearby text?

    To learn more about XPath, this tutorial can be very helpful: What is XPath and how to use it in Octoparse

    In the attached task, I have revised several data fields already, if you need to add more, you can just use the XPath below:

    When the title is just one word, replace the bold part:
    //*[contains(text(),'Type')]/../following-sibling::tr[1]//td[1]/div[1]

    When the title is two words, replace the bold part:
    //*[contains(text(),'Publisher') and contains(text(),'Location')]/../following-sibling::tr[1]//td[1]/div[1]

    So on and so forth.

    0
    Comment actions Permalink

Please sign in to leave a comment.