I'm wondering why the local run returns me 64 records and the cloud only 48 (see picture)


1 comment

  • Fergus

    For your future reference, we'd suggest you not to select "Disable image loading" or "Block ads" in the settings since it might lead to the webpage loading unproperly in the cloud:

    And I noticed that for the data field "product URL", we used regular expression tool to get the clean URL, which can get the job done of course, but if we inspect the source code of the page, we can see the target URL is accrually locates here:

    So we can use this XPath "//h2/a" to get to the target area first, then choose to extract the value of "title" directly as the image below shows:


    Besides, the cloud extraction is getting fewer results, it might be because this webpage task a while to fully load the content, so I add "wait time" in the workflow to avoid missing data. 

    Here are the related tutorials that you might find helpful:
    Set up wait time
    What is XPath and how to use it in Octoparse

    Comment actions Permalink

Please sign in to leave a comment.