Lazyloaded images on Gumtree
I am trying to extract the listing information from Gumtree.co.za and all the data is extracted fine, except for the image URLs. You can look at https://www.gumtree.co.za/s-automotive-vehicles/free-state/v1c5l3100236p1?q=hilux to see how the site works
What I can tell is that they use lazyloading for the images, so you first have to scroll down over the image before it is loaded.
I have setup the page load to scroll down a few screens (scroll down = 30, interval = 0.5, scroll way = scroll down for one screen) before the scraping starts. This then at least loads the images into the browser, but the data is still not correct. Only the first image is found and then the same image path is "scraped" for the records after that first listing.
Is there a better way to achieve the image path extraction or do I need to configure something else also to make this work?
-
Official comment
Hi raybez,
Thank you for reaching out.
I just checked the website, the image URLs should all be scrapable with Octoparse. setting up scroll down is a smart move, if the image still needs some time to be fully loaded after that, we can set up "wait time" in the workflow. As you mentioned, the task is getting only the URL of the first image repetitively, it's probably because we didn't tick "use loop" or the XPath for the data field is incorrect. To further check the issue, could you please submit a ticket through https://helpcenter.octoparse.com/hc/en-us/requests/new with the task attached? So we can help you revise it.
Please follow the instructions in this tutorial to export the task: How to export a task?
Cheers,
Comment actions
Please sign in to leave a comment.
Comments
1 comment