Extract website Hotel/Restaurant URL from tripadvisior.
Hi!
I followed your tutorial for trip advisor. I want to extract the hotel/restaurant website URL. In this tutorial a lot of data have been extracted only not the URL of the relevant Hotel/restaurant. I would really like to extract that too. Do you maybe can tell me how to do it? Is there a workaround for that? Otherwise I have all the adress data accept for the URL.
Can you help me with this?
Kr,
Freddy
Tutorial:
https://helpcenter.octoparse.com/hc/en-us/articles/360018841951-Scrape-hotel-data-from-Tripadvisor
-
Official comment
Hi Freddy,
Not sure what is the relevant Hotel/restaurant URL you need to scrape.
You can check how to scrape a URL: Select and extract data/URL/image/HTML
If that cannot help, would you please send over an example URL and show us what URL you need to scrape in a ticket: https://helpcenter.octoparse.com/hc/en-us/requests/new
Comment actions -
This is because TripAdvisor uses encoded URLS:
"""
<span class="_2wKz--mA public-business-listing-ContactInfo__webLink--fVOkq public-business-listing-ContactInfo__ui_link_container--37q8W public-business-listing-ContactInfo__level_1--1s823" data-encoded-url="Uk1qXy9Db21tZXJjZT9wPVRBQkFJbmRlcGVuZGVudEhvdGVscyZzcmM9MCZnZW89NjI4MjA1JmZyb209SG90ZWxfUmV2aWV3JmFyZWE9JnNsb3Q9MiZtYXRjaElEPTEmb29zPTAmY250PTEmc2lsbz0zMDM1NiZidWNrZXQ9ODc1MTcxJnVidWNrZXQ9ODc1MTcxJm5yYW5rPTEmY3Jhbms9MSZjbHQ9Q0xEJnRtPTE1MDM4NTI1MSZtYW5hZ2VkPWZhbHNlJmNhcHBlZD1mYWxzZSZnb3NveD01XzNZNlVZd3dHamd3X2M3aEV3SWtaWjIzVWR6RWtBM0ZlZHA3NnQtS0tLc0R3WFJHdFVEU05qUEFDMV9PbFdFOGtMbmtFZU5oWEVjTGsxRU5YdXEyUSZiYXBpZD0yJmNzPTFjNDEzZDczYTA5ZjA1OWUwZDM3Yzc0YmNmNWFlZmE1Nl9DNm0=">
<span class="ui_icon internet public-business-listing-ContactInfo__offerIcon--1HFZO"></span>
<span class="public-business-listing-ContactInfo__webLinkText--2Vjgm public-business-listing-ContactInfo__ui_link--1_7Zp public-business-listing-ContactInfo__level_1--1s823">Visit hotel website</span>
<span class="ui_icon external-link-no-box public-business-listing-OutboundIcon__outboundIcon--18FnV"></span>
</span>
"""
The only thing I can think of, off hand, is to click each link, then extract the URL from the address bar, then return back to your loop.
Please sign in to leave a comment.
Comments
3 comments