You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!
Expedia is a popular American online hotel booking website for travelers. This tutorial will show you how to scrape basic information such as hotel name, location, price, amenity, etc., from Expedia with Octoparse.
To follow through, here is the example URL:
The main steps are shown in the menu on the right, and you can download the sample task file here.
1. Create a Go to Web Page - to open the target website
Enter the target URL on the Octoparse homepage and click Start
2. Auto-detect the webpage - to create a workflow
Click Auto-detect webpage data and wait for it to complete
Uncheck Add a page scroll
Click Create workflow
Go to Data Preview to see if you're okay with the current data output
3. Set up a Page Scroll - to better load the data on the webpage
Click Go to Webpage > Options panel
Tick Scroll down the page after it is loaded
Set Scroll Mode as for one screen
Set Wait to "2s" before the next scroll
Set Scroll times to "100"
Tick Stop scrolling when there's no more content to load
Click Apply to save the settings
Repeat the steps above for the step Click on a "Load More" button
4. Modify XPath for the loop and data fields - to make sure Octoparse scrapes correct data
Click Loop Item 1
Make sure Loop Mode is Variable List
Input Xpath //div[@data-stid="lodging-card-responsive"]
Click Apply
Click the More button on the data field
Choose Customize XPath
Input Xpath //button[@class="uitk-image-link"]/../div/img for image
Click Apply
Input Xpath for the final price: //div[@data-stid="lodging-card-responsive"]/descendant-or-self::DIV[contains(@class,"uitk-spacing uitk-spacing-padding-block-half")]/span the same way
5. Run the task - to get your target data
Click Run to run your task either on your device or in the cloud
Select Standard Mode under Run on your device section to run the task on your local device
Wait for the task to complete
Here's the sample data output for your reference: