Scrape hotel information from Trip.com
Follow
In this tutorial, we will show you how to collect hotel information on Trip.com with Octoparse.
We will scrape data including the hotel name, location, distance, price, and rating from the metro from the hotel list page with Octoparse.
To follow through, you might want to use this URL in the tutorial:
The website applies an infinite scroll coupled with a "Show More" to load more reviews. After we scroll the page to the bottom like 2 times, a "show more" button would reveal and if we want to continue to load reviews, we have to keep clicking on the button.
Here are the main steps in this tutorial: [Download demo task file here ]
- "Go To Web Page" - open the target web page
- Auto-detect web page data - create a basic task workflow
- "Load More" - click the "show more" button to load more hotels
- Run the task to get data you need
1. "Go To Web Page" - open the targeted web page
- Enter the URL on the home page and click Start
2. Auto-detect web page data - create a basic task workflow
You can continue with the feature "Auto-detect web page data" on the Tips panel.
- Click "Auto-detect web page data"
- Wait till the auto-detect completed (it may take a bit longer since this page applies infinitive scroll down to load)
- Click "Edit" under "Add a page scroll" and set up the wait time to 5-7s
- Go to "Data preview" to see if you're okay with the current data output
- You can delete unnecessary data fields directly by clicking the icon
- You can also modify the data field names here directly by clicking the icon
- If you're okay with the current data preview, click "Create workflow"
Tips! Page scrolling has been widely applied in different websites. To deal with this type of website, you can either use the "Auto-detect" feature to help or set up a page scroll on your own by double-clicking the "Go to Web Page" step in the workflow. Check details in the following tutorials: |
3. "Load More" - click the "show more" button to load more hotels
- Select "Click on a 'Load More' button" on the Tips panel
- Choose the "Search More Hotels" button on the web page
- Set up the "Number of clicks" according to your needs. Here we set it as 5.
- Extend the AJAX timeout as 7s
4. Run the task to get data you need
- Click the "Save" button
- Click the "Run" button, and then choose "Run task on your device" or "Run task in the cloud"
Here is a sample data for your reference.
Author: Lesley