Airbnb is a good website to find you a perfect vacation hotel. In this tutorial, we will help you to learn how to use Octoparse to get hotel info from Airbnb.
The easiest way is to use pre-built task templates of Airbnb. You don't need to configure scraping tasks but just enter keywords/URLs to wait for the data. For further details, you may check it out here: Task Templates
If you want to build the task from scratch, you can continue to read this tutorial. Here is the Airbnb room source link that we will be using as an example.
Here are the main steps in this tutorial [Download task file here]
- "Go To Web Page" - open the target website
- Auto-detect the web page - create the workflow
- Modify the settings of the "Pagination"
- Click into each detail page to get more info
- Extract data from the detail page
- Modify the XPath of "Click URLs in the list"
- Run your task - get data you want
1) "Go To Web Page" - open the target website
- Enter the URL on the home page and click "Start"
2) Auto-detect web page - create the workflow
- Click "Auto-detect web page data" and wait for the detection to complete
- Rename or delete the fields on Data preview
- Click "Create workflow"
Octoparse would automatically generate a workflow like this:
3) Modify the settings of the "Pagination"
The auto-generated XPath does not always work well. In this case, we will need to modify the XPath of the Pagination.
- Click open the settings of "Pagination"
- Enter the XPath: //a[@aria-label="Next"]
XPath plays an important role in locating the correct element in Octoparse. If you want to learn more about it, please refer to the following tutorial:
The next page is loaded with AJAX, so we need to add AJAX timeout to the "Click to Paginate" action.
- Click open the settings of "Click to Paginate"
- Tick "Load with AJAX"
- Set up the AJAX timeout as 7-10s
If all the data you need could be scraped from the listing page, you can stop here and jump to Run your task - get data you want. If you want to go to each product detail page to get more info, follow the steps below.
4) Click into each link to get more info
- Choose “Click on link(s) to scrape the linked page(s)” on the Tips panel
- Select "Click on an extracted data field" and select the field you want to click on from the drop-down menu (you can confirm if it's the correct link on the Data Preview)
- Click "Confirm"
Octoparse would open the first detail page automatically.
5) Extract data from the detail page
- Select information on the web page
- Choose "Extract text of the selected element"
- Repeat the above steps to extract all the data you need
- Rename the fields if needed
- Click open the settings of "Extract Data1"
- Tick "Wait before action"
- Set up the wait time as 7-10s
6) Modify the XPath of "Click URLs in the list"
The auto-generated "Click URLs in the list", in this case, does not work well. We can modify the XPath of it to make it work.
- Click open the settings of "Click URLs in the list"
- Enter the XPath: /descendant-or-self::A[contains(@class,"_gjfol0")]
- Click "OK" to confirm
7) Run your task - get data you want
- Click "Run" on the upper left side
- Select "Run on your device" to run the task on your computer, or select "Run task in the Cloud" to run the task in the Cloud (for premium users only)
Here is the sample output.