You are browsing a tutorial guide for Octoparse's latest version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Airbnb is a good website to find you a perfect vacation hotel. In this tutorial, we will help you to learn how to use Octoparse to get hotel info from Airbnb.

The easiest way is to use pre-built task templates of Airbnb. You don't need to configure scraping tasks but just enter keywords/URLs to wait for the data. For further details, you may check it out here: Task Templates

air1.png

If you want to build the task from scratch, you can continue to read this tutorial. Here is the Airbnb room source link that we will be using as an example.
https://www.airbnb.com/s/New-York--NY--United-States/homes?adults=2&search_type=pagination&s_tag=A2EV74MC&tab_id=home_tab&refinement_paths%5B%5D=%2Fhomes&children=1&place_id=ChIJOwg_06VPwokRYv534QaPC8g&federated_search_session_id=2e7da092-4a51-48db-ba26-9746f41ac068

Here are the main steps in this tutorial [Download task file here]

  1. Go to Web Page - open the target website

  2. Build a Loop Item - click each hotel link

  3. Extract data - scrape information from the detail page

  4. Modify the XPath of data fields

  5. Create pagination- scrape data from multiple pages

  6. Modify the XPath of Pagination

  7. Run your task - get the data you want


1) Go to Web Page - open the target website

  • Enter the URL on the home page and click Start

air2.png

2) Build a Loop Item - click each hotel link

  • Select the first two blocks to detect all blocks

  • Click on "Loop click each URL" to enter the detail page

A Loop Item will be created and Octoparse opens the first hotel page automatically.

306.gif

3) Extract data from the detail page

  • Select any info you want and click on Extract the text of the element

1100.gif
  • Select Add customer field -> Page-level data -> Page URL if you would like to pull the page URL from the current page

1196.gif
  • Double click the data field to modify the name

9887.png

4) Modify the XPath of data fields

The Airbnb page design is tricky and auto-generated XPaths usually does not for all the pages. No worries! We have prepared everything you need. You can just use the element XPath provided below.

  • Switch to Vertical View - Vertical View can help modify multiple data fields easily

  • Double click on the XPath to modify it

  • Input the new XPath to it

Change_XPath.jpg

Here are Xpaths for different fields of Airbnb pages:

  • Hotel Title: //h1

  • Number of review: //button[contains(@aria-label,'Rate')]

  • Review rating: //button[contains(@aria-label,'Rate')]/../preceding-sibling::span[1]

  • Number of guests: //span[contains(text(),'guest')]

  • Number of bedrooms: //span[contains(text(),'bedroom')]

  • Number of bath: //span[contains(text(),'bathroom')]

  • Number of beds: //span[contains(text(),'bed')][not(contains(text(),'room'))]

  • Price: //div[contains(@style,'pricing')]/div[1]//span


5) Create pagination

  • Click on Go to Web Page to open the listing page again

  • Select the next page button (">") at the bottom of the main page

  • Choose Loop click single element from the Tips

A Pagination will be created in the workflow

  • Drag the workflow to the right position

25996.gif

6) Modify the XPath of Pagination and Loop Item

The auto-generated XPath does not always work well. In this case, we will need to modify the XPath of the Pagination and Loop Item

  • Click on Pagination

  • Enter the XPath: //*[@aria-label='Next']

147778.gif
  • Click on Loop Item

  • Change Loop Mode to Variable list

  • Enter XPath: //a[contains(@aria-labelledby,'title')]

  • Click Apply to save

Loop_Item_Xpath.jpg

TIP: XPath plays an important role in locating the correct element in Octoparse. To learn more about it, please refer to the following tutorial: What is XPath and how to use it in Octoparse

The next page is loaded with AJAX, so we need to add AJAX timeout to the "Click to Paginate" action.

  • Click on Click to Paginate

  • Go to the Options

  • Tick Load with AJAX

  • Set up the AJAX timeout as 5-10s

fgfgfgfgf.gif

7) Run your task - get the data you want

  • Click Save

  • Click Run on the upper left side

  • Select Run on your device to run the task on your computer, or select Run task in the Cloud to run the task in the Cloud (for premium users only)

145556.png

Here is the sample output -

air9.png
Did this answer your question?