Skip to main content

Scrape hotel details from Airbnb

Updated over 9 months ago

You are browsing a tutorial guide for Octoparse's latest version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

🐙 Did you know?

Octoparse offers a variety of preset templates for scraping data from Airbnb. We strongly suggest that you test with these templates to determine if any of them meet your data requirements.

If you would like to create the task from the beginning, you may proceed with reading this tutorial. We will be using the below Airbnb room source link as our example.

​https://www.airbnb.com/s/New-York--NY--United-States/homes?adults=2&search_type=pagination&s_tag=A2EV74MC&tab_id=home_tab&refinement_paths%5B%5D=%2Fhomes&children=1&place_id=ChIJOwg_06VPwokRYv534QaPC8g&federated_search_session_id=2e7da092-4a51-48db-ba26-9746f41ac068

The main steps are shown in the menu on the right, and you can download the sample task file below.


1. Go to Web Page - open the target website

  • Enter the URL on the home page and click Start


2. Set up a Loop Item and Pagination - to click each hotel link and paginate

  • Select the first block to detect all blocks

  • Click on Select all similar elements

  • Click on Loop click each URL to enter the detail page

  • Click Yes to create a Pagination

  • Click on Next page button

  • Scroll down to the end of the page to click on the "next page" icon and

    Confirm

The workflow created should look like this:

The next page is loaded with AJAX, so we need to add AJAX timeout to the Click to Paginate action.

  • Click on Click to Paginate

  • Go to the Options

  • Tick Load with AJAX

  • Set up the AJAX timeout as 5-10s

    fgfgfgfgf.gif


3. Modify the XPath of the Loop Item - to locate the items accurately

The auto-generated XPath does not always work well. In this case, we will need to modify the XPath of the Loop Item.

  • Click on Loop Item

  • Switch Loop Mode to Variable list

  • Enter XPath: //div[@data-testid="card-container"]/a

  • Click Apply to save

Note: XPath plays an important role in locating the correct element in Octoparse. To learn more about it, please refer to the following tutorial: What is XPath and how to use it in Octoparse


4. Extract data from the detail page

  • Click on Click Item to enter the detail page

  • Select any info you want and click on Text on the Tips panel

  • Select Add customer field -> Page-level data -> Page URL if you would like to pull the page URL from the current page

1196.gif
  • Double-click the data field to modify the name


5. Run your task - get the data you want

  • Click Run to run your task either on your device or in the cloud

  • Select Standard Mode under Run on your device section to run the task on your local device

  • Wait for the task to complete


Below is the sample output data, which can be exported in Excel, CSV, HTML and JSON formats.

air9.png
Did this answer your question?