Scraping property info from Daft.ie
FollowIn this tutorial, we will show you how to scrape property information from Daft.ie.
To follow through, you may want to use this URL in the tutorial:
https://www.daft.ie/waterford-city/property-for-sale/waterford-city/?s[mxp]=850000
We will scrape data such as price, address, property overview and description from each property detail page with Octoparse.
This tutorial will also cover:
- Locate elements correctly by modifying XPath in Octoparse
Tips! It is recommended that you use the URL of the search result page directly whenever possible. Adding keywords/filters within Octoparse can complicate the task and leads to less efficient scraping. |
Here are the main steps in this tutorial: [Download task file here]
- Go To Web Page - open the target web page
- Create a pagination loop - scrape all the details from multiple pages
- Create a "Loop Item" - loop click into each item on each list
- Extract data - select the data for extraction
- Customize the data field by modifying XPath - improve the accuracy of extracted data (Optional)
- Start extraction - run the task and get data
1. Go To Web Page - open the target web page
- Click "+ Task" to start a new task with Advanced Mode
Advanced Mode is a highly flexible and powerful web scraping mode. For people who want to scrape from websites with complex structures, like Daft.ie, we strongly recommend Advanced Mode to start your data extraction project.
- Paste the URL into the "Website" box and click "Save URL" to move on
2. Create a pagination loop - scrape listings from all pages
- Scroll down the page in the built-in browser, click the "Next" button
- Click "Loop click next page" on the "Action Tips" panel
- Click “Save”
3. Create a "Loop Item" - scrape all the items on each page
Now, you are on the second page. You should always start on the first page.
- Click "Go to Web Page" in the workflow, and then select the "Pagination" loop
By this action, it can help Octoparse decide the order of execution. Then, you can start to make a "Loop Item".
- Click the price of the 1st item in the list
- Click "A" tag on the bottom of the "Action Tips" panel
- Click "Select All" on the "Action Tips" panel
- Click "Loop click each URL"
- Set "Wait before execution" as "5" seconds (optional, depends on your local network condition)
- Click "Save"
4. Extract data - select the data for extraction
- Click on the data you need on the page
- Select "Extract text of the selected element" on the "Action Tips" panel
- Rename the fields by selecting from the predefined list or inputting on your own
- Click "OK" to save
5. Customize the data field by modifying XPath - improve the accuracy of the extracted data (Optional)
When you run the task, you may find some fields missing data even though they are on the web page. In this case, you need to revise the XPath to locate the element correctly.
- Select the "Property Description" field
- Click the icon of "Customize data field" and select "Customize XPath"
- Enter the modified XPath into the "Matching XPath" text box
- //section[@class='Section__container']/p[contains(@class,'PropertyDescription')]
- Click "OK" to save
You may find that the "Loop Item" cannot include all the items you want, so you also need to revise the XPath for "Loop Item".
- Select the "Loop Item"
- Go to "Loop Mode" and select "Variable list"
- Enter the modified XPath below into the text box for "Variable list"
- //a[@class='PropertyInformationCommonStyles__propertyPrice--link']
- Click "OK" to save
Tips! If you want to learn more about XPath and how to generate it, here are some related tutorials you might need: |
6. Start extraction - run the task and get data
- Click "Save"
- Click "Start Extraction" on the upper left side
- Select "Local Extraction" to run the task on your computer, or select "Cloud Extraction" to run the task in the Cloud (for premium users only)
For a premium user, Cloud Extraction is highly recommended.
Here is a data output sample for your reference.
Artículo en español: Scraping de la propiedad info de Daft.ie
También puede leer artículos de web scraping en el sitio web oficial
Writer: Vanny
Editor: Fergus