Skip to main content

Scrape professional details from Houzz

Updated over a year ago

You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Houzz is a platform that connects homeowners and home professionals with tools, resources, and vendors to provide home renovation and design.

You can go to "Task Templates" on the main screen of the Octoparse scraping tool and start with the ready-to-use Houzz Templates directly to save your time. With this feature, there is no need to configure scraping tasks. For further details, you may check it out here: Task Templates

This tutorial will show you how to collect professional details, such as title, rating, location, and description on Houzz.com with Octoparse.

SNAG-houzz0002.jpg

To follow through, you may want to use the URL below:

The main steps are shown in the menu on the right, and you can download the sample task file at the bottom of this tutorial.


1. Create a Go to Web Page - to open the target website

  • Enter the page URL on the home screen and click Start to create a new task

2. Auto-detect the webpage - to create a workflow

  • Choose Auto-detect webpage and wait for the detection to complete

  • Check Data Preview and delete unwanted fields

NOTE: Do not delete the URL field as we need to use it to open the detail page.

  • Uncheck Add a page scroll

  • Click Create workflow

SNAG-houzz0007.jpg

3. Select subpage URL - to load extra information about a Professional

  • Choose Select subpage URL on the Tips panel

  • Select the data field as Title_URL

  • Click Confirm


4. Extract data from the detailed webpage - to collect the Professional’s description

  • Click Read More in the About Us section

  • Choose Click button

NOTE: The location of the Read More button may differ in each detailed webpage. Thus, to extract the data more accurately, we need to modify the matching XPath of Read More

  • Choose Click Item

  • Input the Matching XPath as: //div[@data-container='About Us']/button

  • Select the text needed

  • Click Text

  • Click More and Clean data to delete "Read less" in the text

  • Click Add Step and Replace

  • Enter "Read Less" in the first box and click confirm

  • Click Apply


5. Run the task - to get the target data

  • Click Save on the upper right to save your task

  • Click Run next to it and wait for a Run Task window to pop up

  • Select Run on your device to run the task on your local device

  • Wait for the task to complete

Here is a sample output from a local run:

data_preview.jpg
Did this answer your question?