You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
Houzz is a platform that connects homeowners and home professionals with tools, resources, and vendors to provide home renovation and design.
You can go to "Task Templates" on the main screen of the Octoparse scraping tool and start with the ready-to-use Houzz Templates directly to save your time. With this feature, there is no need to configure scraping tasks. For further details, you may check it out here: Task Templates
This tutorial will show you how to collect professional details, such as title, rating, location, and description on Houzz.com with Octoparse.
To follow through, you may want to use the URL below:
The main steps are shown in the menu on the right, and you can download the sample task file at the bottom of this tutorial.
1. Create a Go to Web Page - to open the target website
2. Auto-detect the webpage - to create a workflow
Choose Auto-detect webpage and wait for the detection to complete
Check Data Preview and delete unwanted fields
NOTE: Do not delete the URL field as we need to use it to open the detail page.
Uncheck Add a page scroll
Click Create workflow
3. Select subpage URL - to load extra information about a Professional
Choose Select subpage URL on the Tips panel
Select the data field as Title_URL
Click Confirm
4. Extract data from the detailed webpage - to collect the Professional’s description
Click Read More in the About Us section
Choose Click button
NOTE: The location of the Read More button may differ in each detailed webpage. Thus, to extract the data more accurately, we need to modify the matching XPath of Read More
Choose Click Item
Input the Matching XPath as: //div[@data-container='About Us']/button
Select the text needed
Click Text
Click More and Clean data to delete "Read less" in the text
Click Add Step and Replace
5. Run the task - to get the target data
Click Save on the upper right to save your task
Click Run next to it and wait for a Run Task window to pop up
Select Run on your device to run the task on your local device
Wait for the task to complete
Here is a sample output from a local run: