Scrape professional details from Houzz
FollowIn this tutorial, we will show you how to collect professional details on Houzz.com with Octoparse.
For Houzz, you could visit our easy-to-use "Task Template" on the main screen of the Octoparse scraping tool. All you need is to type in several parameters and the task is ready to go. For further details, you may check it out here: Task Templates
We will scrape each detail page URL in Task 1 and extract the professionals' details such as the title, number of reviews, and description with Task 2. By splitting one task into two, the extraction speed could be improved to a certain degree, especially when we use the Octoparse Cloud Extraction.
To follow through, you may want to use this URL in the tutorial:
https://www.houzz.com/professionals/architects-and-building-designers/
Here are the main steps in this tutorial: [Download demo task file here]
Task 1: Extract all the URLs of detail pages on the search result pages
- "Go to Web Page" - to open the target web page
- Create a pagination loop - to scrape multiple listing pages
- Extract data - to scrape certain elements on each page
- Start extraction - to run the task and get data
Task 2: Collect the product information from scraped URLs
- Input a batch of the scraped URLs - to loop open the detail pages
- Extract data - to select the data for extraction
- Start extraction - to run the task and get data
Task 1: Extract the detail page URLs on the search result pages
- Click "+ Task" to start a new task with Advanced Mode
Advanced Mode is a highly flexible and powerful web scraping mode. For people who want to scrape from websites with complex structures, like houzz.com, we strongly recommend Advanced Mode to start your data extraction project.
- Paste the URL into the "Extraction URL" box and click "Save URL" to move on
2. Create a pagination loop - to scrape multiple listing pages
- Scroll down and click the "Next Page" button on the webpage
- Click "Loop click next page" on "Action Tips"
- Click any two professionals' titles
- Click "Extract text from selected elements" on "Action Tips" panel to create a loop
- Click any title on the page
- Click "Extract URL of the selected link" on "Action Tips" panel to extract detail page URL
- Rename the fields
- Click “Start Extraction” on the upper left side
- Select “Local Extraction” to run the task on your computer, or select “Cloud Extraction” to run the task in the Cloud (for premium users only)
After the data extraction completed, please export the URL result for Task 2.
Task 2: Collect the professional details from URLs
1. Input a batch of the scraped URLs - to loop open the detail pages
With Task 1, we are able to get a batch of the URL list
- Click "+ Task" to start a task using Advanced Mode to build Task 2
- Input batch URL
There are three ways to batch import URLs to any single task/crawler (up to one million URLs). In this case, we will batch import URLs from local files. For further study, please refer to Batch Import URLs
2. Extract data - to select the data for extraction
As we can see, we are on the detail page now.
- Click the information you need on the page
- Select "Extract data" on the "Action Tips"
- Rename the fields
In this step, we are able to rename the fields by selecting from the pre-defined list or inputting on your own. Here we select three fields: Title; Number_of_Reviews; Description.
To extract the phone number:
- Click "Click to call" and select "Click element" on the "Action Panel"
- Uncheck "New Tab" and "Auto Retry"
- Set "Ajax Timeout" as 5s
- Click the phone number and select "Extract text of the selected element"
- Rename the field name as you need
- Click "Save" to move on
3. Start extraction - to run the task and get data
- Click "Save"
- Click "Start Extraction" on the upper left side
- Select "Local Extraction" to run the task on your computer, or select "Cloud Extraction" to run the task in the Cloud (for premium users only)
For a premium user, Cloud Extraction is highly recommended.
Now you have the data you need. With two tasks, we can avoid some problems caused by small changes happened on websites.
But if you just want to scrape the data for once, you can also combine these two tasks into one task, which makes the whole process easier.
- Follow step1 and step2 in Task1
- Click "Read More" of the first professional
- Click "Select all" on "Action Tips" panel and then click "Loop click each element"
- Follow step2 and step3 in Task2
Happy data hunting!
Artículo en español: Scrape detalles profesionales de Houzz
También puede leer artículos de web scraping en El Website Oficial
Writer: Eric
Editor: Yanni