In this tutorial, we are going to show you how to scrape the product details from Wayfair, an American e-commerce company that sells home goods.
For this example, we will use the URL below in order to scrape data such as product title, description, and price from each product detail page.
Here are the main steps in this tutorial:
- Click "+ Task" to start a new task with Advanced Mode
- Paste the URL into the "Extraction URL" box and click "Save URL" to move on
Extracting data from a list of URLs is recommended for large scrape data scraping projects. This approach is considerably more efficient and manageable. At times when the list of URLs are large, Octoparse supports batch/bulk URL import from local files (text or spreadsheet), from another task or even generate the URLs based on some pre-defined patterns.
- Scroll down to the bottom of the page, click the "Next" button
- Click "Loop click next page" on "Action Tips" panel
- Set up an AJAX loading for 5s (optional according to your local network condition)
- Click "OK" to save
AJAX timeout can often be used as webpage timeout for Click Action. For example, when you have a page that takes forever to finish loading, long after the data you need gets loaded, you can conveniently use AJAX timeout to tell Octoparse to move on to the next action when the set time is reached.
- Click on any product titles on the page
- Click "Select all" on the "Action Tips" panel
- Click "Loop click each element"
Octoparse detects for any similar items on the same page when an element is selected. The selected links are highlighted in green while all the other similar links detected are highlighted in red. When a Loop click action is added, Octoparse will click through each link captured in the Loop Item, and open the product detail page one by one.
Creating a loop is one of the most common steps with Octoparse.
- Uncheck "Uncheck the box for "Retry when page remains unchanged (use discreetly for AJAX loading)"
- Click "Save"
After you click "Loop click each element", Octoparse will open the detail page of the first product.
- Click on the data you need on the page
- Select "Extract text of the selected element" from the "Action Tips"
- Rename the fields by selecting from the pre-defined list or inputting on your own
- Click "Start Extraction" on the upper left side
- Select "Local Extraction" to run the task on your computer, or select "Cloud Extraction" to run the task in the Cloud (for premium users only)
Here's the data we extracted. You can download the task file here to try it out yourself.
Happy data hunting!