In this tutorial, we are going to show you how to scrape the product information from Gumtree.com.
To follow through, you may want to use this URL in the tutorial:
We will scrape data such as the title, price, and update time from property details page with Octoparse.
This tutorial will also cover:
- Deal with AJAX for pagination
Here are the main steps in this tutorial: [Download task file here]
- Go To Web Page - to open the targeted web page
- Create a pagination loop - to scrape all the details from multiple pages
- Create a "Loop Item" - to loop click into each item on each list
- Extract data - to select the data for extraction
- Start extraction - to run the task and get data
1. "Go To Web Page" - to open the targeted web page
- Click "+ Task" to start a new task with Advanced Mode
Advanced Mode is a highly flexible and powerful web scraping mode. For people who want to scrape from websites with complex structures, like amazon.com, we strongly recommend Advanced Mode to start your data extraction project.
- Paste the URL into the "Extraction URL" box and click "Save URL" to move on
2. Create a pagination loop - to scrape all the details from multiple pages
- Scroll down the page and click the next page button ">"
- Click "Loop click the single element" on the "Action Tips"
- Set "Wait before execution" for 15seconds (optional)
- Undo the "Retry when the page remains unchanged"
- Click "Load the page with Ajax" and set AJAX Timeout as 30s (optional according to your network)
3. Create a "Loop Item" - to scrape all the details on one page
- Click the title of the first list under “most recent” on the current page
- Click the title on the middle of the page
- Click "Extract text of selected element"
- Rename the data field by selecting from the pre-defined listP.S. Click only the items located under "most recent" because other items have different web structures.
4. Extract data - to select the data for extraction
- Select the data you need on the page
- Select "Extract text of the selected element" on the "Action Tips" panel
- Rename the fields by selecting from the pre-defined list or inputting on your own
5. Start extraction - to run the task and get data
- Click "Save"
- Click "Start Extraction" on the upper left side
- Select "Local Extraction" to run the task on your computer, or select "Cloud Extraction" to run the task in the Cloud (for premium users only)
Here is the sample for your information.