In this tutorial, we are going to show you how to scrape the product information from Gumtree.com.
For Gumtree, you could visit our easy-to-use "Task Template" on the main screen of the Octoparse scraping tool. All you need is to type in several parameters and the task is ready to go. For further details, you may check it out here: Task Templates
To follow through, you may want to use this URL in the tutorial:
We will scrape data such as the title, price, and update time from the property details page with Octoparse.
This tutorial will also cover:
- Deal with AJAX for pagination
Here are the main steps in this tutorial: [Download task file here]
- Go To Web Page - to open the targeted web page
- Create a pagination loop - to scrape all the details from multiple pages
- Create a "Loop Item" - to loop click into each item on each list
- Extract data - to select the data for extraction
- Start extraction - to run the task and get data
1. "Go To Web Page" - to open the targeted web page
- Click "+ Task" to start a new task with Advanced Mode
- Paste the URL into the "Extraction URL" box
- Click "Save URL" to move on
2. Create a pagination loop - to scrape all the details from multiple pages
- Scroll down the page and click the next page button ">"
- Click "Loop click the single element" on the "Action Tips"
- Set "Wait before execution" for 15seconds (optional)
- Undo the "Retry when the page remains unchanged"
- Click "Load the page with Ajax" and set AJAX Timeout as 30s (optional according to your network)
AJAX timeout can often be used as a webpage timeout for Click Action. For example, when you have a page that takes forever to finish loading, long after the data you need gets loaded, you can conveniently use AJAX timeout to tell Octoparse to move on to the next action when the set time is reached.
If you want to learn more about AJAX, you can enjoy the video tutorial here.
3. Create a "Loop Item" - to scrape all the details on one page
- Scroll down until you see the "most recent"
- Click the title of the first list under “most recent” on the current page
- Click "Select all"
- Click "Extract text of selected element"
For Gumtree, don't click items located under "top ads around you" if you only need items located under "most recent".
4. Extract data - to select the data for extraction
- Select the data you need on the page
- Select "Extract text of the selected element" on the "Action Tips" panel
- Rename the fields by selecting from the pre-defined list or inputting on your own
5. Start extraction - to run the task and get data
- Click "Save"
- Click "Start Extraction" on the upper left side
- Select "Local Extraction" to run the task on your computer, or select "Cloud Extraction" to run the task in the Cloud (for premium users only)
Here is the output sample:
Was this article helpful? Contact us any time if you need our help!