In this tutorial, we will show you how to scrape car data from Kijiji as an example.
For Kijiji, you could visit our easy-to-use "Task Template" on the main screen of the Octoparse scraping tool. All you need is to type in several parameters and the task is ready to go. For further details, you may check it out here: Task Templates
To follow through, you may want to use this URL in the tutorial:
This tutorial will also cover:
- Deal with AJAX
It is recommended that you use the URL of the search result page directly whenever possible. Adding keywords/filters within Octoparse can complicate the task and leads to less efficient scraping.
Here are the main steps in this tutorial: [Download task file here ]
- Go To Web Page - open the targeted web page
- Create a pagination loop - scrape all the details from multiple pages
- Create a "Loop Item" - loop click into each item on each list
- Extract data - select the data for extraction
- Save and start extraction - run the task and get data
1. Go To Web Page - open the target web page
- Click "+ Task" to start a task using Advanced Mode
- Paste the URL into the "Website" box
- Click "Save URL" to move on
2. Create a pagination loop - scrape all the details from multiple pages
- Scroll down to the bottom of the page, click the "Next" button
- Click "Loop click the selected link" on the "Action Tips" panel
- Set up an AJAX loading for 5s (optional according to your local network condition)
- Uncheck "Retry when page fails to load"
- Click "OK" to save
AJAX timeout can often be used as a web page timeout for the Click Action. For example, when you have a page that takes forever to finish loading, long after the data you need gets loaded, you can conveniently use AJAX timeout to tell Octoparse to move on to the next action when the set time is reached.
If you want to learn more about AJAX, here are some related tutorials you might need:
3. Create a "Loop Item" - loop click into each item on each list
- Click on a product title on the page
- Click "Select all" on the "Action Tips" panel
- Click "Loop click each URL"
- Uncheck the "Retry when the page remains unchanged"
- Click "Load the page with AJAX" and set AJAX Timeout as 5 seconds (optional)
- Click "Save" to move on
The first and last several products on this page are advertisements. Be careful not to click them if you only want to scrape the real products on this page.
4. Extract data - select the data for extraction
- Click on the data you need on the page
- Select "Extract text of the selected element" on the "Action Tips" panel
- Click "Add predefined field " and choose "Add current page information" and select "Web page URL" (Optional)
- Rename the fields by selecting from the predefined list or inputting on your own
5. Save and start extraction - run the task and get data
- Click "Start Extraction" on the upper left side
- Select "Local Extraction" to run the task on your computer, or select "Cloud Extraction" to run the task in the Cloud (for premium users only)
Here's the data we extracted.