In this tutorial, we are going to show you how to scrape information from Craigslist.
To follow through, you might want to use the URL in this tutorial:
We will scrape data, such as the title, time, compensation, and employment_type from the job details page with Octoparse.
Here are the main steps in this tutorial: [Download task file here]
1) "Go To Web Page" - to open the targeted web page
- Create the task with "Advanced Mode".
- Paste the URL into the "Extraction URL" box and click "Save URL" to move on.
2) Create a pagination loop - to scrape all the results from multiple pages
- Click the "Next>" button on the webpage
- Click "Loop click the selected link" on "Action Tips"
3)Create a "Loop Item" - to loop click into each item on each list
We are now on the second page. When creating a "Loop Item", we should always start with the first item on the first page. Thus, we 'd better go back to the first page.
- Click "Go To Web Page" in the workflow.
- Select the pagination loop in the workflow
By doing this, we can help Octoparse decide the execution order and generate the Loop Item at the appropriate position in the workflow.
- Click the title of the first item
The first item is highlighted in green while the others are highlighted in red
- Click "Select All" on "Action Tips"
All of the items are highlighted in green
- Select "Loop click each URL"
4) Extract data - to select data you need to scrape
- Select data you need on the item page to scrape, such as compensation, employment type, title etc.
- Select "Extract text of the selected element" and rename the "Field name" column if necessary.
5) Run extraction - to run your task and get data
- Click "Save"
- Click "Start Extraction" and "Local Extraction"
Here is the sample output:
Was this article helpful? Contact us anytime if you need our help!