You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Indeed is one of the most popular job posting websites. With web scraping, you can uncover the value of tons of job information. In this tutorial, we will show you how to use Octoparse to scrape the job posts from Indeed.com.

Before we get started, we need to get the URL of the target result page by searching a keyword and a location.

Below is an example URL for demonstration:

https://www.indeed.com/jobs?q=devops&l=Dallas-Fort%20Worth%2C%20TX&radius=50

The easiest way to scrape the website is to go to "Task Templates" on the main screen of the Octoparse scraping tool and start with the ready-to-use Indeed Templates directly to save your time. Just input the URL into the template, and you can wait for the data to come out. For further details, you may check it out here: Task Templates

1__1_.png

If you would like to know how to build the task from scratch, you may continue reading the following tutorial.

Here are the main steps in this tutorial: [Download task file here]

  1. Go to Web Page - Open the targeted web page

  2. Create Pagination - Scrape data from multiple pages

  3. Create Loop Item - Scrape job information

  4. Set up the wait time for "Extract Data" - control scraping speed

  5. Start extraction - run the task and get data


1. Go to Web Page- open the targeted web page

  • Enter the URL on the home page and click Start

1.1.png

2. Create Pagination - Scrape data from multiple pages

  • Click on the Next page button (>) on the page

  • Choose Loop click singe element on the Tips

Pagnation.jpg

A Pagination will be created in the workflow.

pagination.jpg

To make sure the pagination can work well, we need to modify the XPath of it.

  • Click on Pagination

  • Enter XPath //a[@aria-label="Next"]

  • Click Apply to save

pagination_XPath.jpg

TIP: If you see any pop-ups appear on the page, please turn on the browse mode in the upper right corner and manually close the pop-up window. After that, turn off browser mode to continue building the workflow.


3. Create Loop Item - Scrape job information

  • Select the first two job info blocks (note to select the whole job block that includes all the information you want)

  • Choose Extract text of the selected elements

Loop_Item.gif

A Loop Item will be created in the workflow.


Loop_Item.jpg

But you may have noticed that all the information has been scraped into one cell. We need to separate the information into different columns.

  • Select the first job title (within the highlighted area)

  • Choose Extract the text of the element

separate_fields.jpg
  • Do the same to scrape other information from the first job

  • Double-click on the field name to rename it if needed

rename.jpg

4. Set up the wait time for "Extract Data" - control scraping speed

789.gif

5. Start extraction - run the task and get data

  • Click Save

  • Click Run on the upper left side

  • Select Run on your device to run the task on your computer, or select Run task in the Cloud to run the task in the Cloud (for premium users only)

089.png

Here is the sample data for your reference -

mceclip2__1_.png
Did this answer your question?