You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!

Glassdoor is one of the worldwide leading platforms for insights about jobs and companies, aimed at helping people find suitable employment.

This tutorial will show you how to scrape job information from glassdoor.com.

glassdoor.jpg

To follow through with this tutorial, you may want to use the URL below:

https://www.glassdoor.com/Job/us-marketing-manager-jobs-SRCH_IL.0,2_IN1_KO3,20.htm

Here are the main steps of this tutorial: [Download task file here]

  1. Create a Go to Web Page - to open the target website

  2. Auto-detect the webpage - to create a workflow

  3. Modify the XPath of the data fields - to locate the fields accurately

  4. Click on each link - to get detailed information

  5. Create an Extract data - to add custom data field for detailed job info

  6. Run the task - to get your desired data

1. Create a Go to Web Page - to open the target website

  • Enter the target URL into the search bar on the home screen and click Start

go_to_webpage.jpg

2. Auto-detect the webpage - to create a workflow

  • Click Auto-detect web page data in Tips and wait for the detection to complete

auto_detect.jpg
  • Check the data fields in Data preview and delete unwanted fields or rename them if needed

data_preview.jpg
  • Click Create workflow

create.jpg

3. Modify the XPath of the data fields - to locate the fields accurately

The auto-generated XPath of some fields needs to be modified to make sure that Octoparse extracts accurate data.

  • Click the More button next to the data field to change its settings

  • Choose Customize XPath

xpath.jpg
  • Input the Matching XPath

  • Click Apply to save the change

xpath_page.jpg

We have prepared the XPaths for the fields for you. You can copy and paste them to Octoparse. Enjoy!

  • Job Title: //a[@data-test="job-link"]

  • Company: //div[contains(@class, "align-items-start")]/a

  • Location: //a[@data-test="job-link"]/following-sibling::div[1]

  • Salary: //span[@data-test="detailSalary"]

  • Rating: //a[@class='jobLink']/following-sibling::span

  • Post Date: //div[@data-test="job-age"]

4. Click on each link - to get detailed information

Sometimes you may need some extra information about the job, such as job responsibilities and requirements; thus, the next move will be to click on each link in the job list to get detailed info.

  • Click on the first item in the job list

  • Choose Click element in Tips

click.jpg
  • Set appropriate AJAX timeout: 7-10s recommended

ajax.jpg

Note: If you are interested in how Octoparse handles AJAX websites, please check it out here.

5. Create an Extract Data - to add custom data fields for detailed job info

  • Click the Add step button to add a step in the workflow

  • Click Extract Data

extract_data.jpg
  • Click Add Custom Field in the Data Preview

  • Click Capture data on the page

add_field.jpg
  • Input the field name as: Job_detail

  • Choose Absolute XPath

  • Tick Absolute XPath and input Matching XPath as: //div[@class="jobDescriptionContent desc"]

  • Click Confirm to save the settings

data_field.jpg

6. Run the task - to get your desired data

Before running the task, you will see a workflow created like the one below:

workflow.jpg
  • Click Save on the upper right to save your task

  • Click Run next to it and wait for a Run Task window to pop up

  • Select Run on your device to run the task on your local device

  • Wait for the task to complete

Here is a sample output from a local run:

glassdoor_data.jpg
Did this answer your question?