Scrape job info from Glassdoor
FollowGlassdoor is one of the worldwide leading platforms for insights about jobs and companies, aimed at helping people find suitable employment.
This tutorial will show you how to scrape job information from glassdoor.com.
To follow through with this tutorial, you may want to use the URL below:
https://www.glassdoor.com/Job/us-marketing-manager-jobs-SRCH_IL.0,2_IN1_KO3,20.htm
Note: If you want to check whether your workflow works correctly, please download the OTD file for this case at the bottom of this page.
Here are the main steps of this tutorial:
- Create a Go to Web Page - to open the target website
- Auto-detect the webpage - to create a workflow
- Modify the XPath of the data fields - to locate the fields accurately
- Click on each link - to get detailed information
- Create an Extract data - to add custom data field for detailed job info
- Run the task - to get your desired data
1. Create a Go to Web Page - to open the target website
- Enter the target URL into the search bar on the home screen and click Start
2. Auto-detect the webpage - to create a workflow
- Click Auto-detect web page data in Tips and wait for the detection to complete
- Check the data fields in Data preview and delete unwanted fields or rename them if needed
- Click Create workflow
3. Modify the XPath of the data fields - to locate the fields accurately
The auto-generated XPath of some fields needs to be modified to make sure that Octoparse extracts the accurate data.
- Click
next to the data field to change its settings
- Choose Customize XPath
- Input the Matching XPath
- Click Apply to save the change
- Job Title: //a[@data-test="job-link"]
- Company: //div[contains(@class, "align-items-start")]/a
- Location: //a[@data-test="job-link"]/following-sibling::div[1]
- Salary: //span[@data-test="detailSalary"]
- Rating: //a[@class='jobLink']/following-sibling::span
- Post Date: //div[@data-test="job-age"]
4. Click on each link - to get detailed information
Sometimes you may need some extra information about the job, such as job responsibilities and requirements; thus, the next move will be to click on each link in the job list to get detailed info.
- Click on the first item in the job list
- Choose Click element in Tips
- Set appropriate AJAX timeout: 7-10s recommended
5. Create an Extract Data - to add custom data fields for detailed job info
- Click
to add a step in the workflow
- Click Extract Data
- Click Add Custom Field in the Data Preview
- Click Capture data on the page
- Input the field name as: Job_detail
- Choose Absolute XPath
- Tick Absolute XPath and input Matching XPath as: //div[@class="jobDescriptionContent desc"]
- Click Confirm to save the settings
6. Run the task - to get your desired data
Before running the task, you will see a workflow created like the one below:
- Click Save on the upper right to save your task
- Click Run next to it and wait for a Run Task window to pop up
- Select Run on your device to run the task on your local device
- Wait for the task to complete
Here is a sample output from a local run:
If you have further issues with the task or have a suggestion that would make this a better resource for you, we’d love to hear about it. Submit a request here.
Author: Cassie
Editor: Yina