You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!

LinkedIn is a good database for finding valuable job information. In this tutorial, we are going to introduce how to scrape job information from LinkedIn.com

To follow through, you may want to use the URL in the tutorial:

https://www.linkedin.com/jobs/search/?currentJobId=2011756127&geoId=105080838&keywords=accountant&location=New%20York%2C%20United%20States

We will scrape data such as job titles, companies, levels, types, functions, and industries in Octoparse.

The website applies an infinite scroll coupled with a "Show More" to load more reviews. After we scroll the page to the bottom like 6 times, a "show more" button would reveal, and if we want to continue to load jobs, we have to click on the button.

Here are the main steps in this tutorial. [Download the demo task click here]

  1. "Go To Web Page" - to open the targeted web page

  2. Set up scroll settings - to scroll down the page

  3. Auto-detect web page - to create a workflow

  4. Click on each link - to get more detailed information

  5. Extract data - to select the data for extraction

  6. Modify the XPath of the Loop Item - to locate the show more jobs button

  7. Start extraction - to run the task and get data


1. "Go To Web Page" - to open the targeted web page

  • Enter the URL on the home page and click Start

mceclip0.png

2. Set up scroll settings - to scroll down the page

Since the web page requires scrolling down 6 times, you need to set up scroll settings for the Go to Web Page action before it displays the Show More button.

scroll_settings.jpg

3. Auto-detect web page - to create a workflow

You can use the auto-detect web page to scrape the list of jobs.

  • Choose Auto-detect web page data

auto-detection.jpg
  • Wait for the detection to complete

  • Check the data fields in the Data Preview and delete the unwanted fields or rename fields if needed

rename.jpg
  • Uncheck Add a page scroll from the Tips panel

  • Click Create workflow

create_workflow.jpg

4. Click on each link - to get more detailed information

If you want to scrape job details from each job post, you need to click on each job URL to load the details page.

  • Choose Click on link(s) to scrape the linked page(s) on the Tips panel

  • Select Click on an extracted data field and select the basecard__fulllink_URL from the drop-down menu (you can confirm if it's the correct link on the Data Preview)

  • Click Confirm

mceclip1.gif
  • Go to the settings of Click URLs in the list

  • Click Options tab

  • Uncheck the Open in a new tab option

  • Tick Load with AJAX and set up the AJAX timeout as 5-7s

  • Click Apply to confirm

click_URLs.jpg

5. Extract data - to select the data for extraction

  • Click on any text information you want to extract from the page

  • Select Extract the text of the selected element on the Tips panel

  • Repeat the steps until you get all the data needed to be scraped

Extract_data.jpg
  • Edit the name of the data fields if needed

rename_fields.jpg
  • Uncheck the Extract data in the loop

extract_loop.jpg
  • Set up wait time as 7s

wait_time.jpg

6. Modify the XPath of the Loop Item - to locate the show more jobs button

  • Click on Loop Item

  • Replace the Matching XPath with //button[@aria-label="Load more results"]

  • Click Apply to save

Load_more_button.jpg

7. Run your task - to get the data you want

  • Click Save, and click Run on the upper right side

  • Select Run on your device to run the task on your computer

TIP: Please don't run the task in the Cloud since LinkedIn requires login when it detects suspicious IPs.

Here is the sample output.

mceclip0.png
Did this answer your question?