LinkedIn is a good database to find valuable job information. In this tutorial, we are going to introduce how to scrape job information from LinkedIn.com
To follow through, you may want to use the URL in the tutorial:
We will scrape data such as the job title, company, level, type, function, and industry in Octoparse.
Before that, please make sure that you have downloaded our latest version 8.1 (Check this guide to download News: Octoparse 8.1 Beta Released! ). LinkedIn is no longer compatible with Octoparse 7.3.0.
The website applies an infinite scroll coupled with a "Show More" to load more reviews. After we scroll the page to the bottom like 6 times, a "show more" button would reveal and if we want to continue to load jobs, we have to click on the button.
Here are the main steps in this tutorial. [Download the demo task from here]
- "Go To Web Page" - open the targeted web page
- Auto-detect web page - create a workflow
- Loop click "Show More" button - load more reviews
- Click into each link to get more detailed information
- Extract data - select the data for extraction
- Start extraction - run the task and get data
1. "Go To Web Page" - open the targeted web page
- Enter the URL on the home page and click Start
2. Auto-detect web page - create a workflow
- Choose "Auto-detect the web page data"
- Wait for the detection to complete
- Check the data fields on the Data Preview, and you can also delete the unwanted fields or rename fields if needed
- Click "Edit" under the "Add page scroll" option on the Tips panel
- Set up the wait time as 4-5 seconds (make sure the time is long enough for the page to load new reviews)
- Click "Create workflow" on the Tips panel
3. Loop click "Show More" button - load more jobs
- Choose "Click on a 'Load More' button" on the Tips panel
- Select the "See more jobs" button on the web page
- Set up the number of clicks according to how many jobs you need
- Click "Confirm"
- Set up AJAX Load as 5s
4. Click into each link to get more detailed information
- Choose “Click on link(s) to scrape the linked page(s)” on the Tips panel
- Select "Click on an extracted data field" and select the "resultcard__fullcardlink_URL" from the drop-down menu (you can confirm if it's the correct link on the Data Preview)
- Click "Confirm"
- Click open the action settings of the "Click URLs in the list"
- Uncheck the "Open in a new tab" option
- Tick "Load with AJAX" and set up the AJAX timeout as 5-7s
- Click "OK" to confirm
5. Extract data - select the data for extraction
- Click on the data you want to extract on the page
- Select "Extract the text of the selected element" on the "Tips" panel
- Repeat the steps until you get all the data needed to be scraped
- Edit the name of data fields if needed
6. Start extraction - run the task and get data
- Click "Save"
- Click "Run" on the upper left side
- Select "Run task on your device" to run the task on your computer, or select "Run task in the cloud " to run the task in the Cloud (for premium users only)
Here is the sample output.