You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
In this tutorial, we will show you how to scrape the posts from LinkedIn.com. To follow through, you may want to use this URL in the tutorial: https://www.linkedin.com/search/results/content/?keywords=google&origin=GLOBAL_SEARCH_HEADER&sid=DIi
Here are the main steps in this tutorial: [Download task file here]
1. Go to Web Page - to open the target webpage
Paste the URL and click Start
2. Log in to the website - to access the data
Click on the Sign In button and choose Click URL to go to the log-in page
After the login page is loaded, click on the Email input box and choose Enter text
Input the LinkedIn Email address and confirm
Click the password input box, select Enter text, input the password and confirm
Click the Sign in button and choose the Click button
Set up the AJAX timeout to 10s
3. Auto-detect webpage - to create the workflow
Select Auto-detect web page data
Wait for the detection to be completed - click Edit
Click Create workflow
Click on Scroll Page, set up Scroll for one screen, Repeat and Wait time
Go to Data preview, double click on the header to rename it, or click ... to delete a field
4. Modify the XPath of Loop Item - to locate more posts
LinkedIn pages are quite complicated. The auto-generated XPath does not work perfectly. So we need to update the XPath.
Click on Loop Item and input the XPath //div[contains(@class,'search-results-contain')]//h2/../..
Click Apply to confirm
5. Run task - to get the data
Run the task in the top right corner
Select Run on your device to run the task on your local device.
NOTE: We do not suggest running the LinkedIn tasks in the Cloud because the website would detect that you are logging in with a suspicious IP.
Here is the sample output -