You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

In this tutorial, we will show you how to scrape the posts from LinkedIn.com. To follow through, you may want to use this URL in the tutorial: https://www.linkedin.com/search/results/content/?keywords=google&origin=GLOBAL_SEARCH_HEADER&sid=DIi

Here are the main steps in this tutorial: [Download task file here]

  1. Go to Web Page - to open the target webpage

  2. Log in to the website - to access the data

  3. Auto-detect webpage - to create the workflow

  4. Modify the XPath of Loop Item - to locate more posts

  5. Run the task - to get the data

1. Go to Web Page - to open the target webpage

  • Paste the URL and click Start

mceclip0.png

2. Log in to the website - to access the data

  • Click on the Sign In button and choose Click URL to go to the log-in page

Sign_in.jpg
  • After the login page is loaded, click on the Email input box and choose Enter text

Enter_email.jpg
  • Input the LinkedIn Email address and confirm

Confirm_email.jpg
  • Click the password input box, select Enter text, input the password and confirm

  • Click the Sign in button and choose the Click button

Click_sign_in.jpg
  • Set up the AJAX timeout to 10s

AJAX_timeout.jpg

3. Auto-detect webpage - to create the workflow

  • Select Auto-detect web page data

autodetect.jpg
  • Wait for the detection to be completed - click Edit

Edit_scroll.jpg
  • Click Create workflow

mceclip2.png
  • Click on Scroll Page, set up Scroll for one screen, Repeat and Wait time

set_up_scroll_repeats.jpg
  • Go to Data preview, double click on the header to rename it, or click ... to delete a field

srghesgd.gif

4. Modify the XPath of Loop Item - to locate more posts

LinkedIn pages are quite complicated. The auto-generated XPath does not work perfectly. So we need to update the XPath.

  • Click on Loop Item and input the XPath //div[contains(@class,'search-results-contain')]//h2/../..

  • Click Apply to confirm

Modify_loop_Xpath.jpg

5. Run task - to get the data

  • Run the task in the top right corner

  • Select Run on your device to run the task on your local device.

NOTE: We do not suggest running the LinkedIn tasks in the Cloud because the website would detect that you are logging in with a suspicious IP.

mceclip7.png

Here is the sample output -

mceclip1.png
Did this answer your question?