Scrape posts from LinkedIn
FollowLinkedIn is a good resource to get information from different companies. In this tutorial, we will show you how to scrape the posts from LinkedIn.com.
To follow through, you may want to use this URL in the tutorial:
https://www.linkedin.com/search/results/content/?keywords=octoparse&origin=SWITCH_SEARCH_VERTICAL
Before that, please make sure that you have downloaded our latest version 8.1 (Check this guide to download News: Octoparse 8.1 Beta Released! ). LinkedIn is no longer compatible with Octoparse 7.3.0.
Here are the main steps in this tutorial: [Download task file here]
- "Go To Web Page" - open the target web page
- Log in to the website
- Auto-detect web page data - create a basic task workflow
- Extract data - select data you need to scrape (optional)
- Run the task to get data you need
1. "Go To Web Page" - open the target web page
- Enter the URL into the search bar
- Click "Start" to open a new task
2. Log in to the website
LinkedIn requires people to log in first before accessing the data we want. In this case tutorial, we will use the "save cookies" way for demonstration.
- Enable "Auto-detect web page data" to help you set up the task
- After detecting, choose "Not the right webpage?"
- Then, choose "Login to website"
After clicking "Login to website", you're now under the "Browse Mode". You can navigate the page just like in your regular browser. The actions you take will not be generated to steps in the task workflow.
Since the page is a sign-up page, you need to click "Sign in" to go to the sign-in page (You can skip it if you're already on the sign-in page).
- Enter your LinkedIn account details and then click "Sign in" to log in
- Now, you have logged in to your account. The page will be redirected to the one we enter (https://www.linkedin.com/search/results/content/?keywords=octoparse&origin=SWITCH_SEARCH_VERTICAL).
- Click "Done" on the Tips panel
You will see a notice on top saying "Cookies saved".
Then, you can move on to scrape the data you need.
Tips! Octoparse has different ways to deal with data behind log-in. You can explore more in this tutorial to add log-in steps to the workflow: Scrape data behind a login |
3. Auto-detect web page data - create a basic task workflow
You can continue with the feature "Auto-detect web page data" on the Tips panel.
- Click "Auto-detect web page data"
- Wait till the auto-detection completed (it may take a bit longer since this page applies infinitive scroll down to load)
- Click "Edit" under the "Add a page scroll" to see if you need to adjust the page scroll times.
- Go to "Data preview" to see if you're okay with the current data output
- You can delete unnecessary data fields directly by clicking the icon
- You can also modify the data field names here directly by clicking the icon
- You can delete unnecessary data fields directly by clicking the icon
- If you're okay with the current data preview, click "Create workflow"
Then, you'll see a workflow generated as below.
Tips! Page scrolling has been widely applied in different websites. To deal with this type of websites, you can either use the "Auto-detect" feature to help or set up page scroll on your own by double-clicking the "Go to Web Page" step in the workflow. Check details in the following tutorials: |
4. Extract data - select data you need to scrape (optional)
Now, the workflow is almost completed. We can check the data we have extracted with "Auto-detect" and see if you need to add some other fields.
- Double-click "Extract Data" in the workflow to check the details
- If you want to modify field names, just click the field names to edit
-
- If you want to capture other data on the web page, you can click element(s) inside the area highlighted in red, and then choose "Extract the text of the selected element"
If you need to add some fields like "Current time" or "current Page_URL", click the "+" icon to add from the list.
Tips! To know more about how to deal with "Extract Data", check the following guides: |
5. Run the task to get data you need
- Click the "Save" button
- Click the "Run" button, and then choose "Run task on your device"
Here is a sample data for your reference.
Tips! For LinkedIn, it can only be run in your local device. It cannot run in the Cloud because of LinkedIn's anti-scraping settings. |
Tutorial en español: Scrapear posts de LinkedIn
También puedes leer más artículos de web scraping en el sitio web oficial
Is this article helpful? Contact us anytime if you need our help!
Author: Vanny