Scrape public posts from Facebook
FollowFacebook is a platform with a huge amount of user-generated content. There is a whole lot of things you can do with data from Facebook. It can be used to better understand your audience for business and political gains. You can also collect posts of users or in groups and comments to carry out a sentimental analysis.
With Octoparse, you can easily get post info from Facebook by using Octoparse templates. There is no need to configure scraping tasks. Just input the keywords/URLs and wait for the data to be scraped. For further details, you may check it out here: Task Templates
If you want to configure the task from scratch, you can follow the tutorial below. We will show you how to scrape public posts of an account from Facebook. You may want to use the URL as an example:
https://www.facebook.com/cnn/
Here are the 5 main steps in this tutorial [Download task file here]
- "Go To Web Page" - open the target website
- Auto-detect web page - create the workflow
- Modify the XPath of the "Loop Item"
- Modify the settings of "Extract Data"
- Run your task - get data you want
1) "Go To Web Page" - open the target website
- Enter the URL on the home page and click "Start"
Octoparse would automatically load the page in the built-in browser. Scroll down the page manually and you would get a pop-up.
- Switch to Browse mode by clicking
- Click "Not Now" to close the pop-up
- Turn off Browse mode
Tips! If you would like to log in to see more information, follow this tutorial to see how to log in to a website in Octoparse: |
2) Auto-detect the web page - create the workflow
- Click "Auto-detect web page data" and wait for the detection to complete(it may take a bit longer since this page applies infinitive scroll down to load)
- Uncheck the option of "Click on Load More button"
- Click "Edit" under "Add a page scroll"
- Set up to scroll to the bottom, repeats 20 times, wait time as 5s
- Rename or delete fields on the Data preview if needed
3) Modify the XPath of the "Loop Item"
- Enter the "Loop Item" action settings page by clicking on the gear button on the action bar
- Enter the Xpath //div[@role="article"][not(@aria-label="Comment")]/../..
- Click "OK" to save the settings.
Tips! XPath plays an important role in locating the correct elements in Octoparse. You can check the tutorial below to learn about it: |
4) Modify the settings of "Extract Data"
The posting time is scraped as "1h" and it would be hard to identify when the post is uploaded. The detailed time is stored in the source code. We can modify the setting to get it.
- Click open the settings of "Extract Data"
- Click the "Customize XPath" button of the "Post_time"
- Enter the XPath //abbr
- Click the "..." and choose "Customize field"
- Select "Extract attribute"
- Choose the "title" attribute from the drop-down menu
- Click "OK" to confirm
5) Run your task - get data you want
Here is the sample output.
Tutorial en español: Scrapear posts públicos desde Facebook
También puedes leer más artículos de web scraping en el sitio web oficial
Author: Fergus
Editor: Yina