Scrape post data from Instagram
FollowInstagram is a popular photo and video sharing social media website. In this tutorial, you will learn how to create a crawler to scrape post content, photo URL, number of likes, etc. from an Instagram account.
You can also go to "Task Templates" on the main screen of the Octoparse and start with the ready-to-use Instagram Template directly to save your time. With this template, there is no need to configure scraping tasks. For further details, you may check it out here: Task Templates
If you would like to know how to build the task from scratch, you may continue reading the following tutorial. To illustrate, we will use this URL as an example: https://www.instagram.com/izkiz/
You will need to log in to Instagram before viewing other accounts' posts, so please prepare an account before starting.
Let's get started with the main steps in this tutorial to start your task. [Download demo task file here]
- "Go to Web Page" - Open the target web page
- Log in to your Instagram account
- "Extract Data" - extract basic information of the poster
- "Click Item" - click open the first post
- "Extract Data1" - extract post data
- "Pagination" - click the next page button to scrape more posts
- Start extraction - run the task and get data
1. Go to Web Page - Open the target web page
- Enter the URL on the home page and click Start
2. Log in to your Instagram account
Instagram requires people to log in first before accessing the data we want. In this case tutorial, we will save the cookies to log in to Instagram.
- Switch to Browse mode by clicking
- Enter your Instagram account and password on the web page manually
- Click "Log In"
- Click the
on the "Go to Web Page" action
- Tick "Use Cookie"
- Click "Use cookie from the current page"
- Click "OK" to confirm
Tips! Octoparse has different ways to deal with data behind log-in. You can explore more in this tutorial to add log-in steps to the workflow: Scrape data behind a login |
*After saving the cookies, please remember to turn off the "Browse mode" to continue the next steps.
3. "Extract Data" - extract basic information of the poster
- Select information on the web page
- Choose "Extract text of the selected element"
- Repeat the above steps to extract all the data you need
- Rename the fields if needed
4. "Click Item" - click open the first post
- Add a "Click Item" in the workflow
- Click the icon
on the "Click Item2"
- Click
- Enter the XPath: //*[@id="react-root"]/section/main/div/div[3]/article/div[1]/div/div[1]/div[1]/a
- Set up AJAX timeout as 5-7s
- Click "OK" to confirm
The first post would be opened automatically.
5. "Extract Data1" - extract post data
- Select post information on the web page
- Choose "Extract text of the selected element"
- Repeat the above steps to extract all the data you need
Image scraping- scraping the image URL of the post is a little bit tricky in this case.
- Select the image first
- Click the arrow of left to the last DIV tag on the Tips panel
- Click the first DIV tag on the pop-up
- Click the arrow right to the last DIV tag and select IMG on the pop-up
- Choose "Extract the URL of the selected image"
Post time scraping- the text shown on the page is like "6d", which is hard for us to know the exact post time. We can scrape the detailed post date and time from the source code.
- Click open the settings of "Extract Data 1"
- Click "..." and select "Customize field"
- Choose "Extract attribute" and select the "datetime"(date and time) or "title"(date only) from the drop-down menu according to your needs
6. "Pagination" - click the next page button to scrape more posts
- Click on the next page button
- Select "Loop click next page" on the Tips panel
- Extend the AJAX timeout as 7-10s
- Drag the "Extract Data1" into the Pagination
7. Start extraction - run the task and get data
- Click"Save"
- Click "Run" on the upper left side
- Select "Run task on your device
" to run the task on your computer, or select "Run task in the cloud
" to run the task in the Cloud (for premium users only)
Tutorial en español: Extraer datos de post de Instagram
También puedes leer más artículos de web scraping en el sitio web oficial
Is this article helpful? Contact us anytime if you need our help!
Author: Yina