Scrape photos from an Instagram account
FollowIn this tutorial, we will show you how to scrape the posts from Instagram.
To demonstrate, we will use this URL as an example: https://www.instagram.com/9gag/
For Instagram scraping, you can use our ready-to-use Task Template available on the home page or follow this tutorial to build the task from scratch.
Here are the main steps in this tutorial: [Download task file here]
- Open Target Webpage
- Login and Save Cookies
- Extract Data from Main Page
- Add Click Item to Open Post
- Add A Pagination
- Extract Data from Post
- Rename Field Titles
- Check the Workflow
- Run Task to Extract Data
1. Open Target Webpage
- Paste the URL and click Start
2. Login and Save Cookies
- Turn on Browse mode in the top right corner then switch it off after you log in to your own account
- Go to Options section - tick Use cookie and click Use cookie from the current page - Apply changes
3. Extract Data from Main Page
- Data like account name/ number of posts/ number of followers can be extracted from the main page: click on them then select Extract the text of the element
4. Add Click Item to Open Post
- Add a step of Click in the workflow and paste this XPath under Matching XPath field: //div[1]/div/div[1]/div/div[1]/div/div/div[1]/div[1]/section/main/div/div[3]/article/div[1]/div/div[1]/div[1]/a
- Go to Options section - tick Load with AJAX - adjust Timeout to 10s - Apply changes
5. Add A Pagination
- Open a post by clicking on Click Item in the workflow
- Click on the paging button (>) and select Loop click single button
- Adjust Set AJAX timeout to 10s
6. Extract Data from Post
There are 3 different types of posts on Instagram, and different data will be scraped -
a. post with a single image: image URL/ post content/ number of likes/ post time
b. post with multiple images: image URLs (multiple URLs in one field)/ post content/ number of likes/ post time
c. post with video: post content/ number of views/ post time
a. post with a single image
- Turn on Browse mode - open a post with only one single image (post without
on the top right corner), then turn off Browse mode
- Click on Pagination in the workflow
- Click on post content/ number of likes/ post time and select Extract the text of the element
- For the image URL: click on the image - go to the Tips panel - select the penult DIV - last > - the first DIV in the dropdown - last >- then IMG - select Extract the URL of the selected image
b. post with multiple images
- Turn on Browse mode - open a post with multiple images (post with
on the top right corner), then turn off Browse mode
- Click on the image - go to the Tips panel - click on << - select UL - select Extract inner HTML of the element
- Go to Data Preview - click on ... (more) - select Clean data
- Add Step - Match with Regular Expression
- Select Not sure about RegEx? Try the RegEx tool!
- Input the starting string as [src="] and end string as ["] like below - Generate the regular expression - Apply it
- Tick Match all - Confirm
- Add Step - HTML transcoding - Confirm
Multiple image URLs will be saved into one field.
c. post with video: post content/post time/number of views
- Turn on Browse mode, open a video post, turn off Browse mode, click on the number of views, and select Extract the text of the element
*Note that the video URL cannot be scrapped as there is no video URL in the web page source code.
7. Rename Field Titles and Adjust Post Time
- Go to Data Preview and double click on the field header to rename it
- For post time, x hours ago is not very helpful. You can adjust it by clicking ... - Customize field - Extract attribute - datetime
8. Check the Workflow
- Below is how the final workflow looks like. Once everything is in place, you can continue to run the task.
9. Run Task to Extract Data
- Run the task on the top right corner: Run task on your device to run the task on your local device, or select Run task in the Cloud" to run the task in the Cloud (for premium users only)
Here is the sample output:
Author: Scarlett
Editor: Yina