You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!

With a reported, 211 million daily active users, Twitter has proven its worth in social media marketing. Users on Twitter post an average of 6000 tweets every second, making it over 500 million tweets posted daily. All of this chatter and noise is a treasure chest full of valuable information for marketers, brands, researchers, and analysts. Marketers and brands often scrape Twitter data from specific accounts (influencers and competitors) to analyze engagement and plan effective strategies.

Due to popular demand, this tutorial is the first in a series of tutorials that the Octoparse team has prepared for users with a need for Twitter data.

In this post, we are going to teach you how to scrape the followers/following list from any public account.

We will scrape the followers/following list for Nintendo of America. Check out the two sample URLs below:

https://twitter.com/NintendoAmerica/followers

https://twitter.com/NintendoAmerica/following

Note: Although the workflows are extremely similar, you still need to create two separate tasks to scrape the two lists, using different XPaths.


Here are the major steps of the tutorial:

  1. Create a Go to Web Page - to open the target web page

  2. Log into Twitter in browse mode - to save cookies for authentication

  3. Create an Extract data - to scrape the basic info for the public account from the page heading

  4. Auto-detect the web page - to create a workflow

  5. Modify the XPath of Loop Item - to locate the following/followers accurately

  6. Modify the XPath of data fields - to locate the fields accurately

  7. Modify the scroll settings - to scroll the page to load more followers/following

  8. Create another loop - to scrape the data from the first screen of the page

  9. Run the task - to get your desired data


1. Create a Go to Web Page - to open the target web page

Every workflow in Octoparse starts by telling Octoparse a web page to start with.

  • Enter the link of the followers/following list page into the search bar at the top of the home screen and click Start.

mceclip20.png

2. Log into Twitter in browse mode - to save cookies for authentication

Twitter forbids direct access to followers/following lists unless you've logged in first.

  • Toggle on Browse mode and log into Twitter as you do in a normal browser

mceclip4.png
  • Click the Go to Web Page action to open its settings panel (located at the bottom right)

  • Go to the Options tab and tick Use cookies

  • Click Use cookie from the current page

  • Click Apply to save the settings

mceclip5.png
  • Turn off Browse Mode

mceclip3.png

We have now successfully saved the login information in the task workflow so that our Twitter account has been logged in when we run the task.


3. Create an Extract data - to scrape the basic info for the public account from the page heading

  • Click on the display name (e.g., Nintendo of America) and select Extract text of the element

mceclip6.png
  • Repeat the action to get the username

  • Click Add custom fields and select Page URL from Page-level data to get the profile URL

mceclip7.png

Tip: Twitter might change the XPath of the heading area. You might need to rewrite the XPath if the data preview section does not contain the correct information.

For the display name, the XPath is //h2[@dir="auto" and @aria-level="2"]/span

For the username, the XPath is //h2[@dir="auto" and @aria-level="2"]/following-sibling::div/span


4. Auto-detect the web page - to create a workflow

Twitter's infinite scroll pattern is designed to load content dynamically. Octoparse's auto-detection function can easily identify this kind of page and help you quickly create a workflow.

  • Click the Auto-detect web page on the Tips and wait for the detection to complete

Auto.jpg
  • Check the data fields on the Data Preview and delete unwanted fields or rename them if needed

rename.jpg
  • Uncheck Paginate to scrape more pages

  • Click Create workflow

Create.jpg

You will see a workflow created like this:

mceclip0.png

5. Modify the XPath of Loop Item - to locate the following/followers accurately

  • Click Loop Item to open its settings

  • Input the Matching XPath for each follower/following section, which will be

    • //div[@aria-label="Timeline: Followers"]//div[@data-testid="UserCell"] if you are scraping the followers list

    • //div[@aria-label="Timeline: Following"]//div[@data-testid="UserCell"] if you are scraping the following list

  • Click Apply to save the settings

Modify_XPath.jpg

6. Modify the XPath of data fields - to locate the fields accurately

The auto-generated XPath may not be accurate enough as Twitter changes its XPath regularly. We can modify the XPath of data fields to scrape them precisely.

  • Switch to Vertical view for the Data Preview

  • Edit the XPath for each field

Data_fields_XPath.jpg

We have prepared the XPaths for the fields for you. You can just copy and paste it to Octoparse:)

  • Follower/Following name: //div[@dir="ltr"]/preceding::a[1]

  • Follower/Following Username: //div[@dir="ltr"]

  • Follower/Following Bio: //div[@dir="auto"]/span[count(*)=0]/..

  • Follower/Following Avatar: //img[@draggable="true"]

  • Follower/Following URL: //div[@dir="ltr"]/preceding::a[1]


7. Modify the scroll settings - to scroll the page to load more followers/following

  • Click on the Scroll Page to open the settings

  • Set scroll pattern to for one screen and Repeats 600 times (or more)

  • Set wait time to 3s to fully load the list content (Important!)

  • Tick Capture data as page scrolls dynamically to minimize data loss

  • Click Apply to save the settings

scroll.jpg

8. Create another loop - to scrape the data from the first screen of the page

  • Right-click on the Loop Item in the task workflow and select Copy

mceclip15.png
  • Right-click on the Extract Data and select Paste

mceclip0.png

Now we have created the complete workflow.


9. Run the task - to get your desired data

Your final workflow should look like this.

mceclip17.png
  • Click Save on the upper right to save your task

  • Click Run next to it and wait for a Run Task window to pop up

  • Select Run on your device to run the task on your local device

  • Wait for the task to complete

Here is the sample output from a local run.

mceclip18.png

Tip: Local runs are great for task troubleshooting and quick runs. If you are dealing with more complicated tasks, it is recommended that you select Run in the Cloud to run the task in Octoparse's cloud-based platform for higher speed. Try out this premium feature by signing up for the 14-day free trial here. You can also schedule your task to run hourly, daily, or weekly and get data delivered to you regularly.

Did this answer your question?