Scrape the replies of a Tweet
FollowYou are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
Twitter is a free social networking site where users broadcast short posts known as tweets. Users on Twitter post an average of 6000 tweets every second, which makes it over 500 million tweets posted each day. These tweets can contain text, videos, photos or links and users can interact with each other by replying to tweets.
In this tutorial, we will show you how to scrape the replies of a tweet on Twitter.
To follow through the tutorial, you may want to use the URL below:
https://twitter.com/_aompp_/status/1522207142975094784
Note: If you want to check whether your workflow works correctly, please download the OTD file for this case at the bottom of this page.
Here are the main steps of this tutorial:
- Create a Go to Web Page - to open the target website
- Auto-detect the webpage - to create a workflow
- Modify the Settings of Page Scroll-down - to better scroll the page and fully load data
- Modify the XPath of the loop - to locate the data field(s) more accurately
- Run the task - to get your desired data
1. Create a Go to Web Page - to open the target website
- Enter the target URL into the search bar on the home screen and click Start
2. Auto-detect the webpage - to create a workflow
Octoparse's Auto-detection function can help you create a workflow quickly according to the design of the target website.
- Click Auto-detect web page data in Tips and wait for the detection to complete
- Check the data fields in Data preview and delete unwanted fields or rename them if needed
- Uncheck Click on a "Load More" button
- Click Create workflow
3. Modify the settings of page scroll-down - to better scroll down the page and fully load data
- Click on Scroll Page
- Tick Scroll for one page
- Set the Wait time: 2-3s recommended
- Tick Capture data as page scrolls dynamically (possibly duplicates) - Important!
- Click Apply to save the change
Note: Check out here to find out more about extracting data while scrolling the page.
4. Modify the XPath of the loop - to locate the data field(s) more accurately
- Click on Loop Item in the workflow
- Input the Matching XPath as: //article[@tabindex]
- Click Apply to save the change
5. Run the task - to get your desired data
- Click Save on the upper right to save your task
- Click Run next to it and wait for a Run Task window to pop up
- Select Run on your device to run the task on your local device
- Wait for the task to complete
Here is the sample output from a local run:
Tip: Local runs are great for task troubleshooting and quick runs. If you are dealing with more complicated tasks, it is recommended that you select Run in the Cloud to run the task in Octoparse's cloud-based platform for higher speed. Try out this premium feature by signing up for the 14-day free trial here. You can also schedule your tasks to run hourly, daily, or weekly and get data delivered to you regularly.
If you have further issues with the task or have a suggestion that would make this a better resource for you, we’d love to hear about it. Submit a request here.
Author: Cassie
Editor: Yina