As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the key Twitter user statistics that you need to know.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The US has historically been the target country for Twitter since its launch in 2006. This is the full breakdown of Twitter users by country.
Social network X/Twitter is particularly popular in the United States, and as of February 2025, the microblogging service had an audience reach of 103.9 million users in the country. Japan and the India were ranked second and third with more than 70 million and 25 million users respectively. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the breakdown of Twitter users by age group.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The platform is male-dominated with 68.1% of all Twitter users being male. Just 31.9% of Twitter users are female.
This dataset is a subset of all the Twitter-users, along with their connections and locations.
The graph created by considering the users as nodes and their connections as edges is a connected component of the total Twitter graph (i.e. for every user in the subgraph, all their connections in the original graph are contained within the subgraph).
Although Twitter is a directed graph (follower-following relation is not mutual. "X follows Y" does not imply "Y follows X"), we have considered the directed edges as undirected. Hence, if u→v is present in the original graph but v→u is not, we have added the edge v→u for every u and v.
There are two directories: one with 10 million users and the other with 1 million. Each directory contains two .txt files: location, and user.
The location file contains the latitude and longitude of each user. The file format is:
lat_1, long_1
lat_2, long_2
lat_3, long_3
...
where lat_1
and long_1
are the latitude and longitude of user number 1 respectively, and so on.
The user file contains the adjacency list of each user. The k-th row of this file enumerates the friends of user number k.
If you use this dataset, please cite it as:
@article{DBLP:journals/pvldb/GhoshACHSL18,
author = {Bishwamittra Ghosh and
Mohammed Eunus Ali and
Farhana Murtaza Choudhury and
Sajid Hasan Apon and
Timos Sellis and
Jianxin Li},
title = {The Flexible Socio Spatial Group Queries},
journal = {{PVLDB}},
volume = {12},
number = {2},
pages = {99--111},
year = {2018},
url = {http://www.vldb.org/pvldb/vol12/p99-ghosh.pdf},
timestamp = {Mon, 03 Dec 2018 16:45:54 +0100},
biburl = {https://dblp.org/rec/bib/journals/pvldb/GhoshACHSL18},
}
@inproceedings{DBLP:conf/kdd/LiWDWC12,
author = {Rui Li and Shengjie Wang and Hongbo Deng and Rui Wang and Kevin Chen-Chuan Chang},
title = {Towards social user profiling: unified and discriminative influence model for inferring home locations},
booktitle = {KDD},
year = {2012},
pages = {1023-1031}
}
How many people use X/Twitter?
As of the first quarter of 2019, X/Twitter averaged 330 million monthly active users, a decline from its all-time high of 336 MAU in the first quarter of 2018. As of the first quarter of 2019, the company switched its user reporting metric to monetizable daily active users (mDAU).
X/Twitter
X/Twitter is a social networking and microblogging service, enabling registered users to read and post short messages called tweets. X/Twitter messages are limited to 280 characters and users are also able to upload photos or short videos. Tweets are posted to a publicly available profile or can be sent as direct messages to other users.
Part of the social platform’s appeal is the ability of users to follow any other user with a public profile, enabling users to interact with celebrities who regularly post on the social media site. Currently, the most-followed person on Twitter is singer Katy Perry with more than 107 million followers. Twitter has also become an important communications channel for governments and heads of state – U.S. President Donald Trump was the most-followed world leader on Twitter, followed by Pope Francis and Indian Prime Minister Narendra Modi.
Despite the widespread usage among the rich and famous, the decline in active users has not been impressing investors as the platform is largely reliant on delivering advertising to users in order to generate revenues. Twitter’s company revenue in 2018 amounted to three billion U.S. dollars, up from 2.44 billion in the preceding fiscal year. Twitter was only recently able to report a positive annual result for the first time, when the company generated 1.2 billion U.S. dollars in net income in 2018.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Twitter is ranked as the 12h most popular social media site in the world. The platform currently has 611 million active monthly users.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval
This repo contains the TwitterFaveGraph dataset from our paper kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval. [PDF] [HuggingFace Datasets] This work is licensed under a Creative Commons Attribution 4.0 International License.
TwitterFollowGraph
TwitterFollowGraph is a bipartite directed graph of users (consumer) nodes to author (producer) nodes… See the full description on the dataset page: https://huggingface.co/datasets/Twitter/TwitterFollowGraph.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Twitter follower-followee graph with 269,640 nodes and 6,818,501 edges from [Kwak], and we obtain the ground truth labels from [SybilSCAR]. Among them 178377 are benign and 91263 are Sybil. We divide 9000 Sybil and 17000 benign users (about 10%) from them as the training set and test on the overall social graph.
H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a news media?” in WWW, 2010 B. Wang, L. Zhang, and N. Z. Gong, “SybilSCAR: Sybil detection in online social networks via local rule based propagation,” in IEEE INFOCOM, 2017.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The US has the largest number of Twitter users with over a 100 million users. They account for about 16.7% of all Twitter users worldwide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Online Social Networks (OSN) are used by millions of users, daily. This user-base shares and discovers different opinions on popular topics. Social influence of large groups may be influenced by user believes or be attracted the interest in particular news or products. A large number of users, gathered in a single group or number of followers, increases the probability to influence OSN users. Botnets, collections of automated accounts controlled by a single agent, are a common mechanism for exerting maximum influence. Botnets may be used to better infiltrate the social graph over time and create an illusion of community behaviour, amplifying their message and increasing persuasion. This paper investigates Twitter botnets, their behavior, their interaction with user communities and their evolution over time. We analyze a dense crawl of a subset of Twitter traffic, amounting to nearly all interactions by Greek-speaking Twitter users for a period of 36 months. The collected users are labeled as botnets, based on long term and frequent content similarity events. We detect over a million events, where seemingly unrelated accounts tweeted nearly identical content, at almost the same time. We filter these concurrent content injection events and detect a set of 1,850 accounts that repeatedly exhibit this pattern of behavior, suggesting that they are fully or in part controlled and orchestrated by the same entity. We find botnets that appear for brief intervals and disappear, as well as botnets that evolve and grow, spanning the duration of our dataset. We analyze statistical differences between the bot accounts and human users, as well as the botnet interactions with the user communities and the Twitter trending topics.
As of February 2025, 37.5 percent of X’s (formerly Twitter) global audience was aged between 25 and 34 years. The second-largest age group demographic on the platform was represented by users aged between 18 and 24 years, with a share of 32.1 percent. Users aged less than 18 years accounted for two percent of users, while those aged 50 or older accounted for roughly 7.3 percent. X is a male-dominated platform As of January 2024, more than 60 percent of X users were male. Although all mainstream social media platforms tend to have a slightly more male-skewing audience, X stands out above Instagram, Snapchat, TikTok, and Facebook when it comes to user gender demographics. Overall, Pinterest is the only mainstream platform to have a higher share of female users. X Blue for you It is not uncommon for social media users to now have the chance to become subscribers of their chosen online networks for a monthly fee. X Blue is a subscription service from X that gives users special benefits and features. A blue verification mark, edit post functionality, fewer ads, priority ranking in chats, and longer video upload times are some of the perks offered.
The number of Twitter users in the United Kingdom was forecast to continuously increase between 2024 and 2028 by in total 0.9 million users (+5.1 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 18.55 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
*** Fake News on Twitter ***
These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:
1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.
2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."
3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.
4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.
5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.
The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).
DD
DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:
The structure of excel files for each dataset is as follow:
Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:
User ID (user who has posted the current tweet/retweet)
The description sentence in the profile of the user who has published the tweet/retweet
The number of published tweet/retweet by the user at the time of posting the current tweet/retweet
Date and time of creation of the account by which the current tweet/retweet has been posted
Language of the tweet/retweet
Number of followers
Number of followings (friends)
Date and time of posting the current tweet/retweet
Number of like (favorite) the current tweet had been acquired before crawling it
Number of times the current tweet had been retweeted before crawling it
Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)
The source (OS) of device by which the current tweet/retweet was posted
Tweet/Retweet ID
Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)
Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)
Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)
Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)
State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):
r : The tweet/retweet is a fake news post
a : The tweet/retweet is a truth post
q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it
n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)
DG
DG for each fake news contains two files:
A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)
A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)
Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.
The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Abstract (our paper)
Why does Smith follow Johnson on Twitter? In most cases, the reason why users follow other users is unavailable. In this work, we answer this question by proposing TagF, which analyzes the who-follows-whom network (matrix) and the who-tags-whom network (tensor) simultaneously. Concretely, our method decomposes a coupled tensor constructed from these matrix and tensor. The experimental results on million-scale Twitter networks show that TagF uncovers different, but explainable reasons why users follow other users.
Data
coupled_tensor:
The first column is the source user id (from user id), the second column is the destination user id (to user id), and the third column is the tag id.
users.id:
The first column is the user id for coupled_tensor, and the second column is the user id on Twitter.
tags.id:
The first column is the tag id for coupled_tensor, and the second column is the tag (i.e. slug or list name) on Twitter. On the tags, ###follow### and ###friend### are special tags expressing follower and following.
Publication
This dataset was created for our study. If you make use of this dataset, please cite:
Yuto Yamaguchi, Mitsuo Yoshida, Christos Faloutsos, Hiroyuki Kitagawa. Why Do You Follow Him? Multilinear Analysis on Twitter. Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). pp.137-138, 2015.
http://doi.org/10.1145/2740908.2742715
Code
Our code outputting experiment results made available at:
https://github.com/yamaguchiyuto/tagf
Note
If you would like to use larger dataset, the dataset on 1 million seed users made available at:
http://dx.doi.org/10.5281/zenodo.16267
(The dataset on 0.1 million seed users is not subset of the dataset on 1 million seed users.)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is structured as a graph, where nodes represent users and edges capture their interactions, including tweets, retweets, replies, and mentions. Each node provides detailed user attributes, such as unique ID, follower and following counts, and verification status, offering insights into each user's identity, role, and influence in the mental health discourse. The edges illustrate user interactions, highlighting engagement patterns and types of content that drive responses, such as tweet impressions. This interconnected structure enables sentiment analysis and public reaction studies, allowing researchers to explore engagement trends and identify the mental health topics that resonate most with users.
The dataset consists of three files: 1. Edges Data: Contains graph data essential for social network analysis, including fields for UserID (Source), UserID (Destination), Post/Tweet ID, and Date of Relationship. This file enables analysis of user connections without including tweet content, maintaining compliance with Twitter/X’s data-sharing policies. 2. Nodes Data: Offers user-specific details relevant to network analysis, including UserID, Account Creation Date, Follower and Following counts, Verified Status, and Date Joined Twitter. This file allows researchers to examine user behavior (e.g., identifying influential users or spam-like accounts) without direct reference to tweet content. 3. Twitter/X Content Data: This file contains only the raw tweet text as a single-column dataset, without associated user identifiers or metadata. By isolating the text, we ensure alignment with anonymization standards observed in similar published datasets, safeguarding user privacy in compliance with Twitter/X's data guidelines. This content is crucial for addressing the research focus on mental health discourse in social media. (References to prior Data in Brief publications involving Twitter/X data informed the dataset's structure.)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 653 996 tweets related to the Coronavirus topic and highlighted by hashtags such as: #COVID-19, #COVID19, #COVID, #Coronavirus, #NCoV and #Corona. The tweets' crawling period started on the 27th of February and ended on the 25th of March 2020, which is spread over four weeks.
The tweets were generated by 390 458 users from 133 different countries and were written in 61 languages. English being the most used language with almost 400k tweets, followed by Spanish with around 80k tweets.
The data is stored in as a CSV file, where each line represents a tweet. The CSV file provides information on the following fields:
Author: the user who posted the tweet
Recipient: contains the name of the user in case of a reply, otherwise it would have the same value as the previous field
Tweet: the full content of the tweet
Hashtags: the list of hashtags present in the tweet
Language: the language of the tweet
Relationship: gives information on the type of the tweet, whether it is a retweet, a reply, a tweet with a mention, etc.
Location: the country of the author of the tweet, which is unfortunately not always available
Date: the publication date of the tweet
Source: the device or platform used to send the tweet
The dataset can as well be used to construct a social graph since it includes the relations "Replies to", "Retweet", "MentionsInRetweet" and "Mentions".
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Twitter_2010 data set contains tweets containing URLs that have been posted on Twitter during October 2010. In addition to tweets, we also the followee links of tweeting users, allowing us to reconstruct the follower graph of active (tweeting) users. URLs 66,059 tweets 2,859,764 users 736,930 links 36,743,448 Tweets Table (in csv format) link_status_search_with_ordering_real_csv contains tweets with the following information link: URL within the text of the tweet id: tweet id create_at: date added to the db create_at_long inreplyto_screen_name: screen name of user this tweet is replying to inreplyto_user_id: user id of user this tweet is replying to source: device from which the tweet originated bad_user_id: alternate user id user_screen_name: tweeting user screen name order_of_users: tweet s index within sequence of tweets of the same URL user_id: user id Table (in csv format) distinct_users_from_search_table_real_map contains names of tweeting users, and the following information for
As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.