100+ datasets found

Twitter follower-followee graph, labeled with benign/Sybil
figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haoyu Lu (2023). Twitter follower-followee graph, labeled with benign/Sybil [Dataset]. http://doi.org/10.6084/m9.figshare.20057300.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20057300.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Haoyu Lu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A Twitter follower-followee graph with 269,640 nodes and 6,818,501 edges from [Kwak], and we obtain the ground truth labels from [SybilSCAR]. Among them 178377 are benign and 91263 are Sybil. We divide 9000 Sybil and 17000 benign users (about 10%) from them as the training set and test on the overall social graph.

H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a news media?” in WWW, 2010 B. Wang, L. Zhang, and N. Z. Gong, “SybilSCAR: Sybil detection in online social networks via local rule based propagation,” in IEEE INFOCOM, 2017.
SignedGraphs
huggingface.co
Updated Mar 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Twitter (2023). SignedGraphs [Dataset]. https://huggingface.co/datasets/Twitter/SignedGraphs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2023
Dataset provided by
Xhttp://x.com/
Authors
Twitter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Learning Stance Embeddings from Signed Social Graphs

This repo contains the datasets from our paper Learning Stance Embeddings from Signed Social Graphs. [PDF] [HuggingFace Datasets] This work is licensed under a Creative Commons Attribution 4.0 International License.

Overview

A key challenge in social network analysis is understanding the position, or stance, of people in the graph on a large set of topics. In such social graphs, modeling (dis)agreement patterns… See the full description on the dataset page: https://huggingface.co/datasets/Twitter/SignedGraphs.
Z
COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel...
data.niaid.nih.gov
zenodo.org
Updated Jan 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Habiba Drias (2021). COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel CoronaVirus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4024176
Explore at:
Dataset updated
Jan 23, 2021
Dataset provided by
Habiba Drias
Yassine Drias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains 653 996 tweets related to the Coronavirus topic and highlighted by hashtags such as: #COVID-19, #COVID19, #COVID, #Coronavirus, #NCoV and #Corona. The tweets' crawling period started on the 27th of February and ended on the 25th of March 2020, which is spread over four weeks.

The tweets were generated by 390 458 users from 133 different countries and were written in 61 languages. English being the most used language with almost 400k tweets, followed by Spanish with around 80k tweets.

The data is stored in as a CSV file, where each line represents a tweet. The CSV file provides information on the following fields:

Author: the user who posted the tweet

Recipient: contains the name of the user in case of a reply, otherwise it would have the same value as the previous field

Tweet: the full content of the tweet

Hashtags: the list of hashtags present in the tweet

Language: the language of the tweet

Relationship: gives information on the type of the tweet, whether it is a retweet, a reply, a tweet with a mention, etc.

Location: the country of the author of the tweet, which is unfortunately not always available

Date: the publication date of the tweet

Source: the device or platform used to send the tweet

The dataset can as well be used to construct a social graph since it includes the relations "Replies to", "Retweet", "MentionsInRetweet" and "Mentions".
f
NLP feature set variables for TwiBot-20.
plos.figshare.com
xls
Updated Dec 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agata Skorupka (2024). NLP feature set variables for TwiBot-20. [Dataset]. http://doi.org/10.1371/journal.pone.0315849.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315849.t001
Dataset updated
Dec 23, 2024
Dataset provided by
PLOS ONE
Authors
Agata Skorupka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The study examines different graph-based methods of detecting anomalous activities on digital markets, proposing the most efficient way to increase market actors’ protection and reduce information asymmetry. Anomalies are defined below as both bots and fraudulent users (who can be both bots and real people). Methods are compared against each other, and state-of-the-art results from the literature and a new algorithm is proposed. The goal is to find an efficient method suitable for threat detection, both in terms of predictive performance and computational efficiency. It should scale well and remain robust on the advancements of the newest technologies. The article utilized three publicly accessible graph-based datasets: one describing the Twitter social network (TwiBot-20) and two describing Bitcoin cryptocurrency markets (Bitcoin OTC and Bitcoin Alpha). In the former, an anomaly is defined as a bot, as opposed to a human user, whereas in the latter, an anomaly is a user who conducted a fraudulent transaction, which may (but does not have to) imply being a bot. The study proves that graph-based data is a better-performing predictor than text data. It compares different graph algorithms to extract feature sets for anomaly detection models. It states that methods based on nodes’ statistics result in better model performance than state-of-the-art graph embeddings. They also yield a significant improvement in computational efficiency. This often means reducing the time by hours or enabling modeling on significantly larger graphs (usually not feasible in the case of embeddings). On that basis, the article proposes its own graph-based statistics algorithm. Furthermore, using embeddings requires two engineering choices: the type of embedding and its dimension. The research examines whether there are types of graph embeddings and dimensions that perform significantly better than others. The solution turned out to be dataset-specific and needed to be tailored on a case-by-case basis, adding even more engineering overhead to using embeddings (building a leaderboard of grid of embedding instances, where each of them takes hours to be generated). This, again, speaks in favor of the proposed algorithm based on nodes’ statistics. The research proposes its own efficient algorithm, which makes this engineering overhead redundant.
X/Twitter: number of worldwide users 2019-2024
statista.com
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). X/Twitter: number of worldwide users 2019-2024 [Dataset]. https://www.statista.com/statistics/303681/twitter-users-worldwide/
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2022
Area covered
Worldwide
Description
As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.
Following/Followers and Tags on 0.1 million Twitter Users
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mitsuo Yoshida; Yuto Yamaguchi; Mitsuo Yoshida; Yuto Yamaguchi (2020). Following/Followers and Tags on 0.1 million Twitter Users [Dataset]. http://doi.org/10.5281/zenodo.13966
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13966
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mitsuo Yoshida; Yuto Yamaguchi; Mitsuo Yoshida; Yuto Yamaguchi
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Abstract (our paper)

Why does Smith follow Johnson on Twitter? In most cases, the reason why users follow other users is unavailable. In this work, we answer this question by proposing TagF, which analyzes the who-follows-whom network (matrix) and the who-tags-whom network (tensor) simultaneously. Concretely, our method decomposes a coupled tensor constructed from these matrix and tensor. The experimental results on million-scale Twitter networks show that TagF uncovers different, but explainable reasons why users follow other users.

Data

coupled_tensor:
The first column is the source user id (from user id), the second column is the destination user id (to user id), and the third column is the tag id.

users.id:
The first column is the user id for coupled_tensor, and the second column is the user id on Twitter.

tags.id:
The first column is the tag id for coupled_tensor, and the second column is the tag (i.e. slug or list name) on Twitter. On the tags, ###follow### and ###friend### are special tags expressing follower and following.

Publication

This dataset was created for our study. If you make use of this dataset, please cite:
Yuto Yamaguchi, Mitsuo Yoshida, Christos Faloutsos, Hiroyuki Kitagawa. Why Do You Follow Him? Multilinear Analysis on Twitter. Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). pp.137-138, 2015.
http://doi.org/10.1145/2740908.2742715

Code

Our code outputting experiment results made available at:
https://github.com/yamaguchiyuto/tagf

Note

If you would like to use larger dataset, the dataset on 1 million seed users made available at:
http://dx.doi.org/10.5281/zenodo.16267
(The dataset on 0.1 million seed users is not subset of the dataset on 1 million seed users.)
Undirected Node Attributed Social Network Graph of Twitter Users interested...
zenodo.org
data.europa.eu
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ilias Dimitriadis; Ilias Dimitriadis (2020). Undirected Node Attributed Social Network Graph of Twitter Users interested in plastic pollution - created in the framework of the PlasticTwist project [Dataset]. http://doi.org/10.5281/zenodo.3611146
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3611146
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ilias Dimitriadis; Ilias Dimitriadis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset has been created in the framework of the Plastic Twist project (Ptwist) and more specifically using the Ptwist crowdsourcing application (crowdsourcing.plastictwist.com/). We are sharing the edge list and specific node attributes (hashtags) of Twitter users posting about plastic pollution. The dataset can be used for community detection,clustering, node importance, influence maximization tasks, etc. Each user is represented by a unique integer which has nothing to do with the official Twitter user ID. The dataset contains three (3) files:

ptwist.edgelist: A list containing all the 1,362,863 edges between the users. When loaded they create an undirected graph of 800K+ users.

node_attributes.txt: This file contains information about the hashtags used by each user. (e.g. "652003": ["SingleUsePlastic"] -> user 6529003 has used the hashtag SingleUsePlastic)

annotated_graph: A pickle file which, when loaded, returns a NetworkX node attributed undirected graph.
Twitter Connections with User Location
kaggle.com
Updated Feb 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sajid Hasan Apon (2019). Twitter Connections with User Location [Dataset]. https://www.kaggle.com/sajidhasanapon/twitter-connections-with-user-location/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 13, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sajid Hasan Apon
Description
Content

This dataset is a subset of all the Twitter-users, along with their connections and locations.

The graph created by considering the users as nodes and their connections as edges is a connected component of the total Twitter graph (i.e. for every user in the subgraph, all their connections in the original graph are contained within the subgraph).

Although Twitter is a directed graph (follower-following relation is not mutual. "X follows Y" does not imply "Y follows X"), we have considered the directed edges as undirected. Hence, if u→v is present in the original graph but v→u is not, we have added the edge v→u for every u and v.

There are two directories: one with 10 million users and the other with 1 million. Each directory contains two .txt files: location, and user.

The location file contains the latitude and longitude of each user. The file format is:

lat_1, long_1 lat_2, long_2 lat_3, long_3 ...

where lat_1 and long_1 are the latitude and longitude of user number 1 respectively, and so on.

The user file contains the adjacency list of each user. The k-th row of this file enumerates the friends of user number k.

Bibliographic References

If you use this dataset, please cite it as:

@article{DBLP:journals/pvldb/GhoshACHSL18, author = {Bishwamittra Ghosh and Mohammed Eunus Ali and Farhana Murtaza Choudhury and Sajid Hasan Apon and Timos Sellis and Jianxin Li}, title = {The Flexible Socio Spatial Group Queries}, journal = {{PVLDB}}, volume = {12}, number = {2}, pages = {99--111}, year = {2018}, url = {http://www.vldb.org/pvldb/vol12/p99-ghosh.pdf}, timestamp = {Mon, 03 Dec 2018 16:45:54 +0100}, biburl = {https://dblp.org/rec/bib/journals/pvldb/GhoshACHSL18}, }

Source Publications

@inproceedings{DBLP:conf/kdd/LiWDWC12, author = {Rui Li and Shengjie Wang and Hongbo Deng and Rui Wang and Kevin Chen-Chuan Chang}, title = {Towards social user profiling: unified and discriminative influence model for inferring home locations}, booktitle = {KDD}, year = {2012}, pages = {1023-1031} }
m
Graph-Based Social Media Data on Mental Health Topics
data.mendeley.com
Updated Nov 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Ady Sanjaya (2024). Graph-Based Social Media Data on Mental Health Topics [Dataset]. http://doi.org/10.17632/z45txpdp7f.2
Explore at:
Unique identifier
https://doi.org/10.17632/z45txpdp7f.2
Dataset updated
Nov 4, 2024
Authors
Samuel Ady Sanjaya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is structured as a graph, where nodes represent users and edges capture their interactions, including tweets, retweets, replies, and mentions. Each node provides detailed user attributes, such as unique ID, follower and following counts, and verification status, offering insights into each user's identity, role, and influence in the mental health discourse. The edges illustrate user interactions, highlighting engagement patterns and types of content that drive responses, such as tweet impressions. This interconnected structure enables sentiment analysis and public reaction studies, allowing researchers to explore engagement trends and identify the mental health topics that resonate most with users.

The dataset consists of three files: 1. Edges Data: Contains graph data essential for social network analysis, including fields for UserID (Source), UserID (Destination), Post/Tweet ID, and Date of Relationship. This file enables analysis of user connections without including tweet content, maintaining compliance with Twitter/X’s data-sharing policies. 2. Nodes Data: Offers user-specific details relevant to network analysis, including UserID, Account Creation Date, Follower and Following counts, Verified Status, and Date Joined Twitter. This file allows researchers to examine user behavior (e.g., identifying influential users or spam-like accounts) without direct reference to tweet content. 3. Twitter/X Content Data: This file contains only the raw tweet text as a single-column dataset, without associated user identifiers or metadata. By isolating the text, we ensure alignment with anonymization standards observed in similar published datasets, safeguarding user privacy in compliance with Twitter/X's data guidelines. This content is crucial for addressing the research focus on mental health discourse in social media. (References to prior Data in Brief publications involving Twitter/X data informed the dataset's structure.)
Data from: Discovery and classification of Twitter bots
data.europa.eu
unknown
Updated Apr 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2021). Discovery and classification of Twitter bots [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4715885?locale=de
Explore at:
unknown(22740)Available download formats
Dataset updated
Apr 23, 2021
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Online Social Networks (OSN) are used by millions of users, daily. This user-base shares and discovers different opinions on popular topics. Social influence of large groups may be influenced by user believes or be attracted the interest in particular news or products. A large number of users, gathered in a single group or number of followers, increases the probability to influence OSN users. Botnets, collections of automated accounts controlled by a single agent, are a common mechanism for exerting maximum influence. Botnets may be used to better infiltrate the social graph over time and create an illusion of community behaviour, amplifying their message and increasing persuasion. This paper investigates Twitter botnets, their behavior, their interaction with user communities and their evolution over time. We analyze a dense crawl of a subset of Twitter traffic, amounting to nearly all interactions by Greek-speaking Twitter users for a period of 36 months. The collected users are labeled as botnets, based on long term and frequent content similarity events. We detect over a million events, where seemingly unrelated accounts tweeted nearly identical content, at almost the same time. We filter these concurrent content injection events and detect a set of 1,850 accounts that repeatedly exhibit this pattern of behavior, suggesting that they are fully or in part controlled and orchestrated by the same entity. We find botnets that appear for brief intervals and disappear, as well as botnets that evolve and grow, spanning the duration of our dataset. We analyze statistical differences between the bot accounts and human users, as well as the botnet interactions with the user communities and the Twitter trending topics.
f
Results for TwiBot-20 dataset—best 10 models.
plos.figshare.com
xls
Updated Dec 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agata Skorupka (2024). Results for TwiBot-20 dataset—best 10 models. [Dataset]. http://doi.org/10.1371/journal.pone.0315849.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315849.t004
Dataset updated
Dec 23, 2024
Dataset provided by
PLOS ONE
Authors
Agata Skorupka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Full results are available in S1 Appendix as Table 2d.
o
Data from: TwiBot22: Towards Graph-Based Twitter Bot Detection
explore.openaire.eu
Updated Aug 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Feng Shangbin; Tan Zhaoxuan; Wan Herun; Wang Ningnan; Chen Zilong; Zhang Binchi (2022). TwiBot22: Towards Graph-Based Twitter Bot Detection [Dataset]. http://doi.org/10.5281/zenodo.7012904
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7012904
Dataset updated
Aug 20, 2022
Authors
Feng Shangbin; Tan Zhaoxuan; Wan Herun; Wang Ningnan; Chen Zilong; Zhang Binchi
Description
Twitter bot detection has become an increasingly important task to combat misinformation, facilitate social media moderation, and preserve the integrity of the online discourse. State-of-the-art bot detection methods generally leverage the graph structure of the Twitter network, and they exhibit promising performance when confronting novel Twitter bots that traditional methods fail to detect. However, very few of the existing Twitter bot detection datasets are graph-based, and even these few graph-based datasets suffer from limited dataset scale, incomplete graph structure, as well as low annotation quality. In fact, the lack of a large-scale graph-based Twitter bot detection benchmark that addresses these issues has seriously hindered the development and evaluation of novel graph-based bot detection approaches. In this paper, we propose TwiBot-22, a comprehensive graph-based Twitter bot detection benchmark that presents the largest dataset to date, provides diversified entities and relations on the Twitter network, and has considerably better annotation quality than existing datasets. In addition, we re-implement 35 representative Twitter bot detection baselines and evaluate them on 9 datasets, including TwiBot-22, to promote a fair comparison of model performance and a holistic understanding of research progress. To facilitate further research, we consolidate all implemented codes and datasets into the TwiBot-22 evaluation framework, where researchers could consistently evaluate new models and datasets. Due to the limitation at Zenodo, you can access the whole dataset and more information on this website.
Z
A study on real graphs of fake news spreading on Twitter
data.niaid.nih.gov
Updated Aug 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3711599
Explore at:
Dataset updated
Aug 20, 2021
Dataset authored and provided by
Amirhosein Bodaghi
Description
*** Fake News on Twitter ***

These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

DD

DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

The structure of excel files for each dataset is as follow:

Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:

User ID (user who has posted the current tweet/retweet)

The description sentence in the profile of the user who has published the tweet/retweet

The number of published tweet/retweet by the user at the time of posting the current tweet/retweet

Date and time of creation of the account by which the current tweet/retweet has been posted

Language of the tweet/retweet

Number of followers

Number of followings (friends)

Date and time of posting the current tweet/retweet

Number of like (favorite) the current tweet had been acquired before crawling it

Number of times the current tweet had been retweeted before crawling it

Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)

The source (OS) of device by which the current tweet/retweet was posted

Tweet/Retweet ID

Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)

Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)

Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)

Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)

State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):

r : The tweet/retweet is a fake news post

a : The tweet/retweet is a truth post

q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it

n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

DG

DG for each fake news contains two files:

A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)

A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
s
How Popular Is Twitter In The World?
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). How Popular Is Twitter In The World? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Twitter is ranked as the 12h most popular social media site in the world. The platform currently has 611 million active monthly users.
Data from: OKG: A Knowledge Graph for Fine-grained Understanding of Social...
zenodo.org
data.niaid.nih.gov
+1more
Updated Jun 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inès Blin; Inès Blin; Lise Stork; Lise Stork; Laura Spillner; Laura Spillner; Carlo Romano Marcello Alessandro Santagiustina; Carlo Romano Marcello Alessandro Santagiustina (2024). OKG: A Knowledge Graph for Fine-grained Understanding of Social Media Discourse on Inequality [Dataset]. http://doi.org/10.5281/zenodo.10034210
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10034210
Dataset updated
Jun 9, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Inès Blin; Inès Blin; Lise Stork; Lise Stork; Laura Spillner; Laura Spillner; Carlo Romano Marcello Alessandro Santagiustina; Carlo Romano Marcello Alessandro Santagiustina
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Time period covered
Oct 24, 2023
Description
The Observatory Knowledge Graph (OKG) is a knowledge graph with tweets on inequality in terms of the OBIO ontology (https://w3id.org/okg/obio-ontology/), which integrates social media metadata with various types of linguistic knowledge. The OKG can be used as the backbone of a social media observatory, to facilitate a deeper understanding of social media discourse on inequality.

We retrieved tweets and retweets published from the end (30th) of May 2020 to the beginning (1st) of May 2023.

In this version of the OKG, we use a sample of 85,247 tweets, published from May 30th to August 27th, 2020. To be compliant with Twitter's policies, we remove usernames and id's, as well as the tweet texts and sentences. We also replace user IRIs with skolem IRIs through skolemization.

Access to the OKG as well as the SPARQL endpoint can be requested by sending a mail to the contact person (l.stork@uva.nl) with the following information:

A description of the use case

Affiliation of the researchers involved

How their work is in line with Twitter's policies: https://developer.twitter.com/en/developer-terms/policy#4-d
s
Why Do People Use Twitter?
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Why Do People Use Twitter? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
One of the biggest advantages of Twitter is the speed at which information can be passed around. People use Twitter primarily to get news and for entertainment. This is the breakdown of why people use Twitter today.
T
Twitter Statistics
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Search Logistics (2025). Twitter Statistics [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
Dataset authored and provided by
Search Logistics
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These Twitter user statistics will give you the complete story of where Twitter is at today and what the future looks like for the social media company.
Data from: What Tweets and YouTube comments have in common? Sentiment and...
zenodo.org
bin, csv, doc
Updated Apr 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shevtsov Alexander, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, Sotiris Ioannidis; Shevtsov Alexander, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, Sotiris Ioannidis (2021). What Tweets and YouTube comments have in common? Sentiment and Graph analysis on data related to US Elections 2020. [Dataset]. http://doi.org/10.5281/zenodo.4618233
Explore at:
bin, doc, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4618233
Dataset updated
Apr 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shevtsov Alexander, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, Sotiris Ioannidis; Shevtsov Alexander, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, Sotiris Ioannidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States, YouTube
Description
The presidential elections in the United States on November 3rd 2020 caused extensive discussions on social media. A part of the content on US elections is organic, coming from users discussing their opinions on the candidates, political positions, or relevant content presented on television. Another significant part originates from organized campaigns, both official, including communication campaigns and dissemination, or unofficial, including astroturfing and targeting manipulation of the electorate.

In this study, we obtain approximately 19.8M tweets from 4.5M users, based on prevalent hashtags related to the 2020 US election. From these, we mined 28.343 YouTube links tweeted and obtained likes, dislikes and comments of these videos. In this paper, we study the connection between the two social networks. We employ an array of techniques, including volume analysis, exploring the retweet graph, sentiment and graph analysis on the communities formed in YouTube and Twitter. Furthermore, we propose a method to combine the results of community detection on the two social networks and measure the differences between them.

Particularly, we study the daily traffic per prevalent hashtags, plot the retweet graph from July to November 2020, highlight the two main entities (‘Biden’ and ‘Trump’) and show how the discussion around those entities grows in the period closer to the elections. Additionally, we perform a sentiment analysis of both the Twitter corpus and the YouTube comments in tweeted videos. We found that 35,2% o the users contained in our Twitter dataset express positive sentiment towards Trump and 28% express positive sentiment towards Biden; while 18% of the users in our YouTube dataset express positive sentiment towards Trump and 12% express positive sentiment towards Biden. Finally, we link the Twitter Retweet graph with the YouTube comment graph using tweeted video links. We measure their similarity and differences and show the interactions and the correlation between the largest communities on YouTube and Twitter.
f
The 10 most popular hashtags in our dataset.
plos.figshare.com
xls
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis (2023). The 10 most popular hashtags in our dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0270542.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0270542.t001
Dataset updated
Jun 16, 2023
Dataset provided by
PLOS ONE
Authors
Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The 10 most popular hashtags in our dataset.
f
Results for Bitcoin OTC dataset—best 10 models.
plos.figshare.com
xls
Updated Dec 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agata Skorupka (2024). Results for Bitcoin OTC dataset—best 10 models. [Dataset]. http://doi.org/10.1371/journal.pone.0315849.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315849.t005
Dataset updated
Dec 23, 2024
Dataset provided by
PLOS ONE
Authors
Agata Skorupka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Full results are available in the S1 Appendix as Table 2e.

Facebook

Twitter

Click to copy link

Link copied

Cite

Haoyu Lu (2023). Twitter follower-followee graph, labeled with benign/Sybil [Dataset]. http://doi.org/10.6084/m9.figshare.20057300.v1

Twitter follower-followee graph, labeled with benign/Sybil

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.20057300.v1

Dataset updated

May 31, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Haoyu Lu

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A Twitter follower-followee graph with 269,640 nodes and 6,818,501 edges from [Kwak], and we obtain the ground truth labels from [SybilSCAR]. Among them 178377 are benign and 91263 are Sybil. We divide 9000 Sybil and 17000 benign users (about 10%) from them as the training set and test on the overall social graph.

H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a news media?” in WWW, 2010 B. Wang, L. Zhang, and N. Z. Gong, “SybilSCAR: Sybil detection in online social networks via local rule based propagation,” in IEEE INFOCOM, 2017.

Clear search

Close search

Google apps

Main menu

Twitter follower-followee graph, labeled with benign/Sybil

SignedGraphs

COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel...

NLP feature set variables for TwiBot-20.

X/Twitter: number of worldwide users 2019-2024

Following/Followers and Tags on 0.1 million Twitter Users

Undirected Node Attributed Social Network Graph of Twitter Users interested...

Twitter Connections with User Location

Content

Bibliographic References

Source Publications

Graph-Based Social Media Data on Mental Health Topics

Data from: Discovery and classification of Twitter bots

Results for TwiBot-20 dataset—best 10 models.

Data from: TwiBot22: Towards Graph-Based Twitter Bot Detection

A study on real graphs of fake news spreading on Twitter

How Popular Is Twitter In The World?

Data from: OKG: A Knowledge Graph for Fine-grained Understanding of Social...

Why Do People Use Twitter?

Twitter Statistics

Data from: What Tweets and YouTube comments have in common? Sentiment and...

The 10 most popular hashtags in our dataset.

Results for Bitcoin OTC dataset—best 10 models.

Twitter follower-followee graph, labeled with benign/Sybil