100+ datasets found
  1. Twitter follower-followee graph, labeled with benign/Sybil

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haoyu Lu (2023). Twitter follower-followee graph, labeled with benign/Sybil [Dataset]. http://doi.org/10.6084/m9.figshare.20057300.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Haoyu Lu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A Twitter follower-followee graph with 269,640 nodes and 6,818,501 edges from [Kwak], and we obtain the ground truth labels from [SybilSCAR]. Among them 178377 are benign and 91263 are Sybil. We divide 9000 Sybil and 17000 benign users (about 10%) from them as the training set and test on the overall social graph.

    H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a news media?” in WWW, 2010 B. Wang, L. Zhang, and N. Z. Gong, “SybilSCAR: Sybil detection in online social networks via local rule based propagation,” in IEEE INFOCOM, 2017.

  2. SignedGraphs

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Twitter (2023). SignedGraphs [Dataset]. https://huggingface.co/datasets/Twitter/SignedGraphs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Dataset provided by
    Xhttp://x.com/
    Authors
    Twitter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Learning Stance Embeddings from Signed Social Graphs

    This repo contains the datasets from our paper Learning Stance Embeddings from Signed Social Graphs. [PDF] [HuggingFace Datasets] This work is licensed under a Creative Commons Attribution 4.0 International License.

      Overview
    

    A key challenge in social network analysis is understanding the position, or stance, of people in the graph on a large set of topics. In such social graphs, modeling (dis)agreement patterns… See the full description on the dataset page: https://huggingface.co/datasets/Twitter/SignedGraphs.

  3. Z

    COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Habiba Drias (2021). COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel CoronaVirus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4024176
    Explore at:
    Dataset updated
    Jan 23, 2021
    Dataset provided by
    Habiba Drias
    Yassine Drias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 653 996 tweets related to the Coronavirus topic and highlighted by hashtags such as: #COVID-19, #COVID19, #COVID, #Coronavirus, #NCoV and #Corona. The tweets' crawling period started on the 27th of February and ended on the 25th of March 2020, which is spread over four weeks.

    The tweets were generated by 390 458 users from 133 different countries and were written in 61 languages. English being the most used language with almost 400k tweets, followed by Spanish with around 80k tweets.

    The data is stored in as a CSV file, where each line represents a tweet. The CSV file provides information on the following fields:

    Author: the user who posted the tweet

    Recipient: contains the name of the user in case of a reply, otherwise it would have the same value as the previous field

    Tweet: the full content of the tweet

    Hashtags: the list of hashtags present in the tweet

    Language: the language of the tweet

    Relationship: gives information on the type of the tweet, whether it is a retweet, a reply, a tweet with a mention, etc.

    Location: the country of the author of the tweet, which is unfortunately not always available

    Date: the publication date of the tweet

    Source: the device or platform used to send the tweet

    The dataset can as well be used to construct a social graph since it includes the relations "Replies to", "Retweet", "MentionsInRetweet" and "Mentions".

  4. f

    NLP feature set variables for TwiBot-20.

    • plos.figshare.com
    xls
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agata Skorupka (2024). NLP feature set variables for TwiBot-20. [Dataset]. http://doi.org/10.1371/journal.pone.0315849.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 23, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Agata Skorupka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The study examines different graph-based methods of detecting anomalous activities on digital markets, proposing the most efficient way to increase market actors’ protection and reduce information asymmetry. Anomalies are defined below as both bots and fraudulent users (who can be both bots and real people). Methods are compared against each other, and state-of-the-art results from the literature and a new algorithm is proposed. The goal is to find an efficient method suitable for threat detection, both in terms of predictive performance and computational efficiency. It should scale well and remain robust on the advancements of the newest technologies. The article utilized three publicly accessible graph-based datasets: one describing the Twitter social network (TwiBot-20) and two describing Bitcoin cryptocurrency markets (Bitcoin OTC and Bitcoin Alpha). In the former, an anomaly is defined as a bot, as opposed to a human user, whereas in the latter, an anomaly is a user who conducted a fraudulent transaction, which may (but does not have to) imply being a bot. The study proves that graph-based data is a better-performing predictor than text data. It compares different graph algorithms to extract feature sets for anomaly detection models. It states that methods based on nodes’ statistics result in better model performance than state-of-the-art graph embeddings. They also yield a significant improvement in computational efficiency. This often means reducing the time by hours or enabling modeling on significantly larger graphs (usually not feasible in the case of embeddings). On that basis, the article proposes its own graph-based statistics algorithm. Furthermore, using embeddings requires two engineering choices: the type of embedding and its dimension. The research examines whether there are types of graph embeddings and dimensions that perform significantly better than others. The solution turned out to be dataset-specific and needed to be tailored on a case-by-case basis, adding even more engineering overhead to using embeddings (building a leaderboard of grid of embedding instances, where each of them takes hours to be generated). This, again, speaks in favor of the proposed algorithm based on nodes’ statistics. The research proposes its own efficient algorithm, which makes this engineering overhead redundant.

  5. X/Twitter: number of worldwide users 2019-2024

    • statista.com
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). X/Twitter: number of worldwide users 2019-2024 [Dataset]. https://www.statista.com/statistics/303681/twitter-users-worldwide/
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2022
    Area covered
    Worldwide
    Description

    As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.

  6. Following/Followers and Tags on 0.1 million Twitter Users

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mitsuo Yoshida; Yuto Yamaguchi; Mitsuo Yoshida; Yuto Yamaguchi (2020). Following/Followers and Tags on 0.1 million Twitter Users [Dataset]. http://doi.org/10.5281/zenodo.13966
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mitsuo Yoshida; Yuto Yamaguchi; Mitsuo Yoshida; Yuto Yamaguchi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Abstract (our paper)

    Why does Smith follow Johnson on Twitter? In most cases, the reason why users follow other users is unavailable. In this work, we answer this question by proposing TagF, which analyzes the who-follows-whom network (matrix) and the who-tags-whom network (tensor) simultaneously. Concretely, our method decomposes a coupled tensor constructed from these matrix and tensor. The experimental results on million-scale Twitter networks show that TagF uncovers different, but explainable reasons why users follow other users.

    Data

    coupled_tensor:
    The first column is the source user id (from user id), the second column is the destination user id (to user id), and the third column is the tag id.

    users.id:
    The first column is the user id for coupled_tensor, and the second column is the user id on Twitter.

    tags.id:
    The first column is the tag id for coupled_tensor, and the second column is the tag (i.e. slug or list name) on Twitter. On the tags, ###follow### and ###friend### are special tags expressing follower and following.

    Publication

    This dataset was created for our study. If you make use of this dataset, please cite:
    Yuto Yamaguchi, Mitsuo Yoshida, Christos Faloutsos, Hiroyuki Kitagawa. Why Do You Follow Him? Multilinear Analysis on Twitter. Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). pp.137-138, 2015.
    http://doi.org/10.1145/2740908.2742715

    Code

    Our code outputting experiment results made available at:
    https://github.com/yamaguchiyuto/tagf

    Note

    If you would like to use larger dataset, the dataset on 1 million seed users made available at:
    http://dx.doi.org/10.5281/zenodo.16267
    (The dataset on 0.1 million seed users is not subset of the dataset on 1 million seed users.)

  7. Undirected Node Attributed Social Network Graph of Twitter Users interested...

    • zenodo.org
    • data.europa.eu
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilias Dimitriadis; Ilias Dimitriadis (2020). Undirected Node Attributed Social Network Graph of Twitter Users interested in plastic pollution - created in the framework of the PlasticTwist project [Dataset]. http://doi.org/10.5281/zenodo.3611146
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ilias Dimitriadis; Ilias Dimitriadis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset has been created in the framework of the Plastic Twist project (Ptwist) and more specifically using the Ptwist crowdsourcing application (crowdsourcing.plastictwist.com/). We are sharing the edge list and specific node attributes (hashtags) of Twitter users posting about plastic pollution. The dataset can be used for community detection,clustering, node importance, influence maximization tasks, etc. Each user is represented by a unique integer which has nothing to do with the official Twitter user ID. The dataset contains three (3) files:

    • ptwist.edgelist: A list containing all the 1,362,863 edges between the users. When loaded they create an undirected graph of 800K+ users.
    • node_attributes.txt: This file contains information about the hashtags used by each user. (e.g. "652003": ["SingleUsePlastic"] -> user 6529003 has used the hashtag SingleUsePlastic)
    • annotated_graph: A pickle file which, when loaded, returns a NetworkX node attributed undirected graph.

  8. Twitter Connections with User Location

    • kaggle.com
    Updated Feb 13, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sajid Hasan Apon (2019). Twitter Connections with User Location [Dataset]. https://www.kaggle.com/sajidhasanapon/twitter-connections-with-user-location/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 13, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sajid Hasan Apon
    Description

    Content

    This dataset is a subset of all the Twitter-users, along with their connections and locations.

    The graph created by considering the users as nodes and their connections as edges is a connected component of the total Twitter graph (i.e. for every user in the subgraph, all their connections in the original graph are contained within the subgraph).

    Although Twitter is a directed graph (follower-following relation is not mutual. "X follows Y" does not imply "Y follows X"), we have considered the directed edges as undirected. Hence, if u→v is present in the original graph but v→u is not, we have added the edge v→u for every u and v.

    There are two directories: one with 10 million users and the other with 1 million. Each directory contains two .txt files: location, and user.

    The location file contains the latitude and longitude of each user. The file format is:

    lat_1, long_1
    lat_2, long_2
    lat_3, long_3
    ...
    

    where lat_1 and long_1 are the latitude and longitude of user number 1 respectively, and so on.

    The user file contains the adjacency list of each user. The k-th row of this file enumerates the friends of user number k.

    Bibliographic References

    If you use this dataset, please cite it as:

    @article{DBLP:journals/pvldb/GhoshACHSL18,
    author  = {Bishwamittra Ghosh and
    Mohammed Eunus Ali and
    Farhana Murtaza Choudhury and
    Sajid Hasan Apon and
    Timos Sellis and
    Jianxin Li},
    title   = {The Flexible Socio Spatial Group Queries},
    journal  = {{PVLDB}},
    volume  = {12},
    number  = {2},
    pages   = {99--111},
    year   = {2018},
    url    = {http://www.vldb.org/pvldb/vol12/p99-ghosh.pdf},
    timestamp = {Mon, 03 Dec 2018 16:45:54 +0100},
    biburl  = {https://dblp.org/rec/bib/journals/pvldb/GhoshACHSL18},
    }
    

    Source Publications

    @inproceedings{DBLP:conf/kdd/LiWDWC12,
    author  = {Rui Li and Shengjie Wang and Hongbo Deng and Rui Wang and Kevin Chen-Chuan Chang},
    title   = {Towards social user profiling: unified and discriminative influence model for inferring home locations},
    booktitle = {KDD},
    year   = {2012},
    pages   = {1023-1031}
    }
    
  9. m

    Graph-Based Social Media Data on Mental Health Topics

    • data.mendeley.com
    Updated Nov 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Ady Sanjaya (2024). Graph-Based Social Media Data on Mental Health Topics [Dataset]. http://doi.org/10.17632/z45txpdp7f.2
    Explore at:
    Dataset updated
    Nov 4, 2024
    Authors
    Samuel Ady Sanjaya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is structured as a graph, where nodes represent users and edges capture their interactions, including tweets, retweets, replies, and mentions. Each node provides detailed user attributes, such as unique ID, follower and following counts, and verification status, offering insights into each user's identity, role, and influence in the mental health discourse. The edges illustrate user interactions, highlighting engagement patterns and types of content that drive responses, such as tweet impressions. This interconnected structure enables sentiment analysis and public reaction studies, allowing researchers to explore engagement trends and identify the mental health topics that resonate most with users.

    The dataset consists of three files: 1. Edges Data: Contains graph data essential for social network analysis, including fields for UserID (Source), UserID (Destination), Post/Tweet ID, and Date of Relationship. This file enables analysis of user connections without including tweet content, maintaining compliance with Twitter/X’s data-sharing policies. 2. Nodes Data: Offers user-specific details relevant to network analysis, including UserID, Account Creation Date, Follower and Following counts, Verified Status, and Date Joined Twitter. This file allows researchers to examine user behavior (e.g., identifying influential users or spam-like accounts) without direct reference to tweet content. 3. Twitter/X Content Data: This file contains only the raw tweet text as a single-column dataset, without associated user identifiers or metadata. By isolating the text, we ensure alignment with anonymization standards observed in similar published datasets, safeguarding user privacy in compliance with Twitter/X's data guidelines. This content is crucial for addressing the research focus on mental health discourse in social media. (References to prior Data in Brief publications involving Twitter/X data informed the dataset's structure.)

  10. Data from: Discovery and classification of Twitter bots

    • data.europa.eu
    unknown
    Updated Apr 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2021). Discovery and classification of Twitter bots [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4715885?locale=de
    Explore at:
    unknown(22740)Available download formats
    Dataset updated
    Apr 23, 2021
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Online Social Networks (OSN) are used by millions of users, daily. This user-base shares and discovers different opinions on popular topics. Social influence of large groups may be influenced by user believes or be attracted the interest in particular news or products. A large number of users, gathered in a single group or number of followers, increases the probability to influence OSN users. Botnets, collections of automated accounts controlled by a single agent, are a common mechanism for exerting maximum influence. Botnets may be used to better infiltrate the social graph over time and create an illusion of community behaviour, amplifying their message and increasing persuasion. This paper investigates Twitter botnets, their behavior, their interaction with user communities and their evolution over time. We analyze a dense crawl of a subset of Twitter traffic, amounting to nearly all interactions by Greek-speaking Twitter users for a period of 36 months. The collected users are labeled as botnets, based on long term and frequent content similarity events. We detect over a million events, where seemingly unrelated accounts tweeted nearly identical content, at almost the same time. We filter these concurrent content injection events and detect a set of 1,850 accounts that repeatedly exhibit this pattern of behavior, suggesting that they are fully or in part controlled and orchestrated by the same entity. We find botnets that appear for brief intervals and disappear, as well as botnets that evolve and grow, spanning the duration of our dataset. We analyze statistical differences between the bot accounts and human users, as well as the botnet interactions with the user communities and the Twitter trending topics.

  11. f

    Results for TwiBot-20 dataset—best 10 models.

    • plos.figshare.com
    xls
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agata Skorupka (2024). Results for TwiBot-20 dataset—best 10 models. [Dataset]. http://doi.org/10.1371/journal.pone.0315849.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 23, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Agata Skorupka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Full results are available in S1 Appendix as Table 2d.

  12. o

    Data from: TwiBot22: Towards Graph-Based Twitter Bot Detection

    • explore.openaire.eu
    Updated Aug 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Feng Shangbin; Tan Zhaoxuan; Wan Herun; Wang Ningnan; Chen Zilong; Zhang Binchi (2022). TwiBot22: Towards Graph-Based Twitter Bot Detection [Dataset]. http://doi.org/10.5281/zenodo.7012904
    Explore at:
    Dataset updated
    Aug 20, 2022
    Authors
    Feng Shangbin; Tan Zhaoxuan; Wan Herun; Wang Ningnan; Chen Zilong; Zhang Binchi
    Description

    Twitter bot detection has become an increasingly important task to combat misinformation, facilitate social media moderation, and preserve the integrity of the online discourse. State-of-the-art bot detection methods generally leverage the graph structure of the Twitter network, and they exhibit promising performance when confronting novel Twitter bots that traditional methods fail to detect. However, very few of the existing Twitter bot detection datasets are graph-based, and even these few graph-based datasets suffer from limited dataset scale, incomplete graph structure, as well as low annotation quality. In fact, the lack of a large-scale graph-based Twitter bot detection benchmark that addresses these issues has seriously hindered the development and evaluation of novel graph-based bot detection approaches. In this paper, we propose TwiBot-22, a comprehensive graph-based Twitter bot detection benchmark that presents the largest dataset to date, provides diversified entities and relations on the Twitter network, and has considerably better annotation quality than existing datasets. In addition, we re-implement 35 representative Twitter bot detection baselines and evaluate them on 9 datasets, including TwiBot-22, to promote a fair comparison of model performance and a holistic understanding of research progress. To facilitate further research, we consolidate all implemented codes and datasets into the TwiBot-22 evaluation framework, where researchers could consistently evaluate new models and datasets. Due to the limitation at Zenodo, you can access the whole dataset and more information on this website.

  13. Z

    A study on real graphs of fake news spreading on Twitter

    • data.niaid.nih.gov
    Updated Aug 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3711599
    Explore at:
    Dataset updated
    Aug 20, 2021
    Dataset authored and provided by
    Amirhosein Bodaghi
    Description

    *** Fake News on Twitter ***

    These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

    1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

    2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

    3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

    4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

    5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

    The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

    DD

    DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

    The structure of excel files for each dataset is as follow:

    Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:

    User ID (user who has posted the current tweet/retweet)

    The description sentence in the profile of the user who has published the tweet/retweet

    The number of published tweet/retweet by the user at the time of posting the current tweet/retweet

    Date and time of creation of the account by which the current tweet/retweet has been posted

    Language of the tweet/retweet

    Number of followers

    Number of followings (friends)

    Date and time of posting the current tweet/retweet

    Number of like (favorite) the current tweet had been acquired before crawling it

    Number of times the current tweet had been retweeted before crawling it

    Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)

    The source (OS) of device by which the current tweet/retweet was posted

    Tweet/Retweet ID

    Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)

    Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)

    Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)

    Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)

    State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):

    r : The tweet/retweet is a fake news post

    a : The tweet/retweet is a truth post

    q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it

    n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

    DG

    DG for each fake news contains two files:

    A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)

    A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

    Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

    The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.

  14. s

    How Popular Is Twitter In The World?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How Popular Is Twitter In The World? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Twitter is ranked as the 12h most popular social media site in the world. The platform currently has 611 million active monthly users.

  15. Data from: OKG: A Knowledge Graph for Fine-grained Understanding of Social...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated Jun 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inès Blin; Inès Blin; Lise Stork; Lise Stork; Laura Spillner; Laura Spillner; Carlo Romano Marcello Alessandro Santagiustina; Carlo Romano Marcello Alessandro Santagiustina (2024). OKG: A Knowledge Graph for Fine-grained Understanding of Social Media Discourse on Inequality [Dataset]. http://doi.org/10.5281/zenodo.10034210
    Explore at:
    Dataset updated
    Jun 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Inès Blin; Inès Blin; Lise Stork; Lise Stork; Laura Spillner; Laura Spillner; Carlo Romano Marcello Alessandro Santagiustina; Carlo Romano Marcello Alessandro Santagiustina
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    Oct 24, 2023
    Description

    The Observatory Knowledge Graph (OKG) is a knowledge graph with tweets on inequality in terms of the OBIO ontology (https://w3id.org/okg/obio-ontology/), which integrates social media metadata with various types of linguistic knowledge. The OKG can be used as the backbone of a social media observatory, to facilitate a deeper understanding of social media discourse on inequality.

    We retrieved tweets and retweets published from the end (30th) of May 2020 to the beginning (1st) of May 2023.

    In this version of the OKG, we use a sample of 85,247 tweets, published from May 30th to August 27th, 2020. To be compliant with Twitter's policies, we remove usernames and id's, as well as the tweet texts and sentences. We also replace user IRIs with skolem IRIs through skolemization.

    Access to the OKG as well as the SPARQL endpoint can be requested by sending a mail to the contact person (l.stork@uva.nl) with the following information:

    1. A description of the use case
    2. Affiliation of the researchers involved
    3. How their work is in line with Twitter's policies: https://developer.twitter.com/en/developer-terms/policy#4-d
  16. s

    Why Do People Use Twitter?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Why Do People Use Twitter? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    One of the biggest advantages of Twitter is the speed at which information can be passed around. People use Twitter primarily to get news and for entertainment. This is the breakdown of why people use Twitter today.

  17. T

    Twitter Statistics

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Search Logistics (2025). Twitter Statistics [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset authored and provided by
    Search Logistics
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These Twitter user statistics will give you the complete story of where Twitter is at today and what the future looks like for the social media company.

  18. Data from: What Tweets and YouTube comments have in common? Sentiment and...

    • zenodo.org
    bin, csv, doc
    Updated Apr 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shevtsov Alexander, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, Sotiris Ioannidis; Shevtsov Alexander, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, Sotiris Ioannidis (2021). What Tweets and YouTube comments have in common? Sentiment and Graph analysis on data related to US Elections 2020. [Dataset]. http://doi.org/10.5281/zenodo.4618233
    Explore at:
    bin, doc, csvAvailable download formats
    Dataset updated
    Apr 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shevtsov Alexander, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, Sotiris Ioannidis; Shevtsov Alexander, Maria Oikonomidou, Despoina Antonakaki, Polyvios Pratikakis, Sotiris Ioannidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States, YouTube
    Description

    The presidential elections in the United States on November 3rd 2020 caused extensive discussions on social media. A part of the content on US elections is organic, coming from users discussing their opinions on the candidates, political positions, or relevant content presented on television. Another significant part originates from organized campaigns, both official, including communication campaigns and dissemination, or unofficial, including astroturfing and targeting manipulation of the electorate.

    In this study, we obtain approximately 19.8M tweets from 4.5M users, based on prevalent hashtags related to the 2020 US election. From these, we mined 28.343 YouTube links tweeted and obtained likes, dislikes and comments of these videos. In this paper, we study the connection between the two social networks. We employ an array of techniques, including volume analysis, exploring the retweet graph, sentiment and graph analysis on the communities formed in YouTube and Twitter. Furthermore, we propose a method to combine the results of community detection on the two social networks and measure the differences between them.

    Particularly, we study the daily traffic per prevalent hashtags, plot the retweet graph from July to November 2020, highlight the two main entities (‘Biden’ and ‘Trump’) and show how the discussion around those entities grows in the period closer to the elections. Additionally, we perform a sentiment analysis of both the Twitter corpus and the YouTube comments in tweeted videos. We found that 35,2% o the users contained in our Twitter dataset express positive sentiment towards Trump and 28% express positive sentiment towards Biden; while 18% of the users in our YouTube dataset express positive sentiment towards Trump and 12% express positive sentiment towards Biden. Finally, we link the Twitter Retweet graph with the YouTube comment graph using tweeted video links. We measure their similarity and differences and show the interactions and the correlation between the largest communities on YouTube and Twitter.

  19. f

    The 10 most popular hashtags in our dataset.

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis (2023). The 10 most popular hashtags in our dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0270542.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The 10 most popular hashtags in our dataset.

  20. f

    Results for Bitcoin OTC dataset—best 10 models.

    • plos.figshare.com
    xls
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agata Skorupka (2024). Results for Bitcoin OTC dataset—best 10 models. [Dataset]. http://doi.org/10.1371/journal.pone.0315849.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 23, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Agata Skorupka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Full results are available in the S1 Appendix as Table 2e.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Haoyu Lu (2023). Twitter follower-followee graph, labeled with benign/Sybil [Dataset]. http://doi.org/10.6084/m9.figshare.20057300.v1
Organization logo

Twitter follower-followee graph, labeled with benign/Sybil

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
txtAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Haoyu Lu
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A Twitter follower-followee graph with 269,640 nodes and 6,818,501 edges from [Kwak], and we obtain the ground truth labels from [SybilSCAR]. Among them 178377 are benign and 91263 are Sybil. We divide 9000 Sybil and 17000 benign users (about 10%) from them as the training set and test on the overall social graph.

H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a news media?” in WWW, 2010 B. Wang, L. Zhang, and N. Z. Gong, “SybilSCAR: Sybil detection in online social networks via local rule based propagation,” in IEEE INFOCOM, 2017.

Search
Clear search
Close search
Google apps
Main menu