100+ datasets found
  1. A study on real graphs of fake news spreading on Twitter

    • zenodo.org
    bin
    Updated Aug 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amirhosein Bodaghi; Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. http://doi.org/10.5281/zenodo.5225338
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Amirhosein Bodaghi; Amirhosein Bodaghi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    *** Fake News on Twitter ***

    These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

    1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

    2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

    3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

    4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

    5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

    The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

    DD

    DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

    The structure of excel files for each dataset is as follow:

    • Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:
    • User ID (user who has posted the current tweet/retweet)
    • The number of published tweet/retweet by the user at the time of posting the current tweet/retweet
    • Language of the tweet/retweet
    • Number of followers
    • Number of followings (friends)
    • Date and time of posting the current tweet/retweet
    • Number of like (favorite) the current tweet had been acquired before crawling it
    • Number of times the current tweet had been retweeted before crawling it
    • Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)
    • The source (OS) of device by which the current tweet/retweet was posted
    • Tweet/Retweet ID
    • Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)
    • Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)
    • Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)
    • Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)
    • State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):
    • r : The tweet/retweet is a fake news post
    • a : The tweet/retweet is a truth post
    • q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it
    • n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

    DG

    DG for each fake news contains two files:

    • A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)
    • A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

    Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

    The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.

  2. TwitterFollowGraph

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Twitter (2023). TwitterFollowGraph [Dataset]. https://huggingface.co/datasets/Twitter/TwitterFollowGraph
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Dataset provided by
    Xhttp://x.com/
    Authors
    Twitter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval

    This repo contains the TwitterFaveGraph dataset from our paper kNN-Embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval. [PDF] [HuggingFace Datasets] This work is licensed under a Creative Commons Attribution 4.0 International License.

      TwitterFollowGraph
    

    TwitterFollowGraph is a bipartite directed graph of users (consumer) nodes to author (producer) nodes… See the full description on the dataset page: https://huggingface.co/datasets/Twitter/TwitterFollowGraph.

  3. X/Twitter: number of worldwide users 2019-2024

    • statista.com
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). X/Twitter: number of worldwide users 2019-2024 [Dataset]. https://www.statista.com/statistics/303681/twitter-users-worldwide/
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2022
    Area covered
    Worldwide
    Description

    As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.

  4. Z

    COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Habiba Drias (2021). COVID-19 Tweets : A dataset contaning more than 600k tweets on the novel CoronaVirus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4024176
    Explore at:
    Dataset updated
    Jan 23, 2021
    Dataset provided by
    Yassine Drias
    Habiba Drias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 653 996 tweets related to the Coronavirus topic and highlighted by hashtags such as: #COVID-19, #COVID19, #COVID, #Coronavirus, #NCoV and #Corona. The tweets' crawling period started on the 27th of February and ended on the 25th of March 2020, which is spread over four weeks.

    The tweets were generated by 390 458 users from 133 different countries and were written in 61 languages. English being the most used language with almost 400k tweets, followed by Spanish with around 80k tweets.

    The data is stored in as a CSV file, where each line represents a tweet. The CSV file provides information on the following fields:

    Author: the user who posted the tweet

    Recipient: contains the name of the user in case of a reply, otherwise it would have the same value as the previous field

    Tweet: the full content of the tweet

    Hashtags: the list of hashtags present in the tweet

    Language: the language of the tweet

    Relationship: gives information on the type of the tweet, whether it is a retweet, a reply, a tweet with a mention, etc.

    Location: the country of the author of the tweet, which is unfortunately not always available

    Date: the publication date of the tweet

    Source: the device or platform used to send the tweet

    The dataset can as well be used to construct a social graph since it includes the relations "Replies to", "Retweet", "MentionsInRetweet" and "Mentions".

  5. TwitterFaveGraph

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Twitter (2023). TwitterFaveGraph [Dataset]. https://huggingface.co/datasets/Twitter/TwitterFaveGraph
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Dataset provided by
    Xhttp://x.com/
    Authors
    Twitter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MiCRO: Multi-interest Candidate Retrieval Online

    This repo contains the TwitterFaveGraph dataset from our paper MiCRO: Multi-interest Candidate Retrieval Online. [PDF] [HuggingFace Datasets] This work is licensed under a Creative Commons Attribution 4.0 International License.

      TwitterFaveGraph
    

    TwitterFaveGraph is a bipartite directed graph of user nodes to Tweet nodes where an edge represents a "fave" engagement. Each edge is binned into predetermined time chunks which… See the full description on the dataset page: https://huggingface.co/datasets/Twitter/TwitterFaveGraph.

  6. b

    Twitter 2010 data set

    • berd-platform.de
    csv, txt
    Updated Jul 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristina Lerman; Rumi Ghosh; Tawan Surachawala; Kristina Lerman; Rumi Ghosh; Tawan Surachawala (2025). Twitter 2010 data set [Dataset]. http://doi.org/10.82939/gyewh-v4e47
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Scientific reports
    Authors
    Kristina Lerman; Rumi Ghosh; Tawan Surachawala; Kristina Lerman; Rumi Ghosh; Tawan Surachawala
    License

    https://www.isi.edu/~lerman/downloads/twitter/twitter2010.htmlhttps://www.isi.edu/~lerman/downloads/twitter/twitter2010.html

    Time period covered
    Oct 2010
    Description

    Twitter_2010 data set contains tweets containing URLs that have been posted on Twitter during October 2010. In addition to tweets, we also the followee links of tweeting users, allowing us to reconstruct the follower graph of active (tweeting) users. The dataset contains 66,059 URLs, 2,859,764 tweets, 736,930 users and 36,743,448 links.

  7. f

    Top retweeted users (highest indegree).

    • plos.figshare.com
    xls
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis (2023). Top retweeted users (highest indegree). [Dataset]. http://doi.org/10.1371/journal.pone.0270542.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 17, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Top retweeted users (highest indegree).

  8. SignedGraphs

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Twitter (2023). SignedGraphs [Dataset]. https://huggingface.co/datasets/Twitter/SignedGraphs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Dataset provided by
    Xhttp://x.com/
    Authors
    Twitter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Learning Stance Embeddings from Signed Social Graphs

    This repo contains the datasets from our paper Learning Stance Embeddings from Signed Social Graphs. [PDF] [HuggingFace Datasets] This work is licensed under a Creative Commons Attribution 4.0 International License.

      Overview
    

    A key challenge in social network analysis is understanding the position, or stance, of people in the graph on a large set of topics. In such social graphs, modeling (dis)agreement patterns… See the full description on the dataset page: https://huggingface.co/datasets/Twitter/SignedGraphs.

  9. s

    Why Do People Use Twitter?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Why Do People Use Twitter? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    One of the biggest advantages of Twitter is the speed at which information can be passed around. People use Twitter primarily to get news and for entertainment. This is the breakdown of why people use Twitter today.

  10. s

    How Popular Is Twitter In The World?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How Popular Is Twitter In The World? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Twitter is ranked as the 12h most popular social media site in the world. The platform currently has 611 million active monthly users.

  11. s

    Twitter Key Statistics

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Twitter Key Statistics [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are the key Twitter user statistics that you need to know.

  12. f

    Twitter follower-followee graph, labeled with benign/Sybil

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haoyu Lu (2023). Twitter follower-followee graph, labeled with benign/Sybil [Dataset]. http://doi.org/10.6084/m9.figshare.20057300.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Haoyu Lu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A Twitter follower-followee graph with 269,640 nodes and 6,818,501 edges from [Kwak], and we obtain the ground truth labels from [SybilSCAR]. Among them 178377 are benign and 91263 are Sybil. We divide 9000 Sybil and 17000 benign users (about 10%) from them as the training set and test on the overall social graph.

    H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a news media?” in WWW, 2010 B. Wang, L. Zhang, and N. Z. Gong, “SybilSCAR: Sybil detection in online social networks via local rule based propagation,” in IEEE INFOCOM, 2017.

  13. Top retweeted users (highest out degree).

    • plos.figshare.com
    xls
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis (2023). Top retweeted users (highest out degree). [Dataset]. http://doi.org/10.1371/journal.pone.0270542.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 17, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Top retweeted users (highest out degree).

  14. X/Twitter: personal privacy actions H1 2024

    • statista.com
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). X/Twitter: personal privacy actions H1 2024 [Dataset]. https://www.statista.com/topics/737/twitter/
    Explore at:
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    During the first half of 2024 there were 34,497 pieces of content removed from X due to personal privacy violations, which include the publishing or sharing of other people's private information. These types of violations are also known as doxxing. Overall, 30,450 of these pieces of content were reported manually by users of the platform.

  15. f

    The complete list of all entities with the corresponding keywords that were...

    • plos.figshare.com
    xls
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis (2023). The complete list of all entities with the corresponding keywords that were used for each one. [Dataset]. http://doi.org/10.1371/journal.pone.0270542.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 17, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alexander Shevtsov; Maria Oikonomidou; Despoina Antonakaki; Polyvios Pratikakis; Sotiris Ioannidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete list of all entities with the corresponding keywords that were used for each one.

  16. s

    How Popular Is Twitter In The US?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How Popular Is Twitter In The US? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The US has the largest number of Twitter users with over a 100 million users. They account for about 16.7% of all Twitter users worldwide.

  17. Data from: OKG: A Knowledge Graph for Fine-grained Understanding of Social...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated Jun 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inès Blin; Inès Blin; Lise Stork; Lise Stork; Laura Spillner; Laura Spillner; Carlo Romano Marcello Alessandro Santagiustina; Carlo Romano Marcello Alessandro Santagiustina (2024). OKG: A Knowledge Graph for Fine-grained Understanding of Social Media Discourse on Inequality [Dataset]. http://doi.org/10.5281/zenodo.10034210
    Explore at:
    Dataset updated
    Jun 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Inès Blin; Inès Blin; Lise Stork; Lise Stork; Laura Spillner; Laura Spillner; Carlo Romano Marcello Alessandro Santagiustina; Carlo Romano Marcello Alessandro Santagiustina
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    Oct 24, 2023
    Description

    The Observatory Knowledge Graph (OKG) is a knowledge graph with tweets on inequality in terms of the OBIO ontology (https://w3id.org/okg/obio-ontology/), which integrates social media metadata with various types of linguistic knowledge. The OKG can be used as the backbone of a social media observatory, to facilitate a deeper understanding of social media discourse on inequality.

    We retrieved tweets and retweets published from the end (30th) of May 2020 to the beginning (1st) of May 2023.

    In this version of the OKG, we use a sample of 85,247 tweets, published from May 30th to August 27th, 2020. To be compliant with Twitter's policies, we remove usernames and id's, as well as the tweet texts and sentences. We also replace user IRIs with skolem IRIs through skolemization.

    Access to the OKG as well as the SPARQL endpoint can be requested by sending a mail to the contact person (l.stork@uva.nl) with the following information:

    1. A description of the use case
    2. Affiliation of the researchers involved
    3. How their work is in line with Twitter's policies: https://developer.twitter.com/en/developer-terms/policy#4-d
  18. m

    Graph-Based Social Media Data on Mental Health Topics

    • data.mendeley.com
    Updated Nov 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Ady Sanjaya (2024). Graph-Based Social Media Data on Mental Health Topics [Dataset]. http://doi.org/10.17632/z45txpdp7f.2
    Explore at:
    Dataset updated
    Nov 4, 2024
    Authors
    Samuel Ady Sanjaya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is structured as a graph, where nodes represent users and edges capture their interactions, including tweets, retweets, replies, and mentions. Each node provides detailed user attributes, such as unique ID, follower and following counts, and verification status, offering insights into each user's identity, role, and influence in the mental health discourse. The edges illustrate user interactions, highlighting engagement patterns and types of content that drive responses, such as tweet impressions. This interconnected structure enables sentiment analysis and public reaction studies, allowing researchers to explore engagement trends and identify the mental health topics that resonate most with users.

    The dataset consists of three files: 1. Edges Data: Contains graph data essential for social network analysis, including fields for UserID (Source), UserID (Destination), Post/Tweet ID, and Date of Relationship. This file enables analysis of user connections without including tweet content, maintaining compliance with Twitter/X’s data-sharing policies. 2. Nodes Data: Offers user-specific details relevant to network analysis, including UserID, Account Creation Date, Follower and Following counts, Verified Status, and Date Joined Twitter. This file allows researchers to examine user behavior (e.g., identifying influential users or spam-like accounts) without direct reference to tweet content. 3. Twitter/X Content Data: This file contains only the raw tweet text as a single-column dataset, without associated user identifiers or metadata. By isolating the text, we ensure alignment with anonymization standards observed in similar published datasets, safeguarding user privacy in compliance with Twitter/X's data guidelines. This content is crucial for addressing the research focus on mental health discourse in social media. (References to prior Data in Brief publications involving Twitter/X data informed the dataset's structure.)

  19. f

    Node statistics for TwiBot-20, Bitcoin OTC and Bitcoin Alpha datasets.

    • plos.figshare.com
    xls
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agata Skorupka (2024). Node statistics for TwiBot-20, Bitcoin OTC and Bitcoin Alpha datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0315849.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 23, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Agata Skorupka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Node statistics for TwiBot-20, Bitcoin OTC and Bitcoin Alpha datasets.

  20. X/Twitter: distribution of global audiences 2025, by age and gender

    • statista.com
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2024). X/Twitter: distribution of global audiences 2025, by age and gender [Dataset]. https://www.statista.com/topics/737/twitter/
    Explore at:
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of February 2025, 24.5 percent of X (formerly Twitter) users were men aged between 25 and 34 years. Overall, almost 19 percent of users were men aged between 18 and 24 years. X has a high share of male users when compared to other popular social media platforms.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amirhosein Bodaghi; Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. http://doi.org/10.5281/zenodo.5225338
Organization logo

A study on real graphs of fake news spreading on Twitter

Explore at:
binAvailable download formats
Dataset updated
Aug 20, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Amirhosein Bodaghi; Amirhosein Bodaghi
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

*** Fake News on Twitter ***

These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

DD

DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

The structure of excel files for each dataset is as follow:

  • Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:
  • User ID (user who has posted the current tweet/retweet)
  • The number of published tweet/retweet by the user at the time of posting the current tweet/retweet
  • Language of the tweet/retweet
  • Number of followers
  • Number of followings (friends)
  • Date and time of posting the current tweet/retweet
  • Number of like (favorite) the current tweet had been acquired before crawling it
  • Number of times the current tweet had been retweeted before crawling it
  • Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)
  • The source (OS) of device by which the current tweet/retweet was posted
  • Tweet/Retweet ID
  • Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)
  • Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)
  • Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)
  • Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)
  • State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):
  • r : The tweet/retweet is a fake news post
  • a : The tweet/retweet is a truth post
  • q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it
  • n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

DG

DG for each fake news contains two files:

  • A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)
  • A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.

Search
Clear search
Close search
Google apps
Main menu