97 datasets found

Twitter users in the United States 2019-2028
statista.com
ai-chatbox.pro
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2024). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
Explore at:
Dataset updated
Jun 13, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United States
Description
The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.
Open dataset of scholars on Twitter (X)
zenodo.org
csv
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philippe Mongeon; Philippe Mongeon; Timothy Bowman; Timothy Bowman; Rodrigo Costas; Rodrigo Costas; Wenceslao Arroyo Machado; Wenceslao Arroyo Machado (2024). Open dataset of scholars on Twitter (X) [Dataset]. http://doi.org/10.5281/zenodo.10905839
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10905839
Dataset updated
Apr 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Philippe Mongeon; Philippe Mongeon; Timothy Bowman; Timothy Bowman; Rodrigo Costas; Rodrigo Costas; Wenceslao Arroyo Machado; Wenceslao Arroyo Machado
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is a version 2 dataset of paired OpenAlex author IDs (https://docs.openalex.org/about-the-data/author) and Twitter (now X) user IDs

Major update in this version

Following the significant update to OpenAlex's author identification system, the scholars on Twitter dataset, which previously linked Twitter IDs to OpenAlex author IDs, immediately became outdated. This called for a new approach to re-establish these links, as the absence of new Twitter data made it impossible to replicate the original method of matching Twitter profiles with scholarly authors. To navigate this challenge, a bridge was constructed between the June 2022 snapshot of the OpenAlex database—used in the original matching process—and the most recent snapshot from February 2024. This bridge utilized OpenAlex works IDs and DOIs to match authors in both datasets by their shared publications and identical primary names. When a connection was established between two authors with the same name, the new OpenAlex author ID was assigned to the corresponding Twitter ID. When direct matches based on primary names were not found, an attempt was made to establish connections by matching the names from June 2022 with any corresponding alternative names found in the 2024 dataset. This method ensured continuity of identity through the system update, adapting the strategy to link profiles across the temporal divide created by the database's overhaul.

Our efficient method for re-establishing links between author IDs and Twitter profiles has been notably successful, managing to rematch 432,417 (88%) OpenAlex author IDs. This effort successfully restored connections for 388,968 unique Twitter users, which represents 92% of the original dataset. Of these, 375,316 were matched using their primary names, and 57,101 through alternative names. The simplicity and quick execution of this approach led to exceptionally favourable results, with a minimal loss of only 8% of the original Twitter-linked scholarly accounts.

The dataset includes 432,417 unique author_ids and 388,968 unique tweeter_ids forming 462,427 unique author-tweeter pairs.

File descriptions

authors_tweeters_2024_02.csv is the actual dataset of author IDs paired with tweeter IDs. The "alternative" column indicates if the match was made with the primary name (0) or an alternate name (1).

mapping_tweeters_2022_2024.csv contains the relationship made between the 2022 author IDs and the 2024 author IDs, including the names.

How to cite

When using the dataset, please cite the following article providing details about the matching process:

Mongeon, P., Bowman, T. D., & Costas, R. (2023). An open data set of scholars on Twitter. Quantitative Science Studies, 1–11.
https://doi.org/10.1162/qss_a_00250
d
Population of X/Twitter users and web domains embedded in a multidimensional...
data.sciencespo.fr
tsv
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antoine Vendeville; Jimena Royo-Letelier; Duncan Cassells; Jean-Philippe Cointet; Maxime Crépel; Tim Faverjon; Théophile Lenoir; Béatrice Mazoyer; Benjamin Ooghe-Tabanou; Armin Pournaki; Hiroki Yamashita; Pedro Ramaciotti; Antoine Vendeville; Jimena Royo-Letelier; Duncan Cassells; Jean-Philippe Cointet; Maxime Crépel; Tim Faverjon; Théophile Lenoir; Béatrice Mazoyer; Benjamin Ooghe-Tabanou; Armin Pournaki; Hiroki Yamashita; Pedro Ramaciotti (2025). Population of X/Twitter users and web domains embedded in a multidimensional political opinion space [Dataset]. http://doi.org/10.21410/7E4/QPECFF
Explore at:
tsv(100846), tsv(106000433), tsv(177962), tsv(32523281), tsv(146217)Available download formats
Unique identifier
https://doi.org/10.21410/7E4/QPECFF
Dataset updated
Mar 14, 2025
Dataset provided by
data.sciencespo
Authors
Antoine Vendeville; Jimena Royo-Letelier; Duncan Cassells; Jean-Philippe Cointet; Maxime Crépel; Tim Faverjon; Théophile Lenoir; Béatrice Mazoyer; Benjamin Ooghe-Tabanou; Armin Pournaki; Hiroki Yamashita; Pedro Ramaciotti; Antoine Vendeville; Jimena Royo-Letelier; Duncan Cassells; Jean-Philippe Cointet; Maxime Crépel; Tim Faverjon; Théophile Lenoir; Béatrice Mazoyer; Benjamin Ooghe-Tabanou; Armin Pournaki; Hiroki Yamashita; Pedro Ramaciotti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The undertaking of several studies of political phenomena in social media mandates the operationalization of the notion of political stance of users and contents involved. Relevant examples include the study of segregation and polarization online, the study of political diversity in content diets in social media, or AI explainability. While many research designs rely on operationalizations best suited for the US setting, few allow addressing more general design, in which users and content might take stances on multiple ideology and issue dimensions, going beyond traditional Liberal-Conservative or Left-Right scales. To advance the study of more general online ecosystems, we present a dataset of X/Twitter population of users in the French political Twittersphere and web domains embedded in a political space spanned by dimensions measuring attitudes towards immigration, the EU, liberal values, elites and institutions, nationalism and the environment. We provide several benchmarks validating the positions of these entities (based on both, LLM and human annotations), and discuss several applications for this dataset.
Twitter Dataset
brightdata.com
.json, .csv, .xlsx
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2025). Twitter Dataset [Dataset]. https://brightdata.com/products/datasets/twitter
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our Twitter dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset provides a comprehensive understanding of social media trends, empowering organizations to refine their communication and marketing strategies. Access the entire dataset or customize a subset to fit your needs. Popular use cases include market research to identify trending topics and hashtags, AI training by reviewing factors such as tweet content, retweets, and user interactions for predictive analytics, and trend forecasting by examining correlations between specific themes and user engagement to uncover emerging social media preferences.
Russian Foreign Ministry Twitter Accounts Dataset
zenodo.org
data.niaid.nih.gov
bin
Updated Sep 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Shultz; Benjamin Shultz; E Rosalie Li; E Rosalie Li (2024). Russian Foreign Ministry Twitter Accounts Dataset [Dataset]. http://doi.org/10.5281/zenodo.11489527
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11489527
Dataset updated
Sep 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin Shultz; Benjamin Shultz; E Rosalie Li; E Rosalie Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 5, 2024
Area covered
Russia
Description
This publication introduces a novel dataset of 403 diplomatic X/Twitter accounts belonging to the Russian government (primarily the Russian Foreign Ministry) and accompanying metadata. These accounts have become a known vector in the spread of false and misleading information around the Russian invasion of Ukraine, however, given new restrictions on the accessibility of the X/Twitter API and visibility of users' following lists, the vast majority of these accounts are no longer easily discoverable by researchers. The primary aim behind the publication of this dataset is to provide a comprehensive resource for further analysis of this disinformation vector.
P
Homophobia Detection Dataset (Twitter/X) Dataset
paperswithcode.com
Updated May 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Homophobia Detection Dataset (Twitter/X) Dataset [Dataset]. https://paperswithcode.com/dataset/homophobia-detection-dataset-twitter-x
Explore at:
Dataset updated
May 16, 2024
Description
Dataset Description

Paper: TBC Point of Contact: Josh McGiff (Josh.McGiff@ul.ie)

Dataset Summary This dataset was developed to address the significant gap in online hate speech detection, particularly focusing on homophobia, which is often neglected in sentiment analysis research. It comprises tweets scraped from X (formerly Twitter), which have been labeled for the presence of homophobic content by volunteers from diverse backgrounds. This dataset is the largest open-source labelled English dataset for homophobia detection known to the authors and aims to enhance online safety and inclusivity.

Supported Tasks

Task: Homophobic hate speech detection.

Languages English.

Dataset Structure

Data Fields: tweet_text: The text content of the tweet. label: Binary label indicating the presence of homophobic content (0 = no homophobic content, 1 = homophobic content). 'language': The language of the tweet, as tagged by X/Twitter.

Dataset Creation

Curation Rationale: The dataset was curated to enhance the detection and classification of homophobic content on social media platforms, particularly focusing on the gap where homophobia is underrepresented in current research. Source Data: Data was scraped from X (formerly Twitter) focusing on terms and accounts associated with the LGBTQIA+ community. Annotation Process: Annotations were made by three volunteers from different sexualities and gender identities using a majority vote for label assignment. Annotations were conducted in Microsoft Excel over several days. Personal and Sensitive Information: Usernames and other personal identifiers have been anonymized or removed. URLs have also been removed. The dataset contains sensitive content related to homophobia.

Considerations for Using the Data

Social Impact: The dataset is intended for research purposes to combat online hate speech and improve inclusivity and safety on digital platforms. Ethical Considerations: Given the sensitive nature of hate speech, researchers should consider the impact of their work on marginalised communities and ensure that their use of the dataset aims to reduce harm and promote inclusivity. Legal and Privacy Concerns: Researchers should comply with legal standards and ethical guidelines regarding hate speech and data privacy.

Additional Information

License: CC-BY-4.0 Citation: TBC

Acknowledgements This work was conducted with the financial support of the Science Foundation Ireland Centre for Research Training in Artificial Intelligence under Grant No. 18/CRT/6223.
h
x_fake_profile_detection
huggingface.co
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weronika Dracewicz (2025). x_fake_profile_detection [Dataset]. https://huggingface.co/datasets/drveronika/x_fake_profile_detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2025
Authors
Weronika Dracewicz
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset: Detecting Fake Accounts on Social Media Portals—The X Portal Case Study

This dataset was created as part of the study focused on detecting fake accounts on the X Portal (formerly known as Twitter). The primary aim of the study was to classify social media accounts using image data and machine learning techniques, offering a novel approach to identifying fake accounts. The dataset includes generated accounts, which were used to train and test a Convolutional Neural Network… See the full description on the dataset page: https://huggingface.co/datasets/drveronika/x_fake_profile_detection.
o
Elon Musk Daily Tweets Dataset
opendatabay.com
.undefined
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Elon Musk Daily Tweets Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/3d5a7757-1cfd-423d-b3a9-b2a8449d337c
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Social Media and Networking
Description
This dataset contains a collection of Elon Musk's tweets, updated and recorded automatically on a daily basis. The data collection began on 2 September 2021, subject to limitations of the Twitter API. It focuses exclusively on original content tweets, meaning it excludes replies to other tweets, providing a clear view of direct communications. This dataset offers insights into Elon Musk's public discourse, statements, and activity on the social media platform.

Columns

Id: This column represents the unique identifier for each tweet.

Date: This column indicates the date on which the tweet was created.

Text: This column contains the actual content or body of the tweet.

Distribution

The dataset is typically provided in a CSV file format. It comprises approximately 2072 unique tweet records. Data collection commenced on 2 September 2021 and extends up to 8 June 2023, with daily updates. The number of rows or records will increase as the dataset is continually updated.

Usage

This dataset is ideal for various applications, including: * Social media analysis: Understanding trends and patterns in high-profile individual communications. * Natural Language Processing (NLP): Developing and testing models for sentiment analysis, topic modelling, and text classification based on real-world social media text. * News and media research: Tracking public statements and their impact. * Research into public figures: Analysing communication strategies and thematic content. * AI and LLM training: Providing text data for large language model development and fine-tuning.

Coverage

The dataset's coverage spans from 2 September 2021 to 8 June 2023, with ongoing daily updates. Geographically, the data is considered global, reflecting the nature of online social media platforms. There are no specific demographic notes beyond the fact that the data pertains solely to tweets from Elon Musk and includes only original content, not replies.

License

CC0

Who Can Use It

This dataset is suitable for a wide range of users, including: * Data scientists and analysts: For research, trend analysis, and predictive modelling. * Academics and students: For linguistic studies, social science research, and educational projects. * AI and Machine Learning developers: For training and validating models related to text analysis and language understanding. * Journalists and media professionals: For fact-checking, background research, and narrative development. * Market researchers: For understanding public sentiment and perception surrounding influential figures.

Dataset Name Suggestions

Elon Musk Daily Tweets

The Elon Musk Tweet Archive

Elon Musk's Twitter Chronicle

Musk Tweet Stream

X Insights: Elon Musk

Attributes

Original Data Source: Elon Musk Tweets (Updated Daily Automatically)
twitter-dataset-tesla
huggingface.co
Updated Jul 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fastai X Hugging Face Group 2022 (2022). twitter-dataset-tesla [Dataset]. https://huggingface.co/datasets/hugginglearners/twitter-dataset-tesla
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 11, 2022
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
fastai X Hugging Face Group 2022
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for Twitter Dataset: Tesla

Dataset Summary

This dataset contains all the Tweets regarding #Tesla or #tesla till 12/07/2022 (dd-mm-yyyy). It can be used for sentiment analysis research purpose or used in other NLP tasks or just for fun. It contains 10,000 recent Tweets with the user ID, the hashtags used in the Tweets, and other important features.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information… See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/twitter-dataset-tesla.
m
Graph-Based Social Media Data on Mental Health Topics
data.mendeley.com
Updated Nov 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Ady Sanjaya (2024). Graph-Based Social Media Data on Mental Health Topics [Dataset]. http://doi.org/10.17632/z45txpdp7f.2
Explore at:
Unique identifier
https://doi.org/10.17632/z45txpdp7f.2
Dataset updated
Nov 4, 2024
Authors
Samuel Ady Sanjaya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is structured as a graph, where nodes represent users and edges capture their interactions, including tweets, retweets, replies, and mentions. Each node provides detailed user attributes, such as unique ID, follower and following counts, and verification status, offering insights into each user's identity, role, and influence in the mental health discourse. The edges illustrate user interactions, highlighting engagement patterns and types of content that drive responses, such as tweet impressions. This interconnected structure enables sentiment analysis and public reaction studies, allowing researchers to explore engagement trends and identify the mental health topics that resonate most with users.

The dataset consists of three files: 1. Edges Data: Contains graph data essential for social network analysis, including fields for UserID (Source), UserID (Destination), Post/Tweet ID, and Date of Relationship. This file enables analysis of user connections without including tweet content, maintaining compliance with Twitter/X’s data-sharing policies. 2. Nodes Data: Offers user-specific details relevant to network analysis, including UserID, Account Creation Date, Follower and Following counts, Verified Status, and Date Joined Twitter. This file allows researchers to examine user behavior (e.g., identifying influential users or spam-like accounts) without direct reference to tweet content. 3. Twitter/X Content Data: This file contains only the raw tweet text as a single-column dataset, without associated user identifiers or metadata. By isolating the text, we ensure alignment with anonymization standards observed in similar published datasets, safeguarding user privacy in compliance with Twitter/X's data guidelines. This content is crucial for addressing the research focus on mental health discourse in social media. (References to prior Data in Brief publications involving Twitter/X data informed the dataset's structure.)
Twitter users worldwide 2019-2028
statista.com
ai-chatbox.pro
Updated Dec 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2024). Twitter users worldwide 2019-2028 [Dataset]. https://www.statista.com/topics/2297/twitter-marketing/
Explore at:
Dataset updated
Dec 10, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The global number of Twitter users in was forecast to continuously increase between 2024 and 2028 by in total 74.3 million users (+17.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 503.42 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like South America and the Americas.
Twitter dataset about Information Operations in Honduras and UAE
zenodo.org
Updated Oct 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorenzo Cima; Lorenzo Cima; Lorenzo Mannocci; Lorenzo Mannocci; Marco Avvenuti; Marco Avvenuti; MAURIZIO TESCONI; MAURIZIO TESCONI; Stefano Cresci; Stefano Cresci (2024). Twitter dataset about Information Operations in Honduras and UAE [Dataset]. http://doi.org/10.5281/zenodo.13912659
Explore at:
bin, application/x-troff-meAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13912659
Dataset updated
Oct 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lorenzo Cima; Lorenzo Cima; Lorenzo Mannocci; Lorenzo Mannocci; Marco Avvenuti; Marco Avvenuti; MAURIZIO TESCONI; MAURIZIO TESCONI; Stefano Cresci; Stefano Cresci
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United Arab Emirates, Honduras
Description
Dataset concerning coordinated behaviour in Information Operations in Honduras and United Arab Emirates, consisting of two parts:

malicious tweets, provided by Twitter/X Moderation Research Consortium (TMRC), concerning well-known Information Operations (IOs).

genuine enriching tweets, recovered using Twitter/X search APIs with Academic Elevated Access. Those tweets were published by "genuine" users (i.e. users not into the malicious dataset) and concerned the main topics of the IOs

This dataset allows to explore meaningful patterns of coordination which could distinguish conversations with malicious intent from genuine conversations.

1,2M malicious or genuine tweets about the Honduras IO, shared between 11 September 2019 and 8 January 2020

2,8M malicious or genuine tweets about the UAE IO, shared between 27 January 2019 and 26 May 2019
Z
A study on real graphs of fake news spreading on Twitter
data.niaid.nih.gov
Updated Aug 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3711599
Explore at:
Dataset updated
Aug 20, 2021
Dataset authored and provided by
Amirhosein Bodaghi
Description
*** Fake News on Twitter ***

These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

DD

DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

The structure of excel files for each dataset is as follow:

Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:

User ID (user who has posted the current tweet/retweet)

The description sentence in the profile of the user who has published the tweet/retweet

The number of published tweet/retweet by the user at the time of posting the current tweet/retweet

Date and time of creation of the account by which the current tweet/retweet has been posted

Language of the tweet/retweet

Number of followers

Number of followings (friends)

Date and time of posting the current tweet/retweet

Number of like (favorite) the current tweet had been acquired before crawling it

Number of times the current tweet had been retweeted before crawling it

Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)

The source (OS) of device by which the current tweet/retweet was posted

Tweet/Retweet ID

Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)

Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)

Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)

Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)

State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):

r : The tweet/retweet is a fake news post

a : The tweet/retweet is a truth post

q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it

n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

DG

DG for each fake news contains two files:

A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)

A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
m
Arab Computational Propaganda on X (Twitter)
data.mendeley.com
Updated Oct 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bodor Almotairy (2023). Arab Computational Propaganda on X (Twitter) [Dataset]. http://doi.org/10.17632/58mttpbc7x.3
Explore at:
Unique identifier
https://doi.org/10.17632/58mttpbc7x.3
Dataset updated
Oct 2, 2023
Authors
Bodor Almotairy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The database includes three datasets. All of them were extracted from a dataset published by X (Twitter Transparency Websites) that includes tweets from malicious accounts trying to manipulate public opinion in the Kingdom of Saudi Arabia. Although the propagandist tweets were published by malicious accounts, as X (Twitter) stated, the tweets at their level were not classified as propaganda or not. Propagandists usually mix propaganda and non-propaganda tweets in an attempt to hide their identities. Therefore, it was necessary to classify their tweets as propaganda or not, based on the propaganda technique used. Since the datasets are very large, we annotated a sample of 2,100 tweets. The datasets are made up of 16,355,558 tweets from propagandist users focused on sports and banking topics.
h
tweets
huggingface.co
Updated Dec 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BlockMesh Network Foundation (2024). tweets [Dataset]. https://huggingface.co/datasets/blockmesh/tweets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 31, 2024
Authors
BlockMesh Network Foundation
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
BlockMesh Network

Dataset Summary

The dataset is a sample of our Twitter data collection. It has been prepared for educational and research purposes. It includes public tweets. The dataset is comprised of a JSON lines. The format is: { "user":"Myy23081040", "id":"1870163769273589994", "link":"https://x.com/Myy23081040/status/1870163769273589994", "tweet":"Seu pai é um fofo skskks", "date":"2024-12-21", "reply":"0", "retweet":"0", "like":"2" }

user the… See the full description on the dataset page: https://huggingface.co/datasets/blockmesh/tweets.
Master X-Ray Catalog - Dataset - NASA Open Data Portal
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Master X-Ray Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/master-x-ray-catalog
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The XRAY database table contains selected parameters from almost all HEASARC X-ray catalogs that have source positions located to better than a few arcminutes. The XRAY database table was created by copying all of the entries and common parameters from the tables listed in the Component Tables section. The XRAY database table has many entries but relatively few parameters; it provides users with general information about X-ray sources, obtained from a variety of catalogs. XRAY is especially suitable for cone searches and cross-correlations with other databases. Each entry in XRAY has a parameter called 'database_table' which indicates from which original database the entry was copied; users can browse that original table should they wish to examine all of the parameter fields for a particular entry. For some entries in XRAY, some of the parameter fields may be blank (or have zero values); this indicates that the original database table did not contain that particular parameter or that it had this same value there. The HEASARC in certain instances has included X-ray sources for which the quoted value for the specified band is an upper limit rather than a detection. The HEASARC recommends that the user should always check the original tables to get the complete information about the properties of the sources listed in the XRAY master source list. This master catalog is updated periodically whenever one of the component database tables is modified or a new component database table is added. This is a service provided by NASA HEASARC .
MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane
zenodo.org
Updated May 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2025). MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane [Dataset]. http://doi.org/10.5281/zenodo.15401479
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15401479
Dataset updated
May 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane

We present a Multiplatform Annotated Dataset for Societal Impact of Hurricane (MASH) that includes 98,662 relevant social media data posts from Reddit, X, TikTok, and YouTube.

In addition, all relevant posts are annotated on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes in a multi-modal approach that considers both textual and visual content (text, images, and videos), providing a rich labeled dataset for in-depth analysis.

The dataset is also complemented by an Online Analytics Platform (https://hurricane.web.illinois.edu/) that not only allows users to view hurricane-related posts and articles, but also explores high-frequency keywords, user sentiment, and the locations where posts were made.

To our best knowledge, MASH is the first large-scale, multi-platform, multimodal, and multi-dimensionally annotated hurricane dataset. We envision that MASH can contribute to the study of hurricanes' impact on society, such as disaster severity classification, event detections, public sentiment analysis, and bias identification.

Usage Notice

This dataset includes four annotation files:

• reddit_anno_publish.csv

• tiktok_anno_publish.csv

• twitter_anno_publish.csv

• youtube_anno_publish.csv

Each file contains post IDs and corresponding annotations on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes.

To protect user privacy, only post IDs are released. We recommend retrieving the full post content via the official APIs of each platform, in accordance with their respective terms of service.

- Reddit API (https://www.reddit.com/dev/api)

- TikTok API (https://developers.tiktok.com/products/research-api)

- X/Twitter API (https://developer.x.com/en/docs/x-api)

- YouTube API (https://developers.google.com/youtube/v3)

Humanitarian Classes

Each post is annotated with seven binary humanitarian classes. For each class, the label is either:

• True – the post contains this humanitarian information

• False – the post does not contain this information

These seven humanitarian classes include:

• Casualty: The post reports people or animals who are killed, injured, or missing during the hurricane.

• Evacuation: The post describes the evacuation, relocation, rescue, or displacement of individuals or animals due to the hurricane.

• Damage: The post reports damage to infrastructure or public utilities caused by the hurricane.

• Advice: The post provides advice, guidance, or suggestions related to hurricanes, including how to stay safe, protect property, or prepare for the disaster.

• Request: Request for help, support, or resources due to the hurricane

• Assistance: This includes both physical aid and emotional or psychological support provided by individuals, communities, or organizations.

• Recovery: The post describes efforts or activities related to the recovery and rebuilding process after the hurricane.

Note: A single post may be labeled as True for multiple humanitarian categories.

Bias Classes

Each post is annotated with five binary bias classes. For each class, the label is either:

• True – the post contains this bias information

• False – the post does not contain this information

These five bias classes include:

• Linguistic Bias: The post contains biased, inappropriate, or offensive language, with a focus on word choice, tone, or expression.

• Political Bias: The post expresses political ideology, showing favor or disapproval toward specific political actors, parties, or policies.

• Gender Bias: The post contains biased, stereotypical, or discriminatory language or viewpoints related to gender.

• Hate Speech: The post contains language that expresses hatred, hostility, or dehumanization toward a specific group or individual, especially those belonging to minority or marginalized communities.

• Racial Bias: The post contains biased, discriminatory, or stereotypical statements directed toward one or more racial or ethnic groups.

Note: A single post may be labeled as True for multiple bias categories.

Information Integrity Classes

Each post is also annotated with a single information integrity class, represented by an integer:

• -1 → False information (i.e., misinformation or disinformation)

• 0 → Unverifiable information (unclear or lacking sufficient evidence)

• 1 → True information (verifiable and accurate)

Key Notes

This dataset is also available at https://huggingface.co/datasets/YRC10/MASH.

Version 1 is no longer available.
Twitter users in Brazil 2019-2028
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Twitter users in Brazil 2019-2028 [Dataset]. https://www.statista.com/forecasts/1146589/twitter-users-in-brazil
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Brazil
Description
The number of Twitter users in Brazil was forecast to continuously increase between 2024 and 2028 by in total *** million users (+***** percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach ***** million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
#ChatGPT 1000 Daily 🐦 Tweets
kaggle.com
Updated May 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Enric Domingo (2023). #ChatGPT 1000 Daily 🐦 Tweets [Dataset]. http://doi.org/10.34740/kaggle/dsv/5685262
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5685262
Dataset updated
May 14, 2023
Dataset provided by
Kaggle
Authors
Enric Domingo
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
UPDATE: Due to new Twitter API conditions changed by Elon Musk, now it's no longer free to use the Twitter (X) API and the pricing is 100 $/month in the hobby plan. So my automated ETL notebook stopped from updating new tweets to this dataset on May 13th 2023.

This dataset is was updated everyday with the addition of 1000 tweets/day containing any of the words "ChatGPT", "GPT3", or "GPT4", starting from the 3rd of April 2023. Everyday's tweets are uploaded 24-72h later, so the counter on tweets' likes, retweets, messages and impressions gets enough time to be relevant. Tweets are from any language selected randomly from all hours of the day. There are some basic filters applied trying to discard sensitive tweets and spam.

This dataset can be used for many different applications regarding to Data Analysis and Visualization but also NLP Sentiment Analysis techniques and more.

Consider upvoting this Dataset and the ETL scheduled Notebook providing new data everyday into it if you found them interesting, thanks! 🤗

Columns Description:

tweet_id: Integer. unique identifier for each tweet. Older tweets have smaller IDs.

tweet_created: Timestamp. Time of the tweet's creation.

tweet_extracted: Timestamp. The UTC time when the ETL pipeline pulled the tweet and its metadata (likes count, retweets count, etc).

text: String. The raw payload text from the tweet.

lang: String. Short name for the Tweet text's language.

user_id: Integer. Twitter's unique user id.

user_name: String. The author's public name on Twitter.

user_username: String. The author's Twitter account username (@example)

user_location: String. The author's public location.

user_description: String. The author's public profile's bio.

user_created: Timestamp. Timestamp of user's Twitter account creation.

user_followers_count: Integer. The number of followers of the author's account at the moment of the tweet extraction

user_following_count: Integer. The number of followed accounts from the author's account at the moment of the Tweet extraction

user_tweet_count: Integer. The number of Tweets that the author has published at the moment of the Tweet extraction.

user_verified: Boolean. True if the user is verified (blue mark).

source: The device/app used to publish the tweet (Apparently not working, all values are Nan so far).

retweet_count: Integer. Number of retweets to the Tweet at the moment of the Tweet extraction.

like_count: Integer. Number of Likes to the Tweet at the moment of the Tweet extraction.

reply_count: Integer. Number of reply messages to the Tweet.

impression_count: Integer. Number of times the Tweet has been seen at the moment of the Tweet extraction.

More info: Tweets API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet Users API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user
Action Dataset
kaggle.com
Updated Mar 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arailym (2023). Action Dataset [Dataset]. https://www.kaggle.com/datasets/nenriki/action-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arailym
Description
Smartphones contain tri-axial accelerometers that measure acceleration in all three spatial dimensions. This article will use the raw accelerometer signal data sourced from WISDM Lab, Department of Computer & Information Science, Fordham University, NY.

This data is collected from 36 different users as they performed some day-to-day human activities such as — walking, sitting, standing, jogging, and ascending and descending stairs for a specific period of time. In all cases, data is collected at a frequency of 20 samples per second, that is one record every 50 milliseconds.

The dataset has 6 columns – ‘user’, ‘activity’, ‘timestamp’, ‘x-axis’, ‘y-axis’, and ‘z-axis’. ‘user’ denotes the user ID, ‘timestamp’ is the Unix timestamp in nanoseconds, and the rest are the accelerometer readings along the x, y, and z axes/dimensions at a given instance of time. Our target variable(class-label) is ‘activity’ which we intend to predict.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista Research Department (2024). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/

Twitter users in the United States 2019-2028

Explore at:

74 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 13, 2024

Dataset provided by

Statistahttp://statista.com/

Authors

Statista Research Department

Area covered

United States

Description

The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.

Clear search

Close search

Google apps

Main menu

Twitter users in the United States 2019-2028

Open dataset of scholars on Twitter (X)

Population of X/Twitter users and web domains embedded in a multidimensional...

Twitter Dataset

Russian Foreign Ministry Twitter Accounts Dataset

Homophobia Detection Dataset (Twitter/X) Dataset

x_fake_profile_detection

Elon Musk Daily Tweets Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

twitter-dataset-tesla

Graph-Based Social Media Data on Mental Health Topics

Twitter users worldwide 2019-2028

Twitter dataset about Information Operations in Honduras and UAE

A study on real graphs of fake news spreading on Twitter

Arab Computational Propaganda on X (Twitter)

tweets

Master X-Ray Catalog - Dataset - NASA Open Data Portal

MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane

MASH: A Multiplatform Annotated Dataset for Societal Impact of Hurricane

Usage Notice

Humanitarian Classes

Bias Classes

Information Integrity Classes

Key Notes

Twitter users in Brazil 2019-2028

#ChatGPT 1000 Daily 🐦 Tweets

Columns Description:

Action Dataset

Twitter users in the United States 2019-2028