100+ datasets found

g
Just Another Day on Twitter: A Complete 24 Hours of Twitter Data
search.gesis.org
Updated Oct 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pfeffer, Jürgen (2022). Just Another Day on Twitter: A Complete 24 Hours of Twitter Data [Dataset]. https://search.gesis.org/research_data/SDN-10.7802-2516
Explore at:
Dataset updated
Oct 16, 2022
Dataset provided by
GESIS, Köln
GESIS search
Authors
Pfeffer, Jürgen
License
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
Description
At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change.
s
Twitter Revenue Growth
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Twitter Revenue Growth [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Advertising makes up 89% of its total revenue and data licensing makes up about 11%.
X/Twitter: Countries with the largest audience 2025
statista.com
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). X/Twitter: Countries with the largest audience 2025 [Dataset]. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description
Social network X/Twitter is particularly popular in the United States, and as of February 2025, the microblogging service had an audience reach of 103.9 million users in the country. Japan and the India were ranked second and third with more than 70 million and 25 million users respectively. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.
u
Data from: Google Analytics & Twitter dataset from a movies, TV series and...
portalcientificovalencia.univeuropea.com
figshare.com
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeste, Víctor; Yeste, Víctor (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed3aea56d4af0485dc8
Explore at:
Dataset updated
2024
Authors
Yeste, Víctor; Yeste, Víctor
Description
Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio
X/Twitter: number of worldwide users 2019-2024
statista.com
Updated Dec 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). X/Twitter: number of worldwide users 2019-2024 [Dataset]. https://www.statista.com/statistics/303681/twitter-users-worldwide/
Explore at:
Dataset updated
Dec 13, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2022
Area covered
Worldwide
Description
As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.
T
Twitter Statistics
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Search Logistics (2025). Twitter Statistics [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
Dataset authored and provided by
Search Logistics
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These Twitter user statistics will give you the complete story of where Twitter is at today and what the future looks like for the social media company.
Data from: Twitter Data
kaggle.com
zip
Updated Jul 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shyam R (2020). Twitter Data [Dataset]. https://www.kaggle.com/darkknight98/twitter-data
Explore at:
zip(3163708 bytes)Available download formats
Dataset updated
Jul 28, 2020
Authors
Shyam R
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The following data-set consists of very simple twitter analytics data, including text, user information, confidence, profile dates etc.

Content

Basically the dataset is self explanatory and the objective is basically to classify which gender is more likely to commit typos on their tweets.

Inspiration

Since this dataset contains pretty simple and easy-to-deal-with features, I hope many emerging NLP enthusiasts who have been developing just basic linear/naive models until now, can explore how to apply these techniques to real word tweet data.
Twitter Dataset
brightdata.com
.json, .csv, .xlsx
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2025). Twitter Dataset [Dataset]. https://brightdata.com/products/datasets/twitter
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our Twitter dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset provides a comprehensive understanding of social media trends, empowering organizations to refine their communication and marketing strategies. Access the entire dataset or customize a subset to fit your needs. Popular use cases include market research to identify trending topics and hashtags, AI training by reviewing factors such as tweet content, retweets, and user interactions for predictive analytics, and trend forecasting by examining correlations between specific themes and user engagement to uncover emerging social media preferences.
f
Twitter dataset
figshare.com
csv
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan (2025). Twitter dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28390334.v2
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28390334.v2
Dataset updated
Feb 11, 2025
Dataset provided by
figshare
Authors
Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.
H
Tweets Dataset - Top 20 most followed users in Twitter social platform
dataverse.harvard.edu
Updated Aug 18, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raad Bin Tareaf (2017). Tweets Dataset - Top 20 most followed users in Twitter social platform [Dataset]. http://doi.org/10.7910/DVN/JBXKFD
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/JBXKFD
Dataset updated
Aug 18, 2017
Dataset provided by
Harvard Dataverse
Authors
Raad Bin Tareaf
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
-This Dataset was gathered by crawling Twitter's REST API using the Python library tweepy 3. This dataset contains the tweets of the 20 most popular twitter users (with the most followers) whereby retweets are neglected. These accounts belong to public people, such as Katy Perry and Barack Obama, platforms, YouTube, Instagram, and television channels shows, e.g., CNN Breaking News and The Ellen Show. -Consequently, the dataset contains a mix of relatively structured tweets, tweets written in a formal and informative manner, and completely unstructured tweets written in a colloquial style. Unfortunately, the geocoordinates were not available for those tweets. - H -This Dataset has been used to generate reserach paper under title "Machine Learning Techniques for Anomalies Detection in Post Arrays". -Crawled attributes are: Author (Twitter User), Content (Tweet), Date_Time, id (Twitter User ID), language (Tweet Langugage), Number_of_Likes, Number_of_Shares. Overall: 52543 tweets of top 20 users in twitter Screen_Name #Tweets Time span (in days) TheEllenShow 3,147 - 662 jimmyfallon 3,123 - 1231 ArianaGrande 3,104 - 613 YouTube 3,077 - 411 KimKardashian 2,939 - 603 katyperry 2,924 - 1,598 selenagomez 2,913 - 2,266 rihanna 2,877 - 1,557 BarackObama 2,863 - 849 britneyspears 2,776 - 1,548 instagram 2,577 - 456 shakira 2,530 - 1,850 Cristiano 2,507 - 2,407 jtimberlake 2,478 - 2,491 ladygaga 2,329 - 894 Twitter 2,290 - 2,593 ddlovato 2,217 - 741 taylorswift13 2,029 - 2,091 justinbieber 2,000 - 664 cnnbrk 1,842 - 183
o
Twitter Public Sentiment Dataset
opendatabay.com
.undefined
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Twitter Public Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/04ea3224-1b10-48d4-871a-496c9a2633ff
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Telecommunications & Network Data
Description
This dataset provides a collection of 1000 tweets designed for sentiment analysis. The tweets were sourced from Twitter using Python and systematically generated using various modules to ensure a balanced representation of different tweet types, user behaviours, and sentiments. This includes the use of a random module for IDs and text, a faker module for usernames and dates, and a textblob module for assigning sentiment. The dataset's purpose is to offer a robust foundation for analysing and visualising sentiment trends and patterns, aiding in the initial exploration of data and the identification of significant patterns or trends.

Columns

Tweet ID: A unique identifier assigned to each individual tweet.

Text: The actual textual content of the tweet.

User: The username of the individual who posted the tweet.

Created At: The date and time when the tweet was originally published.

Likes: The total number of likes or approvals the tweet received.

Retweets: The total count of times the tweet was shared by other users.

Sentiment: The categorised emotional tone of the tweet, typically labelled as positive, neutral, or negative.

Distribution

The dataset is provided in a CSV file format. It consists of 1000 individual tweet records, structured in a tabular layout with the columns detailed above. A sample file will be made available separately on the platform.

Usage

This dataset is ideal for: * Analysing and visualising sentiment trends and patterns in social media. * Initial data exploration to uncover insights into tweet characteristics and user emotions. * Identifying underlying patterns or trends within social media conversations. * Developing and training machine learning models for sentiment classification. * Academic research into Natural Language Processing (NLP) and social media dynamics. * Educational purposes, allowing students to practise data analysis and visualisation techniques.

Coverage

The dataset spans tweets created between January and April 2023, as observed from the included data samples. While specific geographic or demographic information for users is not available within the dataset, the nature of Twitter implies a general global scope, reflecting a variety of user behaviours and sentiments without specific regional or population group focus.

License

CC0

Who Can Use It

This dataset is valuable for: * Data Scientists and Machine Learning Engineers working on NLP tasks and model development. * Researchers in fields such as Natural Language Processing, Machine Learning Algorithms, Deep Learning, and Computer Science. * Data Analysts looking to extract insights from social media content. * Academics and Students undertaking projects related to sentiment analysis or social media studies. * Anyone interested in understanding online sentiment and user behaviour on social media platforms.

Dataset Name Suggestions

Twitter Public Sentiment Dataset

Social Media Text Sentiment Analysis

General Tweet Mood Data

Twitter Sentiment Collection 2023

Microblog Sentiment Dataset

Attributes

Original Data Source: Twitter Sentiment Analysis using Roberta and VaderTwitter Sentiment Analysis using Roberta and Vader
Z
Data from: IA Tweets Analysis Dataset (Spanish)
data.niaid.nih.gov
zenodo.org
Updated Aug 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muñoz, Andrés (2024). IA Tweets Analysis Dataset (Spanish) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10821484
Explore at:
Dataset updated
Aug 3, 2024
Dataset provided by
Balderas-Díaz, Sara
Muñoz, Andrés
Serrano-Fernández, Alejandro
Guerrero-Contreras, Gabriel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General Description

This dataset comprises 4,038 tweets in Spanish, related to discussions about artificial intelligence (AI), and was created and utilized in the publication "Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights," (10.1109/IE61493.2024.10599899) presented at the 20th International Conference on Intelligent Environments. It is designed to support research on public perception, sentiment, and engagement with AI topics on social media from a Spanish-speaking perspective. Each entry includes detailed annotations covering sentiment analysis, user engagement metrics, and user profile characteristics, among others.

Data Collection Method

Tweets were gathered through the Twitter API v1.1 by targeting keywords and hashtags associated with artificial intelligence, focusing specifically on content in Spanish. The dataset captures a wide array of discussions, offering a holistic view of the Spanish-speaking public's sentiment towards AI.

Dataset Content

ID: A unique identifier for each tweet.

text: The textual content of the tweet. It is a string with a maximum allowed length of 280 characters.

polarity: The tweet's sentiment polarity (e.g., Positive, Negative, Neutral).

favorite_count: Indicates how many times the tweet has been liked by Twitter users. It is a non-negative integer.

retweet_count: The number of times this tweet has been retweeted. It is a non-negative integer.

user_verified: When true, indicates that the user has a verified account, which helps the public recognize the authenticity of accounts of public interest. It is a boolean data type with two allowed values: True or False.

user_default_profile: When true, indicates that the user has not altered the theme or background of their user profile. It is a boolean data type with two allowed values: True or False.

user_has_extended_profile: When true, indicates that the user has an extended profile. An extended profile on Twitter allows users to provide more detailed information about themselves, such as an extended biography, a header image, details about their location, website, and other additional data. It is a boolean data type with two allowed values: True or False.

user_followers_count: The current number of followers the account has. It is a non-negative integer.

user_friends_count: The number of users that the account is following. It is a non-negative integer.

user_favourites_count: The number of tweets this user has liked since the account was created. It is a non-negative integer.

user_statuses_count: The number of tweets (including retweets) posted by the user. It is a non-negative integer.

user_protected: When true, indicates that this user has chosen to protect their tweets, meaning their tweets are not publicly visible without their permission. It is a boolean data type with two allowed values: True or False.

user_is_translator: When true, indicates that the user posting the tweet is a verified translator on Twitter. This means they have been recognized and validated by the platform as translators of content in different languages. It is a boolean data type with two allowed values: True or False.

Cite as

Guerrero-Contreras, G., Balderas-Díaz, S., Serrano-Fernández, A., & Muñoz, A. (2024, June). Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights. In 2024 International Conference on Intelligent Environments (IE) (pp. 62-69). IEEE.

Potential Use Cases

This dataset is aimed at academic researchers and practitioners with interests in:

Sentiment analysis and natural language processing (NLP) with a focus on AI discussions in the Spanish language.

Social media analysis on public engagement and perception of artificial intelligence among Spanish speakers.

Exploring correlations between user engagement metrics and sentiment in discussions about AI.

Data Format and File Type

The dataset is provided in CSV format, ensuring compatibility with a wide range of data analysis tools and programming environments.

License

The dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, permitting sharing, copying, distribution, transmission, and adaptation of the work for any purpose, including commercial, provided proper attribution is given.
o
Global New Year Tweets Dataset
opendatabay.com
.undefined
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Global New Year Tweets Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/e621ef68-74a1-4014-9005-d8e7e51fba1b
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Social Media and Networking
Description
This dataset contains a collection of approximately 100,000 tweets scraped from the Twitter API, specifically mentioning keywords related to "New Year" [1]. The tweets were collected during the evening and night of 31st December 2021 [1, 2]. The scraping process was conducted over several hours to prevent a concentration of tweets from a single timezone or country, aiming for a broad geographical representation [1]. To ensure focus on original content, retweets and quote tweets from other users were intentionally excluded [1, 2]. This dataset is ideal for analysing public sentiment and social trends around the New Year period [1].

Columns

Tweet number in the dataset: An internal tracking number for tweets within this specific dataset, provided to offer a smaller identifier compared to the large numerical Twitter IDs [1, 2].

author_id: The unique identification number assigned to the author of each tweet by Twitter [1, 2].

id: The unique identification number assigned to the tweet itself by Twitter [1, 2].

text: The full content of the tweet. This column may include various elements such as emojis, external links, and mentions of other users [1, 2].

username: The publicly visible username of the tweet's author [1, 2].

Distribution

The dataset is typically provided in CSV format [3]. It comprises approximately 110,000 records [1, 4, 5], representing a significant volume of social media posts. For instance, the 'Tweet number in the dataset' column has over 110,000 unique values [5].

Usage

This dataset is particularly suitable for: * Conducting sentiment analysis to understand public opinion and feelings about the start of the New Year [1]. * Natural Language Processing (NLP) tasks, such as topic modelling, text classification, and entity recognition. * Social media trend analysis specific to the New Year period. * Research into public discourse during significant global events.

Coverage

Time Range: Data was collected on the evening and night of 31st December 2021 [1, 2].

Geographic Scope: The collection methodology, involving scraping over several hours, aimed to avoid geographical clustering, suggesting a worldwide coverage of tweets from various time zones [1].

Demographic Scope: The dataset represents public tweets from general Twitter users. Specific demographic details of the authors are not available.

License

CC0

Who Can Use It

Data scientists and machine learning engineers for developing and testing NLP models.

Academic researchers studying social media behaviour, public opinion, and linguistic patterns.

Marketing and PR professionals seeking insights into consumer sentiment during holiday periods.

Analysts interested in event-driven social media activity.

Dataset Name Suggestions

New Year's Eve Tweets 2021

2021 New Year Twitter Data

New Year Sentiment Tweets

Global New Year Tweets

Attributes

Original Data Source: New Years 2021 Tweets
s
Twitter cascade dataset
researchdata.smu.edu.sg
figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Living Analytics Research Centre (2023). Twitter cascade dataset [Dataset]. http://doi.org/10.25440/smu.12062709.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25440/smu.12062709.v1
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
Living Analytics Research Centre
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
This dataset comprises a set of information cascades generated by Singapore Twitter users. Here a cascade is defined as a set of tweets about the same topic. This dataset was collected via the Twitter REST and streaming APIs in the following way. Starting from popular seed users (i.e., users having many followers), we crawled their follow, retweet, and user mention links. We then added those followers/followees, retweet sources, and mentioned users who state Singapore in their profile location. With this, we have a total of 184,794 Twitter user accounts. Then tweets are crawled from these users from 1 April to 31 August 2012. In all, we got 32,479,134 tweets. To identify cascades, we extracted all the URL links and hashtags from the above tweets. And these URL links and hashtags are considered as the identities of cascades. In other words, all the tweets which contain the same URL link (or the same hashtag) represent a cascade. Mathematically, a cascade is represented as a set of user-timestamp pairs. Figure 1 provides an example, i.e. cascade C = {< u1, t1 >, < u2, t2 >, < u1, t3 >, < u3, t4 >, < u4, t5 >}. For evaluation, the dataset was split into two parts: four months data for training and the last one month data for testing. Table 1summarizes the basic (count) statistics of the dataset. Each line in each file represents a cascade. The first term in each line is a hashtag or URL, the second term is a list of user-timestamp pairs. Due to privacy concerns, all user identities are anonymized.
Twitter users in the United States 2019-2028
statista.com
ai-chatbox.pro
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2024). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
Explore at:
Dataset updated
Jun 13, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United States
Description
The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.
u
Data from: IA Tweets Analysis Dataset (Spanish)
produccioncientifica.uca.es
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guerrero-Contreras, Gabriel; Balderas-Díaz, Sara; Serrano-Fernández, Alejandro; Muñoz, Andrés; Guerrero-Contreras, Gabriel; Balderas-Díaz, Sara; Serrano-Fernández, Alejandro; Muñoz, Andrés (2024). IA Tweets Analysis Dataset (Spanish) [Dataset]. https://produccioncientifica.uca.es/documentos/67321e53aea56d4af04854c2
Explore at:
Dataset updated
2024
Authors
Guerrero-Contreras, Gabriel; Balderas-Díaz, Sara; Serrano-Fernández, Alejandro; Muñoz, Andrés; Guerrero-Contreras, Gabriel; Balderas-Díaz, Sara; Serrano-Fernández, Alejandro; Muñoz, Andrés
Description
Cite as

Guerrero-Contreras, G., Balderas-Díaz, S., Serrano-Fernández, A., & Muñoz, A. (2024, June). Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights. In 2024 International Conference on Intelligent Environments (IE) (pp. 62-69). IEEE.

General Description

This dataset comprises 4,038 tweets in Spanish, related to discussions about artificial intelligence (AI), and was created and utilized in the publication "Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights," (10.1109/IE61493.2024.10599899) presented at the 20th International Conference on Intelligent Environments. It is designed to support research on public perception, sentiment, and engagement with AI topics on social media from a Spanish-speaking perspective. Each entry includes detailed annotations covering sentiment analysis, user engagement metrics, and user profile characteristics, among others.

Data Collection Method

Tweets were gathered through the Twitter API v1.1 by targeting keywords and hashtags associated with artificial intelligence, focusing specifically on content in Spanish. The dataset captures a wide array of discussions, offering a holistic view of the Spanish-speaking public's sentiment towards AI.

Dataset Content

ID: A unique identifier for each tweet.

text: The textual content of the tweet. It is a string with a maximum allowed length of 280 characters.

polarity: The tweet's sentiment polarity (e.g., Positive, Negative, Neutral).

favorite_count: Indicates how many times the tweet has been liked by Twitter users. It is a non-negative integer.

retweet_count: The number of times this tweet has been retweeted. It is a non-negative integer.

user_verified: When true, indicates that the user has a verified account, which helps the public recognize the authenticity of accounts of public interest. It is a boolean data type with two allowed values: True or False.

user_default_profile: When true, indicates that the user has not altered the theme or background of their user profile. It is a boolean data type with two allowed values: True or False.

user_has_extended_profile: When true, indicates that the user has an extended profile. An extended profile on Twitter allows users to provide more detailed information about themselves, such as an extended biography, a header image, details about their location, website, and other additional data. It is a boolean data type with two allowed values: True or False.

user_followers_count: The current number of followers the account has. It is a non-negative integer.

user_friends_count: The number of users that the account is following. It is a non-negative integer.

user_favourites_count: The number of tweets this user has liked since the account was created. It is a non-negative integer.

user_statuses_count: The number of tweets (including retweets) posted by the user. It is a non-negative integer.

user_protected: When true, indicates that this user has chosen to protect their tweets, meaning their tweets are not publicly visible without their permission. It is a boolean data type with two allowed values: True or False.

user_is_translator: When true, indicates that the user posting the tweet is a verified translator on Twitter. This means they have been recognized and validated by the platform as translators of content in different languages. It is a boolean data type with two allowed values: True or False.

Potential Use Cases

This dataset is aimed at academic researchers and practitioners with interests in:

Sentiment analysis and natural language processing (NLP) with a focus on AI discussions in the Spanish language.

Social media analysis on public engagement and perception of artificial intelligence among Spanish speakers.

Exploring correlations between user engagement metrics and sentiment in discussions about AI.

Data Format and File Type

The dataset is provided in CSV format, ensuring compatibility with a wide range of data analysis tools and programming environments.

License

The dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, permitting sharing, copying, distribution, transmission, and adaptation of the work for any purpose, including commercial, provided proper attribution is given.
s
Twitter Key Statistics
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Twitter Key Statistics [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are the key Twitter user statistics that you need to know.
Data from: TWITTER DATA
kaggle.com
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
smmmmmmmmmmmm (2024). TWITTER DATA [Dataset]. https://www.kaggle.com/datasets/smmmmmmmmmmmm/twitter-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 30, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
smmmmmmmmmmmm
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset consists of various columns containing information related to tweets posted on Twitter. Each row in the dataset represents a single tweet. Here's an explanation of the columns in the dataset from a third-person perspective:

Tweet: This column contains the actual text content of the tweet. It includes the message that the user posted on Twitter. Tweets can vary in length from a few characters to the maximum allowed by Twitter.

Sentiment: This column indicates the sentiment or emotional tone of the tweet. Sentiment can be classified into categories such as positive, negative, or neutral. It reflects the overall opinion or attitude expressed in the tweet.

Username: This column contains the username of the Twitter account that posted the tweet. Each Twitter user has a unique username that identifies their account.

Timestamp: This column contains the timestamp indicating when the tweet was posted. It includes information about the date and time when the tweet was published on Twitter.

Retweets: This column represents the number of times the tweet has been retweeted by other Twitter users. A retweet is when a user shares another user's tweet with their followers.

Likes: This column indicates the number of likes or favorites received by the tweet. Users can express their appreciation for a tweet by liking it.

Hashtags: This column contains any hashtags included in the tweet. Hashtags are keywords or phrases preceded by the "#" symbol, used to categorize or label tweets and make them more discoverable.

Mentions: This column includes any Twitter usernames mentioned in the tweet. Mentions are when a user tags another user in their tweet by including their username preceded by the "@" symbol.

Location: This column provides information about the location associated with the tweet. It may include details such as the city, state, country, or geographical coordinates from which the tweet was posted, if available.

Source: This column specifies the source or platform used to post the tweet. It indicates whether the tweet was posted from the Twitter website, a mobile app, or a third-party application.
Famous Words Twitter Dataset
kaggle.com
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
_w1998 (2023). Famous Words Twitter Dataset [Dataset]. https://www.kaggle.com/datasets/jackksoncsie/twitter-dataset-keywords-likes-and-tweets/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
_w1998
License
http://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html
Description
The Famous Words Twitter Dataset is a comprehensive collection of tweets associated with famous words. The dataset provides valuable insights into the social media engagement and popularity of these words on the Twitter platform. It includes three primary columns: keyword, likes, and tweets.

The keyword column represents the specific famous word or phrase associated with each tweet. It allows researchers and analysts to explore the dynamics of user interactions and discussions surrounding these popular terms on Twitter.

The likes column indicates the number of likes received by each tweet. This metric serves as an indicator of the tweet's popularity and resonation among Twitter users.

The tweet column contains the actual tweet text, capturing the content and context of user-generated messages related to the famous words. This column provides valuable qualitative data for sentiment analysis, topic modeling, and other natural language processing tasks.

Researchers, data scientists, and social media analysts can leverage this dataset to study various aspects, such as tracking trends, sentiment analysis, understanding user engagement patterns, and identifying influential topics associated with famous words on Twitter.

Topics: "COVID-19", "Vaccine", "Zoom", "Bitcoin", "Dogecoin", "NFT", "Elon Musk", "Tesla", "Amazon", "iPhone 12", "Remote work", "TikTok", "Instagram", "Facebook", "YouTube", "Netflix", "GameStop", "Super Bowl", "Olympics", "Black Lives Matter" "India vs England", "Ukraine", "Queen Elizabeth", "World Cup", "Jeffrey Dahmer", "Johnny Depp", "Will Smith", "Weather", "xvideo", "porn", "nba", "Macdonald",

Total has 128837 tweets, and here are the plot for each number of tweets for different keyword

https://i.imgur.com/z4xbbyt.png" alt="">

Note: The dataset is carefully curated, anonymized, and stripped of any personally identifiable information to protect user privacy.
Z
Data from: GeoCoV19: A Dataset of Hundreds of Millions of Multilingual...
data.niaid.nih.gov
zenodo.org
Updated Jun 16, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Imran (2020). GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3878598
Explore at:
Dataset updated
Jun 16, 2020
Dataset provided by
Umair Qazi
Ferda Ofli
Muhammad Imran
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present GeoCoV19, a large-scale Twitter dataset related to the ongoing COVID-19 pandemic. The dataset has been collected over a period of 90 days from February 1 to May 1, 2020 and consists of more than 524 million multilingual tweets. As the geolocation information is essential for many tasks such as disease tracking and surveillance, we employed a gazetteer-based approach to extract toponyms from user location and tweet content to derive their geolocation information using the Nominatim (Open Street Maps) data at different geolocation granularity levels. In terms of geographical coverage, the dataset spans over 218 countries and 47K cities in the world. The tweets in the dataset are from more than 43 million Twitter users, including around 209K verified accounts. These users posted tweets in 62 different languages.

Facebook

Twitter

Click to copy link

Link copied

Cite

Pfeffer, Jürgen (2022). Just Another Day on Twitter: A Complete 24 Hours of Twitter Data [Dataset]. https://search.gesis.org/research_data/SDN-10.7802-2516

Just Another Day on Twitter: A Complete 24 Hours of Twitter Data

Explore at:

Dataset updated

Oct 16, 2022

Dataset provided by

GESIS, Köln
GESIS search

Authors

Pfeffer, Jürgen

License

https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

Description

At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change.

Clear search

Close search

Google apps

Main menu

Just Another Day on Twitter: A Complete 24 Hours of Twitter Data

Twitter Revenue Growth

X/Twitter: Countries with the largest audience 2025

Data from: Google Analytics & Twitter dataset from a movies, TV series and...

X/Twitter: number of worldwide users 2019-2024

Twitter Statistics

Data from: Twitter Data

Context

Content

Inspiration

Twitter Dataset

Twitter dataset

Tweets Dataset - Top 20 most followed users in Twitter social platform

Twitter Public Sentiment Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Data from: IA Tweets Analysis Dataset (Spanish)

Global New Year Tweets Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Twitter cascade dataset

Twitter users in the United States 2019-2028

Data from: IA Tweets Analysis Dataset (Spanish)

Twitter Key Statistics

Data from: TWITTER DATA

Famous Words Twitter Dataset

Data from: GeoCoV19: A Dataset of Hundreds of Millions of Multilingual...

Just Another Day on Twitter: A Complete 24 Hours of Twitter Data