100+ datasets found

s
Why Do People Use Twitter?
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Why Do People Use Twitter? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
One of the biggest advantages of Twitter is the speed at which information can be passed around. People use Twitter primarily to get news and for entertainment. This is the breakdown of why people use Twitter today.
X/Twitter: Countries with the largest audience 2025
statista.com
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). X/Twitter: Countries with the largest audience 2025 [Dataset]. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description
Social network X/Twitter is particularly popular in the United States, and as of February 2025, the microblogging service had an audience reach of 103.9 million users in the country. Japan and the India were ranked second and third with more than 70 million and 25 million users respectively. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.
X/Twitter users in the United Kingdom 2019-2028
statista.com
Updated Jan 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). X/Twitter users in the United Kingdom 2019-2028 [Dataset]. https://www.statista.com/topics/11843/x-formerly-twitter-in-the-united-kingdom-uk/
Explore at:
Dataset updated
Jan 13, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United Kingdom
Description
The number of Twitter users in the United Kingdom was forecast to continuously increase between 2024 and 2028 by in total 0.9 million users (+5.1 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 18.55 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
s
Twitter Key Statistics
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Twitter Key Statistics [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are the key Twitter user statistics that you need to know.
Z
Data from: TWIGMA: A dataset of AI-Generated Images with Metadata From...
data.niaid.nih.gov
Updated May 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Zou (2024). TWIGMA: A dataset of AI-Generated Images with Metadata From Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8031784
Explore at:
Dataset updated
May 28, 2024
Dataset provided by
Yiqun Chen
James Zou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Update May 2024: Fixed a data type issue with "id" column that prevented twitter ids from rendering correctly.

Recent progress in generative artificial intelligence (gen-AI) has enabled the generation of photo-realistic and artistically-inspiring photos at a single click, catering to millions of users online. To explore how people use gen-AI models such as DALLE and StableDiffusion, it is critical to understand the themes, contents, and variations present in the AI-generated photos. In this work, we introduce TWIGMA (TWItter Generative-ai images with MetadatA), a comprehensive dataset encompassing 800,000 gen-AI images collected from Jan 2021 to March 2023 on Twitter, with associated metadata (e.g., tweet text, creation date, number of likes).

Through a comparative analysis of TWIGMA with natural images and human artwork, we find that gen-AI images possess distinctive characteristics and exhibit, on average, lower variability when compared to their non-gen-AI counterparts. Additionally, we find that the similarity between a gen-AI image and human images (i) is correlated with the number of likes; and (ii) can be used to identify human images that served as inspiration for the gen-AI creations. Finally, we observe a longitudinal shift in the themes of AI-generated images on Twitter, with users increasingly sharing artistically sophisticated content such as intricate human portraits, whereas their interest in simple subjects such as natural scenes and animals has decreased. Our analyses and findings underscore the significance of TWIGMA as a unique data resource for studying AI-generated images.

Note that in accordance with the privacy and control policy of Twitter, NO raw content from Twitter is included in this dataset and users could and need to retrieve the original Twitter content used for analysis using the Twitter id. In addition, users who want to access Twitter data should consult and follow rules and regulations closely at the official Twitter developer policy at https://developer.twitter.com/en/developer-terms/policy.
Twitter users in the United States 2019-2028
statista.com
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
Explore at:
Dataset updated
Jul 30, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United States
Description
The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.
s
Twitter Users Broken Down By Age
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Twitter Users Broken Down By Age [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the breakdown of Twitter users by age group.
s
Twitter Users Broken down By Country
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Twitter Users Broken down By Country [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The US has historically been the target country for Twitter since its launch in 2006. This is the full breakdown of Twitter users by country.
d
Data from: Twitter Big Data as A Resource For Exoskeleton Research: A...
search.dataone.org
Updated Nov 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thakur, Nirmalya (2023). Twitter Big Data as A Resource For Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions [Dataset]. http://doi.org/10.7910/DVN/VPPTRF
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/VPPTRF
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Thakur, Nirmalya
Description
Please cite the following paper when using this dataset: N. Thakur, “Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions,” Preprints, 2022, DOI: 10.20944/preprints202206.0383.v1 Abstract The exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and use cases in assisted living, military, healthcare, firefighting, and industries. With the projected increase in the diverse uses of exoskeletons in the next few years in these application domains and beyond, it is crucial to study, interpret, and analyze user perspectives, public opinion, reviews, and feedback related to exoskeletons, for which a dataset is necessary. The Internet of Everything era of today's living, characterized by people spending more time on the Internet than ever before, holds the potential for developing such a dataset by mining relevant web behavior data from social media communications, which have increased exponentially in the last few years. Twitter, one such social media platform, is highly popular amongst all age groups, who communicate on diverse topics including but not limited to news, current events, politics, emerging technologies, family, relationships, and career opportunities, via tweets, while sharing their views, opinions, perspectives, and feedback towards the same. Therefore, this work presents a dataset of about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons. Instructions: This dataset contains about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons. The dataset contains only tweet identifiers (Tweet IDs) due to the terms and conditions of Twitter to re-distribute Twitter data only for research purposes. They need to be hydrated to be used. The process of retrieving a tweet's complete information (such as the text of the tweet, username, user ID, date and time, etc.) using its ID is known as the hydration of a tweet ID. The Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweets) or any similar application may be used for hydrating this dataset. Data Description This dataset consists of 7 .txt files. The following shows the number of Tweet IDs and the date range (of the associated tweets) in each of these files. Filename: Exoskeleton_TweetIDs_Set1.txt (Number of Tweet IDs – 22945, Date Range of Tweets - July 20, 2021 – May 21, 2022) Filename: Exoskeleton_TweetIDs_Set2.txt (Number of Tweet IDs – 19416, Date Range of Tweets - Dec 1, 2020 – July 19, 2021) Filename: Exoskeleton_TweetIDs_Set3.txt (Number of Tweet IDs – 16673, Date Range of Tweets - April 29, 2020 - Nov 30, 2020) Filename: Exoskeleton_TweetIDs_Set4.txt (Number of Tweet IDs – 16208, Date Range of Tweets - Oct 5, 2019 - Apr 28, 2020) Filename: Exoskeleton_TweetIDs_Set5.txt (Number of Tweet IDs – 17983, Date Range of Tweets - Feb 13, 2019 - Oct 4, 2019) Filename: Exoskeleton_TweetIDs_Set6.txt (Number of Tweet IDs – 34009, Date Range of Tweets - Nov 9, 2017 - Feb 12, 2019) Filename: Exoskeleton_TweetIDs_Set7.txt (Number of Tweet IDs – 11351, Date Range of Tweets - May 21, 2017 - Nov 8, 2017) Here, the last date for May is May 21 as it was the most recent date at the time of data collection. The dataset would be updated soon to incorporate more recent tweets.
X/Twitter: number of worldwide users 2019-2024
statista.com
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). X/Twitter: number of worldwide users 2019-2024 [Dataset]. https://www.statista.com/statistics/303681/twitter-users-worldwide/
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2022
Area covered
Worldwide
Description
As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.
S
Social media profile growth, engagement rate, and reach
data.sugarlandtx.gov
xlsx
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Communications and Community Engagement (2024). Social media profile growth, engagement rate, and reach [Dataset]. https://data.sugarlandtx.gov/dataset/social-media-profile-growth-engagement-rate-and-reach
Explore at:
xlsxAvailable download formats
Dataset updated
Jan 3, 2024
Dataset authored and provided by
Communications and Community Engagement
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Profile growth - the growth on our social platforms to see where and when we're gaining followers. Engagement rate - a ratio of how many people interacted with ours posts based on when users are usually online. Reach - the number of feeds our posts appeared in (doesn't mean people interacted with the post).
s
Twitter bot profiling
researchdata.smu.edu.sg
pdf
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Living Analytics Research Centre (2023). Twitter bot profiling [Dataset]. http://doi.org/10.25440/smu.12062706.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25440/smu.12062706.v1
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
Living Analytics Research Centre
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
This dataset comprises a set of Twitter accounts in Singapore that are used for social bot profiling research conducted by the Living Analytics Research Centre (LARC) at Singapore Management University (SMU). Here a bot is defined as a Twitter account that generates contents and/or interacts with other users automatically (at least according to human judgment). In this research, Twitter bots have been categorized into three major types:

Broadcast bot. This bot aims at disseminating information to general audience by providing, e.g., benign links to news, blogs or sites. Such bot is often managed by an organization or a group of people (e.g., bloggers). Consumption bot. The main purpose of this bot is to aggregate contents from various sources and/or provide update services (e.g., horoscope reading, weather update) for personal consumption or use. Spam bot. This type of bots posts malicious contents (e.g., to trick people by hijacking certain account or redirecting them to malicious sites), or promotes harmless but invalid/irrelevant contents aggressively.

This categorization is general enough to cater for new, emerging types of bot (e.g., chatbots can be viewed as a special type of broadcast bots). The dataset was collected from 1 January to 30 April 2014 via the Twitter REST and streaming APIs. Starting from popular seed users (i.e., users having many followers), their follow, retweet, and user mention links were crawled. The data collection proceeds by adding those followers/followees, retweet sources, and mentioned users who state Singapore in their profile location. Using this procedure, a total of 159,724 accounts have been collected. To identify bots, the first step is to check active accounts who tweeted at least 15 times within the month of April 2014. These accounts were then manually checked and labelled, of which 589 bots were found. As many more human users are expected in the Twitter population, the remaining accounts were randomly sampled and manually checked. With this, 1,024 human accounts were identified. In total, this results in 1,613 labelled accounts. Related Publication: R. J. Oentaryo, A. Murdopo, P. K. Prasetyo, and E.-P. Lim. (2016). On profiling bots in social media. Proceedings of the International Conference on Social Informatics (SocInfo’16), 92-109. Bellevue, WA. https://doi.org/10.1007/978-3-319-47880-7_6
s
Twitter Users Broken Down By Gender
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Twitter Users Broken Down By Gender [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The platform is male-dominated with 68.1% of all Twitter users being male. Just 31.9% of Twitter users are female.
Twitter users in Brazil 2019-2028
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Twitter users in Brazil 2019-2028 [Dataset]. https://www.statista.com/forecasts/1146589/twitter-users-in-brazil
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Brazil
Description
The number of Twitter users in Brazil was forecast to continuously increase between 2024 and 2028 by in total *** million users (+***** percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach ***** million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Z
Dataset for the Article "A Predictive Method to Improve the Effectiveness of...
data.niaid.nih.gov
Updated May 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Riccardo Martoglia (2021). Dataset for the Article "A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4782983
Explore at:
Dataset updated
May 24, 2021
Dataset provided by
Marco Furini
Riccardo Martoglia
Manuela Montangero
Federica Mandreoli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset for the article "A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario".

Abstract:

Museums are embracing social technologies in the attempt to broaden their audience and to engage people. Although social communication seems an easy task, media managers know how hard it is to reach millions of people with a simple message. Indeed, millions of posts are competing every day to get visibility in terms of likes and shares and very little research focused on museums communication to identify best practices. In this paper, we focus on Twitter and we propose a novel method that exploits interpretable machine learning techniques to: (a) predict whether a tweet will likely be appreciated by Twitter users or not; (b) present simple suggestions that will help enhancing the message and increasing the probability of its success. Using a real-world dataset of around 40,000 tweets written by 23 world famous museums, we show that our proposed method allows identifying tweet features that are more likely to influence the tweet success.

Code to run a selection of experiments is available at https://github.com/rmartoglia/predict-twitter-ch

Dataset structure

The dataset contains the dataset used in the experiments of the above research paper. Only the extracted features for the museum tweet threads (and not the message full text) are provided and needed for the analyses.

We selected 23 well known world spread art museums and grouped them into five groups: G1 (museums with at least three million of followers); G2 (museums with more than one million of followers); G3 (museums with more than 400,000 followers); G4 (museums with more that 200,000 followers); G5 (Italian museums). From these museums, we analyzed ca. 40,000 tweets, with a number varying from 5k ca. to 11k ca. for each museum group, depending on the number of museums in each group.

Content features: these are the features that can be drawn form the content of the tweet itself. We further divide such features in the following two categories:

– Countable: these features have a value ranging into different intervals. We take into consideration: the number of hashtags (i.e., words preceded by #) in the tweet, the number of URLs (i.e., links to external resources), the number of images (e.g., photos and graphical emoticons), the number of mentions (i.e., twitter accounts preceded by @), the length of the tweet;

– On-Off : these features have binary values in {0, 1}. We observe whether the tweet has exclamation marks, question marks, person names, place names, organization names, other names. Moreover, we also take into consideration the tweet topic density: assuming that the involved topics correspond to the hashtags mentioned in the text, we define a tweet as dense of topics if the number of hashtags it contains is greater than a given threshold, set to 5. Finally, we observe the tweet sentiment that might be present (positive or negative) or not (neutral).

Context features: these features are not drawn form the content of the tweet itself and might give a larger picture of the context in which the tweet was sent. Namely, we take into consideration the part of the day in which the tweet was sent (morning, afternoon, evening and night respectively from 5:00am to 11:59am, from 12:00pm to 5:59pm, from 6:00pm to 10:59pm and from 11pm to 4:59am), and a boolean feature indicating whether the tweet is a retweet or not.

User features: these features are proper of the user that sent the tweet, and are the same for all the tweets of this user. Namely we consider the name of the museum and the number of followers of the user.
f
S3 File -
plos.figshare.com
txt
Updated Sep 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Bernard Washington; Pradeep Gali; Furqan Rustam; Imran Ashraf (2023). S3 File - [Dataset]. http://doi.org/10.1371/journal.pone.0286541.s003
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286541.s003
Dataset updated
Sep 28, 2023
Dataset provided by
PLOS ONE
Authors
Patrick Bernard Washington; Pradeep Gali; Furqan Rustam; Imran Ashraf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
COVID-19 affected the world’s economy severely and increased the inflation rate in both developed and developing countries. COVID-19 also affected the financial markets and crypto markets significantly, however, some crypto markets flourished and touched their peak during the pandemic era. This study performs an analysis of the impact of COVID-19 on public opinion and sentiments regarding the financial markets and crypto markets. It conducts sentiment analysis on tweets related to financial markets and crypto markets posted during COVID-19 peak days. Using sentiment analysis, it investigates the people’s sentiments regarding investment in these markets during COVID-19. In addition, damage analysis in terms of market value is also carried out along with the worse time for financial and crypto markets. For analysis, the data is extracted from Twitter using the SNSscraper library. This study proposes a hybrid model called CNN-LSTM (convolutional neural network-long short-term memory model) for sentiment classification. CNN-LSTM outperforms with 0.89, and 0.92 F1 Scores for crypto and financial markets, respectively. Moreover, topic extraction from the tweets is also performed along with the sentiments related to each topic.
A Twitter Dataset for Spatial Infectious Disease Surveillance
zenodo.org
csv, txt, zip
Updated Jan 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roberto C.S.N.P. Souza; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Wagner Meira Jr.; Renato M. Assuncao; Walter dos Santos; Roberto C.S.N.P. Souza; Wagner Meira Jr.; Renato M. Assuncao; Walter dos Santos (2021). A Twitter Dataset for Spatial Infectious Disease Surveillance [Dataset]. http://doi.org/10.5281/zenodo.2541440
Explore at:
csv, txt, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2541440
Dataset updated
Jan 6, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Roberto C.S.N.P. Souza; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Wagner Meira Jr.; Renato M. Assuncao; Walter dos Santos; Roberto C.S.N.P. Souza; Wagner Meira Jr.; Renato M. Assuncao; Walter dos Santos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dengue is a mosquito-borne viral disease which infects millions of people every year, specially in developing countries. Some of the main challenges facing the disease are reporting risk indicators and rapidly detecting outbreaks. Traditional surveillance systems rely on passive reporting from health-care facilities, often ignoring human mobility and locating each individual by their home address. Yet, geolocated data are becoming commonplace in social media, which is widely used as means to discuss a large variety of health topics, including the users' health status. In this dataset paper, we make available two large collections of dengue related labeled Twitter data. One is a set of tweets available through the Streaming API using the keywords dengue and aedes from 2010 to 2016. The other is the set of all geolocated tweets in Brazil during the year of 2015 (available also through the Streaming API). We detail the process of collecting and labeling each tweet containing keywords related to dengue in one of 5 categories: personal experience, information, opinion, campaign, and joke. This dataset can be useful for the development of models for spatial disease surveillance, but also scenarios such as understanding health-related content in a language other than English, and studying human mobility.
Crime Tweets
kaggle.com
Updated Apr 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samantha Kumara (2023). Crime Tweets [Dataset]. https://www.kaggle.com/datasets/samanthakumara/crime-dataset-twitter
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 27, 2023
Dataset provided by
Kaggle
Authors
Samantha Kumara
Description
More Details:

C. Sandagiri, B. T. Kumara, and B. Kuhaneswaran, “Deep neural network-based crime prediction using Twitter data,” International Journal of Systems and Service-Oriented Engineering, vol. 11, no. 1, pp. 15–30, 2021.

S. P. C. W. Sandagiri, B. T. G. S. Kumara and B. Kuhaneswaran, "ANN Based Crime Detection and Prediction using Twitter Posts and Weather Data," 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), Sakheer, Bahrain, 2020, pp. 1-5, doi: 10.1109/ICDABI51230.2020.9325660.
T
sentiment140
tensorflow.org
Updated Dec 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). sentiment140 [Dataset]. https://www.tensorflow.org/datasets/catalog/sentiment140
Explore at:
Dataset updated
Dec 23, 2022
Description
Sentiment140 allows you to discover the sentiment of a brand, product, or topic on Twitter.

The data is a CSV with emoticons removed. Data file format has 6 fields:

the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)

the id of the tweet (2087)

the date of the tweet (Sat May 16 23:58:44 UTC 2009)

the query (lyx). If there is no query, then this value is NO_QUERY.

the user that tweeted (robotickilldozr)

the text of the tweet (Lyx is cool)

For more information, refer to the paper Twitter Sentiment Classification with Distant Supervision at https://cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('sentiment140', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
Z
Hate Speech and Bias against Asians, Blacks, Jews, Latines, and Muslims: A...
data.niaid.nih.gov
Updated Oct 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soemer, Katharina (2023). Hate Speech and Bias against Asians, Blacks, Jews, Latines, and Muslims: A Dataset for Machine Learning and Text Analytics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8147307
Explore at:
Dataset updated
Oct 26, 2023
Dataset provided by
Soemer, Katharina
Jikeli, Gunther
Karali, Sameer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Institute for the Study of Contemporary Antisemitism (ISCA) at Indiana University Dataset on bias against Asians, Blacks, Jews, Latines, and Muslims

The ISCA project compiled this dataset using an annotation portal, which was used to label tweets as either biased or non-biased, among other labels. Note that the annotation was done on live data, including images and context, such as threads. The original data comes from annotationportal.com. They include representative samples of live tweets from the years 2020 and 2021 with the keywords "Asians, Blacks, Jews, Latinos, and Muslims". A random sample of 600 tweets per year was drawn for each of the keywords. This includes retweets. Due to a sampling error, the sample for the year 2021 for the keyword "Jews" has only 453 tweets from 2021 and 147 from the first eight months of 2022 and it includes some tweets from the query with the keyword "Israel." The tweets were divided into six samples of 100 tweets, which were then annotated by three to seven students in the class "Researching White Supremacism and Antisemitism on Social Media" taught by Gunther Jikeli, Elisha S. Breton, and Seth Moller at Indiana University in the fall of 2022, see this report. Annotators used a scale from 1 to 5 (confident not biased, probably not biased, don't know, probably biased, confident biased). The definitions of bias against each minority group used for annotation are also included in the report. If a tweet called out or denounced bias against the minority in question, it was labeled as "calling out bias." The labels of whether a tweet is biased or calls out bias are based on a 75% majority vote. We considered "probably biased" and "confident biased" as biased and "confident not biased," "probably not biased," and "don't know" as not biased.
The types of stereotypes vary widely across the different categories of prejudice. While about a third of all biased tweets were classified as "hate" against the minority, the stereotypes in the tweets often matched common stereotypes about the minority. Asians were blamed for the Covid pandemic. Blacks were seen as inferior and associated with crime. Jews were seen as powerful and held collectively responsible for the actions of the State of Israel. Some tweets denied the Holocaust. Hispanics/Latines were portrayed as being in the country illegally and as "invaders," in addition to stereotypical accusations of being lazy, stupid, or having too many children. Muslims, on the other hand, were often collectively blamed for terrorism and violence, though often in conversations about Muslims in India.

Content:

This dataset contains 5880 tweets that cover a wide range of topics common in conversations about Asians, Blacks, Jews, Latines, and Muslims. 357 tweets (6.1 %) are labeled as biased and 5523 (93.9 %) are labeled as not biased. 1365 tweets (23.2 %) are labeled as calling out or denouncing bias. 1180 out of 5880 tweets (20.1 %) contain the keyword "Asians," 590 were posted in 2020 and 590 in 2021. 39 tweets (3.3 %) are biased against Asian people. 370 tweets (31,4 %) call out bias against Asians. 1160 out of 5880 tweets (19.7%) contain the keyword "Blacks," 578 were posted in 2020 and 582 in 2021. 101 tweets (8.7 %) are biased against Black people. 334 tweets (28.8 %) call out bias against Blacks. 1189 out of 5880 tweets (20.2 %) contain the keyword "Jews," 592 were posted in 2020, 451 in 2021, and ––as mentioned above––146 tweets from 2022. 83 tweets (7 %) are biased against Jewish people. 220 tweets (18.5 %) call out bias against Jews. 1169 out of 5880 tweets (19.9 %) contain the keyword "Latinos," 584 were posted in 2020 and 585 in 2021. 29 tweets (2.5 %) are biased against Latines. 181 tweets (15.5 %) call out bias against Latines. 1182 out of 5880 tweets (20.1 %) contain the keyword "Muslims," 593 were posted in 2020 and 589 in 2021. 105 tweets (8.9 %) are biased against Muslims. 260 tweets (22 %) call out bias against Muslims.

File Description:

The dataset is provided in a csv file format, with each row representing a single message, including replies, quotes, and retweets. The file contains the following columns:
'TweetID': Represents the tweet ID.
'Username': Represents the username who published the tweet (if it is a retweet, it will be the user who retweetet the original tweet.
'Text': Represents the full text of the tweet (not pre-processed). 'CreateDate': Represents the date the tweet was created.
'Biased': Represents the labeled by our annotators if the tweet is biased (1) or not (0). 'Calling_Out': Represents the label by our annotators if the tweet is calling out bias against minority groups (1) or not (0). 'Keyword': Represents the keyword that was used in the query. The keyword can be in the text, including mentioned names, or the username.

Licences

Data is published under the terms of the "Creative Commons Attribution 4.0 International" licence (https://creativecommons.org/licenses/by/4.0)

Acknowledgements

We are grateful for the technical collaboration with Indiana University's Observatory on Social Media (OSoMe). We thank all class participants for the annotations and contributions, including Kate Baba, Eleni Ballis, Garrett Banuelos, Savannah Benjamin, Luke Bianco, Zoe Bogan, Elisha S. Breton, Aidan Calderaro, Anaye Caldron, Olivia Cozzi, Daj Crisler, Jenna Eidson, Ella Fanning, Victoria Ford, Jess Gruettner, Ronan Hancock, Isabel Hawes, Brennan Hensler, Kyra Horton, Maxwell Idczak, Sanjana Iyer, Jacob Joffe, Katie Johnson, Allison Jones, Kassidy Keltner, Sophia Knoll, Jillian Kolesky, Emily Lowrey, Rachael Morara, Benjamin Nadolne, Rachel Neglia, Seungmin Oh, Kirsten Pecsenye, Sophia Perkovich, Joey Philpott, Katelin Ray, Kaleb Samuels, Chloe Sherman, Rachel Weber, Molly Winkeljohn, Ally Wolfgang, Rowan Wolke, Michael Wong, Jane Woods, Kaleb Woodworth, and Aurora Young. This work used Jetstream2 at Indiana University through allocation HUM200003 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Why Do People Use Twitter? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/

Why Do People Use Twitter?

Explore at:

21 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Apr 1, 2025

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

One of the biggest advantages of Twitter is the speed at which information can be passed around. People use Twitter primarily to get news and for entertainment. This is the breakdown of why people use Twitter today.

Clear search

Close search

Google apps

Main menu

Why Do People Use Twitter?

X/Twitter: Countries with the largest audience 2025

X/Twitter users in the United Kingdom 2019-2028

Twitter Key Statistics

Data from: TWIGMA: A dataset of AI-Generated Images with Metadata From...

Twitter users in the United States 2019-2028

Twitter Users Broken Down By Age

Twitter Users Broken down By Country

Data from: Twitter Big Data as A Resource For Exoskeleton Research: A...

X/Twitter: number of worldwide users 2019-2024

Social media profile growth, engagement rate, and reach

Twitter bot profiling

Twitter Users Broken Down By Gender

Twitter users in Brazil 2019-2028

Dataset for the Article "A Predictive Method to Improve the Effectiveness of...

S3 File -

A Twitter Dataset for Spatial Infectious Disease Surveillance

Crime Tweets

sentiment140

Hate Speech and Bias against Asians, Blacks, Jews, Latines, and Muslims: A...

Institute for the Study of Contemporary Antisemitism (ISCA) at Indiana University Dataset on bias against Asians, Blacks, Jews, Latines, and Muslims

Content:

File Description:

Licences

Acknowledgements

Why Do People Use Twitter?