100+ datasets found

s
Twitter Revenue Growth
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Twitter Revenue Growth [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Advertising makes up 89% of its total revenue and data licensing makes up about 11%.
Twitter Friends
kaggle.com
Updated Sep 2, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hubert Wassner (2016). Twitter Friends [Dataset]. https://www.kaggle.com/datasets/hwassner/TwitterFriends/discussion?sortBy=recent
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 2, 2016
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hubert Wassner
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Twitter Friends and hashtags

Context

This datasets is an extract of a wider database aimed at collecting Twitter user's friends (other accound one follows). The global goal is to study user's interest thru who they follow and connection to the hashtag they've used.

Content

It's a list of Twitter user's informations. In the JSON format one twitter user is stored in one object of this more that 40.000 objects list. Each object holds :

avatar : URL to the profile picture

followerCount : the number of followers of this user

friendsCount : the number of people following this user.

friendName : stores the @name (without the '@') of the user (beware this name can be changed by the user)

id : user ID, this number can not change (you can retrieve screen name with this service : https://tweeterid.com/)

friends : the list of IDs the user follows (data stored is IDs of users followed by this user)

lang : the language declared by the user (in this dataset there is only "en" (english))

lastSeen : the time stamp of the date when this user have post his last tweet.

tags : the hashtags (whith or without #) used by the user. It's the "trending topic" the user tweeted about.

tweetID : Id of the last tweet posted by this user.

You also have the CSV format which uses the same naming convention.

These users are selected because they tweeted on Twitter trending topics, I've selected users that have at least 100 followers and following at least 100 other account (in order to filter out spam and non-informative/empty accounts).

Acknowledgements

This data set is build by Hubert Wassner (me) using the Twitter public API. More data can be obtained on request (hubert.wassner AT gmail.com), at this time I've collected over 5 milions in different languages. Some more information can be found here (in french only) : http://wassner.blogspot.fr/2016/06/recuperer-des-profils-twitter-par.html

Past Research

No public research have been done (until now) on this dataset. I made a private application which is described here : http://wassner.blogspot.fr/2016/09/twitter-profiling.html (in French) which uses the full dataset (Millions of full profiles).

Inspiration

On can analyse a lot of stuff with this datasets :

stats about followers & followings

manyfold learning or unsupervised learning from friend list

hashtag prediction from friend list

Contact

Feel free to ask any question (or help request) via Twitter : @hwassner

Enjoy! ;)
X/Twitter: Countries with the largest audience 2025
statista.com
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). X/Twitter: Countries with the largest audience 2025 [Dataset]. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description
Social network X/Twitter is particularly popular in the United States, and as of February 2025, the microblogging service had an audience reach of 103.9 million users in the country. Japan and the India were ranked second and third with more than 70 million and 25 million users respectively. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.
Sentiment Analysis on Financial Tweets
kaggle.com
zip
Updated Sep 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivek Rathi (2019). Sentiment Analysis on Financial Tweets [Dataset]. https://www.kaggle.com/datasets/vivekrathi055/sentiment-analysis-on-financial-tweets
Explore at:
zip(2538259 bytes)Available download formats
Dataset updated
Sep 5, 2019
Authors
Vivek Rathi
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.

Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.

"I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.

I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."

Content

This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'

Acknowledgements

The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot

Inspiration

I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)
s
Twitter Users Broken down By Country
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Twitter Users Broken down By Country [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The US has historically been the target country for Twitter since its launch in 2006. This is the full breakdown of Twitter users by country.
d
Data from: Twitter Big Data as A Resource For Exoskeleton Research: A...
search.dataone.org
Updated Nov 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thakur, Nirmalya (2023). Twitter Big Data as A Resource For Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions [Dataset]. http://doi.org/10.7910/DVN/VPPTRF
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/VPPTRF
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Thakur, Nirmalya
Description
Please cite the following paper when using this dataset: N. Thakur, “Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions,” Preprints, 2022, DOI: 10.20944/preprints202206.0383.v1 Abstract The exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and use cases in assisted living, military, healthcare, firefighting, and industries. With the projected increase in the diverse uses of exoskeletons in the next few years in these application domains and beyond, it is crucial to study, interpret, and analyze user perspectives, public opinion, reviews, and feedback related to exoskeletons, for which a dataset is necessary. The Internet of Everything era of today's living, characterized by people spending more time on the Internet than ever before, holds the potential for developing such a dataset by mining relevant web behavior data from social media communications, which have increased exponentially in the last few years. Twitter, one such social media platform, is highly popular amongst all age groups, who communicate on diverse topics including but not limited to news, current events, politics, emerging technologies, family, relationships, and career opportunities, via tweets, while sharing their views, opinions, perspectives, and feedback towards the same. Therefore, this work presents a dataset of about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons. Instructions: This dataset contains about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons. The dataset contains only tweet identifiers (Tweet IDs) due to the terms and conditions of Twitter to re-distribute Twitter data only for research purposes. They need to be hydrated to be used. The process of retrieving a tweet's complete information (such as the text of the tweet, username, user ID, date and time, etc.) using its ID is known as the hydration of a tweet ID. The Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweets) or any similar application may be used for hydrating this dataset. Data Description This dataset consists of 7 .txt files. The following shows the number of Tweet IDs and the date range (of the associated tweets) in each of these files. Filename: Exoskeleton_TweetIDs_Set1.txt (Number of Tweet IDs – 22945, Date Range of Tweets - July 20, 2021 – May 21, 2022) Filename: Exoskeleton_TweetIDs_Set2.txt (Number of Tweet IDs – 19416, Date Range of Tweets - Dec 1, 2020 – July 19, 2021) Filename: Exoskeleton_TweetIDs_Set3.txt (Number of Tweet IDs – 16673, Date Range of Tweets - April 29, 2020 - Nov 30, 2020) Filename: Exoskeleton_TweetIDs_Set4.txt (Number of Tweet IDs – 16208, Date Range of Tweets - Oct 5, 2019 - Apr 28, 2020) Filename: Exoskeleton_TweetIDs_Set5.txt (Number of Tweet IDs – 17983, Date Range of Tweets - Feb 13, 2019 - Oct 4, 2019) Filename: Exoskeleton_TweetIDs_Set6.txt (Number of Tweet IDs – 34009, Date Range of Tweets - Nov 9, 2017 - Feb 12, 2019) Filename: Exoskeleton_TweetIDs_Set7.txt (Number of Tweet IDs – 11351, Date Range of Tweets - May 21, 2017 - Nov 8, 2017) Here, the last date for May is May 21 as it was the most recent date at the time of data collection. The dataset would be updated soon to incorporate more recent tweets.
E
Slovenian Twitter hate speech dataset IMSyPP-sl
live.european-language-grid.eu
binary format
Updated Feb 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Slovenian Twitter hate speech dataset IMSyPP-sl [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/8365
Explore at:
binary formatAvailable download formats
Dataset updated
Feb 16, 2021
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
A hand-labeled training (50,000 tweets labeled twice) and evaluation set (10,000 tweets labeled twice) for hate speech on Slovenian Twitter. The data files contain tweet IDs, hate speech type, hate speech target, and annotator ID. For obtaining the full text of the dataset, please contact the first author.

Hate speech type:
1. Appropriate - has no target
2. Inappropriate (contains terms that are obscene, vulgar; but the text is not directed at any person specifically) - has no target
3. Offensive (including offensive generalization, contempt, dehumanization, indirect offensive remarks)
4. Violent (author threatens, indulges, desires, or calls for physical violence against a target; it also includes calling for, denying, or glorifying war crimes and crimes against humanity)

Hate speech target:
1. Racism (intolerance based on nationality, ethnicity, language, towards foreigners; and based on race, skin color)
2. Migrants (intolerance of refugees or migrants, offensive generalization, call for their exclusion, restriction of rights, non-acceptance, denial of assistance…)
3. Islamophobia (intolerance towards Muslims)
4. Antisemitism (intolerance of Jews; also includes conspiracy theories, Holocaust denial or glorification, offensive stereotypes…)
5. Religion (other than above)
6. Homophobia (intolerance based on sexual orientation and / or identity, calls for restrictions on the rights of LGBTQ persons
7. Sexism (offensive gender-based generalization, misogynistic insults, unjustified gender discrimination)
8. Ideology (intolerance based on political affiliation, political belief, ideology… e.g. “communists”, “leftists”, “home defenders”, “socialists”, “activists for…”)
9. Media (journalists and media, also includes allegations of unprofessional reporting, false news, bias)
10. Politics (intolerance towards individual politicians, authorities, system, political parties)
11. Individual (intolerance toward any other individual due to individual characteristics; like commentator, neighbor, acquaintance )
12. Other (intolerance towards members of other groups due to belonging to this group; write in the blank column on the right which group it is)

Training dataset
The training set is sampled from data collected between December 2017 and February 2020. The sampling was intentionally biased to contain as much hate speech as possible. A simple model was used to flag potential hate speech content and additionally, filtering by users and by tweet length (number of characters) was applied. 50,000 tweets were selected for annotation.

Evaluation dataset
The evaluation set is sampled from data collected between February 2020 and August 2020. Contrary to the training set, the evaluation set is an unbiased random sample. Since the evaluation set is from a later period compared to the training set, the possibility of data linkage is minimized. Furthermore, the estimates of model performance made on the evaluation set are realistic, or even pessimistic, since the evaluation set is characterized by a new topic: Covid-19. 10,000 tweets were selected for the evaluation set.

Annotation results
Each tweet was annotated twice: In 90% of the cases by two different annotators and in 10% of the cases by the same annotator. Special attention was devoted to evening out the overlap between annotators to get agreement estimates on equally sized sets.
Ten annotators were engaged for our annotation campaign. They were given annotation guidelines, a training session, and a test on a small set to evaluate their understanding of the task and their commitment before starting the annotation procedure. Annotator agreement in terms of Krippendorff Alpha is around 0.6. Annotation agreement scores are detailed in the accompanying report files for each dataset separately.
The annotation process lasted four months, and it required about 1,200 person-hours for the ten annotators to complete the task.
s
Twitter bot profiling
researchdata.smu.edu.sg
smu.edu.sg
+1more
pdf
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Living Analytics Research Centre (2023). Twitter bot profiling [Dataset]. http://doi.org/10.25440/smu.12062706.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25440/smu.12062706.v1
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
Living Analytics Research Centre
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
This dataset comprises a set of Twitter accounts in Singapore that are used for social bot profiling research conducted by the Living Analytics Research Centre (LARC) at Singapore Management University (SMU). Here a bot is defined as a Twitter account that generates contents and/or interacts with other users automatically (at least according to human judgment). In this research, Twitter bots have been categorized into three major types:

Broadcast bot. This bot aims at disseminating information to general audience by providing, e.g., benign links to news, blogs or sites. Such bot is often managed by an organization or a group of people (e.g., bloggers). Consumption bot. The main purpose of this bot is to aggregate contents from various sources and/or provide update services (e.g., horoscope reading, weather update) for personal consumption or use. Spam bot. This type of bots posts malicious contents (e.g., to trick people by hijacking certain account or redirecting them to malicious sites), or promotes harmless but invalid/irrelevant contents aggressively.

This categorization is general enough to cater for new, emerging types of bot (e.g., chatbots can be viewed as a special type of broadcast bots). The dataset was collected from 1 January to 30 April 2014 via the Twitter REST and streaming APIs. Starting from popular seed users (i.e., users having many followers), their follow, retweet, and user mention links were crawled. The data collection proceeds by adding those followers/followees, retweet sources, and mentioned users who state Singapore in their profile location. Using this procedure, a total of 159,724 accounts have been collected. To identify bots, the first step is to check active accounts who tweeted at least 15 times within the month of April 2014. These accounts were then manually checked and labelled, of which 589 bots were found. As many more human users are expected in the Twitter population, the remaining accounts were randomly sampled and manually checked. With this, 1,024 human accounts were identified. In total, this results in 1,613 labelled accounts. Related Publication: R. J. Oentaryo, A. Murdopo, P. K. Prasetyo, and E.-P. Lim. (2016). On profiling bots in social media. Proceedings of the International Conference on Social Informatics (SocInfo’16), 92-109. Bellevue, WA. https://doi.org/10.1007/978-3-319-47880-7_6
A Twitter Dataset for Spatial Infectious Disease Surveillance
zenodo.org
data.niaid.nih.gov
csv, txt, zip
Updated Jan 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roberto C.S.N.P. Souza; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Wagner Meira Jr.; Renato M. Assuncao; Walter dos Santos; Roberto C.S.N.P. Souza; Wagner Meira Jr.; Renato M. Assuncao; Walter dos Santos (2021). A Twitter Dataset for Spatial Infectious Disease Surveillance [Dataset]. http://doi.org/10.5281/zenodo.2541440
Explore at:
csv, txt, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2541440
Dataset updated
Jan 6, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Roberto C.S.N.P. Souza; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Wagner Meira Jr.; Renato M. Assuncao; Walter dos Santos; Roberto C.S.N.P. Souza; Wagner Meira Jr.; Renato M. Assuncao; Walter dos Santos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dengue is a mosquito-borne viral disease which infects millions of people every year, specially in developing countries. Some of the main challenges facing the disease are reporting risk indicators and rapidly detecting outbreaks. Traditional surveillance systems rely on passive reporting from health-care facilities, often ignoring human mobility and locating each individual by their home address. Yet, geolocated data are becoming commonplace in social media, which is widely used as means to discuss a large variety of health topics, including the users' health status. In this dataset paper, we make available two large collections of dengue related labeled Twitter data. One is a set of tweets available through the Streaming API using the keywords dengue and aedes from 2010 to 2016. The other is the set of all geolocated tweets in Brazil during the year of 2015 (available also through the Streaming API). We detail the process of collecting and labeling each tweet containing keywords related to dengue in one of 5 categories: personal experience, information, opinion, campaign, and joke. This dataset can be useful for the development of models for spatial disease surveillance, but also scenarios such as understanding health-related content in a language other than English, and studying human mobility.
X/Twitter users in the United Kingdom 2019-2028
statista.com
Updated Jan 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). X/Twitter users in the United Kingdom 2019-2028 [Dataset]. https://www.statista.com/topics/11843/x-formerly-twitter-in-the-united-kingdom-uk/
Explore at:
Dataset updated
Jan 13, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United Kingdom
Description
The number of Twitter users in the United Kingdom was forecast to continuously increase between 2024 and 2028 by in total 0.9 million users (+5.1 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 18.55 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
S
Social media profile growth, engagement rate, and reach
data.sugarlandtx.gov
xlsx
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Communications and Community Engagement (2024). Social media profile growth, engagement rate, and reach [Dataset]. https://data.sugarlandtx.gov/dataset/social-media-profile-growth-engagement-rate-and-reach
Explore at:
xlsxAvailable download formats
Dataset updated
Jan 3, 2024
Dataset authored and provided by
Communications and Community Engagement
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Profile growth - the growth on our social platforms to see where and when we're gaining followers. Engagement rate - a ratio of how many people interacted with ours posts based on when users are usually online. Reach - the number of feeds our posts appeared in (doesn't mean people interacted with the post).
Twitter users in the United States 2019-2028
statista.com
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
Explore at:
Dataset updated
Jul 30, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United States
Description
The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.
f
A Twitter Dataset on Tweets about People who Got Lost due to Dementia
figshare.com
application/gzip
Updated Jan 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kelvin KF Tsoi; Nicholas B Chan; Felix CH Chan; Lingling Zhang; Annisa CH Lee; Helen ML Meng (2018). A Twitter Dataset on Tweets about People who Got Lost due to Dementia [Dataset]. http://doi.org/10.6084/m9.figshare.5788125.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5788125.v1
Dataset updated
Jan 16, 2018
Dataset provided by
figshare
Authors
Kelvin KF Tsoi; Nicholas B Chan; Felix CH Chan; Lingling Zhang; Annisa CH Lee; Helen ML Meng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset used and analyzed in the paper "How can we Better Use Twitter to find a Person who Got Lost due to Dementia?".A total of five tables are included. 1. raw_tweets.rds: All tweets that mentioned (i) "Dementia" or "Alzheimer"; and (ii) "Lost" or "Missing", which were crawled from Twitter from April to May 2017. 2. raw_userinfo.rds: The corresponding Twitter user info of Tweets.3. filtered_tweets.csv: Tweets that were included in the study. Details (age, gender, place, etc.) of the corresponding lost person mentioned in each tweet were appended in this table. 4. filtered_userinfo.csv: The corresponding Twitter user info of Tweets that were included in the study. Occupation (police / media / others) of each user were appended in this table. 5. cleansed_lostcases.csv: A cleansed table that shows several features of the lost cases.
s
Twitter Key Statistics
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Twitter Key Statistics [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are the key Twitter user statistics that you need to know.
s
Why Do People Use Twitter?
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Why Do People Use Twitter? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
One of the biggest advantages of Twitter is the speed at which information can be passed around. People use Twitter primarily to get news and for entertainment. This is the breakdown of why people use Twitter today.
The Twitter Parliamentarian Database
figshare.com
txt
Updated Oct 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Livia van Vliet (2023). The Twitter Parliamentarian Database [Dataset]. http://doi.org/10.6084/m9.figshare.10120685.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.10120685.v3
Dataset updated
Oct 27, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Livia van Vliet
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is the Twitter Parliamentarian Database: a database consisting of parliamentarian names, parties and twitter ids from the following countries: Austria, Belgium, France, Denmark, Spain, Finland, Germany, Greece, Italy, Malta, Poland, Netherlands, United Kingdom, Ireland, Sweden, New Zealand, Turkey, United States, Canada, Australia, Iceland, Norway, Switzerland, Luxembourg, Latvia and Slovenia. In addition, the database includes the European Parliament.The tweet ids from the politicans' tweets have been collected from September 2017 - 31 October 2019 (all_tweet_ids.csv). In compliance with Twitter's policy, we only store tweet ids, which can be re-hydrated into full tweets using existing tools. More information on how to use the database can be found in the readme.txt.It is recommended that you use the .csv files to work with the data, rather than the SQL tables. Information on the relations in the SQL database can be found in the Database codebook.pdf.Update:The tweet ids for 2021 have been added as '2021.csv'Update #2:The tweet ids for 2020 have been added as '2020.csv'The last party table has been added as 'parties_2021_04_28.csv'The last members table has been added as 'members_2021_04_28.csv'
Twitter Dataset Based on Depressive Words
kaggle.com
zip
Updated May 20, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saurabh Shahane (2021). Twitter Dataset Based on Depressive Words [Dataset]. https://www.kaggle.com/saurabhshahane/twitter-dataset-based-on-depressive-words
Explore at:
zip(119409375 bytes)Available download formats
Dataset updated
May 20, 2021
Authors
Saurabh Shahane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

Right now we see that depression is one of the most common problems in our society. Most of the time people are committed suicide only cause of depression. And till now there is no proper lab test way for detecting depression. Generally, doctors are detecting depression by asking some knowledge-base questions. On the other hand, there are a good number of people using social media platforms right now, where they are sharing their daily experiences, emotion, and other activity with their friends. Twitter is one of the common social platforms and also popular for data collection. I was collecting these datasets from twitter based on some depressive words. I hope that this twitter datasets will help researchers to detect depression more precisely.

Content

Raw data from twitter

Acknowledgements

Chowdhury, Sawrav (2020), “Raw Twitter Datasets Based on Depressive Words”, Mendeley Data, V1, doi: 10.17632/4rd637tddf.1
d
Data from: Supersharers of fake news on Twitter
dataone.org
datadryad.org
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sahar Baribi-Bartov; Briony Swire-Thompson; Nir Grinberg (2025). Supersharers of fake news on Twitter [Dataset]. http://doi.org/10.5061/dryad.44j0zpcmq
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.44j0zpcmq
Dataset updated
Jul 31, 2025
Dataset provided by
Dryad Digital Repository
Authors
Sahar Baribi-Bartov; Briony Swire-Thompson; Nir Grinberg
Time period covered
Jan 1, 2024
Description
Governments may have the capacity to flood social media with fake news, but little is known about the use of flooding by ordinary voters. In this work, we identify 2107 registered US voters that account for 80% of the fake news shared on Twitter during the 2020 US presidential election by an entire panel of 664,391 voters. We find that supersharers are important members of the network, reaching a sizable 5.2% of registered voters on the platform. Supersharers have a significant overrepresentation of women, older adults, and registered Republicans. Supersharers' massive volume does not seem automated but is rather generated through manual and persistent retweeting. These findings highlight a vulnerability of social media for democracy, where a small group of people distort the political reality for many., This dataset contains aggregated information necessary to replicate the results reported in our work on Supersharers of Fake News on Twitter while respecting and preserving the privacy expectations of individuals included in the analysis. No individual-level data is provided as part of this dataset.Â The data collection process that enabled the creation of this dataset leveraged a large-scale panel of registered U.S. voters matched to Twitter accounts. We examined the activity of 664,391 panel members who were active on Twitter during the months of the 2020 U.S. presidential election (August to November 2020, inclusive), and identified a subset of 2,107 supersharers, which are the most prolific sharers of fake news in the panel that together account for 80% of fake news content shared on the platform. We rely on a source-level definition of fake news, that uses the manually-labeled list of fake news sites by Grinberg et al. 2019 and an updated list based on NewsGuard ratings (commercial..., , # Supersharers of Fake News on Twitter

This repository contains data and code for replication of the results presented in the paper.

The folders are mostly organized by research questions as detailed below. Each folder contains the code and publicly available data necessary for the replication of results. Importantly, no individual-level data is provided as part of this repository. De-identified individual-level data can be attained for IRB-approved uses under the terms and conditions specified in the paper. Once access is granted, the restricted-access data is expected to be located under ./restricted_data.

The folders in this repository are the following:

Preprocessing

Code under the preprocessing folder contains the following:

source classifier - the code used to train a classifier based on NewsGuard domain flags to match the fake news labels source definition use in Grinberg et el. 2019 labels.

political classifier - the code used to identify political tweets, i...
f
A few samples from the dataset.
figshare.com
xls
Updated Sep 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Bernard Washington; Pradeep Gali; Furqan Rustam; Imran Ashraf (2023). A few samples from the dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0286541.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286541.t003
Dataset updated
Sep 28, 2023
Dataset provided by
PLOS ONE
Authors
Patrick Bernard Washington; Pradeep Gali; Furqan Rustam; Imran Ashraf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
COVID-19 affected the world’s economy severely and increased the inflation rate in both developed and developing countries. COVID-19 also affected the financial markets and crypto markets significantly, however, some crypto markets flourished and touched their peak during the pandemic era. This study performs an analysis of the impact of COVID-19 on public opinion and sentiments regarding the financial markets and crypto markets. It conducts sentiment analysis on tweets related to financial markets and crypto markets posted during COVID-19 peak days. Using sentiment analysis, it investigates the people’s sentiments regarding investment in these markets during COVID-19. In addition, damage analysis in terms of market value is also carried out along with the worse time for financial and crypto markets. For analysis, the data is extracted from Twitter using the SNSscraper library. This study proposes a hybrid model called CNN-LSTM (convolutional neural network-long short-term memory model) for sentiment classification. CNN-LSTM outperforms with 0.89, and 0.92 F1 Scores for crypto and financial markets, respectively. Moreover, topic extraction from the tweets is also performed along with the sentiments related to each topic.
Hate Speech and Bias against Asians, Blacks, Jews, Latines, and Muslims: A...
zenodo.org
csv
Updated Oct 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gunther Jikeli; Gunther Jikeli; Sameer Karali; Sameer Karali; Katharina Soemer; Katharina Soemer (2023). Hate Speech and Bias against Asians, Blacks, Jews, Latines, and Muslims: A Dataset for Machine Learning and Text Analytics [Dataset]. http://doi.org/10.5281/zenodo.8147308
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8147308
Dataset updated
Oct 26, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gunther Jikeli; Gunther Jikeli; Sameer Karali; Sameer Karali; Katharina Soemer; Katharina Soemer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
### Institute for the Study of Contemporary Antisemitism (ISCA) at Indiana University Dataset on bias against Asians, Blacks, Jews, Latines, and Muslims

The ISCA project compiled this dataset using an annotation portal, which was used to label tweets as either biased or non-biased, among other labels. Note that the annotation was done on live data, including images and context, such as threads. The original data comes from annotationportal.com. They include representative samples of live tweets from the years 2020 and 2021 with the keywords "Asians, Blacks, Jews, Latinos, and Muslims".
A random sample of 600 tweets per year was drawn for each of the keywords. This includes retweets. Due to a sampling error, the sample for the year 2021 for the keyword "Jews" has only 453 tweets from 2021 and 147 from the first eight months of 2022 and it includes some tweets from the query with the keyword "Israel." The tweets were divided into six samples of 100 tweets, which were then annotated by three to seven students in the class "Researching White Supremacism and Antisemitism on Social Media" taught by Gunther Jikeli, Elisha S. Breton, and Seth Moller at Indiana University in the fall of 2022, see this report. Annotators used a scale from 1 to 5 (confident not biased, probably not biased, don't know, probably biased, confident biased). The definitions of bias against each minority group used for annotation are also included in the report.
If a tweet called out or denounced bias against the minority in question, it was labeled as "calling out bias."
The labels of whether a tweet is biased or calls out bias are based on a 75% majority vote. We considered "probably biased" and "confident biased" as biased and "confident not biased," "probably not biased," and "don't know" as not biased.
The types of stereotypes vary widely across the different categories of prejudice. While about a third of all biased tweets were classified as "hate" against the minority, the stereotypes in the tweets often matched common stereotypes about the minority. Asians were blamed for the Covid pandemic. Blacks were seen as inferior and associated with crime. Jews were seen as powerful and held collectively responsible for the actions of the State of Israel. Some tweets denied the Holocaust. Hispanics/Latines were portrayed as being in the country illegally and as "invaders," in addition to stereotypical accusations of being lazy, stupid, or having too many children. Muslims, on the other hand, were often collectively blamed for terrorism and violence, though often in conversations about Muslims in India.

# Content:

This dataset contains 5880 tweets that cover a wide range of topics common in conversations about Asians, Blacks, Jews, Latines, and Muslims. 357 tweets (6.1 %) are labeled as biased and 5523 (93.9 %) are labeled as not biased. 1365 tweets (23.2 %) are labeled as calling out or denouncing bias.
1180 out of 5880 tweets (20.1 %) contain the keyword "Asians," 590 were posted in 2020 and 590 in 2021. 39 tweets (3.3 %) are biased against Asian people. 370 tweets (31,4 %) call out bias against Asians.
1160 out of 5880 tweets (19.7%) contain the keyword "Blacks," 578 were posted in 2020 and 582 in 2021. 101 tweets (8.7 %) are biased against Black people. 334 tweets (28.8 %) call out bias against Blacks.
1189 out of 5880 tweets (20.2 %) contain the keyword "Jews," 592 were posted in 2020, 451 in 2021, and ––as mentioned above––146 tweets from 2022. 83 tweets (7 %) are biased against Jewish people. 220 tweets (18.5 %) call out bias against Jews.
1169 out of 5880 tweets (19.9 %) contain the keyword "Latinos," 584 were posted in 2020 and 585 in 2021. 29 tweets (2.5 %) are biased against Latines. 181 tweets (15.5 %) call out bias against Latines.
1182 out of 5880 tweets (20.1 %) contain the keyword "Muslims," 593 were posted in 2020 and 589 in 2021. 105 tweets (8.9 %) are biased against Muslims. 260 tweets (22 %) call out bias against Muslims.

# File Description:
The dataset is provided in a csv file format, with each row representing a single message, including replies, quotes, and retweets. The file contains the following columns:

'TweetID': Represents the tweet ID.
'Username': Represents the username who published the tweet (if it is a retweet, it will be the user who retweetet the original tweet.
'Text': Represents the full text of the tweet (not pre-processed).
'CreateDate': Represents the date the tweet was created.
'Biased': Represents the labeled by our annotators if the tweet is biased (1) or not (0).
'Calling_Out': Represents the label by our annotators if the tweet is calling out bias against minority groups (1) or not (0).
'Keyword': Represents the keyword that was used in the query. The keyword can be in the text, including mentioned names, or the username.

# Licences
Data is published under the terms of the "Creative Commons Attribution 4.0 International" licence (https://creativecommons.org/licenses/by/4.0)
# Acknowledgements
We are grateful for the technical collaboration with Indiana University's Observatory on Social Media (OSoMe). We thank all class participants for the annotations and contributions, including Kate Baba, Eleni Ballis, Garrett Banuelos, Savannah Benjamin, Luke Bianco, Zoe Bogan, Elisha S. Breton, Aidan Calderaro, Anaye Caldron, Olivia Cozzi, Daj Crisler, Jenna Eidson, Ella Fanning, Victoria Ford, Jess Gruettner, Ronan Hancock, Isabel Hawes, Brennan Hensler, Kyra Horton, Maxwell Idczak, Sanjana Iyer, Jacob Joffe, Katie Johnson, Allison Jones, Kassidy Keltner, Sophia Knoll, Jillian Kolesky, Emily Lowrey, Rachael Morara, Benjamin Nadolne, Rachel Neglia, Seungmin Oh, Kirsten Pecsenye, Sophia Perkovich, Joey Philpott, Katelin Ray, Kaleb Samuels, Chloe Sherman, Rachel Weber, Molly Winkeljohn, Ally Wolfgang, Rowan Wolke, Michael Wong, Jane Woods, Kaleb Woodworth, and Aurora Young.
This work used Jetstream2 at Indiana University through allocation HUM200003 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Twitter Revenue Growth [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/

Twitter Revenue Growth

Explore at:

15 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Apr 1, 2025

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Advertising makes up 89% of its total revenue and data licensing makes up about 11%.

Clear search

Close search

Google apps

Main menu

Twitter Revenue Growth

Twitter Friends

Twitter Friends and hashtags

Context

Content

Acknowledgements

Past Research

Inspiration

Contact

X/Twitter: Countries with the largest audience 2025

Sentiment Analysis on Financial Tweets

Context

Content

Acknowledgements

Inspiration

Twitter Users Broken down By Country

Data from: Twitter Big Data as A Resource For Exoskeleton Research: A...

Slovenian Twitter hate speech dataset IMSyPP-sl

Twitter bot profiling

A Twitter Dataset for Spatial Infectious Disease Surveillance

X/Twitter users in the United Kingdom 2019-2028

Social media profile growth, engagement rate, and reach

Twitter users in the United States 2019-2028

A Twitter Dataset on Tweets about People who Got Lost due to Dementia

Twitter Key Statistics

Why Do People Use Twitter?

The Twitter Parliamentarian Database

Twitter Dataset Based on Depressive Words

Context

Content

Acknowledgements

Data from: Supersharers of fake news on Twitter

Preprocessing

A few samples from the dataset.

Hate Speech and Bias against Asians, Blacks, Jews, Latines, and Muslims: A...

Twitter Revenue Growth