100+ datasets found

Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July...
zenodo.org
live.european-language-grid.eu
+2more
bin, tsv, txt, zip
Updated May 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Gayo-Avello; Daniel Gayo-Avello (2020). Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets) [Dataset]. http://doi.org/10.5281/zenodo.3833782
Explore at:
bin, zip, txt, tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3833782
Dataset updated
May 20, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniel Gayo-Avello; Daniel Gayo-Avello
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Disclaimer: This dataset is distributed by Daniel Gayo-Avello, an associate professor at the Department of Computer Science in the University of Oviedo, for the sole purpose of non-commercial research and it just includes tweet ids.

The dataset contains tweet IDs for all the published tweets (in any language) bettween March 21, 2006 and July 31, 2009 thus comprising the first whole three years of Twitter from its creation, that is, about 1.5 billion tweets (see file Twitter-historical-20060321-20090731.zip).

It covers several defining issues in Twitter, such as the invention of hashtags, retweets and trending topics, and it includes tweets related to the 2008 US Presidential Elections, the first Obama’s inauguration speech or the 2009 Iran Election protests (one of the so-called Twitter Revolutions).

Finally, it does contain tweets in many major languages (mainly English, Portuguese, Japanese, Spanish, German and French) so it should be possible–at least in theory–to analyze international events from different cultural perspectives.

The dataset was completed in November 2016 and, therefore, the tweet IDs it contains were publicly available at that moment. This means that there could be tweets public during that period that do not appear in the dataset and also that a substantial part of tweets in the dataset has been deleted (or locked) since 2016.

To make easier to understand the decay of tweet IDs in the dataset a number of representative samples (99% confidence level and 0.5 confidence interval) are provided.

In general terms, 85.5% ±0.5 of the historical tweets are available as of May 19, 2020 (see file Twitter-historical-20060321-20090731-sample.txt). However, since the amount of tweets vary greatly throughout the period of three years covered in the dataset, additional representative samples are provided for 90-day intervals (see the file 90-day-samples.zip).

In that regard, the ratio of publicly available tweets (as of May 19, 2020) is as follows:

March 21, 2006 to June 18, 2006: 88.4% ±0.5 (from 5,512 tweets).

June 18, 2006 to September 16, 2006: 82.7% ±0.5 (from 14,820 tweets).

September 16, 2006 to December 15, 2006: 85.7% ±0.5 (from 107,975 tweets).

December 15, 2006 to March 15, 2007: 88.2% ±0.5 (from 852,463 tweets).

March 15, 2007 to June 13, 2007: 89.6% ±0.5 (from 6,341,665 tweets).

June 13, 2007 to September 11, 2007: 88.6% ±0.5 (from 11,171,090 tweets).

September 11, 2007 to December 10, 2007: 87.9% ±0.5 (from 15,545,532 tweets).

December 10, 2007 to March 9, 2008: 89.0% ±0.5 (from 23,164,663 tweets).

March 9, 2008 to June 7, 2008: 66.5% ±0.5 (from 56,416,772 tweets; see below for more details on this).

June 7, 2008 to September 5, 2008: 78.3% ±0.5 (from 62,868,189 tweets; see below for more details on this).

September 5, 2008 to December 4, 2008: 87.3% ±0.5 (from 89,947,498 tweets).

December 4, 2008 to March 4, 2009: 86.9% ±0.5 (from 169,762,425 tweets).

March 4, 2009 to June 2, 2009: 86.4% ±0.5 (from 474,581,170 tweets).

June 2, 2009 to July 31, 2009: 85.7% ±0.5 (from 589,116,341 tweets).

The apparent drop in available tweets from March 9, 2008 to September 5, 2008 has an easy, although embarrassing, explanation.

At the moment of cleaning all the data to publish this dataset there seemed to be a gap between April 1, 2008 to July 7, 2008 (actually, the data was not missing but in a different backup). Since tweet IDs are easy to regenerate for that Twitter era (source code is provided in generate-ids.m) I simply produced all those that were created between those two dates. All those tweets actually existed but a number of them were obviously private and not crawlable. For those regenerated IDs the actual ratio of public tweets (as of May 19, 2020) is 62.3% ±0.5.

In other words, what you see in that period (April to July, 2008) is not actually a huge number of tweets having been deleted but the combination of deleted *and* non-public tweets (whose IDs should not be in the dataset for performance purposes when rehydrating the dataset).

Additionally, given that not everybody will need the whole period of time the earliest tweet ID for each date is provided in the file date-tweet-id.tsv.

For additional details regarding this dataset please see: Gayo-Avello, Daniel. "How I Stopped Worrying about the Twitter Archive at the Library of Congress and Learned to Build a Little One for Myself." arXiv preprint arXiv:1611.08144 (2016).

If you use this dataset in any way please cite that preprint (in addition to the dataset itself).

If you need to contact me you can find me as @PFCdgayo in Twitter.
Data from: Annotated Dataset of History-related Tweets
zenodo.org
csv
Updated Sep 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasunobu Sumikawa; Adam Jatowt; Yasunobu Sumikawa; Adam Jatowt (2021). Annotated Dataset of History-related Tweets [Dataset]. http://doi.org/10.5281/zenodo.4657223
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4657223
Dataset updated
Sep 19, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yasunobu Sumikawa; Adam Jatowt; Yasunobu Sumikawa; Adam Jatowt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains tweet IDs and their 5 types of contextual information including 1) hashtags, 2) their categories, 3) entities obtained by NERD, 4) time-references normalized by Heideltime, and 5) Web categories for URLs attached with history-related hashtag that are related to history and that were collected for the purpose of analyzing how history-related content is disseminated in online social networks. Our IJDL paper shows the analysis results. The preliminary version of the analysis report is available here.

We used the Twitter official search API provided by Twitter to collect tweets. Note that three kinds of tweets are typically found in Twitter: tweets, retweets and quote tweets. Tweet is an original text issued as a post by a Twitter user. A retweet is a copy of an original tweet for the purpose of propagating the tweet content to more users (i.e., one's followers). Finally, a quote tweet copies the content of another tweet and allows also to add new content. A quote tweet is sometimes called a retweet with a comment. In this work, we simply treat all quote tweets as original tweets since they include additional information/text. There were however only 1,877 (0.2%) tweets recognized as quote tweets in our dataset.

To collect tweets that refer to the past or are related to collective memory of past events/entities, we performed hashtag based crawling together with bootstrapping procedure.
At the beginning, we gathered several historical hashtags selected by experts (e.g. #HistoryTeacher, #history, #WmnHist).
In addition, we prepared several hashtags that are commonly used when referring to the past: #onthisday, #thisdayinhistory, #throwbackthursday, #otd. We then collected tweets that contain these hashtags by using Twitter official search API.

The collected tweets were issued from 8 March 2016 to 2 July 2018.
Bootstrapping allowed us to search for other hashtags frequently used with the seed hashtags. The tweets tagged by such hashtags were then included into the seed set after the manual inspection of all the discovered hashtags as of their relation to the history, and filtering ones that are unrelated.
In total, we gathered 147 history-related hashtags which allowed us to collect 2,370,252 tweet IDs pointing to 882,977 tweets and 1,487,275 re-tweets.

Related papers:

Yasunobu Sumikawa, Adam Jatowt, and Marten During, "Digital History meets Microblogging: Analyzing Collective Memories in Twitter", In Proceedings of the 18th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL'18, IEEE/ACM, pp. 213 -- 222, 2018. [paper]

Yasunobu Sumikawa and Adam Jatowt, "Analyzing History-related Posts in Twitter", International Journal on Digital Libraries, Springer, 2020. https://doi.org/10.1007/s00799-020-00296-2 [paper][dataset]

Yasunobu Sumikawa and Adam Jatowt, "Annotated Dataset of History-related Tweets", Data in Brief, Vol. 38, pp. 107344, Elsevier, 2021. [paper]
d
American Historical Association 2017 Conference Tweets
search.dataone.org
borealisdata.ca
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Milligan, Ian (2023). American Historical Association 2017 Conference Tweets [Dataset]. http://doi.org/10.5683/SP/CFVF1F
Explore at:
Unique identifier
https://doi.org/10.5683/SP/CFVF1F
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Milligan, Ian
Description
A list of 10,538 Twitter IDs for tweets harvested between 4 January at 11am and 9 January at 11am using Social Feed Manager. As this used the search API, the 4 January at 11am crawl went back about 5-9 days. Tweet IDs included, as is a log of the decisions made to curate this dataset.
H
Replication Data for: This Was Twitter: Introducing the Twitter History and...
datasetcatalog.nlm.nih.gov
Updated May 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
STEINERT-THRELKELD, ZACHARY (2025). Replication Data for: This Was Twitter: Introducing the Twitter History and Image Sharing v1.0 Datasets [Dataset]. http://doi.org/10.7910/DVN/14WGUG
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/14WGUG
Dataset updated
May 1, 2025
Authors
STEINERT-THRELKELD, ZACHARY
Description
Paper DOI: 10.51685/jqd.2025.011 Paper abstract: This paper introduces the Twitter History and Image Sharing (THIS) datasets. These four related datasets enable the study of Twitter \emph{without the release of tweets or user information}. Both are derived from a corpus of 14.596 billion geolocated tweets streamed from September 1, 2013 through March 14, 2023. Two Twitter History datasets provide data on the number of tweets, tweets by language, and user data by country from September 1, 2013 through March 14, 2023. A third Twitter History dataset provides data on the number of new user registrations by country from March 21, 2006, the start of Twitter, through March 14, 2023. Image Sharing is based on the 1.676 billion images shared during this period and the 956.049 million still available for download in early 2024. It provides data on the number of images shared and still available from September 1, 2013 through March 14, 2023. The THIS datasets enable the study of Twitter itself and its differential use across countries, including in response to specific events, and the paper demonstrates applications to correlates of image sharing and removal, behavior around national executive elections, event detection, and digital repression. While this paper is not the first to study Twitter, it is, as far as we are aware, the first to provide datasets enabling other researchers to do the same.

Twitter Sentiment Analysis Datasets

brightdata.com

.json, .csv, .xlsx

Updated Jul 4, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Bright Data (2024). Twitter Sentiment Analysis Datasets [Dataset]. https://brightdata.com/products/datasets/twitter/sentiment-analysis

Explore at:

.json, .csv, .xlsxAvailable download formats

Dataset updated

Jul 4, 2024

Dataset authored and provided by

Bright Datahttps://brightdata.com/

License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered

Worldwide

Description

Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.

Key Features:

  Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
  Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
  Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
  Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
  Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
  Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.


Use Cases:

  Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
  Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
  Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
  AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
  Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.



  Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download. 
  Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.

s
Twitter Users Broken down By Country
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Twitter Users Broken down By Country [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The US has historically been the target country for Twitter since its launch in 2006. This is the full breakdown of Twitter users by country.
X/Twitter: U.S. users on taking a break from the platform 2023, by gender
statista.com
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2024). X/Twitter: U.S. users on taking a break from the platform 2023, by gender [Dataset]. https://www.statista.com/topics/737/twitter/
Explore at:
Dataset updated
Oct 16, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
According to a survey conducted in March 2023, 69 percent of female X/Twitter users had taken a break from the platform for several weeks or more in the past 12 months, compared to 54 percent of male users. Overall 46 percent of male users reported they had not taken a break from X/Twitter within the past year.
X/Twitter: number of worldwide users 2019-2024
statista.com
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). X/Twitter: number of worldwide users 2019-2024 [Dataset]. https://www.statista.com/statistics/303681/twitter-users-worldwide/
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2022
Area covered
Worldwide
Description
As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.
Z
Yahoo-Yahoo Hash-Tag Tweets Using Sentiment Analysis and Opinion Mining...
data.niaid.nih.gov
zenodo.org
Updated Aug 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Misra Sanjay (2022). Yahoo-Yahoo Hash-Tag Tweets Using Sentiment Analysis and Opinion Mining Algorithms [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4748716
Explore at:
Dataset updated
Aug 18, 2022
Dataset provided by
Abayomi-Alli Adebayo
Fernandez-Sanz Luis
Abayomi-Alli Olusola
Misra Sanjay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background

Social media opinion has become a medium to quickly access large, valuable, and rich details of information on any subject matter within a short period. Twitter being a social microblog site, generate over 330 million tweets monthly across different countries. Analysing trending topics on Twitter presents opportunities to extract meaningful insight into different opinions on various issues.

Aim

This study aims to gain insights into the trending yahoo-yahoo topic on Twitter using content analysis of selected historical tweets.

Methodology

The widgets and workflow engine in the Orange Data mining toolbox were employed for all the text mining tasks. 5500 tweets were collected from Twitter using the “yahoo yahoo” hashtag. The corpus was pre-processed using a pre-trained tweet tokenizer, Valence Aware Dictionary for Sentiment Reasoning (VADER) was used for the sentiment and opinion mining, Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) was used for topic modelling. In contrast, Multidimensional scaling (MDS) was used to visualize the modelled topics.

Results

Results showed that "yahoo" appeared in the corpus 9555 times, 175 unique tweets were returned after duplicate removal. Contrary to expectation, Spain had the highest number of participants tweeting on the 'yahoo yahoo' topic within the period. The result of Vader sentiment analysis returned 35.85%, 24.53%, 15.09%, and 24.53%, negative, neutral, no-zone, and positive sentiment tweets, respectively. The word yahoo was highly representative of the LDA topics 1, 3, 4, 6, and LSI topic 1.

Conclusion

It can be concluded that emojis are even more representative of the sentiments in tweets faster than the textual contents. Also, despite popular belief, a significant number of youths regard cybercrime as a detriment to society.
Tweets containing emojis 2013-2023
statista.com
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon (2024). Tweets containing emojis 2013-2023 [Dataset]. https://www.statista.com/topics/737/twitter/
Explore at:
Dataset updated
Oct 16, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
The share of posts on microblogging platform Twitter that contain emojis has increased significantly over the past ten years. In July 2013, 4.25 percent of tweets contained at least one emoji. Just under one decade later, in March 2023, 26.7 percent of tweets contained an emoji. The most common reason for using emojis, according to users in the United States, was to make conversations more fun.
f
Gaining Historical and International Relations Insights from Social Media:...
figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauricio Quezada; vpena@dcc.uchile.cl; bpoblete@dcc.uchile.cl; dparras@uc.cl (2023). Gaining Historical and International Relations Insights from Social Media: Spatio-Temporal Real-World News Analysis using Twitter. [Dataset]. http://doi.org/10.6084/m9.figshare.5092678.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5092678.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Mauricio Quezada; vpena@dcc.uchile.cl; bpoblete@dcc.uchile.cl; dparras@uc.cl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset consists of metadata related to 24,508 news events, collected from Twitter spanning from August 2013 to June 2015. The events encompasses a total of 193,445,734 tweets produced by 26,127,624 different users.The files contain different aspects of the data.- components.tsv consists of the description of the events (components) of our dataset, consisting of 4 columns separated by tabs. The columns correspond to the component ID, the date of an event, the amount of tweets and a set of keywords describing the event, separated by commas (having a minimum of 2).- componentlocation.tsv consists of the description of the locations where the events happened (“protagonist locations”). The columns correspond to an ID, the component ID, the names of the locations, the frequency (how many times that location was mentioned in the component), the country code, and six more non-relevant columns. Note that one component can be in several rows, one per location being mentioned for that component.- country_protagonized-events.csv consists of the amount of events that one specific country is a protagonist of. It contains two columns, separated by comma, being the first the country code and the second the amount of events (components) that country is a protagonist of.- country_tweets.csv consists of the amount of tweets that one specific country has issued along all the events. It contains two columns, separated by comma, being the first the country code and the second the amount of tweets that country has issued.- participation_data.txt contains a matrix indicating the amount of tweets per country, per event. It contains one row per component ID, and one column per country (plus one column for the component ID); the cell value is the amount of tweets that country has issued for that event.- similarities_no_reciproco_percentile.csv corresponds to the similarity between co-protagonist countries. The columns are in the following order: Country 1, the amount of events Country 1 is a protagonist of, Country 2, the amount of events Country 2 is a protagonist of, the Jaccard Similarity between the two countries (where the country is represented by the set of the component IDs that country is a protagonist of), and the percentile of that similarity value (ranging from 0 to 1).- users_events_distinct.txt corresponds to the amount of unique users participating in an event. The columns are separated by tabs. The first columns is the component ID, the second is the amount of different users for that event, and the third is the amount of of different news sources for that event.- countries.txt is the mapping between country code and country name, separated by space.
s
How Popular Is Twitter In The World?
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). How Popular Is Twitter In The World? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Twitter is ranked as the 12h most popular social media site in the world. The platform currently has 611 million active monthly users.
H
#metoo Digital Media Collection - Twitter Dataset
dataverse.harvard.edu
Updated Mar 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zachary Maiorana; Pablo Morales Henry; Jennifer Weintraub (2023). #metoo Digital Media Collection - Twitter Dataset [Dataset]. http://doi.org/10.7910/DVN/2SRSKJ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/2SRSKJ
Dataset updated
Mar 29, 2023
Dataset provided by
Harvard Dataverse
Authors
Zachary Maiorana; Pablo Morales Henry; Jennifer Weintraub
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset represents version 2 of this dataset. The previous version was published on June 30, 2020.This dataset contains the tweet ids of 39,373,774 tweets, which are part of the Schlesinger Library #metoo Digital Media Collection. This second version of the dataset represents the full set of tweets collected throughout the project, tweets range from October 15, 2017 to December 31, 2022. The previous version of this dataset extended to March 31, 2020. Tweets between October 15, 2017 and December 10, 2018 were licensed from Twitter's Historical PowerTrack and received through GNIP. Tweets after December 10, 2018 were collected weekly from the Twitter API through Social Feed Manager using the POST statuses/filter method of the Twitter Stream API.The following list of 76 terms includes the hashtags used to collect data for this dataset : #metoo, #timesup, #metoostem, #sciencetoo, #metoophd, #shittymediamen, #churchtoo, #ustoo, #metooMVMT, #ARmetoo, #TimesUpAR, #metooSociology, #metooSexScience, #timesupAcademia, #metooMedicine, #MyCampusToo, #howiwillchange, #iwill, #believewomen, #GoTeal, #BelieveChristine, #IStandWithDrFord, #IStandWithChristineBlaseyFord, #believesurvivors, #whyididntreport, #himtoo, #istandwithbrett, #confirmkavanaguhnow, #metooMcdonalds, #metoomovement, #muteRKelly, #WeBelieveDrFord, #WeBelieveSurvivors, #HandsOffPantsOn, #MeAt14, #HeToo, #MeTooLiars, #metoolynchings, #metoohucksters, #metoohustle, #ItWasMe, #Ihave, #TimesUpTech, #GoogleWalkout, #mosquemetoo, #faithandmetoo, #SilenceIsNotSpiritual, #HealMeToo, #TimesUpHarvard, #NoCarveOut, #TimesUpx2, #MeetingsToo, #metoonatsec, #healmetoo, #GamAni, #ShulToo, #harvardhearsyou, #metooarcheology, #TimesUpPayUp, #metooarcheology, #metooHBCU, #TimesUpHC, #aidtoo, #garmentmetoo, #mutemetoo, #mutetimesup, #metoopolisci, #copstoo, #TimesUpBiden, #MeTooNoMatterWho, #IBelieveTara, #BelieveAllWomen, #metoomilitary, #harvard38, #comaroff, and #harvardletter.The final four hashtags in this list were first crawled on February 10, 2022.Because of the size of the files, the list of identifiers are split in 41 files containing up to 1,000,000 ids each.Per Twitter's Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. Therefore, this dataset only contains tweet ids. In order to retrieve tweets still available (not deleted by users) tools like Hydrator are available.Subsets of only the #metoo seed are also available by quarterly datasets.
w
twitter-followers.net - Historical whois Lookup
whoisdatacenter.com
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AllHeart Web Inc, twitter-followers.net - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/twitter-followers.net/
Explore at:
csvAvailable download formats
Dataset authored and provided by
AllHeart Web Inc
License
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Time period covered
Mar 15, 1985 - Oct 3, 2025
Description
Explore the historical Whois records related to twitter-followers.net (Domain). Get insights into ownership history and changes over time.
s
How Popular Is Twitter In The US?
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). How Popular Is Twitter In The US? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The US has the largest number of Twitter users with over a 100 million users. They account for about 16.7% of all Twitter users worldwide.
Uruguayan Media Historical Tweets
kaggle.com
Updated Jan 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lea Dominguez (2022). Uruguayan Media Historical Tweets [Dataset]. https://www.kaggle.com/datasets/leadominguez/uruguayan-media-historical-tweets/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 31, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Lea Dominguez
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Uruguay
Description
Context

In the context of my thesis, we had to scrape Twitter's account of Uruguayan media to create a medias analysis platform and we dicided to publicate this dataset to help every person that may need it.

Content

The file contains tweets scrapped from Twitter from six different Uruguayan media (El País, Brecha, Búsqueda, El Observador, La República and La Diaria) since the creation of each account until approximately october 2021.
w
twitter.net.ag - Historical whois Lookup
whoisdatacenter.com
csv
Updated Aug 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AllHeart Web Inc (2024). twitter.net.ag - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/twitter.net.ag/
Explore at:
csvAvailable download formats
Dataset updated
Aug 1, 2024
Dataset authored and provided by
AllHeart Web Inc
License
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Time period covered
Mar 15, 1985 - Sep 22, 2025
Description
Explore the historical Whois records related to twitter.net.ag (Domain). Get insights into ownership history and changes over time.
X/Twitter: personal privacy actions H1 2024
statista.com
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2024). X/Twitter: personal privacy actions H1 2024 [Dataset]. https://www.statista.com/topics/737/twitter/
Explore at:
Dataset updated
Oct 16, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
During the first half of 2024 there were 34,497 pieces of content removed from X due to personal privacy violations, which include the publishing or sharing of other people's private information. These types of violations are also known as doxxing. Overall, 30,450 of these pieces of content were reported manually by users of the platform.
w
twitter-design.com - Historical whois Lookup
whoisdatacenter.com
csv
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AllHeart Web Inc, twitter-design.com - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/twitter-design.com/
Explore at:
csvAvailable download formats
Dataset authored and provided by
AllHeart Web Inc
License
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Time period covered
Mar 15, 1985 - Oct 27, 2025
Description
Explore the historical Whois records related to twitter-design.com (Domain). Get insights into ownership history and changes over time.
m
Brexit Tweets from the morning of it's announcement
data.mendeley.com
Updated Aug 10, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Parker (2017). Brexit Tweets from the morning of it's announcement [Dataset]. http://doi.org/10.17632/x9wkrghz23.2
Explore at:
Unique identifier
https://doi.org/10.17632/x9wkrghz23.2
Dataset updated
Aug 10, 2017
Authors
Christopher Parker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To further our understanding of public reaction to political movement via social media, this dataset provides 17998 unfiltered tweets taken on the morning that Brexit was announced. This dataset contains metadata such as geolocation as an independent variable, to allow for rigorous qualitative investigation to be used.

Additional tweets from trending topics were also taken at the same time provide context on trending themes at the time: - Scotland - Jeramy Corbyn - Nichola Sturgeon - David Cameron - EURefResults - Euromillions - Borris

Data was captured with NCapture from QSR.

Facebook

Twitter

Click to copy link

Link copied

Cite

Daniel Gayo-Avello; Daniel Gayo-Avello (2020). Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets) [Dataset]. http://doi.org/10.5281/zenodo.3833782

Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets)

Explore at:

bin, zip, txt, tsvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3833782

Dataset updated

May 20, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Daniel Gayo-Avello; Daniel Gayo-Avello

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Disclaimer: This dataset is distributed by Daniel Gayo-Avello, an associate professor at the Department of Computer Science in the University of Oviedo, for the sole purpose of non-commercial research and it just includes tweet ids.

The dataset contains tweet IDs for all the published tweets (in any language) bettween March 21, 2006 and July 31, 2009 thus comprising the first whole three years of Twitter from its creation, that is, about 1.5 billion tweets (see file Twitter-historical-20060321-20090731.zip).

It covers several defining issues in Twitter, such as the invention of hashtags, retweets and trending topics, and it includes tweets related to the 2008 US Presidential Elections, the first Obama’s inauguration speech or the 2009 Iran Election protests (one of the so-called Twitter Revolutions).

Finally, it does contain tweets in many major languages (mainly English, Portuguese, Japanese, Spanish, German and French) so it should be possible–at least in theory–to analyze international events from different cultural perspectives.

The dataset was completed in November 2016 and, therefore, the tweet IDs it contains were publicly available at that moment. This means that there could be tweets public during that period that do not appear in the dataset and also that a substantial part of tweets in the dataset has been deleted (or locked) since 2016.

To make easier to understand the decay of tweet IDs in the dataset a number of representative samples (99% confidence level and 0.5 confidence interval) are provided.

In general terms, 85.5% ±0.5 of the historical tweets are available as of May 19, 2020 (see file Twitter-historical-20060321-20090731-sample.txt). However, since the amount of tweets vary greatly throughout the period of three years covered in the dataset, additional representative samples are provided for 90-day intervals (see the file 90-day-samples.zip).

In that regard, the ratio of publicly available tweets (as of May 19, 2020) is as follows:

March 21, 2006 to June 18, 2006: 88.4% ±0.5 (from 5,512 tweets).
June 18, 2006 to September 16, 2006: 82.7% ±0.5 (from 14,820 tweets).
September 16, 2006 to December 15, 2006: 85.7% ±0.5 (from 107,975 tweets).
December 15, 2006 to March 15, 2007: 88.2% ±0.5 (from 852,463 tweets).
March 15, 2007 to June 13, 2007: 89.6% ±0.5 (from 6,341,665 tweets).
June 13, 2007 to September 11, 2007: 88.6% ±0.5 (from 11,171,090 tweets).
September 11, 2007 to December 10, 2007: 87.9% ±0.5 (from 15,545,532 tweets).
December 10, 2007 to March 9, 2008: 89.0% ±0.5 (from 23,164,663 tweets).
March 9, 2008 to June 7, 2008: 66.5% ±0.5 (from 56,416,772 tweets; see below for more details on this).
June 7, 2008 to September 5, 2008: 78.3% ±0.5 (from 62,868,189 tweets; see below for more details on this).
September 5, 2008 to December 4, 2008: 87.3% ±0.5 (from 89,947,498 tweets).
December 4, 2008 to March 4, 2009: 86.9% ±0.5 (from 169,762,425 tweets).
March 4, 2009 to June 2, 2009: 86.4% ±0.5 (from 474,581,170 tweets).
June 2, 2009 to July 31, 2009: 85.7% ±0.5 (from 589,116,341 tweets).

The apparent drop in available tweets from March 9, 2008 to September 5, 2008 has an easy, although embarrassing, explanation.

At the moment of cleaning all the data to publish this dataset there seemed to be a gap between April 1, 2008 to July 7, 2008 (actually, the data was not missing but in a different backup). Since tweet IDs are easy to regenerate for that Twitter era (source code is provided in generate-ids.m) I simply produced all those that were created between those two dates. All those tweets actually existed but a number of them were obviously private and not crawlable. For those regenerated IDs the actual ratio of public tweets (as of May 19, 2020) is 62.3% ±0.5.

In other words, what you see in that period (April to July, 2008) is not actually a huge number of tweets having been deleted but the combination of deleted *and* non-public tweets (whose IDs should not be in the dataset for performance purposes when rehydrating the dataset).

Additionally, given that not everybody will need the whole period of time the earliest tweet ID for each date is provided in the file date-tweet-id.tsv.

For additional details regarding this dataset please see: Gayo-Avello, Daniel. "How I Stopped Worrying about the Twitter Archive at the Library of Congress and Learned to Build a Little One for Myself." arXiv preprint arXiv:1611.08144 (2016).

If you use this dataset in any way please cite that preprint (in addition to the dataset itself).

If you need to contact me you can find me as @PFCdgayo in Twitter.

Clear search

Close search

Google apps

Main menu

Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July...

Data from: Annotated Dataset of History-related Tweets

American Historical Association 2017 Conference Tweets

Replication Data for: This Was Twitter: Introducing the Twitter History and...

Twitter Sentiment Analysis Datasets

Twitter Users Broken down By Country

X/Twitter: U.S. users on taking a break from the platform 2023, by gender

X/Twitter: number of worldwide users 2019-2024

Yahoo-Yahoo Hash-Tag Tweets Using Sentiment Analysis and Opinion Mining...

Tweets containing emojis 2013-2023

Gaining Historical and International Relations Insights from Social Media:...

How Popular Is Twitter In The World?

#metoo Digital Media Collection - Twitter Dataset

twitter-followers.net - Historical whois Lookup

How Popular Is Twitter In The US?

Uruguayan Media Historical Tweets

Context

Content

twitter.net.ag - Historical whois Lookup

X/Twitter: personal privacy actions H1 2024

twitter-design.com - Historical whois Lookup

Brexit Tweets from the morning of it's announcement

Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets)