100+ datasets found
  1. Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July...

    • zenodo.org
    • live.european-language-grid.eu
    • +2more
    bin, tsv, txt, zip
    Updated May 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Gayo-Avello; Daniel Gayo-Avello (2020). Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets) [Dataset]. http://doi.org/10.5281/zenodo.3833782
    Explore at:
    bin, zip, txt, tsvAvailable download formats
    Dataset updated
    May 20, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniel Gayo-Avello; Daniel Gayo-Avello
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Disclaimer: This dataset is distributed by Daniel Gayo-Avello, an associate professor at the Department of Computer Science in the University of Oviedo, for the sole purpose of non-commercial research and it just includes tweet ids.

    The dataset contains tweet IDs for all the published tweets (in any language) bettween March 21, 2006 and July 31, 2009 thus comprising the first whole three years of Twitter from its creation, that is, about 1.5 billion tweets (see file Twitter-historical-20060321-20090731.zip).

    It covers several defining issues in Twitter, such as the invention of hashtags, retweets and trending topics, and it includes tweets related to the 2008 US Presidential Elections, the first Obama’s inauguration speech or the 2009 Iran Election protests (one of the so-called Twitter Revolutions).

    Finally, it does contain tweets in many major languages (mainly English, Portuguese, Japanese, Spanish, German and French) so it should be possible–at least in theory–to analyze international events from different cultural perspectives.

    The dataset was completed in November 2016 and, therefore, the tweet IDs it contains were publicly available at that moment. This means that there could be tweets public during that period that do not appear in the dataset and also that a substantial part of tweets in the dataset has been deleted (or locked) since 2016.

    To make easier to understand the decay of tweet IDs in the dataset a number of representative samples (99% confidence level and 0.5 confidence interval) are provided.

    In general terms, 85.5% ±0.5 of the historical tweets are available as of May 19, 2020 (see file Twitter-historical-20060321-20090731-sample.txt). However, since the amount of tweets vary greatly throughout the period of three years covered in the dataset, additional representative samples are provided for 90-day intervals (see the file 90-day-samples.zip).

    In that regard, the ratio of publicly available tweets (as of May 19, 2020) is as follows:

    • March 21, 2006 to June 18, 2006: 88.4% ±0.5 (from 5,512 tweets).
    • June 18, 2006 to September 16, 2006: 82.7% ±0.5 (from 14,820 tweets).
    • September 16, 2006 to December 15, 2006: 85.7% ±0.5 (from 107,975 tweets).
    • December 15, 2006 to March 15, 2007: 88.2% ±0.5 (from 852,463 tweets).
    • March 15, 2007 to June 13, 2007: 89.6% ±0.5 (from 6,341,665 tweets).
    • June 13, 2007 to September 11, 2007: 88.6% ±0.5 (from 11,171,090 tweets).
    • September 11, 2007 to December 10, 2007: 87.9% ±0.5 (from 15,545,532 tweets).
    • December 10, 2007 to March 9, 2008: 89.0% ±0.5 (from 23,164,663 tweets).
    • March 9, 2008 to June 7, 2008: 66.5% ±0.5 (from 56,416,772 tweets; see below for more details on this).
    • June 7, 2008 to September 5, 2008: 78.3% ±0.5 (from 62,868,189 tweets; see below for more details on this).
    • September 5, 2008 to December 4, 2008: 87.3% ±0.5 (from 89,947,498 tweets).
    • December 4, 2008 to March 4, 2009: 86.9% ±0.5 (from 169,762,425 tweets).
    • March 4, 2009 to June 2, 2009: 86.4% ±0.5 (from 474,581,170 tweets).
    • June 2, 2009 to July 31, 2009: 85.7% ±0.5 (from 589,116,341 tweets).

    The apparent drop in available tweets from March 9, 2008 to September 5, 2008 has an easy, although embarrassing, explanation.

    At the moment of cleaning all the data to publish this dataset there seemed to be a gap between April 1, 2008 to July 7, 2008 (actually, the data was not missing but in a different backup). Since tweet IDs are easy to regenerate for that Twitter era (source code is provided in generate-ids.m) I simply produced all those that were created between those two dates. All those tweets actually existed but a number of them were obviously private and not crawlable. For those regenerated IDs the actual ratio of public tweets (as of May 19, 2020) is 62.3% ±0.5.

    In other words, what you see in that period (April to July, 2008) is not actually a huge number of tweets having been deleted but the combination of deleted *and* non-public tweets (whose IDs should not be in the dataset for performance purposes when rehydrating the dataset).

    Additionally, given that not everybody will need the whole period of time the earliest tweet ID for each date is provided in the file date-tweet-id.tsv.

    For additional details regarding this dataset please see: Gayo-Avello, Daniel. "How I Stopped Worrying about the Twitter Archive at the Library of Congress and Learned to Build a Little One for Myself." arXiv preprint arXiv:1611.08144 (2016).

    If you use this dataset in any way please cite that preprint (in addition to the dataset itself).

    If you need to contact me you can find me as @PFCdgayo in Twitter.

  2. Data from: Annotated Dataset of History-related Tweets

    • zenodo.org
    csv
    Updated Sep 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasunobu Sumikawa; Adam Jatowt; Yasunobu Sumikawa; Adam Jatowt (2021). Annotated Dataset of History-related Tweets [Dataset]. http://doi.org/10.5281/zenodo.4657223
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 19, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yasunobu Sumikawa; Adam Jatowt; Yasunobu Sumikawa; Adam Jatowt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains tweet IDs and their 5 types of contextual information including 1) hashtags, 2) their categories, 3) entities obtained by NERD, 4) time-references normalized by Heideltime, and 5) Web categories for URLs attached with history-related hashtag that are related to history and that were collected for the purpose of analyzing how history-related content is disseminated in online social networks. Our IJDL paper shows the analysis results. The preliminary version of the analysis report is available here.

    We used the Twitter official search API provided by Twitter to collect tweets. Note that three kinds of tweets are typically found in Twitter: tweets, retweets and quote tweets. Tweet is an original text issued as a post by a Twitter user. A retweet is a copy of an original tweet for the purpose of propagating the tweet content to more users (i.e., one's followers). Finally, a quote tweet copies the content of another tweet and allows also to add new content. A quote tweet is sometimes called a retweet with a comment. In this work, we simply treat all quote tweets as original tweets since they include additional information/text. There were however only 1,877 (0.2%) tweets recognized as quote tweets in our dataset.

    To collect tweets that refer to the past or are related to collective memory of past events/entities, we performed hashtag based crawling together with bootstrapping procedure.
    At the beginning, we gathered several historical hashtags selected by experts (e.g. #HistoryTeacher, #history, #WmnHist).
    In addition, we prepared several hashtags that are commonly used when referring to the past: #onthisday, #thisdayinhistory, #throwbackthursday, #otd. We then collected tweets that contain these hashtags by using Twitter official search API.

    The collected tweets were issued from 8 March 2016 to 2 July 2018.
    Bootstrapping allowed us to search for other hashtags frequently used with the seed hashtags. The tweets tagged by such hashtags were then included into the seed set after the manual inspection of all the discovered hashtags as of their relation to the history, and filtering ones that are unrelated.
    In total, we gathered 147 history-related hashtags which allowed us to collect 2,370,252 tweet IDs pointing to 882,977 tweets and 1,487,275 re-tweets.

    Related papers:

    1. Yasunobu Sumikawa, Adam Jatowt, and Marten During, "Digital History meets Microblogging: Analyzing Collective Memories in Twitter", In Proceedings of the 18th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL'18, IEEE/ACM, pp. 213 -- 222, 2018. [paper]
    2. Yasunobu Sumikawa and Adam Jatowt, "Analyzing History-related Posts in Twitter", International Journal on Digital Libraries, Springer, 2020. https://doi.org/10.1007/s00799-020-00296-2 [paper][dataset]
    3. Yasunobu Sumikawa and Adam Jatowt, "Annotated Dataset of History-related Tweets", Data in Brief, Vol. 38, pp. 107344, Elsevier, 2021. [paper]
  3. d

    American Historical Association 2017 Conference Tweets

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Milligan, Ian (2023). American Historical Association 2017 Conference Tweets [Dataset]. http://doi.org/10.5683/SP/CFVF1F
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Milligan, Ian
    Description

    A list of 10,538 Twitter IDs for tweets harvested between 4 January at 11am and 9 January at 11am using Social Feed Manager. As this used the search API, the 4 January at 11am crawl went back about 5-9 days. Tweet IDs included, as is a log of the decisions made to curate this dataset.

  4. H

    Replication Data for: This Was Twitter: Introducing the Twitter History and...

    • datasetcatalog.nlm.nih.gov
    Updated May 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    STEINERT-THRELKELD, ZACHARY (2025). Replication Data for: This Was Twitter: Introducing the Twitter History and Image Sharing v1.0 Datasets [Dataset]. http://doi.org/10.7910/DVN/14WGUG
    Explore at:
    Dataset updated
    May 1, 2025
    Authors
    STEINERT-THRELKELD, ZACHARY
    Description

    Paper DOI: 10.51685/jqd.2025.011 Paper abstract: This paper introduces the Twitter History and Image Sharing (THIS) datasets. These four related datasets enable the study of Twitter \emph{without the release of tweets or user information}. Both are derived from a corpus of 14.596 billion geolocated tweets streamed from September 1, 2013 through March 14, 2023. Two Twitter History datasets provide data on the number of tweets, tweets by language, and user data by country from September 1, 2013 through March 14, 2023. A third Twitter History dataset provides data on the number of new user registrations by country from March 21, 2006, the start of Twitter, through March 14, 2023. Image Sharing is based on the 1.676 billion images shared during this period and the 956.049 million still available for download in early 2024. It provides data on the number of images shared and still available from September 1, 2013 through March 14, 2023. The THIS datasets enable the study of Twitter itself and its differential use across countries, including in response to specific events, and the paper demonstrates applications to correlates of image sharing and removal, behavior around national executive elections, event detection, and digital repression. While this paper is not the first to study Twitter, it is, as far as we are aware, the first to provide datasets enabling other researchers to do the same.

  5. Twitter Sentiment Analysis Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Jul 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Twitter Sentiment Analysis Datasets [Dataset]. https://brightdata.com/products/datasets/twitter/sentiment-analysis
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Jul 4, 2024
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.

    Key Features:
    
      Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
      Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
      Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
      Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
      Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
      Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.
    
    
    Use Cases:
    
      Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
      Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
      Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
      AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
      Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.
    
    
    
      Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download. 
      Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.
    
  6. s

    Twitter Users Broken down By Country

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Twitter Users Broken down By Country [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The US has historically been the target country for Twitter since its launch in 2006. This is the full breakdown of Twitter users by country.

  7. X/Twitter: U.S. users on taking a break from the platform 2023, by gender

    • statista.com
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). X/Twitter: U.S. users on taking a break from the platform 2023, by gender [Dataset]. https://www.statista.com/topics/737/twitter/
    Explore at:
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    According to a survey conducted in March 2023, 69 percent of female X/Twitter users had taken a break from the platform for several weeks or more in the past 12 months, compared to 54 percent of male users. Overall 46 percent of male users reported they had not taken a break from X/Twitter within the past year.

  8. X/Twitter: number of worldwide users 2019-2024

    • statista.com
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). X/Twitter: number of worldwide users 2019-2024 [Dataset]. https://www.statista.com/statistics/303681/twitter-users-worldwide/
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2022
    Area covered
    Worldwide
    Description

    As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.

  9. Z

    Yahoo-Yahoo Hash-Tag Tweets Using Sentiment Analysis and Opinion Mining...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Misra Sanjay (2022). Yahoo-Yahoo Hash-Tag Tweets Using Sentiment Analysis and Opinion Mining Algorithms [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4748716
    Explore at:
    Dataset updated
    Aug 18, 2022
    Dataset provided by
    Abayomi-Alli Adebayo
    Fernandez-Sanz Luis
    Abayomi-Alli Olusola
    Misra Sanjay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background

    Social media opinion has become a medium to quickly access large, valuable, and rich details of information on any subject matter within a short period. Twitter being a social microblog site, generate over 330 million tweets monthly across different countries. Analysing trending topics on Twitter presents opportunities to extract meaningful insight into different opinions on various issues.

    Aim

    This study aims to gain insights into the trending yahoo-yahoo topic on Twitter using content analysis of selected historical tweets.

    Methodology

    The widgets and workflow engine in the Orange Data mining toolbox were employed for all the text mining tasks. 5500 tweets were collected from Twitter using the “yahoo yahoo” hashtag. The corpus was pre-processed using a pre-trained tweet tokenizer, Valence Aware Dictionary for Sentiment Reasoning (VADER) was used for the sentiment and opinion mining, Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) was used for topic modelling. In contrast, Multidimensional scaling (MDS) was used to visualize the modelled topics.

    Results

    Results showed that "yahoo" appeared in the corpus 9555 times, 175 unique tweets were returned after duplicate removal. Contrary to expectation, Spain had the highest number of participants tweeting on the 'yahoo yahoo' topic within the period. The result of Vader sentiment analysis returned 35.85%, 24.53%, 15.09%, and 24.53%, negative, neutral, no-zone, and positive sentiment tweets, respectively. The word yahoo was highly representative of the LDA topics 1, 3, 4, 6, and LSI topic 1.

    Conclusion

    It can be concluded that emojis are even more representative of the sentiments in tweets faster than the textual contents. Also, despite popular belief, a significant number of youths regard cybercrime as a detriment to society.

  10. Tweets containing emojis 2013-2023

    • statista.com
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2024). Tweets containing emojis 2013-2023 [Dataset]. https://www.statista.com/topics/737/twitter/
    Explore at:
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    The share of posts on microblogging platform Twitter that contain emojis has increased significantly over the past ten years. In July 2013, 4.25 percent of tweets contained at least one emoji. Just under one decade later, in March 2023, 26.7 percent of tweets contained an emoji. The most common reason for using emojis, according to users in the United States, was to make conversations more fun.

  11. f

    Gaining Historical and International Relations Insights from Social Media:...

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauricio Quezada; vpena@dcc.uchile.cl; bpoblete@dcc.uchile.cl; dparras@uc.cl (2023). Gaining Historical and International Relations Insights from Social Media: Spatio-Temporal Real-World News Analysis using Twitter. [Dataset]. http://doi.org/10.6084/m9.figshare.5092678.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Mauricio Quezada; vpena@dcc.uchile.cl; bpoblete@dcc.uchile.cl; dparras@uc.cl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of metadata related to 24,508 news events, collected from Twitter spanning from August 2013 to June 2015. The events encompasses a total of 193,445,734 tweets produced by 26,127,624 different users.The files contain different aspects of the data.- components.tsv consists of the description of the events (components) of our dataset, consisting of 4 columns separated by tabs. The columns correspond to the component ID, the date of an event, the amount of tweets and a set of keywords describing the event, separated by commas (having a minimum of 2).- componentlocation.tsv consists of the description of the locations where the events happened (“protagonist locations”). The columns correspond to an ID, the component ID, the names of the locations, the frequency (how many times that location was mentioned in the component), the country code, and six more non-relevant columns. Note that one component can be in several rows, one per location being mentioned for that component.- country_protagonized-events.csv consists of the amount of events that one specific country is a protagonist of. It contains two columns, separated by comma, being the first the country code and the second the amount of events (components) that country is a protagonist of.- country_tweets.csv consists of the amount of tweets that one specific country has issued along all the events. It contains two columns, separated by comma, being the first the country code and the second the amount of tweets that country has issued.- participation_data.txt contains a matrix indicating the amount of tweets per country, per event. It contains one row per component ID, and one column per country (plus one column for the component ID); the cell value is the amount of tweets that country has issued for that event.- similarities_no_reciproco_percentile.csv corresponds to the similarity between co-protagonist countries. The columns are in the following order: Country 1, the amount of events Country 1 is a protagonist of, Country 2, the amount of events Country 2 is a protagonist of, the Jaccard Similarity between the two countries (where the country is represented by the set of the component IDs that country is a protagonist of), and the percentile of that similarity value (ranging from 0 to 1).- users_events_distinct.txt corresponds to the amount of unique users participating in an event. The columns are separated by tabs. The first columns is the component ID, the second is the amount of different users for that event, and the third is the amount of of different news sources for that event.- countries.txt is the mapping between country code and country name, separated by space.

  12. s

    How Popular Is Twitter In The World?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How Popular Is Twitter In The World? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Twitter is ranked as the 12h most popular social media site in the world. The platform currently has 611 million active monthly users.

  13. H

    #metoo Digital Media Collection - Twitter Dataset

    • dataverse.harvard.edu
    Updated Mar 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zachary Maiorana; Pablo Morales Henry; Jennifer Weintraub (2023). #metoo Digital Media Collection - Twitter Dataset [Dataset]. http://doi.org/10.7910/DVN/2SRSKJ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Zachary Maiorana; Pablo Morales Henry; Jennifer Weintraub
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset represents version 2 of this dataset. The previous version was published on June 30, 2020.This dataset contains the tweet ids of 39,373,774 tweets, which are part of the Schlesinger Library #metoo Digital Media Collection. This second version of the dataset represents the full set of tweets collected throughout the project, tweets range from October 15, 2017 to December 31, 2022. The previous version of this dataset extended to March 31, 2020. Tweets between October 15, 2017 and December 10, 2018 were licensed from Twitter's Historical PowerTrack and received through GNIP. Tweets after December 10, 2018 were collected weekly from the Twitter API through Social Feed Manager using the POST statuses/filter method of the Twitter Stream API.The following list of 76 terms includes the hashtags used to collect data for this dataset : #metoo, #timesup, #metoostem, #sciencetoo, #metoophd, #shittymediamen, #churchtoo, #ustoo, #metooMVMT, #ARmetoo, #TimesUpAR, #metooSociology, #metooSexScience, #timesupAcademia, #metooMedicine, #MyCampusToo, #howiwillchange, #iwill, #believewomen, #GoTeal, #BelieveChristine, #IStandWithDrFord, #IStandWithChristineBlaseyFord, #believesurvivors, #whyididntreport, #himtoo, #istandwithbrett, #confirmkavanaguhnow, #metooMcdonalds, #metoomovement, #muteRKelly, #WeBelieveDrFord, #WeBelieveSurvivors, #HandsOffPantsOn, #MeAt14, #HeToo, #MeTooLiars, #metoolynchings, #metoohucksters, #metoohustle, #ItWasMe, #Ihave, #TimesUpTech, #GoogleWalkout, #mosquemetoo, #faithandmetoo, #SilenceIsNotSpiritual, #HealMeToo, #TimesUpHarvard, #NoCarveOut, #TimesUpx2, #MeetingsToo, #metoonatsec, #healmetoo, #GamAni, #ShulToo, #harvardhearsyou, #metooarcheology, #TimesUpPayUp, #metooarcheology, #metooHBCU, #TimesUpHC, #aidtoo, #garmentmetoo, #mutemetoo, #mutetimesup, #metoopolisci, #copstoo, #TimesUpBiden, #MeTooNoMatterWho, #IBelieveTara, #BelieveAllWomen, #metoomilitary, #harvard38, #comaroff, and #harvardletter.The final four hashtags in this list were first crawled on February 10, 2022.Because of the size of the files, the list of identifiers are split in 41 files containing up to 1,000,000 ids each.Per Twitter's Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. Therefore, this dataset only contains tweet ids. In order to retrieve tweets still available (not deleted by users) tools like Hydrator are available.Subsets of only the #metoo seed are also available by quarterly datasets.

  14. w

    twitter-followers.net - Historical whois Lookup

    • whoisdatacenter.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, twitter-followers.net - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/twitter-followers.net/
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Oct 3, 2025
    Description

    Explore the historical Whois records related to twitter-followers.net (Domain). Get insights into ownership history and changes over time.

  15. s

    How Popular Is Twitter In The US?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How Popular Is Twitter In The US? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The US has the largest number of Twitter users with over a 100 million users. They account for about 16.7% of all Twitter users worldwide.

  16. Uruguayan Media Historical Tweets

    • kaggle.com
    Updated Jan 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lea Dominguez (2022). Uruguayan Media Historical Tweets [Dataset]. https://www.kaggle.com/datasets/leadominguez/uruguayan-media-historical-tweets/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 31, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Lea Dominguez
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Uruguay
    Description

    Context

    In the context of my thesis, we had to scrape Twitter's account of Uruguayan media to create a medias analysis platform and we dicided to publicate this dataset to help every person that may need it.

    Content

    The file contains tweets scrapped from Twitter from six different Uruguayan media (El País, Brecha, Búsqueda, El Observador, La República and La Diaria) since the creation of each account until approximately october 2021.

  17. w

    twitter.net.ag - Historical whois Lookup

    • whoisdatacenter.com
    csv
    Updated Aug 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc (2024). twitter.net.ag - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/twitter.net.ag/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 1, 2024
    Dataset authored and provided by
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Sep 22, 2025
    Description

    Explore the historical Whois records related to twitter.net.ag (Domain). Get insights into ownership history and changes over time.

  18. X/Twitter: personal privacy actions H1 2024

    • statista.com
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). X/Twitter: personal privacy actions H1 2024 [Dataset]. https://www.statista.com/topics/737/twitter/
    Explore at:
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    During the first half of 2024 there were 34,497 pieces of content removed from X due to personal privacy violations, which include the publishing or sharing of other people's private information. These types of violations are also known as doxxing. Overall, 30,450 of these pieces of content were reported manually by users of the platform.

  19. w

    twitter-design.com - Historical whois Lookup

    • whoisdatacenter.com
    csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, twitter-design.com - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/twitter-design.com/
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Oct 27, 2025
    Description

    Explore the historical Whois records related to twitter-design.com (Domain). Get insights into ownership history and changes over time.

  20. m

    Brexit Tweets from the morning of it's announcement

    • data.mendeley.com
    Updated Aug 10, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Parker (2017). Brexit Tweets from the morning of it's announcement [Dataset]. http://doi.org/10.17632/x9wkrghz23.2
    Explore at:
    Dataset updated
    Aug 10, 2017
    Authors
    Christopher Parker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To further our understanding of public reaction to political movement via social media, this dataset provides 17998 unfiltered tweets taken on the morning that Brexit was announced. This dataset contains metadata such as geolocation as an independent variable, to allow for rigorous qualitative investigation to be used.

    Additional tweets from trending topics were also taken at the same time provide context on trending themes at the time: - Scotland - Jeramy Corbyn - Nichola Sturgeon - David Cameron - EURefResults - Euromillions - Borris

    Data was captured with NCapture from QSR.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Daniel Gayo-Avello; Daniel Gayo-Avello (2020). Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets) [Dataset]. http://doi.org/10.5281/zenodo.3833782
Organization logo

Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets)

Related Article
Explore at:
bin, zip, txt, tsvAvailable download formats
Dataset updated
May 20, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniel Gayo-Avello; Daniel Gayo-Avello
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Disclaimer: This dataset is distributed by Daniel Gayo-Avello, an associate professor at the Department of Computer Science in the University of Oviedo, for the sole purpose of non-commercial research and it just includes tweet ids.

The dataset contains tweet IDs for all the published tweets (in any language) bettween March 21, 2006 and July 31, 2009 thus comprising the first whole three years of Twitter from its creation, that is, about 1.5 billion tweets (see file Twitter-historical-20060321-20090731.zip).

It covers several defining issues in Twitter, such as the invention of hashtags, retweets and trending topics, and it includes tweets related to the 2008 US Presidential Elections, the first Obama’s inauguration speech or the 2009 Iran Election protests (one of the so-called Twitter Revolutions).

Finally, it does contain tweets in many major languages (mainly English, Portuguese, Japanese, Spanish, German and French) so it should be possible–at least in theory–to analyze international events from different cultural perspectives.

The dataset was completed in November 2016 and, therefore, the tweet IDs it contains were publicly available at that moment. This means that there could be tweets public during that period that do not appear in the dataset and also that a substantial part of tweets in the dataset has been deleted (or locked) since 2016.

To make easier to understand the decay of tweet IDs in the dataset a number of representative samples (99% confidence level and 0.5 confidence interval) are provided.

In general terms, 85.5% ±0.5 of the historical tweets are available as of May 19, 2020 (see file Twitter-historical-20060321-20090731-sample.txt). However, since the amount of tweets vary greatly throughout the period of three years covered in the dataset, additional representative samples are provided for 90-day intervals (see the file 90-day-samples.zip).

In that regard, the ratio of publicly available tweets (as of May 19, 2020) is as follows:

  • March 21, 2006 to June 18, 2006: 88.4% ±0.5 (from 5,512 tweets).
  • June 18, 2006 to September 16, 2006: 82.7% ±0.5 (from 14,820 tweets).
  • September 16, 2006 to December 15, 2006: 85.7% ±0.5 (from 107,975 tweets).
  • December 15, 2006 to March 15, 2007: 88.2% ±0.5 (from 852,463 tweets).
  • March 15, 2007 to June 13, 2007: 89.6% ±0.5 (from 6,341,665 tweets).
  • June 13, 2007 to September 11, 2007: 88.6% ±0.5 (from 11,171,090 tweets).
  • September 11, 2007 to December 10, 2007: 87.9% ±0.5 (from 15,545,532 tweets).
  • December 10, 2007 to March 9, 2008: 89.0% ±0.5 (from 23,164,663 tweets).
  • March 9, 2008 to June 7, 2008: 66.5% ±0.5 (from 56,416,772 tweets; see below for more details on this).
  • June 7, 2008 to September 5, 2008: 78.3% ±0.5 (from 62,868,189 tweets; see below for more details on this).
  • September 5, 2008 to December 4, 2008: 87.3% ±0.5 (from 89,947,498 tweets).
  • December 4, 2008 to March 4, 2009: 86.9% ±0.5 (from 169,762,425 tweets).
  • March 4, 2009 to June 2, 2009: 86.4% ±0.5 (from 474,581,170 tweets).
  • June 2, 2009 to July 31, 2009: 85.7% ±0.5 (from 589,116,341 tweets).

The apparent drop in available tweets from March 9, 2008 to September 5, 2008 has an easy, although embarrassing, explanation.

At the moment of cleaning all the data to publish this dataset there seemed to be a gap between April 1, 2008 to July 7, 2008 (actually, the data was not missing but in a different backup). Since tweet IDs are easy to regenerate for that Twitter era (source code is provided in generate-ids.m) I simply produced all those that were created between those two dates. All those tweets actually existed but a number of them were obviously private and not crawlable. For those regenerated IDs the actual ratio of public tweets (as of May 19, 2020) is 62.3% ±0.5.

In other words, what you see in that period (April to July, 2008) is not actually a huge number of tweets having been deleted but the combination of deleted *and* non-public tweets (whose IDs should not be in the dataset for performance purposes when rehydrating the dataset).

Additionally, given that not everybody will need the whole period of time the earliest tweet ID for each date is provided in the file date-tweet-id.tsv.

For additional details regarding this dataset please see: Gayo-Avello, Daniel. "How I Stopped Worrying about the Twitter Archive at the Library of Congress and Learned to Build a Little One for Myself." arXiv preprint arXiv:1611.08144 (2016).

If you use this dataset in any way please cite that preprint (in addition to the dataset itself).

If you need to contact me you can find me as @PFCdgayo in Twitter.

Search
Clear search
Close search
Google apps
Main menu