100+ datasets found
  1. Z

    Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July...

    • data.niaid.nih.gov
    • live.european-language-grid.eu
    • +2more
    Updated May 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gayo-Avello, Daniel (2020). Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3833781
    Explore at:
    Dataset updated
    May 20, 2020
    Dataset provided by
    University of Oviedo
    Authors
    Gayo-Avello, Daniel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Disclaimer: This dataset is distributed by Daniel Gayo-Avello, an associate professor at the Department of Computer Science in the University of Oviedo, for the sole purpose of non-commercial research and it just includes tweet ids.

    The dataset contains tweet IDs for all the published tweets (in any language) bettween March 21, 2006 and July 31, 2009 thus comprising the first whole three years of Twitter from its creation, that is, about 1.5 billion tweets (see file Twitter-historical-20060321-20090731.zip).

    It covers several defining issues in Twitter, such as the invention of hashtags, retweets and trending topics, and it includes tweets related to the 2008 US Presidential Elections, the first Obama’s inauguration speech or the 2009 Iran Election protests (one of the so-called Twitter Revolutions).

    Finally, it does contain tweets in many major languages (mainly English, Portuguese, Japanese, Spanish, German and French) so it should be possible–at least in theory–to analyze international events from different cultural perspectives.

    The dataset was completed in November 2016 and, therefore, the tweet IDs it contains were publicly available at that moment. This means that there could be tweets public during that period that do not appear in the dataset and also that a substantial part of tweets in the dataset has been deleted (or locked) since 2016.

    To make easier to understand the decay of tweet IDs in the dataset a number of representative samples (99% confidence level and 0.5 confidence interval) are provided.

    In general terms, 85.5% ±0.5 of the historical tweets are available as of May 19, 2020 (see file Twitter-historical-20060321-20090731-sample.txt). However, since the amount of tweets vary greatly throughout the period of three years covered in the dataset, additional representative samples are provided for 90-day intervals (see the file 90-day-samples.zip).

    In that regard, the ratio of publicly available tweets (as of May 19, 2020) is as follows:

    March 21, 2006 to June 18, 2006: 88.4% ±0.5 (from 5,512 tweets).

    June 18, 2006 to September 16, 2006: 82.7% ±0.5 (from 14,820 tweets).

    September 16, 2006 to December 15, 2006: 85.7% ±0.5 (from 107,975 tweets).

    December 15, 2006 to March 15, 2007: 88.2% ±0.5 (from 852,463 tweets).

    March 15, 2007 to June 13, 2007: 89.6% ±0.5 (from 6,341,665 tweets).

    June 13, 2007 to September 11, 2007: 88.6% ±0.5 (from 11,171,090 tweets).

    September 11, 2007 to December 10, 2007: 87.9% ±0.5 (from 15,545,532 tweets).

    December 10, 2007 to March 9, 2008: 89.0% ±0.5 (from 23,164,663 tweets).

    March 9, 2008 to June 7, 2008: 66.5% ±0.5 (from 56,416,772 tweets; see below for more details on this).

    June 7, 2008 to September 5, 2008: 78.3% ±0.5 (from 62,868,189 tweets; see below for more details on this).

    September 5, 2008 to December 4, 2008: 87.3% ±0.5 (from 89,947,498 tweets).

    December 4, 2008 to March 4, 2009: 86.9% ±0.5 (from 169,762,425 tweets).

    March 4, 2009 to June 2, 2009: 86.4% ±0.5 (from 474,581,170 tweets).

    June 2, 2009 to July 31, 2009: 85.7% ±0.5 (from 589,116,341 tweets).

    The apparent drop in available tweets from March 9, 2008 to September 5, 2008 has an easy, although embarrassing, explanation.

    At the moment of cleaning all the data to publish this dataset there seemed to be a gap between April 1, 2008 to July 7, 2008 (actually, the data was not missing but in a different backup). Since tweet IDs are easy to regenerate for that Twitter era (source code is provided in generate-ids.m) I simply produced all those that were created between those two dates. All those tweets actually existed but a number of them were obviously private and not crawlable. For those regenerated IDs the actual ratio of public tweets (as of May 19, 2020) is 62.3% ±0.5.

    In other words, what you see in that period (April to July, 2008) is not actually a huge number of tweets having been deleted but the combination of deleted and non-public tweets (whose IDs should not be in the dataset for performance purposes when rehydrating the dataset).

    Additionally, given that not everybody will need the whole period of time the earliest tweet ID for each date is provided in the file date-tweet-id.tsv.

    For additional details regarding this dataset please see: Gayo-Avello, Daniel. "How I Stopped Worrying about the Twitter Archive at the Library of Congress and Learned to Build a Little One for Myself." arXiv preprint arXiv:1611.08144 (2016).

    If you use this dataset in any way please cite that preprint (in addition to the dataset itself).

    If you need to contact me you can find me as @PFCdgayo in Twitter.

  2. Data from: Annotated Dataset of History-related Tweets

    • zenodo.org
    csv
    Updated Sep 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasunobu Sumikawa; Adam Jatowt; Yasunobu Sumikawa; Adam Jatowt (2021). Annotated Dataset of History-related Tweets [Dataset]. http://doi.org/10.5281/zenodo.4657223
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 19, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yasunobu Sumikawa; Adam Jatowt; Yasunobu Sumikawa; Adam Jatowt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains tweet IDs and their 5 types of contextual information including 1) hashtags, 2) their categories, 3) entities obtained by NERD, 4) time-references normalized by Heideltime, and 5) Web categories for URLs attached with history-related hashtag that are related to history and that were collected for the purpose of analyzing how history-related content is disseminated in online social networks. Our IJDL paper shows the analysis results. The preliminary version of the analysis report is available here.

    We used the Twitter official search API provided by Twitter to collect tweets. Note that three kinds of tweets are typically found in Twitter: tweets, retweets and quote tweets. Tweet is an original text issued as a post by a Twitter user. A retweet is a copy of an original tweet for the purpose of propagating the tweet content to more users (i.e., one's followers). Finally, a quote tweet copies the content of another tweet and allows also to add new content. A quote tweet is sometimes called a retweet with a comment. In this work, we simply treat all quote tweets as original tweets since they include additional information/text. There were however only 1,877 (0.2%) tweets recognized as quote tweets in our dataset.

    To collect tweets that refer to the past or are related to collective memory of past events/entities, we performed hashtag based crawling together with bootstrapping procedure.
    At the beginning, we gathered several historical hashtags selected by experts (e.g. #HistoryTeacher, #history, #WmnHist).
    In addition, we prepared several hashtags that are commonly used when referring to the past: #onthisday, #thisdayinhistory, #throwbackthursday, #otd. We then collected tweets that contain these hashtags by using Twitter official search API.

    The collected tweets were issued from 8 March 2016 to 2 July 2018.
    Bootstrapping allowed us to search for other hashtags frequently used with the seed hashtags. The tweets tagged by such hashtags were then included into the seed set after the manual inspection of all the discovered hashtags as of their relation to the history, and filtering ones that are unrelated.
    In total, we gathered 147 history-related hashtags which allowed us to collect 2,370,252 tweet IDs pointing to 882,977 tweets and 1,487,275 re-tweets.

    Related papers:

    1. Yasunobu Sumikawa, Adam Jatowt, and Marten During, "Digital History meets Microblogging: Analyzing Collective Memories in Twitter", In Proceedings of the 18th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL'18, IEEE/ACM, pp. 213 -- 222, 2018. [paper]
    2. Yasunobu Sumikawa and Adam Jatowt, "Analyzing History-related Posts in Twitter", International Journal on Digital Libraries, Springer, 2020. https://doi.org/10.1007/s00799-020-00296-2 [paper][dataset]
    3. Yasunobu Sumikawa and Adam Jatowt, "Annotated Dataset of History-related Tweets", Data in Brief, Vol. 38, pp. 107344, Elsevier, 2021. [paper]
  3. d

    American Historical Association 2017 Conference Tweets

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Milligan, Ian (2023). American Historical Association 2017 Conference Tweets [Dataset]. http://doi.org/10.5683/SP/CFVF1F
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Milligan, Ian
    Description

    A list of 10,538 Twitter IDs for tweets harvested between 4 January at 11am and 9 January at 11am using Social Feed Manager. As this used the search API, the 4 January at 11am crawl went back about 5-9 days. Tweet IDs included, as is a log of the decisions made to curate this dataset.

  4. s

    Twitter Users Broken down By Country

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Twitter Users Broken down By Country [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The US has historically been the target country for Twitter since its launch in 2006. This is the full breakdown of Twitter users by country.

  5. Twitter Sentiment Analysis Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data, Twitter Sentiment Analysis Datasets [Dataset]. https://brightdata.com/products/datasets/twitter/sentiment-analysis
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.

    Key Features:
    
      Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
      Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
      Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
      Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
      Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
      Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.
    
    
    Use Cases:
    
      Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
      Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
      Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
      AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
      Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.
    
    
    
      Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download. 
      Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.
    
  6. d

    Replication Data for: This Was Twitter: Introducing the Twitter History and...

    • search.dataone.org
    • datasetcatalog.nlm.nih.gov
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    STEINERT-THRELKELD, ZACHARY (2025). Replication Data for: This Was Twitter: Introducing the Twitter History and Image Sharing v1.0 Datasets [Dataset]. http://doi.org/10.7910/DVN/14WGUG
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    STEINERT-THRELKELD, ZACHARY
    Time period covered
    Sep 1, 2013 - Mar 15, 2023
    Description

    Paper DOI: 10.51685/jqd.2025.011 Paper abstract: This paper introduces the Twitter History and Image Sharing (THIS) datasets. These four related datasets enable the study of Twitter \emph{without the release of tweets or user information}. Both are derived from a corpus of 14.596 billion geolocated tweets streamed from September 1, 2013 through March 14, 2023. Two Twitter History datasets provide data on the number of tweets, tweets by language, and user data by country from September 1, 2013 through March 14, 2023. A third Twitter History dataset provides data on the number of new user registrations by country from March 21, 2006, the start of Twitter, through March 14, 2023. Image Sharing is based on the 1.676 billion images shared during this period and the 956.049 million still available for download in early 2024. It provides data on the number of images shared and still available from September 1, 2013 through March 14, 2023. The THIS datasets enable the study of Twitter itself and its differential use across countries, including in response to specific events, and the paper demonstrates applications to correlates of image sharing and removal, behavior around national executive elections, event detection, and digital repression. While this paper is not the first to study Twitter, it is, as far as we are aware, the first to provide datasets enabling other researchers to do the same.

  7. m

    The Climate Change Twitter Dataset

    • data.mendeley.com
    • kaggle.com
    Updated May 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitrios Effrosynidis (2022). The Climate Change Twitter Dataset [Dataset]. http://doi.org/10.17632/mw8yd7z9wc.2
    Explore at:
    Dataset updated
    May 19, 2022
    Authors
    Dimitrios Effrosynidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    If you use the dataset, cite the paper: https://doi.org/10.1016/j.eswa.2022.117541

    The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.

    The following columns are in the dataset:

    ➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.

    Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.

  8. X/Twitter: number of worldwide users 2019-2024

    • statista.com
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). X/Twitter: number of worldwide users 2019-2024 [Dataset]. https://www.statista.com/statistics/303681/twitter-users-worldwide/
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2022
    Area covered
    Worldwide
    Description

    As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.

  9. s

    Twitter Key Statistics

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Twitter Key Statistics [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are the key Twitter user statistics that you need to know.

  10. H

    #metoo Digital Media Collection - Twitter Dataset

    • dataverse.harvard.edu
    Updated Mar 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zachary Maiorana; Pablo Morales Henry; Jennifer Weintraub (2023). #metoo Digital Media Collection - Twitter Dataset [Dataset]. http://doi.org/10.7910/DVN/2SRSKJ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Zachary Maiorana; Pablo Morales Henry; Jennifer Weintraub
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset represents version 2 of this dataset. The previous version was published on June 30, 2020.This dataset contains the tweet ids of 39,373,774 tweets, which are part of the Schlesinger Library #metoo Digital Media Collection. This second version of the dataset represents the full set of tweets collected throughout the project, tweets range from October 15, 2017 to December 31, 2022. The previous version of this dataset extended to March 31, 2020. Tweets between October 15, 2017 and December 10, 2018 were licensed from Twitter's Historical PowerTrack and received through GNIP. Tweets after December 10, 2018 were collected weekly from the Twitter API through Social Feed Manager using the POST statuses/filter method of the Twitter Stream API.The following list of 76 terms includes the hashtags used to collect data for this dataset : #metoo, #timesup, #metoostem, #sciencetoo, #metoophd, #shittymediamen, #churchtoo, #ustoo, #metooMVMT, #ARmetoo, #TimesUpAR, #metooSociology, #metooSexScience, #timesupAcademia, #metooMedicine, #MyCampusToo, #howiwillchange, #iwill, #believewomen, #GoTeal, #BelieveChristine, #IStandWithDrFord, #IStandWithChristineBlaseyFord, #believesurvivors, #whyididntreport, #himtoo, #istandwithbrett, #confirmkavanaguhnow, #metooMcdonalds, #metoomovement, #muteRKelly, #WeBelieveDrFord, #WeBelieveSurvivors, #HandsOffPantsOn, #MeAt14, #HeToo, #MeTooLiars, #metoolynchings, #metoohucksters, #metoohustle, #ItWasMe, #Ihave, #TimesUpTech, #GoogleWalkout, #mosquemetoo, #faithandmetoo, #SilenceIsNotSpiritual, #HealMeToo, #TimesUpHarvard, #NoCarveOut, #TimesUpx2, #MeetingsToo, #metoonatsec, #healmetoo, #GamAni, #ShulToo, #harvardhearsyou, #metooarcheology, #TimesUpPayUp, #metooarcheology, #metooHBCU, #TimesUpHC, #aidtoo, #garmentmetoo, #mutemetoo, #mutetimesup, #metoopolisci, #copstoo, #TimesUpBiden, #MeTooNoMatterWho, #IBelieveTara, #BelieveAllWomen, #metoomilitary, #harvard38, #comaroff, and #harvardletter.The final four hashtags in this list were first crawled on February 10, 2022.Because of the size of the files, the list of identifiers are split in 41 files containing up to 1,000,000 ids each.Per Twitter's Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. Therefore, this dataset only contains tweet ids. In order to retrieve tweets still available (not deleted by users) tools like Hydrator are available.Subsets of only the #metoo seed are also available by quarterly datasets.

  11. s

    Twitter Users Broken Down By Age

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Twitter Users Broken Down By Age [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the breakdown of Twitter users by age group.

  12. Gaining Historical and International Relations Insights from Social Media:...

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauricio Quezada; vpena@dcc.uchile.cl; bpoblete@dcc.uchile.cl; dparras@uc.cl (2023). Gaining Historical and International Relations Insights from Social Media: Spatio-Temporal Real-World News Analysis using Twitter. [Dataset]. http://doi.org/10.6084/m9.figshare.5092678.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Mauricio Quezada; vpena@dcc.uchile.cl; bpoblete@dcc.uchile.cl; dparras@uc.cl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of metadata related to 24,508 news events, collected from Twitter spanning from August 2013 to June 2015. The events encompasses a total of 193,445,734 tweets produced by 26,127,624 different users.The files contain different aspects of the data.- components.tsv consists of the description of the events (components) of our dataset, consisting of 4 columns separated by tabs. The columns correspond to the component ID, the date of an event, the amount of tweets and a set of keywords describing the event, separated by commas (having a minimum of 2).- componentlocation.tsv consists of the description of the locations where the events happened (“protagonist locations”). The columns correspond to an ID, the component ID, the names of the locations, the frequency (how many times that location was mentioned in the component), the country code, and six more non-relevant columns. Note that one component can be in several rows, one per location being mentioned for that component.- country_protagonized-events.csv consists of the amount of events that one specific country is a protagonist of. It contains two columns, separated by comma, being the first the country code and the second the amount of events (components) that country is a protagonist of.- country_tweets.csv consists of the amount of tweets that one specific country has issued along all the events. It contains two columns, separated by comma, being the first the country code and the second the amount of tweets that country has issued.- participation_data.txt contains a matrix indicating the amount of tweets per country, per event. It contains one row per component ID, and one column per country (plus one column for the component ID); the cell value is the amount of tweets that country has issued for that event.- similarities_no_reciproco_percentile.csv corresponds to the similarity between co-protagonist countries. The columns are in the following order: Country 1, the amount of events Country 1 is a protagonist of, Country 2, the amount of events Country 2 is a protagonist of, the Jaccard Similarity between the two countries (where the country is represented by the set of the component IDs that country is a protagonist of), and the percentile of that similarity value (ranging from 0 to 1).- users_events_distinct.txt corresponds to the amount of unique users participating in an event. The columns are separated by tabs. The first columns is the component ID, the second is the amount of different users for that event, and the third is the amount of of different news sources for that event.- countries.txt is the mapping between country code and country name, separated by space.

  13. Twitter Stock Market Data 2014 - 2022

    • kaggle.com
    zip
    Updated Nov 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amritha R J (2022). Twitter Stock Market Data 2014 - 2022 [Dataset]. https://www.kaggle.com/datasets/amritharj/twitter-stock-market-data-since-2014
    Explore at:
    zip(43631 bytes)Available download formats
    Dataset updated
    Nov 7, 2022
    Authors
    Amritha R J
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This dataset contains the daily historical stock market data of Twitter since 2014. The dataset is based on Yahoo! Finance. These data can be used for analyzing the rise and fall in stock price of Twitter each year. This dataset is great for Exploratory Data Analysis and visualization.

  14. X/Twitter: platform manipulation and spam actions H2 2024

    • statista.com
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). X/Twitter: platform manipulation and spam actions H2 2024 [Dataset]. https://www.statista.com/topics/737/twitter/
    Explore at:
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    Between July and December 2024, over 335 million accounts on X (formerly Twitter) were suspended for reasons of spam or platform manipulation. User-informed labels were added to 66 million posts after being reported for spam.

  15. f

    3805 Tweet IDs from User 25073877 [Thu Feb 25 16:35:12 +0000 2016 to Mon Apr...

    • city.figshare.com
    txt
    Updated Apr 3, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ernesto Priego (2017). 3805 Tweet IDs from User 25073877 [Thu Feb 25 16:35:12 +0000 2016 to Mon Apr 03 12:51:01 +0000 2017] [Dataset]. http://doi.org/10.6084/m9.figshare.4811284.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 3, 2017
    Dataset provided by
    City, University of London
    Authors
    Ernesto Priego
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a CSV file containing Tweet IDs of 3,805 Tweets from user ID 25073877 posted publicly between Thursday February 25 2016 16:35:12 +0000 to Monday April 03 2017 12:51:01 +0000.This file does not include Tweets' texts nor URLs. Columns in the file areid_strfrom_user_id_str created_at time source user_followers_count user_friends_count Motivations to Share this DataArchived Tweets can provide interesting insights for the study of contemporary history of media, politics, diplomacy, etc. The queried account is a public account widely agreed to be of exceptional national and international public interest. Though they provide public access to tweeted content in real time, Twitter Web and mobile clients are not suited for appropriate Tweet corpus analysis. For anyone researching social media, access to the data is absolutely essential in order to perform, review and reproduce studies. Archiving Tweets of public interest due to their historic significance is a means to both preserve and enable reproducible study of this form of rapid online communication that otherwise can very likely become unretrievable as time passes. Due to Twitter's current business model and API limits, to date collecting in real time is the only relatively reliable method to archive Tweets at a small scale. Methodology and LimitationsThe Tweets contained in this file were collected by Ernesto Priego using a Python script. The data collection search query was from:realdonaldtrump. A trigger was scheduled to collect atuomatically every hour. The original data harvesting was refined to delete duplications, to subscribe to Twitter's Terms and Conditions and so that the data was sorted in chronological order.Duplication of data due to the automated collection is possible so further data refining might be required. The file may not contain data from Tweets deleted by the queried user account immediately after original publication. Both research and experience show that the Twitter search API is not 100% reliable. (Gonzalez-Bailon, Sandra, et al. 2012).Apart from the filters and limitations already declared, it cannot be guaranteed that this file contains each and every Tweet posted by the queried account during the indicated period. This file dataset is shared for archival, comparative and indicative educational research purposes only. The content included is from a public Twitter account and was obtained from the Twitter Search API. The shared data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.The original Tweets, their contents and associated metadata were published openly on the Web from the queried public account and are responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually.No private personal information is shared in this dataset. As indicated above this dataset does not contain the text of the Tweets. The collection and sharing of this dataset is enabled and allowed by Twitter's Privacy Policy. The sharing of this dataset complies with Twitter's Developer Rules of the Road.This dataset is shared to archive, document and encourage open educational research into political activity on Twitter.Other ConsiderationsAll Twitter users agree to Twitter's Privacy and data sharing policies. Social media research remains in its infancy and though work has been done to develop best practices there is yet no agreement on a series of grey areas relating to reseach methodologies including ad hoc social media specific research ethics guidelines for reproducible research. Though these datasets have limitations and are not thoroughly systematic, it is hoped they can contribute to developing new insights into the discipline's presence on Twitter over time. Reproducibility is considered here a key value for robust and trustworthy research. Different scholarly professional associations like the Modern Language Association recognise Tweets, datasets and other online and digital resources as citeable scholarly outputs.The data contained in the deposited file is otherwise available elsewhere through different methods.

  16. w

    twitter.net.ag - Historical whois Lookup

    • whoisdatacenter.com
    csv
    Updated Aug 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc (2024). twitter.net.ag - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/twitter.net.ag/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 1, 2024
    Dataset authored and provided by
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Nov 8, 2025
    Description

    Explore the historical Whois records related to twitter.net.ag (Domain). Get insights into ownership history and changes over time.

  17. w

    twitter-eventt22.com - Historical whois Lookup

    • whoisdatacenter.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, twitter-eventt22.com - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/twitter-eventt22.com/
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Nov 25, 2025
    Description

    Explore the historical Whois records related to twitter-eventt22.com (Domain). Get insights into ownership history and changes over time.

  18. s

    Twitter Users Broken Down By Gender

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Twitter Users Broken Down By Gender [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The platform is male-dominated with 68.1% of all Twitter users being male. Just 31.9% of Twitter users are female.

  19. s

    How Popular Is Twitter In The World?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). How Popular Is Twitter In The World? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Twitter is ranked as the 12h most popular social media site in the world. The platform currently has 611 million active monthly users.

  20. w

    twitter-design.com - Historical whois Lookup

    • whoisdatacenter.com
    csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, twitter-design.com - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/twitter-design.com/
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Nov 27, 2025
    Description

    Explore the historical Whois records related to twitter-design.com (Domain). Get insights into ownership history and changes over time.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gayo-Avello, Daniel (2020). Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3833781

Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets)

Related Article
Explore at:
Dataset updated
May 20, 2020
Dataset provided by
University of Oviedo
Authors
Gayo-Avello, Daniel
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Disclaimer: This dataset is distributed by Daniel Gayo-Avello, an associate professor at the Department of Computer Science in the University of Oviedo, for the sole purpose of non-commercial research and it just includes tweet ids.

The dataset contains tweet IDs for all the published tweets (in any language) bettween March 21, 2006 and July 31, 2009 thus comprising the first whole three years of Twitter from its creation, that is, about 1.5 billion tweets (see file Twitter-historical-20060321-20090731.zip).

It covers several defining issues in Twitter, such as the invention of hashtags, retweets and trending topics, and it includes tweets related to the 2008 US Presidential Elections, the first Obama’s inauguration speech or the 2009 Iran Election protests (one of the so-called Twitter Revolutions).

Finally, it does contain tweets in many major languages (mainly English, Portuguese, Japanese, Spanish, German and French) so it should be possible–at least in theory–to analyze international events from different cultural perspectives.

The dataset was completed in November 2016 and, therefore, the tweet IDs it contains were publicly available at that moment. This means that there could be tweets public during that period that do not appear in the dataset and also that a substantial part of tweets in the dataset has been deleted (or locked) since 2016.

To make easier to understand the decay of tweet IDs in the dataset a number of representative samples (99% confidence level and 0.5 confidence interval) are provided.

In general terms, 85.5% ±0.5 of the historical tweets are available as of May 19, 2020 (see file Twitter-historical-20060321-20090731-sample.txt). However, since the amount of tweets vary greatly throughout the period of three years covered in the dataset, additional representative samples are provided for 90-day intervals (see the file 90-day-samples.zip).

In that regard, the ratio of publicly available tweets (as of May 19, 2020) is as follows:

March 21, 2006 to June 18, 2006: 88.4% ±0.5 (from 5,512 tweets).

June 18, 2006 to September 16, 2006: 82.7% ±0.5 (from 14,820 tweets).

September 16, 2006 to December 15, 2006: 85.7% ±0.5 (from 107,975 tweets).

December 15, 2006 to March 15, 2007: 88.2% ±0.5 (from 852,463 tweets).

March 15, 2007 to June 13, 2007: 89.6% ±0.5 (from 6,341,665 tweets).

June 13, 2007 to September 11, 2007: 88.6% ±0.5 (from 11,171,090 tweets).

September 11, 2007 to December 10, 2007: 87.9% ±0.5 (from 15,545,532 tweets).

December 10, 2007 to March 9, 2008: 89.0% ±0.5 (from 23,164,663 tweets).

March 9, 2008 to June 7, 2008: 66.5% ±0.5 (from 56,416,772 tweets; see below for more details on this).

June 7, 2008 to September 5, 2008: 78.3% ±0.5 (from 62,868,189 tweets; see below for more details on this).

September 5, 2008 to December 4, 2008: 87.3% ±0.5 (from 89,947,498 tweets).

December 4, 2008 to March 4, 2009: 86.9% ±0.5 (from 169,762,425 tweets).

March 4, 2009 to June 2, 2009: 86.4% ±0.5 (from 474,581,170 tweets).

June 2, 2009 to July 31, 2009: 85.7% ±0.5 (from 589,116,341 tweets).

The apparent drop in available tweets from March 9, 2008 to September 5, 2008 has an easy, although embarrassing, explanation.

At the moment of cleaning all the data to publish this dataset there seemed to be a gap between April 1, 2008 to July 7, 2008 (actually, the data was not missing but in a different backup). Since tweet IDs are easy to regenerate for that Twitter era (source code is provided in generate-ids.m) I simply produced all those that were created between those two dates. All those tweets actually existed but a number of them were obviously private and not crawlable. For those regenerated IDs the actual ratio of public tweets (as of May 19, 2020) is 62.3% ±0.5.

In other words, what you see in that period (April to July, 2008) is not actually a huge number of tweets having been deleted but the combination of deleted and non-public tweets (whose IDs should not be in the dataset for performance purposes when rehydrating the dataset).

Additionally, given that not everybody will need the whole period of time the earliest tweet ID for each date is provided in the file date-tweet-id.tsv.

For additional details regarding this dataset please see: Gayo-Avello, Daniel. "How I Stopped Worrying about the Twitter Archive at the Library of Congress and Learned to Build a Little One for Myself." arXiv preprint arXiv:1611.08144 (2016).

If you use this dataset in any way please cite that preprint (in addition to the dataset itself).

If you need to contact me you can find me as @PFCdgayo in Twitter.

Search
Clear search
Close search
Google apps
Main menu