100+ datasets found
  1. X/Twitter: Countries with the largest audience 2025

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, X/Twitter: Countries with the largest audience 2025 [Dataset]. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2025
    Area covered
    Worldwide
    Description

    As of October 2025, social network X (formerly known as Twitter) was most popular in the United States, with an audience reach of approximately 99.04 million users. Japan ranked second, recording more than 71 million users on the platform. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.

  2. Twitter Geospatial Data

    • kaggle.com
    zip
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahitya Setu (2025). Twitter Geospatial Data [Dataset]. https://www.kaggle.com/datasets/sahityasetu/twitter-geospatial-data
    Explore at:
    zip(187153686 bytes)Available download formats
    Dataset updated
    Apr 2, 2025
    Authors
    Sahitya Setu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Information

    Note that this is the full week of data that was sampled from Twitter. The 10,005,301 count mentioned in the introductory paper below refers to the weekday portion of the data (i.e., Monday through Friday). If you remove Saturday (Jan 12, 2013) and Sunday (Jan 13, 2013), then you will get the Monday through Friday portion that was analyzed in the paper. Has Missing Values? No

    Dataset Characteristics Multivariate, Time-Series, Spatiotemporal Subject Area Social Science Associated Tasks Classification, Regression, Clustering

    Variable Information This dataset contains geospatial and timestamp data for one week worth of Tweets in the contiguous United States. The Tweets were created between January 12, 2013 and January 18, 2013. The exact location (i.e., longitude and latitude) and timestamp (hour, minute, second) of each Tweet was recorded. All timestamps are reported in central standard time in the format "YYYY-MM-DD HH:MM:SS". The geo-tag information was used to assign each Tweet to one of the four standard time zones (for details see Helwig et al., 2015). The data were collected by the CyberGIS Center for Advanced Digital and Spatial Studies at the University of Illinois at Urbana-Champaign. Details on the data preprocessing and analysis can be found in Helwig et al. (2015). Class Labels 1. longitude: exact longitude coordinate of Tweet (real valued) 2. latitude: exact latitude coordinate of Tweet (real valued) 3. timestamp: 20130112000000 = 2013-01-12 00:00:00 CST (integer) 4. timezone: 1 = Eastern, 2 = Central, 3 = Mountain, 4 = Pacific (integer)

  3. X/Twitter: number of monthly active users 2010-2019

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, X/Twitter: number of monthly active users 2010-2019 [Dataset]. https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    As of the first quarter of 2019, X/Twitter averaged 330 million monthly active users, a decline from its all-time high of 336 MAU in the first quarter of 2018. As of the first quarter of 2019, the company switched its user reporting metric to monetizable daily active users (mDAU). X/Twitter X/Twitter is a social networking and microblogging service, enabling registered users to read and post short messages called tweets. X/Twitter messages are limited to 280 characters and users are also able to upload photos or short videos. Tweets are posted to a publicly available profile or can be sent as direct messages to other users. Part of the social platform’s appeal is the ability of users to follow any other user with a public profile, enabling users to interact with celebrities who regularly post on the social media site. Currently, the most-followed person on Twitter is singer Katy Perry with more than 107 million followers. Twitter has also become an important communications channel for governments and heads of state – U.S. President Donald Trump was the most-followed world leader on Twitter, followed by Pope Francis and Indian Prime Minister Narendra Modi. Despite the widespread usage among the rich and famous, the decline in active users has not been impressing investors as the platform is largely reliant on delivering advertising to users in order to generate revenues. Twitter’s company revenue in 2018 amounted to three billion U.S. dollars, up from 2.44 billion in the preceding fiscal year. Twitter was only recently able to report a positive annual result for the first time, when the company generated 1.2 billion U.S. dollars in net income in 2018.

  4. tweet-per-sec.com Website Traffic, Ranking, Analytics [October 2025]

    • semrush.ebundletools.com
    • sr01.toolswala.net
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). tweet-per-sec.com Website Traffic, Ranking, Analytics [October 2025] [Dataset]. https://semrush.ebundletools.com/website/tweet-per-sec.com/overview/
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Nov 12, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    tweet-per-sec.com is ranked # in US with 0 Traffic. Categories: Online Services. Learn more about website traffic, market share, and more!

  5. X/Twitter: platform manipulation and spam actions H2 2024

    • statista.com
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). X/Twitter: platform manipulation and spam actions H2 2024 [Dataset]. https://www.statista.com/topics/737/twitter/
    Explore at:
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    Between July and December 2024, over 335 million accounts on X (formerly Twitter) were suspended for reasons of spam or platform manipulation. User-informed labels were added to 66 million posts after being reported for spam.

  6. g

    Just Another Day on Twitter: A Complete 24 Hours of Twitter Data

    • search.gesis.org
    • datacatalogue.cessda.eu
    Updated Oct 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pfeffer, Jürgen (2022). Just Another Day on Twitter: A Complete 24 Hours of Twitter Data [Dataset]. https://search.gesis.org/research_data/SDN-10.7802-2516
    Explore at:
    Dataset updated
    Oct 15, 2022
    Dataset provided by
    GESIS search
    GESIS, Köln
    Authors
    Pfeffer, Jürgen
    License

    https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

    Description

    At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change.

  7. H

    2016 United States Presidential Election Tweet Ids

    • dataverse.harvard.edu
    Updated Dec 13, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justin Littman; Laura Wrubel; Daniel Kerchner (2016). 2016 United States Presidential Election Tweet Ids [Dataset]. http://doi.org/10.7910/DVN/PDI7IN
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Justin Littman; Laura Wrubel; Daniel Kerchner
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United States
    Description

    This dataset contains the tweet ids of approximately 280 million tweets related to the 2016 United States presidential election. They were collected between July 13, 2016 and November 10, 2016 from the Twitter API using Social Feed Manager. These tweet ids are broken up into 12 collections. Each collection was collected either from the GET statuses/user_timeline method of the Twitter REST API or the POST statuses/filter method of the Twitter Stream API. The collections are: Candidates and key election hashtags (Twitter filter): election-filter[1-6].txt Democratic candidates (Twitter user timeline): democratic-candidate-timelines.txt Democratic Convention (Twitter filter): democratic-convention-filter.txt Democratic Party (Twitter user timeline): democratic-party-timelines.txt Election Day (Twitter filter): election-day.txt First presidential debate (Twitter filter): first-debate.txt GOP Convention (Twitter filter): republican-convention-filter.txt Republican candidates (Twitter user timeline): republican-candidate-timelines.txt Republican Party (Twitter user timeline): republican-party-timelines.txt Second presidential debate (Twitter filter): second-debate.txt Third presidential debate (Twitter filter): third-debate.txt Vice Presidential debate (Twitter filter): vp-debate.txt There is also a README.txt file for each collection containing additional documentation on how it was collected. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. When hydrating be aware that: Twitter limits hydration to 900 requests of 100 tweet ids per 15 minute window per set of user credentials. This works out to 8,640,000 tweets per day, so hydrating this entire dataset will take 32 days. The Twitter API will not return tweets that have been deleted or belong to accounts that have been suspended, deleted, or made private. You should expect a large number of these tweets to be unavailable. There may be duplicate tweets across collections. Also, according to the Twitter documentation, duplicate tweets are possible for tweets collected from the Twitter filter stream. For tweets collected from the Twitter filter stream, this is not a complete set of tweets that match the filter. Gaps may exist because: Twitter limits the number of tweets returned by the filter at any point in time. Social Feed Manager stops and starts the Twitter filter stream every 30 minutes. In Social Feed Manager, collecting is turned off while a user is making changes to the collection criteria. There were some operational issues, e.g., network interruptions, during the collection period. Since some of the terms used to collect from the Twitter filter stream were broad (e.g., “election”), it may contain tweets from elections other than the U.S. presidential election, including state elections, local elections, or elections in other countries. Per Twitter’s Developer Policy, tweet ids may be publicly shared; tweets may not. Questions about this dataset can be sent to sfm@gwu.edu. George Washington University researchers should contact us for access to the tweets. This work is supported by grant #NARDI-14-50017-14 from the National Historical Publications and Records Commission.

  8. Sundanese Twitter Dataset emotions classification

    • kaggle.com
    zip
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabie El Kharoua (2024). Sundanese Twitter Dataset emotions classification [Dataset]. https://www.kaggle.com/datasets/rabieelkharoua/sundanese-twitter-dataset
    Explore at:
    zip(104015 bytes)Available download formats
    Dataset updated
    Apr 24, 2024
    Authors
    Rabie El Kharoua
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains tweet of the second-largest local language in Indonesia and is used for emotion classification.

    Dataset Characteristics: Tabular

    Subject Area: Computer Science

    Associated Tasks: Classification

    Instances: 2510

    Dataset Information

    For what purpose was the dataset created?

    This dataset is created as contribution for NLP research particularly in Indonesia

    Who funded the creation of the dataset?

    This dataset is self-funded

    What do the instances in this dataset represent?

    tweet

    Are there recommended data splits?

    No

    Was there any data preprocessing performed?

    tokenization, stopword removal, stemming

    Has Missing Values?

    No

    Introductory Paper

    Title: Sundanese Twitter Dataset for Emotion Classification

    Authors: Oddy Virgantara Putra; Fathin Muhammad Wasmanson; Triana Harmini; Shoffin Nahwa Utama. 2020

    Journal: Published in Conference

    Link: https://ieeexplore.ieee.org/abstract/document/9297929

    Abstract of Introductory Paper

    Sundanese is the second-largest tribe in Indonesia which possesses many dialects. This condition has gained attention for many researchers to analyze emotion especially on social media. However, with barely available Sundanese dataset, this condition makes understanding sundanese emotion is a challenging task. In this research, we proposed a dataset for emotion classification of Sundanese text. The preprocessing includes case folding, stopwords removal, stemming, tokenizing, and text representation. Prior to classification, for the feature generation, we utilize term frequency-inverse document frequency (TFIDF). We evaluated our dataset using k-Fold Cross Validation. Our experiments with the proposed method exhibit an effective result for machine learning classification. Furthermore, as far as we know, this is the first Sundanese emotion dataset available for public.

    Cite

    Citation: Putra,Oddy Virgantara. (2021). Sundanese Twitter Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5MK8C.

    BibTex: @misc{misc_sundanese_twitter_dataset_695, author = {Putra,Oddy Virgantara}, title = {{Sundanese Twitter Dataset}}, year = {2021}, howpublished = {UCI Machine Learning Repository}, note = {{DOI}: https://doi.org/10.24432/C5MK8C} }

  9. u

    Google Analytics & Twitter dataset from a movies, TV series and videogames...

    • portalcientificovalencia.univeuropea.com
    • figshare.com
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yeste, Víctor; Yeste, Víctor (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed3aea56d4af0485dc8?lang=en
    Explore at:
    Dataset updated
    2024
    Authors
    Yeste, Víctor; Yeste, Víctor
    Description

    Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio

  10. Gaining Historical and International Relations Insights from Social Media:...

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauricio Quezada; vpena@dcc.uchile.cl; bpoblete@dcc.uchile.cl; dparras@uc.cl (2023). Gaining Historical and International Relations Insights from Social Media: Spatio-Temporal Real-World News Analysis using Twitter. [Dataset]. http://doi.org/10.6084/m9.figshare.5092678.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Mauricio Quezada; vpena@dcc.uchile.cl; bpoblete@dcc.uchile.cl; dparras@uc.cl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of metadata related to 24,508 news events, collected from Twitter spanning from August 2013 to June 2015. The events encompasses a total of 193,445,734 tweets produced by 26,127,624 different users.The files contain different aspects of the data.- components.tsv consists of the description of the events (components) of our dataset, consisting of 4 columns separated by tabs. The columns correspond to the component ID, the date of an event, the amount of tweets and a set of keywords describing the event, separated by commas (having a minimum of 2).- componentlocation.tsv consists of the description of the locations where the events happened (“protagonist locations”). The columns correspond to an ID, the component ID, the names of the locations, the frequency (how many times that location was mentioned in the component), the country code, and six more non-relevant columns. Note that one component can be in several rows, one per location being mentioned for that component.- country_protagonized-events.csv consists of the amount of events that one specific country is a protagonist of. It contains two columns, separated by comma, being the first the country code and the second the amount of events (components) that country is a protagonist of.- country_tweets.csv consists of the amount of tweets that one specific country has issued along all the events. It contains two columns, separated by comma, being the first the country code and the second the amount of tweets that country has issued.- participation_data.txt contains a matrix indicating the amount of tweets per country, per event. It contains one row per component ID, and one column per country (plus one column for the component ID); the cell value is the amount of tweets that country has issued for that event.- similarities_no_reciproco_percentile.csv corresponds to the similarity between co-protagonist countries. The columns are in the following order: Country 1, the amount of events Country 1 is a protagonist of, Country 2, the amount of events Country 2 is a protagonist of, the Jaccard Similarity between the two countries (where the country is represented by the set of the component IDs that country is a protagonist of), and the percentile of that similarity value (ranging from 0 to 1).- users_events_distinct.txt corresponds to the amount of unique users participating in an event. The columns are separated by tabs. The first columns is the component ID, the second is the amount of different users for that event, and the third is the amount of of different news sources for that event.- countries.txt is the mapping between country code and country name, separated by space.

  11. Twitter: number of tweets by board members 2019

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Twitter: number of tweets by board members 2019 [Dataset]. https://www.statista.com/statistics/151247/twitter-number-of-tweets-by-board-members/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    This statistic presents the number of tweets by Twitter board members as of July 2019. As of the measured month, Twitter founder Jack Dorsey had posted 25,900 tweets. Second-ranked Martha Lane Fox was a similarly prolific Twitter user with over 23,000 tweets.

  12. X/Twitter: global enforcement actions H2 2024, by violation

    • statista.com
    Updated Oct 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). X/Twitter: global enforcement actions H2 2024, by violation [Dataset]. https://www.statista.com/topics/737/twitter/
    Explore at:
    Dataset updated
    Oct 16, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    Between July and December 2024, there were over four million account suspensions on X (formerly Twitter) and over 10.1 million posts either removed or labeled. Overall, 927,892 million account suspensions occurred for reasons of abuse or harassment, and 1.49 million posts were labeled or removed for this reason.

  13. f

    Twitter cascade dataset

    • figshare.com
    • researchdata.smu.edu.sg
    • +1more
    pdf
    Updated Mar 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Living Analytics Research Centre (2021). Twitter cascade dataset [Dataset]. http://doi.org/10.25440/smu.12062709.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Mar 12, 2021
    Dataset provided by
    SMU Research Data Repository (RDR)
    Authors
    Living Analytics Research Centre
    License

    http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

    Description

    This dataset comprises a set of information cascades generated by Singapore Twitter users. Here a cascade is defined as a set of tweets about the same topic. This dataset was collected via the Twitter REST and streaming APIs in the following way. Starting from popular seed users (i.e., users having many followers), we crawled their follow, retweet, and user mention links. We then added those followers/followees, retweet sources, and mentioned users who state Singapore in their profile location. With this, we have a total of 184,794 Twitter user accounts. Then tweets are crawled from these users from 1 April to 31 August 2012. In all, we got 32,479,134 tweets. To identify cascades, we extracted all the URL links and hashtags from the above tweets. And these URL links and hashtags are considered as the identities of cascades. In other words, all the tweets which contain the same URL link (or the same hashtag) represent a cascade. Mathematically, a cascade is represented as a set of user-timestamp pairs. Figure 1 provides an example, i.e. cascade C = {< u1, t1 >, < u2, t2 >, < u1, t3 >, < u3, t4 >, < u4, t5 >}. For evaluation, the dataset was split into two parts: four months data for training and the last one month data for testing. Table 1summarizes the basic (count) statistics of the dataset. Each line in each file represents a cascade. The first term in each line is a hashtag or URL, the second term is a list of user-timestamp pairs. Due to privacy concerns, all user identities are anonymized.

  14. Two datasets of tweets.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Su Yeon Han; Ming-Hsiang Tsou; Keith C. Clarke (2023). Two datasets of tweets. [Dataset]. http://doi.org/10.1371/journal.pone.0132464.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Su Yeon Han; Ming-Hsiang Tsou; Keith C. Clarke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Every tweet in the first dataset includes at least one name of a large city in the U.S. or elsewhere. The second dataset does not include city names outside the U.S., but contains the names of small, mid-sized, and large cities in the U.S.Two datasets of tweets.

  15. Z

    Dataset used in the paper: "Scaling laws and dynamics of hashtags on...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    Updated Apr 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hongjia H. Chen; Tristram J. Alexander; Diego F. M.Oliveira; Eduardo G. Altmann (2020). Dataset used in the paper: "Scaling laws and dynamics of hashtags on Twitter" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3673743
    Explore at:
    Dataset updated
    Apr 27, 2020
    Dataset provided by
    The University of Auckland
    The University of Sydney
    US Army Research Laboratory & the Rensselaer Polytechnic Institute
    Authors
    Hongjia H. Chen; Tristram J. Alexander; Diego F. M.Oliveira; Eduardo G. Altmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was used in the manuscript "Scaling laws and dynamics of hashtags on Twitter"..

    The Twitter data was obtained from a sample of 10% of all public tweets, provided by the Twitter streaming application programming interface. We extracted the hashtags from each tweet and counted how many times they were used in different time intervals. Time intervals of three different lengths were used: days, hours, and minutes. The tweets were published between November 1st 2015 and November 30th 2016, but not all time intervals between these dates are available.

    The four files in this dataset correspond each to one folder (collected using tar). Each folder contains compressed .csv files (compressed using gzip). The content of the .csv files in each folder are:

    hashtags_frequency_day.tar Counts of hashtags in each day. The name of each file in the folder indicates the date (GMT). The entries in each file are the hashtag and the count in the interval.

    hashtags_frequency_hour.tar Counts of hashtags in each hour. The name of each file in the folder indicates the date (GMT). The entries in each file are the hashtag and the count in the interval.

    hashtags_frequency_minutes.tar Counts of hashtags in each minute. The name of each file in the folder indicates the date (GMT, only a fraction of all days is available). The entries in each file are the hashtag and the count in the interval.

    number_of_tweets.tar Counts of the number of tweets in each minute. The name of each file in the folder indicates the day. The entries in each file are the minute in the day (GMT) and count of tweets in our dataset.

  16. S

    Tweets-second VCC

    • splitgraph.com
    Updated Jul 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data-act-gov-au (2017). Tweets-second VCC [Dataset]. https://www.splitgraph.com/data-act-gov-au/tweetssecond-vcc-a53h-ua8z
    Explore at:
    json, application/openapi+json, application/vnd.splitgraph.imageAvailable download formats
    Dataset updated
    Jul 5, 2017
    Authors
    data-act-gov-au
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Twitter analysis of the second ACT Virtual Community Cabinet, held on 30 August 2011.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  17. Covid-19 Twitter Dataset

    • kaggle.com
    zip
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arunava Kr. Chakraborty (2023). Covid-19 Twitter Dataset [Dataset]. https://www.kaggle.com/arunavakrchakraborty/covid19-twitter-dataset
    Explore at:
    zip(51063255 bytes)Available download formats
    Dataset updated
    Mar 13, 2023
    Authors
    Arunava Kr. Chakraborty
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Data Collection

    I streamed live tweets from the twitter after WHO declared Covid-19 as a pandemic. Since this Covid-19 epidemic has affected the entire world, I collected worldwide Covid-19 related English tweets at a rate of almost 10k per day in three phases starting from April-June, 2020, August-October, 2020 and April-June, 2021. I prepared the first phase dataset of about 235k tweets collected from 19th April to 20th June 2020. After one month I again start collecting tweets from Twitter as at that time the pandemic was spreading with its fatal intensity. I collected almost 320k tweets in the period August 20 to October 20, 2020, for the second phase dataset. Finally, after six months collected almost 489k tweets in the period 26th April to 27th June 2021 for the third phase dataset.

    Content

    The datasets I developed contain important information about most of the tweets and their attributes. The main attributes of both of these datasets are: - Tweet ID - Creation Date & Time - Source Link - Original Tweet - Favorite Count - Retweet Count - Original Author - Hashtags - User Mentions - Place

    Finally, I collected 2,35,240, 3,20,316, and 4,89,269 tweets for first, second, and third phase datasets containing the hash-tagged keywords like - #covid-19, #coronavirus, #covid, #covaccine, #lockdown, #homequarantine, #quarantinecenter, #socialdistancing, #stayhome, #staysafe, etc. Here I represented an overview of the collected dataset.

    Data Pre-Processing

    I pre-processed these collected data by developing a user-defined pre-processing function based on NLTK (Natural Language Toolkit, a Python library for NLP). At the initial stage, it converts all the tweets into lowercase. Then it removes all extra white spaces, numbers, special characters, ASCII characters, URLs, punctuations & stopwords from the tweets. Then it converts all ‘covid’ words into ‘covid19’ as we already removed all numbers from the tweets. Using stemming the pre-processing function has reduced inflected words to their word stem.

    Sentiment Analysis

    I calculated the sentiment polarity of each cleaned and pre-processed tweet using the NLTK-based Sentiment Analyzer and get the sentiment scores for positive, negative, and neutral categories to calculate the compound sentiment score for each tweet. I classified the tweets on the basis of the compound sentiment scores into three different classes i.e., Positive, Negative, and Neutral. Then we assigned the sentiment polarity ratings for each tweet based on the following algorithm-

    Algorithm Sentiment Classification of Tweets (compound, sentiment): 1. for each tweet in the dataset: 2. if tweet[compound] < 0: 3. tweet[sentiment] = 0.0 # assigned 0.0 for Negative Tweets 4. elif tweet[compound] > 0: 5. tweet[sentiment] = 1.0 # assigned 1.0 for Positive Tweets 6. else: 7. tweet[sentiment] = 0.5 # assigned 0.5 for Neutral Tweets 8. end

    Acknowledgements

    I wouldn't be here without the help of my project guide Dr. Anup Kumar Kolya, Assistant Professor, Dept of Computer Science and Engineering, RCCIIT whose kind and valuable suggestions and excellent guidance enlightened to give me the best opportunity in preparing these datasets. If you owe any attributions or thanks, include him here along with any citations of past research.

    This datasets are the part of the publications entitled:

    • Chakraborty, A. K., Das, D., & Kolya, A. K. (2023). Sentiment Analysis on Large-Scale Covid-19 Tweets using Hybrid Convolutional LSTM Based on Naïve Bayes Sentiment Modeling. ECTI Transactions on Computer and Information Technology (ECTI-CIT), 17(3), 343–357. https://doi.org/10.37936/ecti-cit.2023173.252549
    • Chakraborty, A. K., & Das, S. (2023). A comparative study of a novel approach with baseline attributes leading to sentiment analysis of Covid-19 tweets. In Elsevier eBooks (pp. 179–208). https://doi.org/10.1016/b978-0-32-390535-0.00013-6
    • Chakraborty, A. K., Das, S., & Kolya, A. K. (2021). Sentiment analysis of COVID-19 tweets using Evolutionary Classification-Based LSTM model. In Advances in intelligent systems and computing (pp. 75–86). https://doi.org/10.1007/978-981-16-1543-6_7
  18. Z

    HaterNet a system for detecting and analyzing hate speech in Twitter

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lara Quijano-Sanchez; Juan Carlos Pereira Kohatsu; Federico Liberatore; Miguel Camacho-Collados (2020). HaterNet a system for detecting and analyzing hate speech in Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2592148
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    State Secretariat for Security Interior Ministry, Madrid, Spain
    Universidad Autonoma de Madrid
    Copmlutense university of Madrid
    Authors
    Lara Quijano-Sanchez; Juan Carlos Pereira Kohatsu; Federico Liberatore; Miguel Camacho-Collados
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of two corpuses used in the paper "Detecting and analyzing hate speech in Twitter: HaterNet a system in the Spanish prevention of hate crime office". A first one based on tweets collected at different random dates between February 2017 and December 2017 with a final size of 2 million tweets. A second one with 6,000 tweets labeled as described in the paper as hate containing or not.

  19. X/Twitter: distribution of global audiences 2025, by age group

    • statista.com
    Updated Nov 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). X/Twitter: distribution of global audiences 2025, by age group [Dataset]. https://www.statista.com/statistics/283119/age-distribution-of-global-twitter-users/
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, 37.5 percent of X’s (formerly Twitter) global audience was aged between 25 and 34 years. The second-largest age group demographic on the platform was represented by users aged between 18 and 24 years, with a share of 32.1 percent. Users aged less than 18 years accounted for two percent of users, while those aged 50 or older accounted for roughly 7.3 percent. X is a male-dominated platform As of January 2024, more than 60 percent of X users were male. Although all mainstream social media platforms tend to have a slightly more male-skewing audience, X stands out above Instagram, Snapchat, TikTok, and Facebook when it comes to user gender demographics. Overall, Pinterest is the only mainstream platform to have a higher share of female users. X Blue for you It is not uncommon for social media users to now have the chance to become subscribers of their chosen online networks for a monthly fee. X Blue is a subscription service from X that gives users special benefits and features. A blue verification mark, edit post functionality, fewer ads, priority ranking in chats, and longer video upload times are some of the perks offered.

  20. f

    100 Days of Tweet IDs and Most Frequent Terms in Tweets from_user_id_str...

    • city.figshare.com
    xls
    Updated Apr 29, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ernesto Priego (2017). 100 Days of Tweet IDs and Most Frequent Terms in Tweets from_user_id_str 25073877 [Dataset]. http://doi.org/10.6084/m9.figshare.4955231.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 29, 2017
    Dataset provided by
    City, University of London
    Authors
    Ernesto Priego
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an Excel workbook containing two sheets. The first sheet contains 503 rows corresponding to 503 Tweet id strings from_user_id_str 25073877 and the following corresponding metadata:created_at time user_lang in_reply_to_user_id_str f from_user_id_str in_reply_to_status_id_str source user_followers_count user_friends_countTweet texts, URLs and other metadata such as profile_image_url, status_url and entities_str have not been included.An attempt to remove duplicated entries was made but duplicates might have remained so further data refining might be required prior to analyses.The second sheet contains 400 rows corresponding to the most frequent terms in the dataset's Tweets' texts. The text analysis was performed with the Terms Tool from Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell (2017). An edited English stop words list was applied to remove Twitter data specific terms such as t.co, https, user names, etc. The analysed Tweets contained emojis and other special characters; due to character encoding these will be reflected in the terms list as character combinations. Motivations to Share this DataArchived Tweets can provide interesting insights for the study of contemporary history of media, politics, diplomacy, etc. The queried account is a public account widely agreed to be of exceptional national and international public interest. Though they provide public access to tweeted content in real time, Twitter Web and mobile clients are not suited for appropriate Tweet corpus analysis. For anyone researching social media, access to the data is absolutely essential in order to perform, review and reproduce studies. Archiving Tweets of public interest due to their historic significance is a means to both preserve and enable reproducible study of this form of rapid online communication that otherwise can very likely become unretrievable as time passes. Due to Twitter's current business model and API limits, to date collecting in real time is the only relatively reliable method to archive Tweets at a small scale.So far Twitter data analysis and visualisation has been done without researchers providing access to the source data that would allow reproducibility. It is appreciated that an Excel workbook is far from ideal as a file format, but due to the small scale the intention is to make this data human readable and available to researchers in a variety of non-technical fields. Methodology and LimitationsThe Tweets contained in this file were collected by Ernesto Priego using a Python script. The data collection search query was from:realdonaldtrump. A trigger was scheduled to collect atuomatically every hour, this means that any Tweets immediately deleted after publication have not been collected. The original data harvesting was refined to delete duplications, to subscribe to Twitter's Terms and Conditions and so that the data was sorted in chronological order.Duplication of data due to the automated collection is possible so further data refining might be required. The file may not contain data from Tweets deleted by the queried user account immediately after original publication. Both research and experience show that the Twitter search API is not 100% reliable. (Gonzalez-Bailon, Sandra, et al. 2012).Apart from the filters and limitations already declared, it cannot be guaranteed that this file contains each and every Tweet posted by the queried account during the indicated period. This file dataset is shared for archival, comparative and indicative educational research purposes only. The content included is from a public Twitter account and was obtained from the Twitter Search API. The shared data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.The original Tweets, their contents and associated metadata were published openly on the Web from the queried public account and are responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually. The license on this output applies to the data collection; third-party content should be attributed to the original authors and copyright owners. Please note that usernames, user profile pictures and full text of the Tweets collected have not been included in this file. No private personal information is shared in this dataset. As indicated above this dataset does not contain the text of the Tweets. The collection and sharing of this dataset is enabled and allowed by Twitter's Privacy Policy. The sharing of this dataset complies with Twitter's Developer Rules of the Road.This dataset is shared to archive, document and encourage open educational research into political activity on Twitter.Other ConsiderationsAll Twitter users agree to Twitter's Privacy and data sharing policies. Social media research remains in its infancy and though work has been done to develop best practices there is yet no agreement on a series of grey areas relating to reseach methodologies including ad hoc social media specific research ethics guidelines for reproducible research. It is understood that public figures Tweet publicly with the conscious intention to have their Tweets publicly accessed and discussed. It is assumed that a public figure Tweeting publicly is of public interest and that such figure, as a Twitter user, has given implicit consent, by agreeing explicitly to Twitter's Terms and Conditions, for their Tweets to be publicly accessed and discussed, including critical analysis, without the need for prior written permission. There is therefore no difference between collecting data and performing data analysis from a public printed or online publication and collecting data and performing data analysis of a dataset containing Twitter data from a public account from a public user in a public role. Though these datasets have limitations and are not thoroughly systematic, it is hoped they can contribute to developing new insights into the discipline's presence on Twitter over time. Reproducibility is considered here a key value for robust and trustworthy research. Different scholarly professional associations like the Modern Language Association recognise Tweets, datasets and other online and digital resources as citeable scholarly outputs.The data contained in the deposited file is otherwise available elsewhere through different methods.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista, X/Twitter: Countries with the largest audience 2025 [Dataset]. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
Organization logo

X/Twitter: Countries with the largest audience 2025

Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2025
Area covered
Worldwide
Description

As of October 2025, social network X (formerly known as Twitter) was most popular in the United States, with an audience reach of approximately 99.04 million users. Japan ranked second, recording more than 71 million users on the platform. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.

Search
Clear search
Close search
Google apps
Main menu