100+ datasets found

A Twitter Dataset of 70+ million tweets related to COVID-19
zenodo.org
csv, tsv, zip
Updated Apr 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Gerardo Chowell; Gerardo Chowell (2023). A Twitter Dataset of 70+ million tweets related to COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.3732460
Explore at:
csv, tsv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3732460
Dataset updated
Apr 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Gerardo Chowell; Gerardo Chowell
Description
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th to March 29th which yielded over 4 million tweets a day.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (70,569,368 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (13,535,912 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.

More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)

As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data. The need to be hydrated to be used.
Share of tweets per user per day from MENA by country 2016
statista.com
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of tweets per user per day from MENA by country 2016 [Dataset]. https://www.statista.com/statistics/729693/mena-tweets-per-user-per-day-by-country/
Explore at:
Dataset updated
Jan 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2016
Area covered
Asia, MENA
Description
This statistic describes the distribution of tweets per user per day from the Middle East and North Africa in March 2016, by country. The average twitter user in Kuwait sent out 4.2 tweets per day in March 2016.
A Twitter Dataset of 150+ million tweets related to COVID-19 for open...
zenodo.org
application/gzip, csv +1
Updated Apr 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding (2023). A Twitter Dataset of 150+ million tweets related to COVID-19 for open research [Dataset]. http://doi.org/10.5281/zenodo.3738018
Explore at:
application/gzip, csv, tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3738018
Dataset updated
Apr 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding
Description
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (152,920,832 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (30,990,645 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.

More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)

As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. The need to be hydrated to be used.
o
Data from: A large-scale COVID-19 Twitter chatter dataset for open...
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Aug 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan M. Banda; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Katya Artemova; Elena Tutubalina; Gerardo Chowell (2020). A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration [Dataset]. http://doi.org/10.5281/zenodo.3977558
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3977558
Dataset updated
Aug 9, 2020
Authors
Juan M. Banda; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Katya Artemova; Elena Tutubalina; Gerardo Chowell
Description
Version 22 of the dataset, we have refactored the full_dataset.tsv and full_dataset_clean.tsv files (since version 20) to include two additional columns: language and place country code (when available). This change now includes language and country code for ALL the tweets in the dataset, not only clean tweets. With this change we have removed the clean_place_country.tar.gz and clean_languages.tar.gz files. With our refactoring of the dataset generating code we also found a small bug that made some of the retweets not be counted properly, hence the extra increase on tweets available. Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files. From version 14 we have included the tweet identifiers and their respective language for the clean version of the dataset. Since version 20 we have included language and place location for all tweets. The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (602,921,788 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (142,360,288 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the full_dataset-statistics.tsv and full_dataset-clean-statistics.tsv files. For more statistics and some visualizations visit: http://www.panacealab.org/covid19/ More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter) and our pre-print about the dataset (https://arxiv.org/abs/2004.03688) As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. They need to be hydrated to be used. This dataset will be updated bi-weekly at least with additional tweets, look at the github repo for these updates. Release: We have standardized the name of the resource to match our pre-print manuscript and to not have to update it every week.
g
Just Another Day on Twitter: A Complete 24 Hours of Twitter Data
search.gesis.org
Updated Oct 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pfeffer, Jürgen (2022). Just Another Day on Twitter: A Complete 24 Hours of Twitter Data [Dataset]. https://search.gesis.org/research_data/SDN-10.7802-2516
Explore at:
Dataset updated
Oct 16, 2022
Dataset provided by
GESIS search
GESIS, Köln
Authors
Pfeffer, Jürgen
License
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
Description
At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change.
Average daily Twitter brand posts 2017-2019
statista.com
ai-chatbox.pro
Updated Apr 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Average daily Twitter brand posts 2017-2019 [Dataset]. https://www.statista.com/statistics/656736/daily-twitter-tweets-posts-average/
Explore at:
Dataset updated
Apr 28, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
This statistic presents the average number of daily brand posts on Twitter from 2017 to 2019. As of the last measured period, brands posted an average of 0.77 tweets to the social network every day. This represents an approximately ten percent decline from the previous year, continuing the trend of less brand engagement on Twitter.
Coronavirus (COVID-19) Tweets Dataset
search.datacite.org
ieee-dataport.org
+1more
Updated Dec 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabindra Lamsal (2020). Coronavirus (COVID-19) Tweets Dataset [Dataset]. http://doi.org/10.21227/dkv1-r475
Explore at:
Unique identifier
https://doi.org/10.21227/dkv1-r475
Dataset updated
Dec 23, 2020
Dataset provided by
Institute of Electrical and Electronics Engineershttp://www.ieee.ro/
DataCitehttps://www.datacite.org/
Authors
Rabindra Lamsal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes CSV files that contain IDs and sentiment scores of the tweets related to the COVID-19 pandemic. The tweets have been collected by an on-going project deployed at https://live.rlamsal.com.np. The model monitors the real-time Twitter feed for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. This dataset has been wholly re-designed on March 20, 2020, to comply with the content redistribution policy set by Twitter.The paper associated with this dataset is available here: Design and analysis of a large-scale COVID-19 tweets dataset-------------------------------------Related datasets:(a) Tweets Originating from India During COVID-19 Lockdowns(b) Coronavirus (COVID-19) Tweets Sentiment Trend (Global)-------------------------------------Below is the quick overview of this dataset.— Dataset name: COV19Tweets Dataset— Number of tweets : 857,809,018 tweets— Coverage : Global— Language : English (EN)— Dataset usage terms : By using this dataset, you agree to (i) use the content of this dataset and the data generated from the content of this dataset for non-commercial research only, (ii) remain in compliance with Twitter's Developer Policy and (iii) cite the following paper:Lamsal, R. Design and analysis of a large-scale COVID-19 tweets dataset. Applied Intelligence (2020). https://doi.org/10.1007/s10489-020-02029-z— Geo-tagged Version: Coronavirus (COVID-19) Geo-tagged Tweets Dataset (GeoCOV19Tweets Dataset)— Dataset updates : Everyday— Active keywords and hashtags (archive: keywords.tsv) : "corona", "#corona", "coronavirus", "#coronavirus", "covid", "#covid", "covid19", "#covid19", "covid-19", "#covid-19", "sarscov2", "#sarscov2", "sars cov2", "sars cov 2", "covid_19", "#covid_19", "#ncov", "ncov", "#ncov2019", "ncov2019", "2019-ncov", "#2019-ncov", "pandemic", "#pandemic" "#2019ncov", "2019ncov", "quarantine", "#quarantine", "flatten the curve", "flattening the curve", "#flatteningthecurve", "#flattenthecurve", "hand sanitizer", "#handsanitizer", "#lockdown", "lockdown", "social distancing", "#socialdistancing", "work from home", "#workfromhome", "working from home", "#workingfromhome", "ppe", "n95", "#ppe", "#n95", "#covidiots", "covidiots", "herd immunity", "#herdimmunity", "pneumonia", "#pneumonia", "chinese virus", "#chinesevirus", "wuhan virus", "#wuhanvirus", "kung flu", "#kungflu", "wearamask", "#wearamask", "wear a mask", "vaccine", "vaccines", "#vaccine", "#vaccines", "corona vaccine", "corona vaccines", "#coronavaccine", "#coronavaccines", "face shield", "#faceshield", "face shields", "#faceshields", "health worker", "#healthworker", "health workers", "#healthworkers", "#stayhomestaysafe", "#coronaupdate", "#frontlineheroes", "#coronawarriors", "#homeschool", "#homeschooling", "#hometasking", "#masks4all", "#wfh", "wash ur hands", "wash your hands", "#washurhands", "#washyourhands", "#stayathome", "#stayhome", "#selfisolating", "self isolating"Dataset Files (the local time mentioned below is GMT+5:45)corona_tweets_01.csv + corona_tweets_02.csv + corona_tweets_03.csv: 2,475,980 tweets (March 20, 2020 01:37 AM - March 21, 2020 09:25 AM)corona_tweets_04.csv: 1,233,340 tweets (March 21, 2020 09:27 AM - March 22, 2020 07:46 AM)corona_tweets_05.csv: 1,782,157 tweets (March 22, 2020 07:50 AM - March 23, 2020 09:08 AM)corona_tweets_06.csv: 1,771,295 tweets (March 23, 2020 09:11 AM - March 24, 2020 11:35 AM)corona_tweets_07.csv: 1,479,651 tweets (March 24, 2020 11:42 AM - March 25, 2020 11:43 AM)corona_tweets_08.csv: 1,272,592 tweets (March 25, 2020 11:47 AM - March 26, 2020 12:46 PM)corona_tweets_09.csv: 1,091,429 tweets (March 26, 2020 12:51 PM - March 27, 2020 11:53 AM)corona_tweets_10.csv: 1,172,013 tweets (March 27, 2020 11:56 AM - March 28, 2020 01:59 PM)corona_tweets_11.csv: 1,141,210 tweets (March 28, 2020 02:03 PM - March 29, 2020 04:01 PM)corona_tweets_12.csv: 793,417 tweets (March 30, 2020 02:01 PM - March 31, 2020 10:16 AM)corona_tweets_13.csv: 1,029,294 tweets (March 31, 2020 10:20 AM - April 01, 2020 10:59 AM)corona_tweets_14.csv: 920,076 tweets (April 01, 2020 11:02 AM - April 02, 2020 12:19 PM)corona_tweets_15.csv: 826,271 tweets (April 02, 2020 12:21 PM - April 03, 2020 02:38 PM)corona_tweets_16.csv: 612,512 tweets (April 03, 2020 02:40 PM - April 04, 2020 11:54 AM)corona_tweets_17.csv: 685,560 tweets (April 04, 2020 11:56 AM - April 05, 2020 12:54 PM)corona_tweets_18.csv: 717,301 tweets (April 05, 2020 12:56 PM - April 06, 2020 10:57 AM)corona_tweets_19.csv: 722,921 tweets (April 06, 2020 10:58 AM - April 07, 2020 12:28 PM)corona_tweets_20.csv: 554,012 tweets (April 07, 2020 12:29 PM - April 08, 2020 12:34 PM)corona_tweets_21.csv: 589,679 tweets (April 08, 2020 12:37 PM - April 09, 2020 12:18 PM)corona_tweets_22.csv: 517,718 tweets (April 09, 2020 12:20 PM - April 10, 2020 09:20 AM)corona_tweets_23.csv: 601,199 tweets (April 10, 2020 09:22 AM - April 11, 2020 10:22 AM)corona_tweets_24.csv: 497,655 tweets (April 11, 2020 10:24 AM - April 12, 2020 10:53 AM)corona_tweets_25.csv: 477,182 tweets (April 12, 2020 10:57 AM - April 13, 2020 11:43 AM)corona_tweets_26.csv: 288,277 tweets (April 13, 2020 11:46 AM - April 14, 2020 12:49 AM)corona_tweets_27.csv: 515,739 tweets (April 14, 2020 11:09 AM - April 15, 2020 12:38 PM)corona_tweets_28.csv: 427,088 tweets (April 15, 2020 12:40 PM - April 16, 2020 10:03 AM)corona_tweets_29.csv: 433,368 tweets (April 16, 2020 10:04 AM - April 17, 2020 10:38 AM)corona_tweets_30.csv: 392,847 tweets (April 17, 2020 10:40 AM - April 18, 2020 10:17 AM)> With the addition of some more coronavirus specific keywords, the number of tweets captured day has increased significantly, therefore, the CSV files hereafter will be zipped. Lets save some bandwidth.corona_tweets_31.csv: 2,671,818 tweets (April 18, 2020 10:19 AM - April 19, 2020 09:34 AM)corona_tweets_32.csv: 2,393,006 tweets (April 19, 2020 09:43 AM - April 20, 2020 10:45 AM)corona_tweets_33.csv: 2,227,579 tweets (April 20, 2020 10:56 AM - April 21, 2020 10:47 AM)corona_tweets_34.csv: 2,211,689 tweets (April 21, 2020 10:54 AM - April 22, 2020 10:33 AM)corona_tweets_35.csv: 2,265,189 tweets (April 22, 2020 10:45 AM - April 23, 2020 10:49 AM)corona_tweets_36.csv: 2,201,138 tweets (April 23, 2020 11:08 AM - April 24, 2020 10:39 AM)corona_tweets_37.csv: 2,338,713 tweets (April 24, 2020 10:51 AM - April 25, 2020 11:50 AM)corona_tweets_38.csv: 1,981,835 tweets (April 25, 2020 12:20 PM - April 26, 2020 09:13 AM)corona_tweets_39.csv: 2,348,827 tweets (April 26, 2020 09:16 AM - April 27, 2020 10:21 AM)corona_tweets_40.csv: 2,212,216 tweets (April 27, 2020 10:33 AM - April 28, 2020 10:09 AM)corona_tweets_41.csv: 2,118,853 tweets (April 28, 2020 10:20 AM - April 29, 2020 08:48 AM)corona_tweets_42.csv: 2,390,703 tweets (April 29, 2020 09:09 AM - April 30, 2020 10:33 AM)corona_tweets_43.csv: 2,184,439 tweets (April 30, 2020 10:53 AM - May 01, 2020 10:18 AM)corona_tweets_44.csv: 2,223,013 tweets (May 01, 2020 10:23 AM - May 02, 2020 09:54 AM)corona_tweets_45.csv: 2,216,553 tweets (May 02, 2020 10:18 AM - May 03, 2020 09:57 AM)corona_tweets_46.csv: 2,266,373 tweets (May 03, 2020 10:09 AM - May 04, 2020 10:17 AM)corona_tweets_47.csv: 2,227,489 tweets (May 04, 2020 10:32 AM - May 05, 2020 10:17 AM)corona_tweets_48.csv: 2,218,774 tweets (May 05, 2020 10:38 AM - May 06, 2020 10:26 AM)corona_tweets_49.csv: 2,164,251 tweets (May 06, 2020 10:35 AM - May 07, 2020 09:33 AM)corona_tweets_50.csv: 2,203,686 tweets (May 07, 2020 09:55 AM - May 08, 2020 09:35 AM)corona_tweets_51.csv: 2,250,019 tweets (May 08, 2020 09:39 AM - May 09, 2020 09:49 AM)corona_tweets_52.csv: 2,273,705 tweets (May 09, 2020 09:55 AM - May 10, 2020 10:11 AM)corona_tweets_53.csv: 2,208,264 tweets (May 10, 2020 10:23 AM - May 11, 2020 09:57 AM)corona_tweets_54.csv: 2,216,845 tweets (May 11, 2020 10:08 AM - May 12, 2020 09:52 AM)corona_tweets_55.csv: 2,264,472 tweets (May 12, 2020 09:59 AM - May 13, 2020 10:14 AM)corona_tweets_56.csv: 2,339,709 tweets (May 13, 2020 10:24 AM - May 14, 2020 11:21 AM)corona_tweets_57.csv: 2,096,878 tweets (May 14, 2020 11:38 AM - May 15, 2020 09:58 AM)corona_tweets_58.csv: 2,214,205 tweets (May 15, 2020 10:13 AM - May 16, 2020 09:43 AM)> The server and the databases have been optimized; therefore, there is a significant rise in the number of tweets captured per day.corona_tweets_59.csv: 3,389,090 tweets (May 16, 2020 09:58 AM - May 17, 2020 10:34 AM)corona_tweets_60.csv: 3,530,933 tweets (May 17, 2020 10:36 AM - May 18, 2020 10:07 AM)corona_tweets_61.csv: 3,899,631 tweets (May 18, 2020 10:08 AM - May 19, 2020 10:07 AM)corona_tweets_62.csv: 3,767,009 tweets (May 19, 2020 10:08 AM - May 20, 2020 10:06 AM)corona_tweets_63.csv: 3,790,455 tweets (May 20, 2020 10:06 AM - May 21, 2020 10:15 AM)corona_tweets_64.csv: 3,582,020 tweets (May 21, 2020 10:16 AM - May 22, 2020 10:13 AM)corona_tweets_65.csv: 3,461,470 tweets (May 22, 2020 10:14 AM - May 23, 2020 10:08 AM)corona_tweets_66.csv: 3,477,564 tweets (May 23, 2020 10:08 AM - May 24, 2020 10:02 AM)corona_tweets_67.csv: 3,656,446 tweets (May 24, 2020 10:02 AM - May 25, 2020 10:10 AM)corona_tweets_68.csv: 3,474,952 tweets (May 25, 2020 10:11 AM - May 26, 2020 10:22 AM)corona_tweets_69.csv: 3,422,960 tweets (May 26, 2020 10:22 AM - May 27, 2020 10:16 AM)corona_tweets_70.csv: 3,480,999 tweets (May 27, 2020 10:17 AM - May 28, 2020 10:35 AM)corona_tweets_71.csv: 3,446,008 tweets (May 28, 2020 10:36 AM - May 29, 2020 10:07 AM)corona_tweets_72.csv: 3,492,841 tweets (May 29, 2020 10:07 AM - May 30, 2020 10:14 AM)corona_tweets_73.csv: 3,098,817 tweets (May 30, 2020 10:15 AM - May 31, 2020 10:13 AM)corona_tweets_74.csv: 3,234,848 tweets (May 31, 2020 10:13 AM - June 01, 2020 10:14 AM)corona_tweets_75.csv: 3,206,132 tweets (June 01, 2020 10:15 AM - June 02, 2020 10:07 AM)corona_tweets_76.csv: 3,206,417 tweets (June 02, 2020 10:08 AM - June 03, 2020 10:26 AM)corona_tweets_77.csv: 3,256,225 tweets (June 03, 2020
Frequency of U.S. Twitter user platform visits per day 2021, by volume of...
statista.com
Updated Apr 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Frequency of U.S. Twitter user platform visits per day 2021, by volume of activity [Dataset]. https://www.statista.com/statistics/1279590/united-states-twitter-user-daily-frequency-by-volume-of-activity/
Explore at:
Dataset updated
Apr 4, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 17, 2021 - May 31, 2021
Area covered
United States
Description
In a May 2021 survey, adult U.S. Twitter users were asked how many times per day they visited the social media service. Of high volume tweeters, those who produced an average of 20 or more tweets per month, 28 percent reported that they visited Twitter occasionally and 21 percent said that they visited the site too many times to count. Low volume tweeters visited the networking service significantly less with 8 percent stating that they visited Twitter just once or twice per day.
g
Geotagged Twitter posts from the United States: A tweet collection to...
search.gesis.org
Updated Mar 4, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pfeffer, Jürgen; Morstatter, Fred (2021). Geotagged Twitter posts from the United States: A tweet collection to investigate representativeness [Dataset]. https://search.gesis.org/research_data/SDN-10.7802-1166
Explore at:
Dataset updated
Mar 4, 2021
Dataset provided by
GESIS search
GESIS, Köln
Authors
Pfeffer, Jürgen; Morstatter, Fred
License
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
Area covered
United States
Description
This dataset consists of IDs of geotagged Twitter posts from within the United States. They are provided as files per day and state as well as per day and county. In addition, files containing the aggregated number of hashtags from these tweets are provided per day and state and per day and county. This data is organized as a ZIP-file per month containing several zip-files per day which hold the txt-files with the ID/hash information.

Also part of the dataset are two shapefiles for the US counties and states and Python scripts for the data collection and sorting geotags into counties.
Data from: Google Analytics & Twitter dataset from a movies, TV series and...
figshare.com
portalcientificovalencia.univeuropea.com
txt
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Víctor Yeste (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. http://doi.org/10.6084/m9.figshare.16553061.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.16553061.v4
Dataset updated
Feb 7, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Víctor Yeste
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio
o
Global Covid-19 Tweets with Sentiment Analysis
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Global Covid-19 Tweets with Sentiment Analysis [Dataset]. https://www.opendatabay.com/data/healthcare/f445ec28-4fdd-4832-8d8e-da282f16c84b
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
Area covered
Data Science and Analytics
Description
This dataset captures Twitter activity related to Covid-19, focusing on the initial phase of the pandemic from April to June 2020 [1, 2]. It comprises 235,240 worldwide tweets in English, streamed live at a rate of approximately 10,000 tweets per day after the World Health Organisation declared Covid-19 a pandemic [1, 2]. The tweets were collected using relevant hashtags such as #covid-19, #coronavirus, #covid, #covaccine, #lockdown, #homequarantine, #quarantinecenter, #socialdistancing, #stayhome, and #staysafe [1, 2].

The data has undergone pre-processing, which involved converting all tweets to lowercase, removing extra white spaces, numbers, special characters, ASCII characters, URLs, punctuations, and stopwords [2]. Additionally, all instances of 'covid' were converted to 'covid19', and stemming was applied to reduce inflected words to their root forms [2]. Sentiment analysis has been performed on each cleaned tweet using an NLTK-based Sentiment Analyser, providing sentiment scores for positive, negative, and neutral categories, and a compound sentiment score [2]. Tweets are classified as Positive, Negative, or Neutral based on these scores [2].

Columns

id: Unique identifier for the tweet [1].

Tweet ID: Unique identifier for the tweet [2]. (Note: Appears to be the same as 'id')

created_at: The date and time when the tweet was created [1].

Creation Date & Time: The date and time when the tweet was created [2]. (Note: Appears to be the same as 'created_at')

source: The source link from which the tweet was posted [1].

Source Link: The source link from which the tweet was posted [2]. (Note: Appears to be the same as 'source')

original_text: The full text of the original tweet [1].

Original Tweet: The full text of the original tweet [2]. (Note: Appears to be the same as 'original_text')

lang: The language of the tweet [1].

favorite_count: The number of times the tweet was favourited [1].

Favorite Count: The number of times the tweet was favourited [2]. (Note: Appears to be the same as 'favorite_count')

retweet_count: The number of times the tweet was retweeted [1].

Retweet Count: The number of times the tweet was retweeted [2]. (Note: Appears to be the same as 'retweet_count')

original_author: The original author of the tweet [3].

Original Author: The original author of the tweet [2]. (Note: Appears to be the same as 'original_author')

hashtags: Hashtags included in the tweet [3].

Hashtags: Hashtags included in the tweet [2]. (Note: Appears to be the same as 'hashtags')

user_mentions: User mentions within the tweet [3].

User Mentions: User mentions within the tweet [2]. (Note: Appears to be the same as 'user_mentions')

Place: Location associated with the tweet [2].

Distribution

The dataset consists of 235,240 tweets from the first phase of collection [1, 2]. Data files are typically provided in CSV format [4]. The tweets were collected from 19th April to 20th June 2020 [1].

Usage

This dataset is ideal for various data science and analytics applications, including Natural Language Processing (NLP), Deep Learning, Text Classification, and Ensembling [2]. Its pre-processed nature and included sentiment scores make it particularly useful for sentiment analysis research related to public opinion during the Covid-19 pandemic [2].

Coverage

The dataset covers a time range from 19th April to 20th June 2020 [1]. It includes worldwide tweets [2] and is limited to English language content [2]. Tweet sources are primarily Twitter for Android (31%) and Twitter for iPhone (28%), with 41% originating from other sources [5].

License

CC-BY-SA

Who Can Use It

Data Scientists and Analysts: For conducting social media analysis, trend identification, and public sentiment tracking during the pandemic [2].

Researchers in NLP and Machine Learning: To train and evaluate text classification models, conduct deep learning experiments, and explore ensembling techniques [2].

Public Health Researchers: To understand public response, concerns, and sentiment towards Covid-19, lockdowns, and vaccines [2].

Academics and Students: For academic projects, dissertations, and learning about real-world social media data analysis and sentiment classification [2].

Dataset Name Suggestions

COVID-19 Twitter Sentiment (Apr-Jun 2020)

Pandemic Twitter Activity Dataset (Phase 1)

Global Covid-19 Tweets with Sentiment Analysis

Social Media Response to Covid-19: April-June 2020

Twitter Covid-19 Discourse (Early Pandemic)

Attributes

Original Data Source: Covid-19 Twitter Dataset
#ChatGPT 1000 Daily 🐦 Tweets
kaggle.com
Updated May 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Enric Domingo (2023). #ChatGPT 1000 Daily 🐦 Tweets [Dataset]. http://doi.org/10.34740/kaggle/dsv/5685262
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5685262
Dataset updated
May 14, 2023
Dataset provided by
Kaggle
Authors
Enric Domingo
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
UPDATE: Due to new Twitter API conditions changed by Elon Musk, now it's no longer free to use the Twitter (X) API and the pricing is 100 $/month in the hobby plan. So my automated ETL notebook stopped from updating new tweets to this dataset on May 13th 2023.

This dataset is was updated everyday with the addition of 1000 tweets/day containing any of the words "ChatGPT", "GPT3", or "GPT4", starting from the 3rd of April 2023. Everyday's tweets are uploaded 24-72h later, so the counter on tweets' likes, retweets, messages and impressions gets enough time to be relevant. Tweets are from any language selected randomly from all hours of the day. There are some basic filters applied trying to discard sensitive tweets and spam.

This dataset can be used for many different applications regarding to Data Analysis and Visualization but also NLP Sentiment Analysis techniques and more.

Consider upvoting this Dataset and the ETL scheduled Notebook providing new data everyday into it if you found them interesting, thanks! 🤗

Columns Description:

tweet_id: Integer. unique identifier for each tweet. Older tweets have smaller IDs.

tweet_created: Timestamp. Time of the tweet's creation.

tweet_extracted: Timestamp. The UTC time when the ETL pipeline pulled the tweet and its metadata (likes count, retweets count, etc).

text: String. The raw payload text from the tweet.

lang: String. Short name for the Tweet text's language.

user_id: Integer. Twitter's unique user id.

user_name: String. The author's public name on Twitter.

user_username: String. The author's Twitter account username (@example)

user_location: String. The author's public location.

user_description: String. The author's public profile's bio.

user_created: Timestamp. Timestamp of user's Twitter account creation.

user_followers_count: Integer. The number of followers of the author's account at the moment of the tweet extraction

user_following_count: Integer. The number of followed accounts from the author's account at the moment of the Tweet extraction

user_tweet_count: Integer. The number of Tweets that the author has published at the moment of the Tweet extraction.

user_verified: Boolean. True if the user is verified (blue mark).

source: The device/app used to publish the tweet (Apparently not working, all values are Nan so far).

retweet_count: Integer. Number of retweets to the Tweet at the moment of the Tweet extraction.

like_count: Integer. Number of Likes to the Tweet at the moment of the Tweet extraction.

reply_count: Integer. Number of reply messages to the Tweet.

impression_count: Integer. Number of times the Tweet has been seen at the moment of the Tweet extraction.

More info: Tweets API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet Users API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user
Tweet Sentiment's Impact on Stock Returns
kaggle.com
Updated Jan 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Tweet Sentiment's Impact on Stock Returns [Dataset]. https://www.kaggle.com/datasets/thedevastator/tweet-sentiment-s-impact-on-stock-returns
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Tweet Sentiment's Impact on Stock Returns

862,231 Labeled Instances

By [source]

About this dataset

This dataset contains 862,231 labeled tweets and associated stock returns, providing a comprehensive look into the impact of social media on company-level stock market performance. For each tweet, researchers have extracted data such as the date of the tweet and its associated stock symbol, along with metrics such as last price and various returns (1-day return, 2-day return, 3-day return, 7-day return). Also recorded are volatility scores for both 10 day intervals and 30 day intervals. Finally, sentiment scores from both Long Short - Term Memory (LSTM) and TextBlob models have been included to quantify the overall tone in which these messages were delivered. With this dataset you will be able to explore how tweets can affect a company's share prices both short term and long term by leveraging all of these data points for analysis!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

In order to use this dataset, users can utilize descriptive statistics such as histograms or regression techniques to establish relationships between tweet content & sentiment with corresponding stock return data points such as 1-day & 7-day returns measurements.

The primary fields used for analysis include Tweet Text (TWEET), Stock symbol (STOCK), Date (DATE), Closing Price at the time of Tweet (LAST_PRICE) a range of Volatility measures 10 day Volatility(VOLATILITY_10D)and 30 day Volatility(VOLATILITY_30D ) for each Stock which capture changes in market fluctuation during different periods around when Twitter reactions occur. Additionally Sentiment Polarity analysis undertaken via two Machine learning algorithms LSTM Polarity(LSTM_POLARITY)and Textblob polarity provide insight into whether people are expressing positive or negative sentiments about each company at given times which again could influence thereby potentially influence Stock Prices over shorter term periods like 1-Day Returns(1_DAY_RETURN),2-Day Returns(2_DAY_RETURN)or longer term horizon like 7 Day Returns*7DAY RETURNS*.Finally MENTION field indicates if names/acronyms associated with Companies were specifically mentioned in each Tweet or not which gives extra insight into whether company specific contexts were present within individual Tweets aka “Company Relevancy”

Research Ideas

Analyzing the degree to which tweets can influence stock prices. By analyzing relationships between variables such as tweet sentiment and stock returns, correlations can be identified that could be used to inform investment decisions.

Exploring natural language processing (NLP) models for predicting future market trends based on textual data such as tweets. Through testing and evaluating different text-based models using this dataset, better predictive models may emerge that can give investors advance warning of upcoming market shifts due to news or other events.

Investigating the impact of different types of tweets (positive/negative, factual/opinionated) on stock prices over specific time frames. By studying correlations between the sentiment or nature of a tweet and its effect on stocks, insights may be gained into what sort of news or events have a greater impact on markets in general

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: reduced_dataset-release.csv | Column name | Description | |:----------------------|:-------------------------------------------------------------------------------------------------------| | TWEET | Text of the tweet. (String) | | STOCK | Company's stock mentioned in the tweet. (String) | | DATE | Date the tweet was posted. (Date) | | LAST_PRICE | Company's last price at the time of tweeting. (Float) ...
O
COVID-19 Twitter Chatter Dataset
opendatalab.com
zip
Updated Jun 21, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Missouri (2020). COVID-19 Twitter Chatter Dataset [Dataset]. https://opendatalab.com/OpenDataLab/COVID-19_Twitter_Chatter_Dataset
Explore at:
zip(15836618735 bytes)Available download formats
Dataset updated
Jun 21, 2020
Dataset provided by
National Research University Higher School of Economics
Universität Duisburg-Essen
University of Missouri
Universitat Autònoma de Barcelona
Kazan Federal University
Georgia State University
Carl von Ossietzky Universität Oldenburg
License
https://zenodo.org/record/3902855https://zenodo.org/record/3902855
Description
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files.
X/Twitter: Countries with the largest audience 2025
statista.com
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). X/Twitter: Countries with the largest audience 2025 [Dataset]. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description
Social network X/Twitter is particularly popular in the United States, and as of February 2025, the microblogging service had an audience reach of 103.9 million users in the country. Japan and the India were ranked second and third with more than 70 million and 25 million users respectively. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.
A Twitter Dataset of 100+ million tweets related to COVID-19
zenodo.org
application/gzip, csv +1
Updated Apr 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding (2023). A Twitter Dataset of 100+ million tweets related to COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.3735274
Explore at:
application/gzip, tsv, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3735274
Dataset updated
Apr 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding
Description
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th to March 30th which yielded over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to February 27th, to provide extra longitudinal coverage.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (101,400,452 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (20,244,746 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.

More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)

As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data. The need to be hydrated to be used.
Economy-Related Tweets
figshare.com
xlsx
Updated Feb 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sasha Cove (2019). Economy-Related Tweets [Dataset]. http://doi.org/10.6084/m9.figshare.7679363.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7679363.v1
Dataset updated
Feb 6, 2019
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Sasha Cove
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data file consists of the all tweets made by @realDonald Trump, those determined to be economy-related, their assigned sentiment (positive, negative, neutral).The data file also contains the data in a day-to-day format, with the total tweets made per day, the total economy-related tweets per day, the % of tweets about the economy for each day, and the percent change in the S&P 500 and VIX for each day.
X/Twitter: number of worldwide users 2019-2024
statista.com
Updated Dec 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). X/Twitter: number of worldwide users 2019-2024 [Dataset]. https://www.statista.com/statistics/303681/twitter-users-worldwide/
Explore at:
Dataset updated
Dec 13, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2022
Area covered
Worldwide
Description
As of December 2022, X/Twitter's audience accounted for over *** million monthly active users worldwide. This figure was projected to ******** to approximately *** million by 2024, a ******* of around **** percent compared to 2022.
s
Twitter Users Broken Down By Age
searchlogistics.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Twitter Users Broken Down By Age [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
Dataset updated
Apr 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the breakdown of Twitter users by age group.
o
ChatGPT Social Media Insights Dataset
opendatabay.com
.undefined
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). ChatGPT Social Media Insights Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/2cf951da-3ce1-4606-a8d6-3f865c4d8a3b
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Social Media and Networking
Description
This dataset captures a daily collection of tweets containing keywords such as "ChatGPT", "GPT3", or "GPT4". It was designed to provide a rich source of social media data for analysis, particularly for applications concerning Natural Language Processing (NLP) and sentiment analysis. The collection process began on 3rd April 2023, with approximately 1,000 tweets added daily. Tweets were extracted 24-72 hours after creation to allow for relevant engagement metrics like likes and retweets to accumulate. However, updates to this dataset ceased on 13th May 2023, due to changes in Twitter (X) API conditions, which introduced a cost for its use. The dataset includes tweets from various languages, selected randomly throughout the day, with basic filters applied to discard sensitive content and spam.

Columns

tweet_id: An integer serving as a unique identifier for each tweet. Older tweets typically have smaller IDs.

tweet_created: A timestamp indicating the exact time the tweet was published.

tweet_extracted: A UTC timestamp recording when the ETL (Extract, Transform, Load) pipeline pulled the tweet and its associated metadata (e.g., likes count, retweets count).

text: A string containing the raw text content of the tweet payload.

lang: A string providing the short name for the language of the tweet's text.

user_id: An integer representing the author's unique user ID on Twitter.

user_name: A string displaying the author's public name on Twitter.

user_username: A string showing the author's Twitter account username (e.g., @example).

user_location: A string detailing the author's publicly stated location.

user_description: A string containing the author's public profile biography.

user_created: A timestamp indicating when the user's Twitter account was created.

user_followers_count: An integer showing the number of followers the author's account had at the moment the tweet was extracted.

user_following_count: An integer indicating the number of accounts the author was following at the moment of tweet extraction.

user_tweet_count: An integer representing the total number of tweets the author had published at the time of tweet extraction.

user_verified: A boolean value (True/False) indicating if the user is verified (i.e., has a blue tick).

source: This column was intended to show the device or application used to publish the tweet but currently contains only 'Nan' (Not a Number) values.

retweet_count: An integer displaying the number of times the tweet had been retweeted at the moment of extraction.

like_count: An integer showing the number of likes the tweet had received at the moment of extraction.

reply_count: An integer indicating the number of reply messages to the tweet.

impression_count: An integer representing the number of times the tweet had been seen at the moment of extraction.

Distribution

The dataset is provided in a CSV file format, generated from a Pandas DataFrame, with each row containing the tweet's text and its metadata, along with the author's information. The collection started on 3rd April 2023, adding approximately 1,000 tweets per day, and stopped updating on 13th May 2023. While specific total row counts are not available, various segments show substantial data, such as 43,000 tweets collected between 22nd September 2022 and 12th May 2023. Daily additions of 1,000 to 7,000 tweets are noted for the period of 8th April 2023 to 14th May 2023. The dataset includes unique values for over 25,000 tweet IDs, over 37,000 unique user IDs, and over 38,000 unique user locations.

Usage

This dataset is ideal for various data analysis and visualisation applications. It is particularly well-suited for Natural Language Processing (NLP) techniques, including sentiment analysis, to understand public opinion and trends related to ChatGPT, GPT3, and GPT4. Researchers can use it for social media listening, trend tracking, and studying the evolution of discussions around large language models.

Coverage

The dataset primarily covers tweets from 3rd April 2023 to 13th May 2023, with some older tweets included, particularly from September 2022. Tweets are from any language, randomly selected globally. English (en) tweets constitute approximately 48% of the dataset, Japanese (ja) tweets make up about 23%, and other languages account for 30%. User locations vary widely, with a significant portion (41%) being null, 1% from Japan, and the remaining 59% from various other global locations.

License

CC0

Who Can Use It

Data Analysts: For exploring social media trends and user engagement related to AI.

Researchers: Studying the public reception, discussion patterns, and sentiment around large language models.

Machine Learning Engineers: Developing and testing NLP models for s

Facebook

Twitter

Click to copy link

Link copied

Cite

Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Gerardo Chowell; Gerardo Chowell (2023). A Twitter Dataset of 70+ million tweets related to COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.3732460

A Twitter Dataset of 70+ million tweets related to COVID-19

Explore at:

csv, tsv, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3732460

Dataset updated

Apr 17, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Gerardo Chowell; Gerardo Chowell

Description

Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th to March 29th which yielded over 4 million tweets a day.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (70,569,368 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (13,535,912 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.

More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)

As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data. The need to be hydrated to be used.

Clear search

Close search

Google apps

Main menu

A Twitter Dataset of 70+ million tweets related to COVID-19

Share of tweets per user per day from MENA by country 2016

A Twitter Dataset of 150+ million tweets related to COVID-19 for open...

Data from: A large-scale COVID-19 Twitter chatter dataset for open...

Just Another Day on Twitter: A Complete 24 Hours of Twitter Data

Average daily Twitter brand posts 2017-2019

Coronavirus (COVID-19) Tweets Dataset

Frequency of U.S. Twitter user platform visits per day 2021, by volume of...

Geotagged Twitter posts from the United States: A tweet collection to...

Data from: Google Analytics & Twitter dataset from a movies, TV series and...

Global Covid-19 Tweets with Sentiment Analysis

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

#ChatGPT 1000 Daily 🐦 Tweets

Columns Description:

Tweet Sentiment's Impact on Stock Returns

Tweet Sentiment's Impact on Stock Returns

862,231 Labeled Instances

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

COVID-19 Twitter Chatter Dataset

X/Twitter: Countries with the largest audience 2025

A Twitter Dataset of 100+ million tweets related to COVID-19

Economy-Related Tweets

X/Twitter: number of worldwide users 2019-2024

Twitter Users Broken Down By Age

ChatGPT Social Media Insights Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

A Twitter Dataset of 70+ million tweets related to COVID-19