100+ datasets found

Twitter Dataset
brightdata.com
.json, .csv, .xlsx
Updated Jan 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Twitter Dataset [Dataset]. https://brightdata.com/products/datasets/twitter
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jan 8, 2023
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our Twitter dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset provides a comprehensive understanding of social media trends, empowering organizations to refine their communication and marketing strategies. Access the entire dataset or customize a subset to fit your needs. Popular use cases include market research to identify trending topics and hashtags, AI training by reviewing factors such as tweet content, retweets, and user interactions for predictive analytics, and trend forecasting by examining correlations between specific themes and user engagement to uncover emerging social media preferences.
twitter-dataset-tesla
huggingface.co
Updated Jul 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fastai X Hugging Face Group 2022 (2022). twitter-dataset-tesla [Dataset]. https://huggingface.co/datasets/hugginglearners/twitter-dataset-tesla
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 11, 2022
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
fastai X Hugging Face Group 2022
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for Twitter Dataset: Tesla

Dataset Summary

This dataset contains all the Tweets regarding #Tesla or #tesla till 12/07/2022 (dd-mm-yyyy). It can be used for sentiment analysis research purpose or used in other NLP tasks or just for fun. It contains 10,000 recent Tweets with the user ID, the hashtags used in the Tweets, and other important features.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More… See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/twitter-dataset-tesla.
B
COVID-19 Twitter Dataset
borealisdata.ca
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anatoliy Gruzd; Philip Mai (2020). COVID-19 Twitter Dataset [Dataset]. http://doi.org/10.5683/SP2/PXF2CU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/PXF2CU
Dataset updated
Nov 10, 2020
Dataset provided by
Borealis
Authors
Anatoliy Gruzd; Philip Mai
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The current dataset contains 237M Tweet IDs for Twitter posts that mentioned "COVID" as a keyword or as part of a hashtag (e.g., COVID-19, COVID19) between March and July of 2020. Sampling Method: hourly requests sent to Twitter Search API using Social Feed Manager, an open source software that harvests social media data and related content from Twitter and other platforms. NOTE: 1) In accordance with Twitter API Terms, only Tweet IDs are provided as part of this dataset. 2) To recollect tweets based on the list of Tweet IDs contained in these datasets, you will need to use tweet 'rehydration' programs like Hydrator (https://github.com/DocNow/hydrator) or Python library Twarc (https://github.com/DocNow/twarc). 3) This dataset, like most datasets collected via the Twitter Search API, is a sample of the available tweets on this topic and is not meant to be comprehensive. Some COVID-related tweets might not be included in the dataset either because the tweets were collected using a standardized but intermittent (hourly) sampling protocol or because tweets used hashtags/keywords other than COVID (e.g., Coronavirus or #nCoV). 4) To broaden this sample, consider comparing/merging this dataset with other COVID-19 related public datasets such as: https://github.com/thepanacealab/covid19_twitter https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset https://github.com/echen102/COVID-19-TweetIDs
H
#metoo Digital Media Collection - Twitter Dataset
dataverse.harvard.edu
Updated Mar 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zachary Maiorana; Pablo Morales Henry; Jennifer Weintraub (2023). #metoo Digital Media Collection - Twitter Dataset [Dataset]. http://doi.org/10.7910/DVN/2SRSKJ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/2SRSKJ
Dataset updated
Mar 29, 2023
Dataset provided by
Harvard Dataverse
Authors
Zachary Maiorana; Pablo Morales Henry; Jennifer Weintraub
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.7910/DVN/2SRSKJhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.7910/DVN/2SRSKJ
Description
This dataset represents version 2 of this dataset. The previous version was published on June 30, 2020.This dataset contains the tweet ids of 39,373,774 tweets, which are part of the Schlesinger Library #metoo Digital Media Collection. This second version of the dataset represents the full set of tweets collected throughout the project, tweets range from October 15, 2017 to December 31, 2022. The previous version of this dataset extended to March 31, 2020. Tweets between October 15, 2017 and December 10, 2018 were licensed from Twitter's Historical PowerTrack and received through GNIP. Tweets after December 10, 2018 were collected weekly from the Twitter API through Social Feed Manager using the POST statuses/filter method of the Twitter Stream API.The following list of 76 terms includes the hashtags used to collect data for this dataset : #metoo, #timesup, #metoostem, #sciencetoo, #metoophd, #shittymediamen, #churchtoo, #ustoo, #metooMVMT, #ARmetoo, #TimesUpAR, #metooSociology, #metooSexScience, #timesupAcademia, #metooMedicine, #MyCampusToo, #howiwillchange, #iwill, #believewomen, #GoTeal, #BelieveChristine, #IStandWithDrFord, #IStandWithChristineBlaseyFord, #believesurvivors, #whyididntreport, #himtoo, #istandwithbrett, #confirmkavanaguhnow, #metooMcdonalds, #metoomovement, #muteRKelly, #WeBelieveDrFord, #WeBelieveSurvivors, #HandsOffPantsOn, #MeAt14, #HeToo, #MeTooLiars, #metoolynchings, #metoohucksters, #metoohustle, #ItWasMe, #Ihave, #TimesUpTech, #GoogleWalkout, #mosquemetoo, #faithandmetoo, #SilenceIsNotSpiritual, #HealMeToo, #TimesUpHarvard, #NoCarveOut, #TimesUpx2, #MeetingsToo, #metoonatsec, #healmetoo, #GamAni, #ShulToo, #harvardhearsyou, #metooarcheology, #TimesUpPayUp, #metooarcheology, #metooHBCU, #TimesUpHC, #aidtoo, #garmentmetoo, #mutemetoo, #mutetimesup, #metoopolisci, #copstoo, #TimesUpBiden, #MeTooNoMatterWho, #IBelieveTara, #BelieveAllWomen, #metoomilitary, #harvard38, #comaroff, and #harvardletter.The final four hashtags in this list were first crawled on February 10, 2022.Because of the size of the files, the list of identifiers are split in 41 files containing up to 1,000,000 ids each.Per Twitter's Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. Therefore, this dataset only contains tweet ids. In order to retrieve tweets still available (not deleted by users) tools like Hydrator are available.Subsets of only the #metoo seed are also available by quarterly datasets.
CMU-MisCov19: A Novel Twitter Dataset for Characterizing COVID-19...
zenodo.org
data.niaid.nih.gov
pdf, zip
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahan Ali Memon; Shahan Ali Memon; Kathleen M. Carley; Kathleen M. Carley (2024). CMU-MisCov19: A Novel Twitter Dataset for Characterizing COVID-19 Misinformation [Dataset]. http://doi.org/10.5281/zenodo.4024154
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4024154
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shahan Ali Memon; Shahan Ali Memon; Kathleen M. Carley; Kathleen M. Carley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
From conspiracy theories to fake cures and fake treatments, COVID-19 has become a hot-bed for the spread of misinformation online. It is more important than ever to identify methods to debunk and correct false information online. Detection and characterization of misinformation requires an availability of annotated datasets. Most of the published COVID-19 Twitter datasets are generic, lack annotations or labels, employ automated annotations using transfer learning or semi-supervised methods, or are not specifically designed for misinformation. Annotated datasets are either only focused on "fake news", are small in size, or have less diversity in terms of classes.

Here, we present a novel Twitter misinformation dataset called "CMU-MisCov19" with 4573 annotated tweets over 17 themes around the COVID-19 discourse. We also present our annotation codebook for the different COVID-19 themes on Twitter, along with their descriptions and examples, for the community to use for collecting further annotations. Further details related to the dataset, and our analysis based on this dataset can be found at https://arxiv.org/abs/2008.00791. In adherence to the Twitter’s terms and conditions, we do not provide the full tweet JSONs but provide a ".csv" file with the tweet IDs so that the tweets can be rehydrated. We also provide the annotations, and the date of creation for each tweet for the reproduction of the results of our analyses.

Note: If for any reason, you are not able to rehydrate all the tweets, reach out to Shahan Ali Memon at (shahan@nyu.edu).

If you use this data, please cite our paper as follows:

"Shahan Ali Memon and Kathleen M. Carley. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset, In Proceedings of The 5th International Workshop on Mining Actionable Insights from Social Networks (MAISoN 2020), co-located with CIKM, virtual event due to COVID-19, 2020."
h
large-twitter-tweets-sentiment
huggingface.co
Updated Mar 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gong Xiangbo (2024). large-twitter-tweets-sentiment [Dataset]. https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2024
Authors
Gong Xiangbo
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for "Large twitter tweets sentiment analysis"

Dataset Description Dataset Summary

This dataset is a collection of tweets formatted in a tabular data structure, annotated for sentiment analysis. Each tweet is associated with a sentiment label, with 1 indicating a Positive sentiment and 0 for a Negative sentiment.

Languages

The tweets in English.

Dataset Structure Data Instances

An instance of… See the full description on the dataset page: https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment.
H
#RoeOverturned: Twitter Dataset on the Abortion Rights Controversy
dataverse.harvard.edu
search.dataone.org
Updated Feb 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashwin Rao; Rong-Ching Chang; Qiankun Zhong; Magdalena Wojcieszak; Kristina Lerman (2023). #RoeOverturned: Twitter Dataset on the Abortion Rights Controversy [Dataset]. http://doi.org/10.7910/DVN/STU0J5
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/STU0J5
Dataset updated
Feb 6, 2023
Dataset provided by
Harvard Dataverse
Authors
Ashwin Rao; Rong-Ching Chang; Qiankun Zhong; Magdalena Wojcieszak; Kristina Lerman
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
On June 24, 2022, the United States Supreme Court overturned landmark rulings made in its 1973 verdict in Roe v. Wade. The justices by way of a majority vote in Dobbs v. Jackson Women's Health Organization, decided that abortion wasn't a constitutional right and returned the issue of abortion to the elected representatives. This decision triggered multiple protests and debates across the US, especially in the context of the midterm elections in November 2022. Given that many citizens use social media platforms to express their views and mobilize for collective action, and given that online debate provides tangible effects on public opinion, political participation, news media coverage, and the political decision-making, it is crucial to understand online discussions surrounding this topic. Toward this end, we present the first large-scale Twitter dataset collected on the abortion rights debate in the United States. We present a set of 74M tweets systematically collected over the course of one year from January 1, 2022 to January 6, 2023.
Data from: Twitter Dataset on the Russo-Ukrainian War
zenodo.org
Updated Oct 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Shevtsov; Alexander Shevtsov; Despoina Antonakaki; Despoina Antonakaki; Ioannis Lamprou; Sotiris Ioannidis; Sotiris Ioannidis; Polyvios Pratikakis; Polyvios Pratikakis; Ioannis Lamprou (2023). Twitter Dataset on the Russo-Ukrainian War [Dataset]. http://doi.org/10.5281/zenodo.8431047
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8431047
Dataset updated
Oct 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexander Shevtsov; Alexander Shevtsov; Despoina Antonakaki; Despoina Antonakaki; Ioannis Lamprou; Sotiris Ioannidis; Sotiris Ioannidis; Polyvios Pratikakis; Polyvios Pratikakis; Ioannis Lamprou
Time period covered
Feb 23, 2022
Area covered
Ukraine
Description
On 24 February 2022, Russia invaded Ukraine, also known now as the Russo-Ukrainian War. We obtained our dataset through Twitter API from 23 February of 2022 until 23 June of 2023. The collected dataset has 127.275.386 tweets, shared in the form of anonymized text, where the tweet/user IDs and user mentions are anonymized and do not provide any personal information. The provided dataset contains user discussion in more than 70 languages, where the 20 most popular are : 'eng', 'fr', 'de', 'mix', 'it', 'es', 'ja', 'ru', 'pl', 'uk', 'tr', 'th', 'hi', 'qme', 'qht', 'nl', 'fi', 'ar', 'zh' and 'pt'. For the purpose of the information integrity tweets are separated and stored in different files ordered by creation date. The provided dataset is shared for further research purposes. Additionally, we provide the list of tweets IDs at the GitHub repository which can be retracted via Twitter API. Furthermore, we also manage to execute some initial analysis including: volume/activity, hashtags popularity, sentiment and military intelligence and publish the results in the web portal.
h
twitter-dataset
huggingface.co
Updated Nov 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tiky ekotto yanis (2024). twitter-dataset [Dataset]. https://huggingface.co/datasets/yanisTiky/twitter-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2024
Authors
Tiky ekotto yanis
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
yanisTiky/twitter-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Z
Digital Narratives of Covid-19: a Twitter Dataset
data.niaid.nih.gov
live.european-language-grid.eu
+3more
Updated Jun 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dieyun Song (2020). Digital Narratives of Covid-19: a Twitter Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3824949
Explore at:
Dataset updated
Jun 24, 2020
Dataset provided by
Romina De León
Nidia Hernández
Susanna Allés Torrent
Dieyun Song
Gimena del Rio Riande
Jerry Bonnell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We are releasing a Twitter dataset connected to our project Digital Narratives of Covid-19 (DHCOVID) that -among other goals- aims to explore during one year (May 2020-2021) the narratives behind data about the coronavirus pandemic.

In this first version, we deliver a Twitter dataset organized as follows:

Each folder corresponds to daily data (one folder for each day): YEAR-MONTH-DAY

In every folder there are 9 different plain text files named with "dhcovid", followed by date (YEAR-MONTH-DAY), language ("en" for English, and "es" for Spanish), and region abbreviation ("fl", "ar", "mx", "co", "pe", "ec", "es"):

dhcovid_YEAR-MONTH-DAY_es_fl.txt: Dataset containing tweets geolocalized in South Florida. The geo-localization is tracked by tweet coordinates, by place, or by user information.

dhcovid_YEAR-MONTH-DAY_en_fl.txt: We are gathering only tweets in English that refer to the area of Miami and South Florida. The reason behind this choice is that there are multiple projects harvesting English data, and, our project is particularly interested in this area because of our home institution (University of Miami) and because we aim to study public conversations from a bilingual (EN/ES) point of view.

dhcovid_YEAR-MONTH-DAY_es_ar.txt: Dataset containing tweets from Argentina.

dhcovid_YEAR-MONTH-DAY_es_mx.txt: Dataset containing tweets from Mexico.

dhcovid_YEAR-MONTH-DAY_es_co.txt: Dataset containing tweets from Colombia.

dhcovid_YEAR-MONTH-DAY_es_pe.txt: Dataset containing tweets from Perú.

dhcovid_YEAR-MONTH-DAY_es_ec.txt: Dataset containing tweets from Ecuador.

dhcovid_YEAR-MONTH-DAY_es_es.txt: Dataset containing tweets from Spain.

dhcovid_YEAR-MONTH-DAY_es.txt: This dataset contains all tweets in Spanish, regardless of its geolocation.

For English, we collect all tweets with the following keywords and hashtags: covid, coronavirus, pandemic, quarantine, stayathome, outbreak, lockdown, socialdistancing. For Spanish, we search for: covid, coronavirus, pandemia, quarentena, confinamiento, quedateencasa, desescalada, distanciamiento social.

The corpus of tweets consists of a list of Tweet Ids; to obtain the original tweets, you can use "Twitter hydratator" which takes the id and download for you all metadata in a csv file.

We started collecting this Twitter dataset on April 24th, 2020 and we are adding daily data to our GitHub repository. There is a detected problem with file 2020-04-24/dhcovid_2020-04-24_es.txt, which we couldn't gather the data due to technical reasons.

For more information about our project visit https://covid.dh.miami.edu/

For more updated datasets and detailed criteria, check our GitHub Repository: https://github.com/dh-miami/narratives_covid19/
X/Twitter: Countries with the largest audience 2024
statista.com
Updated Apr 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). X/Twitter: Countries with the largest audience 2024 [Dataset]. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/
Explore at:
Dataset updated
Apr 29, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Apr 2024
Area covered
Worldwide
Description
Social network X/Twitter is particularly popular in the United States, and as of April 2024, the microblogging service had an audience reach of 106.23 million users in the country. Japan and the India were ranked second and third with more than 69 million and 25 million users respectively. Global Twitter usage As of the second quarter of 2021, X/Twitter had 206 million monetizable daily active users worldwide. The most-followed Twitter accounts include figures such as Elon Musk, Justin Bieber and former U.S. president Barack Obama. X/Twitter and politics X/Twitter has become an increasingly relevant tool in domestic and international politics. The platform has become a way to promote policies and interact with citizens and other officials, and most world leaders and foreign ministries have an official Twitter account. Former U.S. president Donald Trump used to be a prolific Twitter user before the platform permanently suspended his account in January 2021. During an August 2018 survey, 61 percent of respondents stated that Trump's use of Twitter as President of the United States was inappropriate.
Z
Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...
data.niaid.nih.gov
dataverse.harvard.edu
+1more
Updated Aug 10, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6624080
Explore at:
Dataset updated
Aug 10, 2022
Dataset authored and provided by
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please cite the following paper when using this dataset:

N. Thakur, “A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave,” Journal of Data, vol. 7, no. 8, p. 109, Aug. 2022, doi: 10.3390/data7080109

Abstract

The COVID-19 Omicron variant, reported to be the most immune evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations, centered around information seeking and sharing, related to online learning. Mining such conversations, such as Tweets, to develop a dataset can serve as a data resource for interdisciplinary research related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore this work presents a large-scale public Twitter dataset of conversations about online learning since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter and the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description

The dataset comprises a total of 52,984 Tweet IDs (that correspond to the same number of Tweets) about online learning that were posted on Twitter from 9th November 2021 to 13th July 2022. The earliest date was selected as 9th November 2021, as the Omicron variant was detected for the first time in a sample that was collected on this date. 13th July 2022 was the most recent date as per the time of data collection and publication of this dataset.

The dataset consists of 9 .txt files. An overview of these dataset files along with the number of Tweet IDs and the date range of the associated tweets is as follows. Table 1 shows the list of all the synonyms or terms that were used for the dataset development.

Filename: TweetIDs_November_2021.txt (No. of Tweet IDs: 1283, Date Range of the associated Tweet IDs: November 1, 2021 to November 30, 2021)

Filename: TweetIDs_December_2021.txt (No. of Tweet IDs: 10545, Date Range of the associated Tweet IDs: December 1, 2021 to December 31, 2021)

Filename: TweetIDs_January_2022.txt (No. of Tweet IDs: 23078, Date Range of the associated Tweet IDs: January 1, 2022 to January 31, 2022)

Filename: TweetIDs_February_2022.txt (No. of Tweet IDs: 4751, Date Range of the associated Tweet IDs: February 1, 2022 to February 28, 2022)

Filename: TweetIDs_March_2022.txt (No. of Tweet IDs: 3434, Date Range of the associated Tweet IDs: March 1, 2022 to March 31, 2022)

Filename: TweetIDs_April_2022.txt (No. of Tweet IDs: 3355, Date Range of the associated Tweet IDs: April 1, 2022 to April 30, 2022)

Filename: TweetIDs_May_2022.txt (No. of Tweet IDs: 3120, Date Range of the associated Tweet IDs: May 1, 2022 to May 31, 2022)

Filename: TweetIDs_June_2022.txt (No. of Tweet IDs: 2361, Date Range of the associated Tweet IDs: June 1, 2022 to June 30, 2022)

Filename: TweetIDs_July_2022.txt (No. of Tweet IDs: 1057, Date Range of the associated Tweet IDs: July 1, 2022 to July 13, 2022)

The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.

Table 1. List of commonly used synonyms, terms, and phrases for online learning and COVID-19 that were used for the dataset development

Terminology

List of synonyms and terms

COVID-19

Omicron, COVID, COVID19, coronavirus, coronaviruspandemic, COVID-19, corona, coronaoutbreak, omicron variant, SARS CoV-2, corona virus

online learning

online education, online learning, remote education, remote learning, e-learning, elearning, distance learning, distance education, virtual learning, virtual education, online teaching, remote teaching, virtual teaching, online class, online classes, remote class, remote classes, distance class, distance classes, virtual class, virtual classes, online course, online courses, remote course, remote courses, distance course, distance courses, virtual course, virtual courses, online school, virtual school, remote school, online college, online university, virtual college, virtual university, remote college, remote university, online lecture, virtual lecture, remote lecture, online lectures, virtual lectures, remote lectures
Z
TRACES Bulgarian Twitter Dataset on Covid-19 Annotated with Linguistic...
data.niaid.nih.gov
zenodo.org
Updated Apr 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Silvia Gargova (2023). TRACES Bulgarian Twitter Dataset on Covid-19 Annotated with Linguistic Markers of Lies [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7614246
Explore at:
Dataset updated
Apr 16, 2023
Dataset provided by
Irina Temnikova
Veneta Kireva
Tsvetelina Stefanova
Silvia Gargova
Description
This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset contains 61411 tweet IDs of tweets, written in Bulgarian, with annotations. The dataset can be used for general use or for building lies and disinformation detection applications.

Note: this dataset is not fact-checked, the social media messages have been retrieved via keywords. For fact-checked datasets, see our other datasets.

The tweets (written between 1 Jan 2020 and 28 June 2022) have been collected via Twitter API under academic access in June 2022 with the following keywords:

(Covid OR коронавирус OR Covid19 OR Covid-19 OR Covid_19) - without replies and without retweets

(Корона OR корона OR Corona OR пандемия OR пандемията OR Spikevax OR SARS-CoV-2 OR бустерна доза) - with replies, but without retweets

Explanations of which fields can be used as markers of lies (or of intentional disinformation) are provided in our forthcoming paper (please cite it when using this dataset):

Irina Temnikova, Silvia Gargova, Ruslana Margova, Veneta Kireva, Ivo Dzhumerov, Tsvetelina Stefanova and Hristiana Nikolaeva (2023) New Bulgarian Resources for Detecting Disinformation. 10th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC'23). Poznań. Poland.
c
Geotagged Twitter posts from the United States: A tweet collection to...
datacatalogue.cessda.eu
search.gesis.org
+1more
Updated Dec 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pfeffer, Jürgen; Morstatter, Fred (2024). Geotagged Twitter posts from the United States: A tweet collection to investigate representativeness [Dataset]. http://doi.org/10.7802/1166
Explore at:
Unique identifier
https://doi.org/10.7802/1166
Dataset updated
Dec 6, 2024
Dataset provided by
Arizona State University
Carnegie Mellon University
Authors
Pfeffer, Jürgen; Morstatter, Fred
Area covered
United States
Measurement technique
Recording
Description
This dataset consists of IDs of geotagged Twitter posts from within the United States. They are provided as files per day and state as well as per day and county. In addition, files containing the aggregated number of hashtags from these tweets are provided per day and state and per day and county. This data is organized as a ZIP-file per month containing several zip-files per day which hold the txt-files with the ID/hash information.

Also part of the dataset are two shapefiles for the US counties and states and Python scripts for the data collection and sorting geotags into counties.
a
Lerman Twitter 2010 Dataset
academictorrents.com
marketplace.sshopencloud.eu
bittorrent
Updated Aug 18, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lerman Twitter 2010 Dataset [Dataset]. https://academictorrents.com/details/d8b3a315172c8d804528762f37fa67db14577cdb
Explore at:
bittorrentAvailable download formats
Dataset updated
Aug 18, 2014
Dataset authored and provided by
Kristina Lerman
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Twitter_2010 data set contains tweets containing URLs that have been posted on Twitter during October 2010. In addition to tweets, we also the followee links of tweeting users, allowing us to reconstruct the follower graph of active (tweeting) users. URLs 66,059 tweets 2,859,764 users 736,930 links 36,743,448 Tweets Table (in csv format) link_status_search_with_ordering_real_csv contains tweets with the following information link: URL within the text of the tweet id: tweet id create_at: date added to the db create_at_long inreplyto_screen_name: screen name of user this tweet is replying to inreplyto_user_id: user id of user this tweet is replying to source: device from which the tweet originated bad_user_id: alternate user id user_screen_name: tweeting user screen name order_of_users: tweet s index within sequence of tweets of the same URL user_id: user id Table (in csv format) distinct_users_from_search_table_real_map contains names of tweeting users, and the following information for
A Twitter Dataset of 100+ million tweets related to COVID-19
zenodo.org
application/gzip, csv +1
Updated Apr 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding (2023). A Twitter Dataset of 100+ million tweets related to COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.3735274
Explore at:
application/gzip, tsv, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3735274
Dataset updated
Apr 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding
Description
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th to March 30th which yielded over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to February 27th, to provide extra longitudinal coverage.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (101,400,452 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (20,244,746 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.

More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)

As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data. The need to be hydrated to be used.
s
How Popular Is Twitter In The World?
searchlogistics.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
How Popular Is Twitter In The World? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As of February 2024, Twitter is ranked as the 12h most popular social media site in the world. The platform currently has 436 million active monthly users.
s
Why Do People Use Twitter?
searchlogistics.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Why Do People Use Twitter? [Dataset]. https://www.searchlogistics.com/learn/statistics/twitter-user-statistics/
Explore at:
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
One of the biggest advantages of Twitter is the speed at which information can be passed around. People use Twitter primarily to get news and for entertainment. This is the breakdown of why people use Twitter today.
H
Data from: MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022...
dataverse.harvard.edu
Updated Nov 19, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022 Monkeypox Outbreak, Findings from Analysis of Tweets, and Open Research Questions [Dataset]. http://doi.org/10.7910/DVN/CR7T5E
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/CR7T5E
Dataset updated
Nov 19, 2022
Dataset provided by
Harvard Dataverse
Authors
Nirmalya Thakur
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
May 7, 2022 - Nov 11, 2022
Description
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: A large-scale Twitter dataset on the 2022 Monkeypox outbreak, findings from analysis of Tweets, and open research questions,” Infect. Dis. Rep., vol. 14, no. 6, pp. 855–883, 2022, DOI: https://doi.org/10.3390/idr14060087. Abstract The mining of Tweets to develop datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters has been of significant interest to the scientific community in the recent past, as such datasets serve as a rich data resource for the investigation of different research questions. Furthermore, the virus outbreaks of the past, such as COVID-19, Ebola, Zika virus, and flu, just to name a few, were associated with various works related to the analysis of the multimodal components of Tweets to infer the different characteristics of conversations on Twitter related to these respective outbreaks. The ongoing outbreak of the monkeypox virus, declared a Global Public Health Emergency (GPHE) by the World Health Organization (WHO), has resulted in a surge of conversations about this outbreak on Twitter, which is resulting in the generation of tremendous amounts of Big Data. There has been no prior work in this field thus far that has focused on mining such conversations to develop a Twitter dataset. Therefore, this work presents an open-access dataset of 571,831 Tweets about monkeypox that have been posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset complies with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management. Data Description The dataset consists of a total of 571,831 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 11th November (the most recent date at the time of uploading the most recent version of the dataset). The Tweet IDs are presented in 12 different .txt files based on the timelines of the associated tweets. The following represents the details of these dataset files. Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the associated Tweet IDs: May 7, 2022, to May 21, 2022) Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the associated Tweet IDs: May 21, 2022, to May 27, 2022) Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the associated Tweet IDs: May 27, 2022, to June 5, 2022) Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the associated Tweet IDs: June 5, 2022, to June 11, 2022) Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 46718, Date Range of the associated Tweet IDs: June 12, 2022, to June 30, 2022) Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the associated Tweet IDs: July 1, 2022, to July 23, 2022) Filename: TweetIDs_Part7.txt (No. of Tweet IDs: 105890, Date Range of the associated Tweet IDs: July 24, 2022, to July 31, 2022) Filename: TweetIDs_Part8.txt (No. of Tweet IDs: 93959, Date Range of the associated Tweet IDs: August 1, 2022, to August 9, 2022) Filename: TweetIDs_Part9.txt (No. of Tweet IDs: 50832, Date Range of the associated Tweet IDs: August 10, 2022, to August 24, 2022) Filename: TweetIDs_Part10.txt (No. of Tweet IDs: 39042, Date Range of the associated Tweet IDs: August 25, 2022, to September 19, 2022) Filename: TweetIDs_Part11.txt (No. of Tweet IDs: 12341, Date Range of the associated Tweet IDs: September 20, 2022, to October 9, 2022) Filename: TweetIDs_Part12.txt (No. of Tweet IDs: 15404, Date Range of the associated Tweet IDs: October 10, 2022, to November 11, 2022) Please note: The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset, the Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweets) may be used.
X/Twitter: number of worldwide users 2019-2024
statista.com
flwrdeptvarieties.store
Updated Nov 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). X/Twitter: number of worldwide users 2019-2024 [Dataset]. https://www.statista.com/statistics/303681/twitter-users-worldwide/
Explore at:
Dataset updated
Nov 15, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2022
Area covered
Worldwide
Description
As of December 2022, X/Twitter's audience accounted for over 368 million monthly active users worldwide. This figure was projected to decrease to approximately 335 million by 2024, a decline of around five percent compared to 2022.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bright Data (2024). Twitter Dataset [Dataset]. https://brightdata.com/products/datasets/twitter

Twitter Dataset

Explore at:

.json, .csv, .xlsxAvailable download formats

Dataset updated

Jan 8, 2023

Dataset authored and provided by

Bright Datahttps://brightdata.com/

License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered

Worldwide

Description

Utilize our Twitter dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset provides a comprehensive understanding of social media trends, empowering organizations to refine their communication and marketing strategies. Access the entire dataset or customize a subset to fit your needs. Popular use cases include market research to identify trending topics and hashtags, AI training by reviewing factors such as tweet content, retweets, and user interactions for predictive analytics, and trend forecasting by examining correlations between specific themes and user engagement to uncover emerging social media preferences.

Clear search

Close search

Google apps

Main menu

Twitter Dataset

twitter-dataset-tesla

COVID-19 Twitter Dataset

#metoo Digital Media Collection - Twitter Dataset

CMU-MisCov19: A Novel Twitter Dataset for Characterizing COVID-19...

large-twitter-tweets-sentiment

#RoeOverturned: Twitter Dataset on the Abortion Rights Controversy

Data from: Twitter Dataset on the Russo-Ukrainian War

twitter-dataset

Digital Narratives of Covid-19: a Twitter Dataset

X/Twitter: Countries with the largest audience 2024

Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...

TRACES Bulgarian Twitter Dataset on Covid-19 Annotated with Linguistic...

Geotagged Twitter posts from the United States: A tweet collection to...

Lerman Twitter 2010 Dataset

A Twitter Dataset of 100+ million tweets related to COVID-19

How Popular Is Twitter In The World?

Why Do People Use Twitter?

Data from: MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022...

X/Twitter: number of worldwide users 2019-2024

Twitter DatasetSee More Versions

Twitter Dataset