96 datasets found

Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...

zenodo.org

txt

Updated Aug 10, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Nirmalya Thakur; Nirmalya Thakur (2022). A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave [Dataset]. http://doi.org/10.5281/zenodo.6624081

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6624081

Dataset updated

Aug 10, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Nirmalya Thakur; Nirmalya Thakur

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Please cite the following paper when using this dataset:

N. Thakur, “A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave,” Preprints, 2022, DOI: 10.20944/preprints202206.0146.v1

Abstract

The COVID-19 Omicron variant, reported to be the most immune evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations, centered around information seeking and sharing, related to online learning. Mining such conversations, such as Tweets, to develop a dataset can serve as a data resource for interdisciplinary research related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore this work presents a large-scale public Twitter dataset of conversations about online learning since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset files contain the raw version that comprises 52,868 Tweet IDs (that correspond to the same number of Tweets) and the cleaned and preprocessed version that contains 46,208 unique Tweet IDs. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter and the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description

The dataset comprises 7 .txt files. The raw version of this dataset comprises 6 .txt files (TweetIDs_Corona Virus.txt, TweetIDs_Corona.txt, TweetIDs_Coronavirus.txt, TweetIDs_Covid.txt, TweetIDs_Omicron.txt, and TweetIDs_SARS CoV2.txt) that contain Tweet IDs grouped together based on certain synonyms or terms that were used to refer to online learning and the Omicron variant of COVID-19 in the respective tweets. Table 1 shows the list of all the synonyms or terms that were used for the dataset development. The cleaned and preprocessed version of this dataset is provided in the .txt file - TweetIDs_Duplicates_Removed.txt. A description of these dataset files is provided in Table 2.

The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.

Table 1. List of commonly used synonyms, terms, and phrases for online learning and COVID-19 that were used for the dataset development

Terminology	List of synonyms and terms
COVID-19	Omicron, COVID, COVID19, coronavirus, coronaviruspandemic, COVID-19, corona, coronaoutbreak, omicron variant, SARS CoV-2, corona virus
online learning	online education, online learning, remote education, remote learning, e-learning, elearning, distance learning, distance education, virtual learning, virtual education, online teaching, remote teaching, virtual teaching, online class, online classes, remote class, remote classes, distance class, distance classes, virtual class, virtual classes, online course, online courses, remote course, remote courses, distance course, distance courses, virtual course, virtual courses, online school, virtual school, remote school, online college, online university, virtual college, virtual university, remote college, remote university, online lecture, virtual lecture, remote lecture, online lectures, virtual lectures, remote lectures

Table 2: Description of the dataset files along with the information about the number of Tweet IDs in each of them

Filename	No. of Tweet IDs	Description
TweetIDs_Corona Virus.txt	321	Tweet IDs correspond to tweets that comprise the keywords – "corona virus" and one or more keywords/terms that refer to online learning
TweetIDs_Corona.txt	1819	Tweet IDs correspond to tweets that comprise the keyword – "corona" or "coronaoutbreak" and one or more keywords/terms that refer to online learning
TweetIDs_Coronavirus.txt	1429	Tweet IDs correspond to tweets that comprise the keywords – "coronavirus" or "coronaviruspandemic" and one or more keywords/terms that refer to online learning
TweetIDs_Covid.txt	41088	Tweet IDs correspond to tweets that comprise the keywords – "COVID" or "COVID19" or "COVID-19" and one or more keywords/terms that refer to online learning
TweetIDs_Omicron.txt	8198	Tweet IDs correspond to tweets that comprise the keywords – "omicron" or "omicron variant" and one or more keywords/terms that refer to online learning
TweetIDs_SARS CoV2.txt	13	Tweet IDs correspond to tweets that comprise the keyword – "SARS-CoV-2" and one or more keywords/terms that refer to online learning
TweetIDs_Duplicates_Removed.txt	46208	A collection of unique Tweet IDs from all the 6 .txt files mentioned above after data preprocessing, data clearing, and removal of duplicate tweets

C
Data from: Summary of Twitter activity and interactions by teacher and...
dataverse.csuc.cat
ods, txt
Updated Jul 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Albert Llausàs i Pascual; Albert Llausàs i Pascual (2024). Summary of Twitter activity and interactions by teacher and students [Dataset]. http://doi.org/10.34810/data640
Explore at:
txt(9184), ods(23203), ods(32998)Available download formats
Unique identifier
https://doi.org/10.34810/data640
Dataset updated
Jul 27, 2024
Dataset provided by
CORA.Repositori de Dades de Recerca
Authors
Albert Llausàs i Pascual; Albert Llausàs i Pascual
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains two data sheets, summarizing the results of a class activity involving the use of Twitter to share information and to learn. Table S1 is a summary of the tweets produced by user GeoHumana_UV (teacher/subject account) between 2020-09-16 and 2020-12-15, plus interactions recorded for each tweet (including number of impressions, URL clicks, likes, retweets and replies). All data provided by Twitter Analytics. Table S2 is a summary of student activity related to the class activity for the same period of time. It includes anonymized data on tweeting activity, level of interaction, performance and self-assessment by each participating student. Data provided by Twitter Analytics and the participants
H
Arabic Depression Tweets Dataset (15,000 Tweets) with Linguistic...
dataverse.harvard.edu
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdelmoniem Helmy (2025). Arabic Depression Tweets Dataset (15,000 Tweets) with Linguistic Augmentation [Dataset]. http://doi.org/10.7910/DVN/UWLHRI
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/UWLHRI
Dataset updated
Jun 12, 2025
Dataset provided by
Harvard Dataverse
Authors
Abdelmoniem Helmy
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains 15,000 Arabic tweets annotated for depression detection and includes linguistic feature augmentations to support research in natural language processing (NLP), sentiment analysis, and mental health detection. The dataset was curated to enable studies on automatic depression detection in Arabic social media and to support machine learning and deep learning approaches in the domain of computational mental health. Contents The dataset consists of the following columns: tweet: The original Arabic tweet text. label: Binary label indicating whether the tweet expresses signs of depression: 1 = Depression 0 = Non-depression negation_flag: Indicates presence (1) or absence (0) of negation in the tweet. intensifier_flag: Indicates presence (1) or absence (0) of intensifiers (words that strengthen the degree of emotion). Class (redundant but included for convenience): Textual label corresponding to the binary label (Depression or Non-depression). Binary Classification: Contains the count of instances in each class (appears as an artifact in the provided file). Key Features Language: Arabic (varied dialects and Modern Standard Arabic). Source: Publicly available tweets collected from Twitter (X). Annotation: Manual labeling by native Arabic speakers trained in psychology and linguistics. Linguistic augmentation: Flags for negation and intensifier usage are included to support linguistically informed NLP models. Potential Use Cases Depression detection models for Arabic texts. Linguistic analysis of depression expression in Arabic social media. Cross-lingual studies comparing depression signals across languages. Development of clinical decision support systems leveraging social media data. Licensing & Ethical Considerations The dataset consists of public social media posts. Researchers are advised to use it strictly for research purposes, respecting privacy and ethical guidelines. No personally identifiable information (PII) is included. Citation If you use this dataset, please cite it appropriately in your research publications and acknowledge the creators.
h
twitter-financial-news-topic
huggingface.co
Updated Dec 4, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
not a (2022). twitter-financial-news-topic [Dataset]. https://huggingface.co/datasets/zeroshot/twitter-financial-news-topic
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 4, 2022
Authors
not a
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Description

The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their topic.

The dataset holds 21,107 documents annotated with 20 labels:

topics = { "LABEL_0": "Analyst Update", "LABEL_1": "Fed | Central Banks", "LABEL_2": "Company | Product News", "LABEL_3": "Treasuries | Corporate Debt", "LABEL_4": "Dividend"… See the full description on the dataset page: https://huggingface.co/datasets/zeroshot/twitter-financial-news-topic.
m
Twitter Hate Speech Dataset for the Saudi Dialect
data.mendeley.com
Updated Nov 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Alhazmi (2024). Twitter Hate Speech Dataset for the Saudi Dialect [Dataset]. http://doi.org/10.17632/c2jpnv9yk6.4
Explore at:
Unique identifier
https://doi.org/10.17632/c2jpnv9yk6.4
Dataset updated
Nov 1, 2024
Authors
Ali Alhazmi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Saudi Arabia
Description
The data was performed by employing standard Twitter API on Arabic tweets and code-mixing datasets. The Data was carried out for a duration of three months, specifically from April 2023 to June 2023. This was done via a combination of keyword, thread-based searches, and profile-based search approaches as. A total of 120 terms, including various versions, which were used to identify tweets containing code-mixing concerning regional hate speech. To conduct a thread-based search, we have incorporated hashtags that are related to contentious subjects that are deemed essential markers for hateful speech. Throughout the data-gathering phase, we kept an eye on Twitter trends and designated ten hashtags for information retrieval. Given that hateful tweets are usually less common than regular tweets, we expanded our dataset and improved the representation of the hate class by incorporating the most impactful terms from a lexicon of religious hate terms (Albadi et al., 2018). We gathered exclusively original Arabic tweets for all queries, excluding retweets and non-Arabic tweets. In all, we obtained 200,000 Twitter data, of which we sampled 35k tweets for annotation.
Hate Speech and Offensive Language Detection
kaggle.com
Updated Dec 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Hate Speech and Offensive Language Detection [Dataset]. https://www.kaggle.com/datasets/thedevastator/hate-speech-and-offensive-language-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 2, 2023
Dataset provided by
Kaggle
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Hate Speech and Offensive Language Detection

Hate Speech and Offensive Language Detection on Twitter

By hate_speech_offensive (From Huggingface) [source]

About this dataset

This dataset, named hate_speech_offensive, is a meticulously curated collection of annotated tweets with the specific purpose of detecting hate speech and offensive language. The dataset primarily consists of English tweets and is designed to train machine learning models or algorithms in the task of hate speech detection. It should be noted that the dataset has not been divided into multiple subsets, and only the train split is currently available for use.

The dataset includes several columns that provide valuable information for understanding each tweet's classification. The column count represents the total number of annotations provided for each tweet, whereas hate_speech_count signifies how many annotations classified a particular tweet as hate speech. On the other hand, offensive_language_count indicates the number of annotations categorizing a tweet as containing offensive language. Additionally, neither_count denotes how many annotations identified a tweet as neither hate speech nor offensive language.

For researchers and developers aiming to create effective models or algorithms capable of detecting hate speech and offensive language on Twitter, this comprehensive dataset offers a rich resource for training and evaluation purposes

How to use the dataset

Introduction:

Dataset Overview:

The dataset is presented in a CSV file format named 'train.csv'.

It consists of annotated tweets with information about their classification as hate speech, offensive language, or neither.

Each row represents a tweet along with the corresponding annotations provided by multiple annotators.

The main columns that will be essential for your analysis are: count (total number of annotations), hate_speech_count (number of annotations classifying a tweet as hate speech), offensive_language_count (number of annotations classifying a tweet as offensive language), neither_count (number of annotations classifying a tweet as neither hate speech nor offensive language).

Data Collection Methodology: The data collection methodology used to create this dataset involved obtaining tweets from Twitter's public API using specific search terms related to hate speech and offensive language. These tweets were then manually labeled by multiple annotators who reviewed them for classification purposes.

Data Quality: Although efforts have been made to ensure the accuracy of the data, it is important to acknowledge that annotations are subjective opinions provided by individual annotators. As such, there may be variations in classifications between annotators.

Preprocessing Techniques: Prior to training machine learning models or algorithms on this dataset, it is recommended to apply standard preprocessing techniques such as removing URLs, usernames/handles, special characters/punctuation marks, stop words removal, tokenization, stemming/lemmatization etc., depending on your analysis requirements.

Exploratory Data Analysis (EDA): Conducting EDA on the dataset will help you gain insights and understand the underlying patterns in hate speech and offensive language. Some potential analysis ideas include:

Distribution of tweet counts per classification category (hate speech, offensive language, neither).

Most common words/phrases associated with each class.

Co-occurrence analysis to identify correlations between hate speech and offensive language.

Building Machine Learning Models: To train models for automatic detection of hate speech and offensive language, you can follow these steps: a) Split the dataset into training and testing sets for model evaluation purposes. b) Choose appropriate features/

Research Ideas

Sentiment Analysis: This dataset can be used to train models for sentiment analysis on Twitter data. By classifying tweets as hate speech, offensive language, or neither, the dataset can help in understanding the sentiment behind different tweets and identifying patterns of negative or offensive language.

Hate Speech Detection: The dataset can be used to develop models that automatically detect hate speech on Twitter. By training machine learning algorithms on this annotated dataset, it becomes possible to create systems that can identify and flag hate speech in real-time, making social media platforms safer and more inclusive.

Content Moderation: Social media platforms can use this dataset to improve their content m...
Z
Hate Speech and Bias against Asians, Blacks, Jews, Latines, and Muslims: A...
data.niaid.nih.gov
zenodo.org
Updated Oct 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karali, Sameer (2023). Hate Speech and Bias against Asians, Blacks, Jews, Latines, and Muslims: A Dataset for Machine Learning and Text Analytics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8147307
Explore at:
Dataset updated
Oct 26, 2023
Dataset provided by
Karali, Sameer
Soemer, Katharina
Jikeli, Gunther
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Institute for the Study of Contemporary Antisemitism (ISCA) at Indiana University Dataset on bias against Asians, Blacks, Jews, Latines, and Muslims

The ISCA project compiled this dataset using an annotation portal, which was used to label tweets as either biased or non-biased, among other labels. Note that the annotation was done on live data, including images and context, such as threads. The original data comes from annotationportal.com. They include representative samples of live tweets from the years 2020 and 2021 with the keywords "Asians, Blacks, Jews, Latinos, and Muslims". A random sample of 600 tweets per year was drawn for each of the keywords. This includes retweets. Due to a sampling error, the sample for the year 2021 for the keyword "Jews" has only 453 tweets from 2021 and 147 from the first eight months of 2022 and it includes some tweets from the query with the keyword "Israel." The tweets were divided into six samples of 100 tweets, which were then annotated by three to seven students in the class "Researching White Supremacism and Antisemitism on Social Media" taught by Gunther Jikeli, Elisha S. Breton, and Seth Moller at Indiana University in the fall of 2022, see this report. Annotators used a scale from 1 to 5 (confident not biased, probably not biased, don't know, probably biased, confident biased). The definitions of bias against each minority group used for annotation are also included in the report. If a tweet called out or denounced bias against the minority in question, it was labeled as "calling out bias." The labels of whether a tweet is biased or calls out bias are based on a 75% majority vote. We considered "probably biased" and "confident biased" as biased and "confident not biased," "probably not biased," and "don't know" as not biased.
The types of stereotypes vary widely across the different categories of prejudice. While about a third of all biased tweets were classified as "hate" against the minority, the stereotypes in the tweets often matched common stereotypes about the minority. Asians were blamed for the Covid pandemic. Blacks were seen as inferior and associated with crime. Jews were seen as powerful and held collectively responsible for the actions of the State of Israel. Some tweets denied the Holocaust. Hispanics/Latines were portrayed as being in the country illegally and as "invaders," in addition to stereotypical accusations of being lazy, stupid, or having too many children. Muslims, on the other hand, were often collectively blamed for terrorism and violence, though often in conversations about Muslims in India.

Content:

This dataset contains 5880 tweets that cover a wide range of topics common in conversations about Asians, Blacks, Jews, Latines, and Muslims. 357 tweets (6.1 %) are labeled as biased and 5523 (93.9 %) are labeled as not biased. 1365 tweets (23.2 %) are labeled as calling out or denouncing bias. 1180 out of 5880 tweets (20.1 %) contain the keyword "Asians," 590 were posted in 2020 and 590 in 2021. 39 tweets (3.3 %) are biased against Asian people. 370 tweets (31,4 %) call out bias against Asians. 1160 out of 5880 tweets (19.7%) contain the keyword "Blacks," 578 were posted in 2020 and 582 in 2021. 101 tweets (8.7 %) are biased against Black people. 334 tweets (28.8 %) call out bias against Blacks. 1189 out of 5880 tweets (20.2 %) contain the keyword "Jews," 592 were posted in 2020, 451 in 2021, and ––as mentioned above––146 tweets from 2022. 83 tweets (7 %) are biased against Jewish people. 220 tweets (18.5 %) call out bias against Jews. 1169 out of 5880 tweets (19.9 %) contain the keyword "Latinos," 584 were posted in 2020 and 585 in 2021. 29 tweets (2.5 %) are biased against Latines. 181 tweets (15.5 %) call out bias against Latines. 1182 out of 5880 tweets (20.1 %) contain the keyword "Muslims," 593 were posted in 2020 and 589 in 2021. 105 tweets (8.9 %) are biased against Muslims. 260 tweets (22 %) call out bias against Muslims.

File Description:

The dataset is provided in a csv file format, with each row representing a single message, including replies, quotes, and retweets. The file contains the following columns:
'TweetID': Represents the tweet ID.
'Username': Represents the username who published the tweet (if it is a retweet, it will be the user who retweetet the original tweet.
'Text': Represents the full text of the tweet (not pre-processed). 'CreateDate': Represents the date the tweet was created.
'Biased': Represents the labeled by our annotators if the tweet is biased (1) or not (0). 'Calling_Out': Represents the label by our annotators if the tweet is calling out bias against minority groups (1) or not (0). 'Keyword': Represents the keyword that was used in the query. The keyword can be in the text, including mentioned names, or the username.

Licences

Data is published under the terms of the "Creative Commons Attribution 4.0 International" licence (https://creativecommons.org/licenses/by/4.0)

Acknowledgements

We are grateful for the technical collaboration with Indiana University's Observatory on Social Media (OSoMe). We thank all class participants for the annotations and contributions, including Kate Baba, Eleni Ballis, Garrett Banuelos, Savannah Benjamin, Luke Bianco, Zoe Bogan, Elisha S. Breton, Aidan Calderaro, Anaye Caldron, Olivia Cozzi, Daj Crisler, Jenna Eidson, Ella Fanning, Victoria Ford, Jess Gruettner, Ronan Hancock, Isabel Hawes, Brennan Hensler, Kyra Horton, Maxwell Idczak, Sanjana Iyer, Jacob Joffe, Katie Johnson, Allison Jones, Kassidy Keltner, Sophia Knoll, Jillian Kolesky, Emily Lowrey, Rachael Morara, Benjamin Nadolne, Rachel Neglia, Seungmin Oh, Kirsten Pecsenye, Sophia Perkovich, Joey Philpott, Katelin Ray, Kaleb Samuels, Chloe Sherman, Rachel Weber, Molly Winkeljohn, Ally Wolfgang, Rowan Wolke, Michael Wong, Jane Woods, Kaleb Woodworth, and Aurora Young. This work used Jetstream2 at Indiana University through allocation HUM200003 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.
h
financial-tweets-sentiment
huggingface.co
Updated Dec 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim Koornstra (2023). financial-tweets-sentiment [Dataset]. https://huggingface.co/datasets/TimKoornstra/financial-tweets-sentiment
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2023
Authors
Tim Koornstra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Financial Sentiment Analysis Dataset

Overview

This dataset is a comprehensive collection of tweets focused on financial topics, meticulously curated to assist in sentiment analysis in the domain of finance and stock markets. It serves as a valuable resource for training machine learning models to understand and predict sentiment trends based on social media discourse, particularly within the financial sector.

Data Description

The dataset comprises tweets… See the full description on the dataset page: https://huggingface.co/datasets/TimKoornstra/financial-tweets-sentiment.
h
tweet_eval
huggingface.co
Updated Nov 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cardiff NLP (2021). tweet_eval [Dataset]. https://huggingface.co/datasets/cardiffnlp/tweet_eval
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 22, 2021
Dataset authored and provided by
Cardiff NLP
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for tweet_eval

Dataset Summary

TweetEval consists of seven heterogenous tasks in Twitter, all framed as multi-class tweet classification. The tasks include - irony, hate, offensive, stance, emoji, emotion, and sentiment. All tasks have been unified into the same benchmark, with each dataset presented in the same format and with fixed training, validation and test splits.

Supported Tasks and Leaderboards

text_classification: The dataset can be… See the full description on the dataset page: https://huggingface.co/datasets/cardiffnlp/tweet_eval.
𝒙 Twemoji Dataset
kaggle.com
Updated Sep 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2023). 𝒙 Twemoji Dataset [Dataset]. https://www.kaggle.com/datasets/mexwell/twemoji-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 22, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
mexwell
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Collection of 13M tweets divided into training, validation, and test sets for the purposes of predicting emoji based on text and/or images.

The data provides the tweet status ID and the emoji annotations associated with it. In the case of image-containing subsets, the image URL is also listed.

The Full, unbalanced dataset consists of a random test and validation sets of 1M tweets, with the remainder in the training set.

The Balanced testset is a subset of the test set chosen to improve emoji class balance.

The Image subsets are image-containing tweets.

Finally, emoji_map_1791.csv provides information regarding the emoji labels and potential metadata.

URL to get the tweet based on ID: `https://twitter.com/anyuser/status/
o
Twitter Sentiment Analysis
opendatabay.com
.undefined
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Twitter Sentiment Analysis [Dataset]. https://www.opendatabay.com/data/ai-ml/8fd1962b-c5b3-4b01-b895-3aaafee2ab8c
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Social Media and Networking
Description
Overview This is an entity-level sentiment analysis dataset of twitter. Given a message and an entity, the task is to judge the sentiment of the message about the entity. There are three classes in this dataset: Positive, Negative and Neutral. We regard messages that are not relevant to the entity (i.e. Irrelevant) as Neutral.

Usage Please use twitter_training.csv as the training set and twitter_validation.csv as the validation set. Top 1 classification accuracy is used as the metric.

Original Data Source: Twitter Sentiment Analysis
P
STEDUCOV: A DATASET ON STANCE DETECTION IN TWEETS TOWARDS ONLINE EDUCATION...
paperswithcode.com
Updated Apr 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). STEDUCOV: A DATASET ON STANCE DETECTION IN TWEETS TOWARDS ONLINE EDUCATION DURING COVID-19 PANDEMIC Dataset [Dataset]. https://paperswithcode.com/dataset/steducov-a-dataset-on-stance-detection-in
Explore at:
Dataset updated
Apr 16, 2022
Description
StEduCov, a dataset annotated for stances toward online education during the COVID-19 pandemic. StEduCov has 17,097 tweets gathered over 15 months, from March 2020 to May 2021, using Twitter API. The tweets are manually annotated into agree, disagree or neutral classes. We used a set of relevant hashtags and keywords. Specifically, we utilised a combination of hashtags, such as '#COVID 19' or '#Coronavirus' with keywords, such as 'education', 'online learning', 'distance learning' and 'remote learning'. To ensure high annotation quality, three different annotators annotated each tweet and at least one of the reviewers from three judges revised it. They were guided by some instructions, such as that in the case of disagree class, there should be a clear negative statement about online education or its impact. Also, if the tweet is negative but refers to other people (e.g. 'my children hate online learning').
Z
CMU-MisCov19: A Novel Twitter Dataset for Characterizing COVID-19...
data.niaid.nih.gov
zenodo.org
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Memon, Shahan Ali (2024). CMU-MisCov19: A Novel Twitter Dataset for Characterizing COVID-19 Misinformation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4024153
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Carley, Kathleen M.
Memon, Shahan Ali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
From conspiracy theories to fake cures and fake treatments, COVID-19 has become a hot-bed for the spread of misinformation online. It is more important than ever to identify methods to debunk and correct false information online. Detection and characterization of misinformation requires an availability of annotated datasets. Most of the published COVID-19 Twitter datasets are generic, lack annotations or labels, employ automated annotations using transfer learning or semi-supervised methods, or are not specifically designed for misinformation. Annotated datasets are either only focused on "fake news", are small in size, or have less diversity in terms of classes.

Here, we present a novel Twitter misinformation dataset called "CMU-MisCov19" with 4573 annotated tweets over 17 themes around the COVID-19 discourse. We also present our annotation codebook for the different COVID-19 themes on Twitter, along with their descriptions and examples, for the community to use for collecting further annotations. Further details related to the dataset, and our analysis based on this dataset can be found at https://arxiv.org/abs/2008.00791. In adherence to the Twitter’s terms and conditions, we do not provide the full tweet JSONs but provide a ".csv" file with the tweet IDs so that the tweets can be rehydrated. We also provide the annotations, and the date of creation for each tweet for the reproduction of the results of our analyses.

Note: If for any reason, you are not able to rehydrate all the tweets, reach out to Shahan Ali Memon at (shahan@nyu.edu).

If you use this data, please cite our paper as follows:

"Shahan Ali Memon and Kathleen M. Carley. Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset, In Proceedings of The 5th International Workshop on Mining Actionable Insights from Social Networks (MAISoN 2020), co-located with CIKM, virtual event due to COVID-19, 2020."
h
binary-class-tweets-dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Hamza Azhar, binary-class-tweets-dataset [Dataset]. https://huggingface.co/datasets/mhamza-007/binary-class-tweets-dataset
Explore at:
Authors
Muhammad Hamza Azhar
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Disaster Tweets Dataset For Binary Classification

This dataset contains tweets classified as either disastrous (label 1) or not disastrous (label 0). It is designed to train and evaluate machine learning models for disaster-related tweet classification.

Files Included

train.csv: Contains 7,613 tweets with their respective labels. test.csv: Contains 3,263 tweets without labels.

Columns

Each CSV file contains the following columns:

id – Unique identifier for… See the full description on the dataset page: https://huggingface.co/datasets/mhamza-007/binary-class-tweets-dataset.
o
Covid-19 Twitter Dataset
opendatabay.com
.undefined
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Covid-19 Twitter Dataset [Dataset]. https://www.opendatabay.com/data/healthcare/f445ec28-4fdd-4832-8d8e-da282f16c84b
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Datasimple
Area covered
Data Science and Analytics
Description
Data Collection I streamed live tweets from the twitter after WHO declared Covid-19 as a pandemic. Since this Covid-19 epidemic has affected the entire world, I collected worldwide Covid-19 related English tweets at a rate of almost 10k per day in three phases starting from April-June, 2020, August-October, 2020 and April-June, 2021. I prepared the first phase dataset of about 235k tweets collected from 19th April to 20th June 2020. After one month I again start collecting tweets from Twitter as at that time the pandemic was spreading with its fatal intensity. I collected almost 320k tweets in the period August 20 to October 20, 2020, for the second phase dataset. Finally, after six months collected almost 489k tweets in the period 26th April to 27th June 2021 for the third phase dataset.

Content The datasets I developed contain important information about most of the tweets and their attributes. The main attributes of both of these datasets are:

Tweet ID Creation Date & Time Source Link Original Tweet Favorite Count Retweet Count Original Author Hashtags User Mentions Place Finally, I collected 2,35,240, 3,20,316, and 4,89,269 tweets for first, second, and third phase datasets containing the hash-tagged keywords like - #covid-19, #coronavirus, #covid, #covaccine, #lockdown, #homequarantine, #quarantinecenter, #socialdistancing, #stayhome, #staysafe, etc. Here I represented an overview of the collected dataset.

Data Pre-Processing I pre-processed these collected data by developing a user-defined pre-processing function based on NLTK (Natural Language Toolkit, a Python library for NLP). At the initial stage, it converts all the tweets into lowercase. Then it removes all extra white spaces, numbers, special characters, ASCII characters, URLs, punctuations & stopwords from the tweets. Then it converts all ‘covid’ words into ‘covid19’ as we already removed all numbers from the tweets. Using stemming the pre-processing function has reduced inflected words to their word stem.

Sentiment Analysis I calculated the sentiment polarity of each cleaned and pre-processed tweet using the NLTK-based Sentiment Analyzer and get the sentiment scores for positive, negative, and neutral categories to calculate the compound sentiment score for each tweet. I classified the tweets on the basis of the compound sentiment scores into three different classes i.e., Positive, Negative, and Neutral. Then we assigned the sentiment polarity ratings for each tweet based on the following algorithm-

Algorithm Sentiment Classification of Tweets (compound, sentiment):

for each tweet in the dataset: if tweet[compound] < 0: tweet[sentiment] = 0.0 # assigned 0.0 for Negative Tweets elif tweet[compound] > 0: tweet[sentiment] = 1.0 # assigned 1.0 for Positive Tweets else: tweet[sentiment] = 0.5 # assigned 0.5 for Neutral Tweets end Acknowledgements I wouldn't be here without the help of my project guide Dr. Anup Kumar Kolya, Assistant Professor, Dept of Computer Science and Engineering, RCCIIT whose kind and valuable suggestions and excellent guidance enlightened to give me the best opportunity in preparing these datasets. If you owe any attributions or thanks, include him here along with any citations of past research.

This datasets are the part of the publications entitled:

Chakraborty, A. K., Das, D., & Kolya, A. K. (2023). Sentiment Analysis on Large-Scale Covid-19 Tweets using Hybrid Convolutional LSTM Based on Naïve Bayes Sentiment Modeling. ECTI Transactions on Computer and Information Technology (ECTI-CIT), 17(3), 343–357. https://doi.org/10.37936/ecti-cit.2023173.252549 Chakraborty, A. K., & Das, S. (2023). A comparative study of a novel approach with baseline attributes leading to sentiment analysis of Covid-19 tweets. In Elsevier eBooks (pp. 179–208). https://doi.org/10.1016/b978-0-32-390535-0.00013-6 Chakraborty, A. K., Das, S., & Kolya, A. K. (2021). Sentiment analysis of COVID-19 tweets using Evolutionary Classification-Based LSTM model. In Advances in intelligent systems and computing (pp. 75–86). https://doi.org/10.1007/978-981-16-1543-6_7

Original Data Source: Covid-19 Twitter Dataset
Domain shares on Twitter containing news and misinformation
zenodo.org
data.niaid.nih.gov
bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dimitar Nikolov; Dimitar Nikolov; Filippo Menczer; Filippo Menczer (2020). Domain shares on Twitter containing news and misinformation [Dataset]. http://doi.org/10.5281/zenodo.2558687
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2558687
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dimitar Nikolov; Dimitar Nikolov; Filippo Menczer; Filippo Menczer
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset contains a set of domain sharing actions that occurred on Twitter during the month of June 2017. Each domain sharing action can be thought of as a triple (twitter user id, tweet id, domain). The tweets in the dataset were collected through Twitter's decahose API. Each user in the dataset was responsible for sharing at least one news article, and at least one article that can be labeled as misinformation.

The data is distributed in domain-shares.data in the following JSON format:

{ "

For example:

{ "2274775459": { "880214796779573249": ["palmerreport.com"], "870801925054373888": ["reuters.com", "abcn.ws"], "879899831808012292": ["mobile.nytimes.com"] }, "2909434459": { "879856813755256832": ["dallasnews.com"] } }

In addition, a TAB-separated version with (twitter user id, tweet id, domain) triples is also available.

For further information on how the dataset was constructed and on analyses that have been conducted on it, please refer to the accompanying Github repository at https://github.com/dimitargnikolov/twitter-bias.
Wiki-MID Dataset (LOD + TSV)
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giovanni Stilo (2023). Wiki-MID Dataset (LOD + TSV) [Dataset]. http://doi.org/10.6084/m9.figshare.6231326.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6231326.v2
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Giovanni Stilo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Wiki-MID Dataset Wiki-MID is a LOD compliant multi-domain interests dataset to train and test Recommender Systems. Our English dataset includes an average of 90 multi-domain preferences per user on music, books, movies, celebrities, sport, politics and much more, for about half million Twitter users traced during six months in 2017. Preferences are either extracted from messages of users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their "topical" friends, i.e., followees representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles describing them. This unique feature of our dataset provides a mean to categorize preferred items, exploiting available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others. Data model: Our resource is designed on top of the Semantically-Interlinked Online Communities (SIOC) core ontology. The SIOC ontology favors the inclusion of data mined from social networks communities into the Linked Open Data (LOD) cloud.We represent Twitter users as instances of the SIOC UserAccount class.Topical users and message based user interests are then associated, through the usage of the Simple Knowledge Organization System Namespace Document (SKOS) predicate relatedMatch, to a corresponding Wikipedia page as a result of our automated mapping methodology.
r
Data from: Developing a rule-based method for identifying researchers on...
researchdata.se
datacatalogue.cessda.eu
Updated Sep 20, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Björn Ekström (2019). Developing a rule-based method for identifying researchers on Twitter: The case of vaccine discussions [Dataset]. http://doi.org/10.5878/akmc-va16
Explore at:
(7805)Available download formats
Unique identifier
https://doi.org/10.5878/akmc-va16
Dataset updated
Sep 20, 2019
Dataset provided by
University of Borås
Authors
Björn Ekström
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 1, 2018 - Oct 31, 2019
Description
This study seeks to develop a method for identifying the occurrences and proportions of researchers, media and other professionals active in Twitter discussions. As a case example, an anonymised dataset from Twitter vaccine discussions is used. The study proposes a method of using keywords as strings within lists to identify classes from user biographies. This provides a way to apply multiple classification principles to a set of Twitter biographies using semantic rules through the Python programming language. The script used for the study is here deposited.

Method development for Twitter biography classification concerning occurrences of academics, academically related groups and individuals, media, other groups and members of the general public. Written in the Python programming language.
A Novel Dataset of Misinformation Tweets Regarding the CoronaVac Vaccine in...
zenodo.org
data.niaid.nih.gov
csv
Updated May 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel P. Oliveira; Gabriel P. Oliveira; Beatriz F. Paiva; Ana Paula Couto da Silva; Ana Paula Couto da Silva; Mirella M. Moro; Mirella M. Moro; Beatriz F. Paiva (2022). A Novel Dataset of Misinformation Tweets Regarding the CoronaVac Vaccine in Brazil [Dataset]. http://doi.org/10.5281/zenodo.6388126
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6388126
Dataset updated
May 18, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gabriel P. Oliveira; Gabriel P. Oliveira; Beatriz F. Paiva; Ana Paula Couto da Silva; Ana Paula Couto da Silva; Mirella M. Moro; Mirella M. Moro; Beatriz F. Paiva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brazil
Description
This dataset was built to analyze the spread of misinformation about CoronaVac in Brazil by using data from Twitter for two specific events: the approval for emergency use in adults over 18 years old (January 17, 2021) and the approval for use in children aged 6 to 17 years (January 20, 2022).

We choose to label the original tweets with at least one retweet in the analyzed period. The manual labeling of such tweets was initially performed by two annotators with high knowledge about the dataset and the considered context. In cases in which there was no agreement between the two annotators, a third annotator was considered to define the class of the tweet.

The final dataset contains 1,010 tweets from January 17, 2021, and 816 tweets from January 20, 2022.

This dataset was originally built for a conference paper accepted at BraSNAM 2022. If you make use of the dataset, please also cite the following paper:

Gabriel P. Oliveira, Beatriz F. Paiva, Ana Paula Couto da Silva, and Mirella M. Moro. Characterizing the Diffusion of Misinformation Regarding the CoronaVac Vaccine in Brazil. In Proceedings of the XI Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2022), 2022.

@inproceedings{brasnam/OliveiraPSM22, title = {Characterizing the Diffusion of Misinformation Regarding the CoronaVac Vaccine in Brazil}, author = {Gabriel P. Oliveira and Beatriz F. Paiva and Ana Paula Couto da Silva and Mirella M. Moro}, booktitle = {Proceedings of the XI Brazilian Workshop on Social Network Analysis and Mining (BraSNAM)} year = {2022} }
Multi-Class Depression Detection Dataset
zenodo.org
data.niaid.nih.gov
csv
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Osama Nusrat; Muhammad Osama Nusrat (2025). Multi-Class Depression Detection Dataset [Dataset]. http://doi.org/10.5281/zenodo.14233292
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14233292
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Muhammad Osama Nusrat; Muhammad Osama Nusrat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 18, 2023
Description
This dataset was created as part of the Master's thesis titled "Multi-Class Depression Detection Through Tweets Using Artificial Intelligence." It contains tweets labeled for five types of depression (Bipolar, Major, Psychotic, Atypical, and Postpartum) using lexicons verified by psychiatrists.

Purpose: Designed for multi-class classification of depression using AI, focusing on Explainable AI for highlighting key words in the tweets influencing the predictions.
Applications: The dataset is suitable for research in natural language processing, sentiment analysis, mental health prediction, and Explainable AI.

This dataset is shared under the Creative Commons Attribution 4.0 International (CC BY) license, requiring proper attribution for any use or modification.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nirmalya Thakur; Nirmalya Thakur (2022). A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave [Dataset]. http://doi.org/10.5281/zenodo.6624081

Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6624081

Dataset updated

Aug 10, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Nirmalya Thakur; Nirmalya Thakur

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Please cite the following paper when using this dataset:

N. Thakur, “A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave,” Preprints, 2022, DOI: 10.20944/preprints202206.0146.v1

Abstract

Data Description

Table 1. List of commonly used synonyms, terms, and phrases for online learning and COVID-19 that were used for the dataset development

Terminology	List of synonyms and terms
COVID-19	Omicron, COVID, COVID19, coronavirus, coronaviruspandemic, COVID-19, corona, coronaoutbreak, omicron variant, SARS CoV-2, corona virus
online learning	online education, online learning, remote education, remote learning, e-learning, elearning, distance learning, distance education, virtual learning, virtual education, online teaching, remote teaching, virtual teaching, online class, online classes, remote class, remote classes, distance class, distance classes, virtual class, virtual classes, online course, online courses, remote course, remote courses, distance course, distance courses, virtual course, virtual courses, online school, virtual school, remote school, online college, online university, virtual college, virtual university, remote college, remote university, online lecture, virtual lecture, remote lecture, online lectures, virtual lectures, remote lectures

Table 2: Description of the dataset files along with the information about the number of Tweet IDs in each of them

Filename	No. of Tweet IDs	Description
TweetIDs_Corona Virus.txt	321	Tweet IDs correspond to tweets that comprise the keywords – "corona virus" and one or more keywords/terms that refer to online learning
TweetIDs_Corona.txt	1819	Tweet IDs correspond to tweets that comprise the keyword – "corona" or "coronaoutbreak" and one or more keywords/terms that refer to online learning
TweetIDs_Coronavirus.txt	1429	Tweet IDs correspond to tweets that comprise the keywords – "coronavirus" or "coronaviruspandemic" and one or more keywords/terms that refer to online learning
TweetIDs_Covid.txt	41088	Tweet IDs correspond to tweets that comprise the keywords – "COVID" or "COVID19" or "COVID-19" and one or more keywords/terms that refer to online learning
TweetIDs_Omicron.txt	8198	Tweet IDs correspond to tweets that comprise the keywords – "omicron" or "omicron variant" and one or more keywords/terms that refer to online learning
TweetIDs_SARS CoV2.txt	13	Tweet IDs correspond to tweets that comprise the keyword – "SARS-CoV-2" and one or more keywords/terms that refer to online learning
TweetIDs_Duplicates_Removed.txt	46208	A collection of unique Tweet IDs from all the 6 .txt files mentioned above after data preprocessing, data clearing, and removal of duplicate tweets

Clear search

Close search

Google apps

Main menu

Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...

Data from: Summary of Twitter activity and interactions by teacher and...

Arabic Depression Tweets Dataset (15,000 Tweets) with Linguistic...

twitter-financial-news-topic

Twitter Hate Speech Dataset for the Saudi Dialect

Hate Speech and Offensive Language Detection

Hate Speech and Offensive Language Detection

Hate Speech and Offensive Language Detection on Twitter

About this dataset

How to use the dataset

Research Ideas

Hate Speech and Bias against Asians, Blacks, Jews, Latines, and Muslims: A...

Institute for the Study of Contemporary Antisemitism (ISCA) at Indiana University Dataset on bias against Asians, Blacks, Jews, Latines, and Muslims

Content:

File Description:

Licences

Acknowledgements

financial-tweets-sentiment

tweet_eval

𝒙 Twemoji Dataset

Twitter Sentiment Analysis

STEDUCOV: A DATASET ON STANCE DETECTION IN TWEETS TOWARDS ONLINE EDUCATION...

CMU-MisCov19: A Novel Twitter Dataset for Characterizing COVID-19...

binary-class-tweets-dataset

Covid-19 Twitter Dataset

Domain shares on Twitter containing news and misinformation

Wiki-MID Dataset (LOD + TSV)

Data from: Developing a rule-based method for identifying researchers on...

A Novel Dataset of Misinformation Tweets Regarding the CoronaVac Vaccine in...

Multi-Class Depression Detection Dataset

Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron WaveSee More Versions

Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave