Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Dataset Name
Dataset Summary
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure
Data Instances
[More Information Needed]
Data Fields
[More Information Needed]
Data Splits
[More Information Needed]
Dataset Creation… See the full description on the dataset page: https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This comprehensive dataset is a meticulously curated collection of mental health statuses tagged from various statements. The dataset amalgamates raw data from multiple sources, cleaned and compiled to create a robust resource for developing chatbots and performing sentiment analysis.
The dataset integrates information from the following Kaggle datasets:
The dataset consists of statements tagged with one of the following seven mental health statuses: - Normal - Depression - Suicidal - Anxiety - Stress - Bi-Polar - Personality Disorder
The data is sourced from diverse platforms including social media posts, Reddit posts, Twitter posts, and more. Each entry is tagged with a specific mental health status, making it an invaluable asset for:
This dataset is ideal for training machine learning models aimed at understanding and predicting mental health conditions based on textual data. It can be used in various applications such as:
This dataset was created by aggregating and cleaning data from various publicly available datasets on Kaggle. Special thanks to the original dataset creators for their contributions.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a comprehensive collection of tweets designed for multilingual sentiment analysis tasks. It consists of three primary columns: "tweet","language" and "sentiment". The dataset's goal is to facilitate research and development in natural language processing and multi-task learning applications.
The "tweet" column contains the actual tweet scraped. They encompass a wide range of topics, opinions and sentiments expressed by users across the social media platforms. THe tweets are provided in their original text format.
The "language" column specifies the language in which each tweet is written. The dataset is carefully curated to include tweets from multiple languages.
The "sentiment" column contains sentiment ratings for each tweet in the range of 1 to 5 stars. 1 star represent strongly negative sentiment whereas 5 stars represent strongly positive sentiment. Intermediate values like 2,3 and 4 stars represent negative, neutral and positive sentiment.
Researchers and developers can leverage this dataset for a wide range of NLP and MTL tasks where they can jointly try to predict language and sentiment given a tweet.
The dataset is well structured and formatted in CSV file format.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset
This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
model-generated predictions
Model Card for Sentiment Analysis on Financial News
Overview
This dataset contains sentiments for financial news headlines from the perspective of a retail investor. The data is derived from the research by Malo et al. (2014), which focuses on detecting semantic orientations in economic texts.
Dataset Details
Source: Malo, P., Sinha, A., Takala, P., Korhonen, P., and Wallenius, J. (2014). “Good debt or bad debt: Detecting semantic orientations in economic… See the full description on the dataset page: https://huggingface.co/datasets/mltrev23/financial-sentiment-analysis.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Preprocessed amazon product review data of Gen3EcoDot scrapped entirely from amazon.in Stemmed and Lemmatized using nltk sentiment labels are generated using TextBlob polarity scores
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore our Multimodal Sentiment Dataset, featuring 100 diverse classes of images and corresponding texts with sentiment labels. Ideal for AI-driven sentiment analysis, image classification, and multimodal fusion tasks.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A Sentiment Analsysis Dataset for Finetuning Large Models in Chat-style
More details can be found at https://github.com/l294265421/chat-sentiment-analysis
Supported Tasks
Aspect Term Extraction (ATE) Opinion Term Extraction (OTE) Aspect Term-Opinion Term Pair Extraction (AOPE) Aspect term, Sentiment, Opinion term Triplet Extraction (ASOTE) Aspect Category Detection (ACD) Aspect Category-Sentiment Pair Extraction (ACSA) Aspect-Category-Opinion-Sentiment (ACOS) Quadruple… See the full description on the dataset page: https://huggingface.co/datasets/yuncongli/chat-sentiment-analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset has three sentiments namely, negative, neutral, and positive. It contains two fields for the tweet and label.
letijo03/sentiment-analysis-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by mansh_anand
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Humans spend most of their time in settlements, and the built environment of settlements may affect the residents’ sentiments. Research in this field is interdisciplinary, integrating urban planning and public health. However, it has been limited by the difficulty of quantifying subjective sentiments and the small sample size. This study uses 147,613 Weibo text check-ins in Xiamen from 2017 to quantify residents' sentiments in 1,096 neighborhoods in the city. A multilevel regression model and gradient boosting decision tree (GBDT) model are used to investigate the multilevel and nonlinear effects of the built environment of neighborhoods and subdistricts on residents' sentiments. The results show the following: 1) The multilevel regression model indicates that at the neighborhood level, a high land value, low plot ratio, low population density, more security facilities, and neighborhoods close to water are more likely to improve the residents’ sentiments. At the subdistrict level, more green space and commercial land, less industry, higher building density and road density, and a smaller migrant population are more likely to promote positive sentiments. Approximately 19% of the total variance in the sentiments occurred among subdistricts. 2) The number of security facilities, the proportion of green space and commercial land, and the density of buildings and roads are linearly correlated with residents' sentiments. The land value and the number of security facilities are basic needs and exhibit nonlinear correlations with sentiments. The plot ratio, population density, and the proportions of industrial land and the migrant population are advanced needs and are nonlinearly correlated with sentiments. The quantitative analysis of sentiments enables setting a threshold of the influence of the built environment on residents' sentiments in neighborhoods and surrounding areas. Our results provide data support for urban planning and implementing targeted measures to improve the living environment of residents.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Understand the mood of social media with sentiment analysis. Monitor brand mentions, analyze feedback, and tailor strategies.
Problem Statement
👉 Download the case studies here
A global consumer goods company struggled to understand customer sentiment across various social media platforms. With millions of posts, reviews, and comments generated daily, manually tracking and analyzing public opinion was inefficient. The company needed an automated solution to monitor brand perception, address negative feedback promptly, and leverage insights for marketing strategies.
Challenge
Analyzing social media sentiment posed the following challenges:
Processing vast amounts of unstructured text data from multiple platforms like Twitter, Facebook, and Instagram.
Accurately interpreting slang, emojis, and nuanced language used by social media users.
Identifying trends and actionable insights in real-time to respond to potential crises or opportunities effectively.
Solution Provided
An advanced sentiment analysis system was developed using Natural Language Processing (NLP) and sentiment analysis algorithms. The solution was designed to:
Classify social media posts into positive, negative, and neutral sentiments.
Extract key topics and trends related to the brand and its products.
Provide real-time dashboards for monitoring customer sentiment and identifying areas of improvement.
Development Steps
Data Collection
Aggregated data from major social media platforms using APIs, focusing on brand mentions, hashtags, and product keywords.
Preprocessing
Cleaned and normalized text data, including handling slang, emojis, and misspellings, to prepare it for analysis.
Model Training
Trained NLP models for sentiment classification using supervised learning. Implemented topic modeling algorithms to identify recurring themes and discussions.
Validation
Tested the sentiment analysis models on labeled datasets to ensure high accuracy and relevance in classifying social media posts.
Deployment
Integrated the sentiment analysis system with a real-time analytics dashboard, enabling the marketing and customer support teams to track trends and respond proactively.
Monitoring & Improvement
Established a continuous feedback mechanism to refine models based on evolving language patterns and new social media trends.
Results
Gained Actionable Insights
The system provided detailed insights into customer opinions, helping the company identify strengths and areas for improvement.
Improved Brand Reputation Management
Real-time monitoring enabled swift responses to negative feedback, mitigating potential reputation risks.
Informed Marketing Strategies
Insights from sentiment analysis guided targeted marketing campaigns, resulting in higher engagement and ROI.
Enhanced Customer Relationships
Proactive engagement with customers based on sentiment analysis improved customer satisfaction and loyalty.
Scalable Monitoring Solution
The system scaled efficiently to analyze data across multiple languages and platforms, broadening the company’s reach and understanding.
Title: Text-Analysis Dataset with Stopwords, Positive Words, and Negative Words
Description: This dataset is designed for text analysis tasks and contains three types of words: stopwords, positive words, and negative words. Stopwords are common words that are typically removed from text during preprocessing because they don't carry much meaning, such as "the," "and," "a," etc. Positive words are words that convey a positive sentiment, while negative words are words that convey a negative sentiment.
The stopwords were obtained from a standard list used in natural language processing, while the positive and negative words were obtained from publicly available sentiment lexicons.
Each word is provided as a separate entry in the dataset.
The dataset is provided in CSV format and is suitable for use in various text analysis tasks, such as sentiment analysis, text classification, and natural language processing.
Columns: All the csvs contain a single column having the specified set of words.
EG: positive-words.txt a+ abound abounds abundance abundant accessable accessible acclaim acclaimed acclamation accolade accolades accommodative . . . and so on
This dataset can be used to build models that can automatically classify text as positive or negative, or to identify which words are likely to carry more meaning in a given text.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There is an increasing demand for sentiment analysis of text from social media which are mostly code-mixed. Systems trained on monolingual data fail for code-mixed data due to the complexity of mixing at different levels of the text. However, very few resources are available for code-mixed data to create models specific for this data. Although much research in multilingual and cross-lingual sentiment analysis has used semi-supervised or unsupervised methods, supervised methods still performs better. Only a few datasets for popular languages such as English-Spanish, English-Hindi, and English-Chinese are available. There are no resources available for Malayalam-English code-mixed data. This paper presents a new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators. This gold standard corpus obtained a Krippendorff’s alpha above 0.8 for the dataset. We use this new corpus to provide the benchmark for sentiment analysis in Malayalam-English code-mixed texts.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:
0 — Negative 1 — Neutral 2 — Positive The Data has been gathered from multiple websites such as : Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed — lowercase, cleaned from punctuation, URLs, numbers, and stopwords — and is ready for NLP pipelines. 📊 Columns Column Description Comment User-generated text content | Sentiment| Sentiment label (0=Negative, 1=Neutral, 2=Positive) | 🚀 Use Cases 🧠 Train sentiment classifiers using LSTM, BiLSTM, CNN, BERT, or RoBERTa 🔍 Evaluate preprocessing and tokenization strategies 📈 Benchmark NLP models on multi-class classification tasks 🎓 Educational projects and research in opinion mining or text classification
Original Data Source: Sentiment Analysis Dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
because of COVID-19
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is automatically generated by webscraping from sites such as Tripadvisor or Google Maps reviews. In these sites, the users post comments with ratings, allowing us to have tagged data. The code that generated this dataset can be found at the following URL:
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Dataset Name
Dataset Summary
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure
Data Instances
[More Information Needed]
Data Fields
[More Information Needed]
Data Splits
[More Information Needed]
Dataset Creation… See the full description on the dataset page: https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset.