Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
About this Dataset
This dataset is designed for sentiment analysis tasks, specifically to classify text comments as positive or negative. It's a supervised dataset, meaning each comment is already labeled with its corresponding sentiment.
Key Features:
Two Columns: - Text: Contains the raw text of the comments. - Tag: Indicates the sentiment of the comment, labeled as either "positive" or "negative."
Supervised Learning: Ideal for training and evaluating machine learning models for sentiment classification.
Potential Applications: - Sentiment Analysis: Build models to automatically analyze emotions and opinions in various text data. - Social Media Analysis: Understand public sentiment towards brands, products, or topics on social media platforms. - Customer Feedback Analysis: Gauge customer satisfaction and identify areas for improvement based on reviews and feedback. - Text Classification: Develop text categorization systems for diverse applications.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is a large-scale Bangla dataset based on positive, negative, and and neutral comments. It has four features: platform, where we get the comments; sources; comment; sentiment; and label.
There are four columns which are Platform, Comment, Sentiment, and Label. I have collected Bangla comments from Twitter, Youtube, and Google. Comment is about positive, negative, and neutral. Sentiment is about making toxic, neutral, sad, funny, and happy comments that are labeled by 0, 1, 2, 3, and 4.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Bangla Product Comments Dataset is a comprehensive collection of product reviews gathered from diverse ecommerce platforms in Bangladesh. This dataset offers a rich source of information reflecting customer opinions and sentiments towards various products available online. This dataset holds significant value for businesses, researchers, and data scientists interested in understanding consumer behavior, product perception, and sentiment analysis within the Bangladeshi ecommerce landscape. By leveraging this dataset, stakeholders can derive actionable insights to enhance product quality, marketing strategies, and overall customer satisfaction.
Columns:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 44,236 Bengali sentences with corresponding sentiment labels, synthetically generated using ChatGPT. The dataset is designed for Bengali natural language processing tasks, particularly sentiment analysis.
Each entry contains:
- text: Bengali sentence/phrase
- label: Sentiment label (integer)
0: Negative sentiment1: Neutral sentiment 2: Positive sentiment[
{
"text": "আজকের দিনটা একদম ভালো যায়নি।",
"label": 0
},
{
"text": "বাসা থেকে বের হতে দেরি হয়ে গেল।",
"label": 0
},
{
"text": "আমার খুব ভাল লাগছে।",
"label": 2
}
]
Run Code:
https://www.kaggle.com/code/piketar/bengali-sentiment-analysis
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the Bangla language, sentiment analysis is becoming more and more significant. Aspect-based sentiment analysis (ABSA) predicts the sentiment polarity on an aspect level. The data were collected from numerous individuals with a minimum of two aspects. Every comment is a complex or compound sentence. The datasets are organized in a folder named "BANGLA_ABSA dataset" which has four Excel files, one for each of the datasets: Car_ABSA, Mobile_phone_ABSA, Movie_ABSA, and Restaurant_ABSA. Each Excel file contains three columns namely Id, Comment, and {Aspect category, Sentiment Polarity}. Car_ABSA, Mobile_phone_ABSA, Movie_ABSA, and Restaurant_ABSA datasets have 1149, 975, 800, and 801 rows of data respectively.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises 3,290 Bengali political comments sourced from social media platforms, news comment sections, and online political discussions, specifically curated for sentiment analysis research in Bengali NLP. The corpus provides a comprehensive resource for training and evaluating sentiment classification models within the political domain. The dataset features 3,290 instances distributed across five sentiment classes with excellent balance (variance <8%): Very Negative (675, 20.5%), Negative (663, 20.2%), Neutral (626, 19.0%), Very Positive (664, 20.2%), and Positive (662, 20.1%). Stored in Excel format with two columns containing Bengali political comments (Unicode text) and corresponding sentiment labels, the dataset maintains high quality with no missing values and verified annotations. Comment lengths average 83 characters, ranging from 11 to 398 characters. The collection encompasses diverse political discourse including government policies and governance, electoral processes and democracy, political parties and leadership dynamics, social and economic issues, current affairs and political events, along with public opinion and citizen responses to political developments. This dataset serves multiple research purposes, including Bengali sentiment analysis model development and benchmarking, political discourse analysis and opinion mining, natural language processing research for low-resource languages, cross-lingual sentiment analysis studies, social media analytics for Bengali content, multi-class text classification research, and comparative political sentiment studies across different linguistic and cultural contexts.
Facebook
TwitterThis is a data set of Sentiment Analysis On Bangla News Comments where every data was annotated by three different individuals to get three different perspectives and based on the majorities decisions the final tag was chosen. This data set contains 13802 data in total.
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
This dataset was created by Sushmit
Released under GPL 2
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains around 1300 positive and negative Bengal ( Bangla ) sentiment words. This lexicon was created from a Bengali review corpus.
If you use this lexicon please cite following paper-
@inproceedings{sazzed2020development, title={Development of Sentiment Lexicon in Bengali utilizing Corpus and Cross-lingual Resources}, author={Sazzed, Salim}, booktitle={2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI)}, pages={237--244}, year={2020}, organization={IEEE Computer Society} }
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
• Bangla, a language spoken by more than 230 million people worldwide, is significantly underrepresented in speech and sentiment analysis research when compared to high-resource languages. • This is addressed with the dataset. Researchers and developers working on low-resource language technologies, such as sentiment analysis, speech recognition, and multimodal learning frameworks, should find this extensive resource very helpful. • Sentiment-aware speech recognition, speech-based emotion detection, emotionally expressive text-to-speech systems, multimodal sentiment classification, and speaker-independent recognition models are just a few of the many applications that can be developed and evaluated using this dataset. • Its modular structure promotes continuous research expansion by enabling contributors to add new regional vocabularies, dialectal variations, or additional sentiment classes over time. • The dataset is precisely balanced, with 4,000 audio recordings created by four native speakers (two male and two female) and 500 samples for each sentiment category. The sentences capture the natural and everyday use of the Bangla language, spanning a wide range of topics that include events, emotions, personal experiences, and general statements.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Rhs Liza
Released under CC0: Public Domain
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset presents the news articles published in a renowned Bengali YouTube news channel along with the public comments, replies, and other corresponding information. There are 7,62,678 samples of data with 15 features. The features include video URL, title of the news, likes in the video, video views, publishing date, hashtags, video description, comments with corresponding likes, and replies with likes. To ensure the privacy of the commentators, their names have been encoded.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As a result of the technological advancements of the internet, Bangladeshi users are increasingly active on social networks. In this sense, social media influencers are becoming more well-known and attracting a growing number of users. Bangladeshi food review influencers are becoming more and more well-known every day. The most sophisticated Bengali sequence classification model was used in this study's analysis of social network interaction data. Through an extensive exploration of the social media landscape, we delve into the realm of food reviews. We used the sequence classification model to classify the comments collected from social media for our study. Our findings reveal that the majority of viewers hold a positive perception of Bengali food reviews on social media, while a small number of outliers may express contrasting opinions. Notably, our classifier, BanglaBERT, achieves an impressive prediction accuracy of 83.76%, emphasizing the reliability and effectiveness of our approach.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The repository contains 3307 Negative reviews and 8500 Positive reviews collected and manually annotated from Youtube Bengali drama.If you use this dataset, please cite the following paper-@inproceedings{sazzed2020cross,title={Cross-lingual sentiment classification in low-resource Bengali language},author={Sazzed, Salim},booktitle={Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)},pages={50--60},year={2020}
}If you have any questions, please email me- salimsazzad222@gmail.com.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Critical studies found NLP systems to bias based on gender and racial identities. However, few studies focused on identities defined by cultural factors like religion and nationality. Compared to English, such research efforts are even further limited in major languages like Bengali due to the unavailability of labeled datasets. Our paper (see the reference) describes a process for developing a bias evaluation dataset highlighting cultural influences on identity. We also provide this Bengali dataset as an artifact outcome that can contribute to future critical research.
If you find this dataset useful, please cite the associated paper:
Das, D., Guha, S., & Semaan, B. (2023, May). Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) (pp. 68-83).
BibTeX:
@inproceedings{das-etal-2023-toward, title = "Toward Cultural Bias Evaluation Datasets: The Case of {B}engali Gender, Religious, and National Identity", author = "Das, Dipto and Guha, Shion and Semaan, Bryan", booktitle = "Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.c3nlp-1.8", pages = "68--83", }
Facebook
TwitterThis dataset was created by Nuhash Afnan
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a manually annotated Bangla Emotion corpus, which incorporates the diversity of fine-grained emotion expressions in social-media text. We tried to consider more fine-grained emotion labels such as Sadness, Happiness, Disgust, Surprise, Fear and Anger - which are, according to Paul Ekman (1999), the six basic emotion categories. For this task, we collected a large amount of raw text data from the user’s comments on two different Facebook groups (Ekattor TV and Airport Magistrates) and from the public post of a popular blogger and activist Dr. Imran H Sarker. These comments are mostly reactions to ongoing socio-political issues and towards the economic success and failure of Bangladesh. We scrape a total of 32923 comments from the three sources aforementioned above. Out of these, a total of 6314 comments were annotated into the six categories. The distribution of the annotated corpus is as follows:
sad = 1341 happy = 1908 disgust = 703 surprise = 562 fear = 384 angry = 1416
We have also provided a balanced set from the above data and split the dataset into training and test set of equal ratio. We considered a proportion of 5:1 for training and evaluation purpose. More information on the dataset and the experiments on it could be found in our paper (related links below).
Facebook
TwitterThis is a dataset for Bengali sentiment analysis which a merged dataset from the publicly available sentiment dataset. The sources I used to make a merged bangla sentiment dataset are: 1) https://www.kaggle.com/datasets/cryptexcode/sentnob-sentiment-analysis-in-noisy-bangla-texts 2) https://github.com/atik-05/Bangla_ABSA_Datasets/tree/master 3) https://data.mendeley.com/datasets/n53xt69gnf/3 4) https://github.com/shakkhor/Academic-Thesis/blob/master/450/comments.csv 5) https://github.com/mohsinulkabir14/BanglaBook/tree/main/data/csv After that, i applied some cleaning and preprocessing on this merge dataset. In the dataset, there are 2 columns. One is "Data" and another is "Label". There are 3 labels for sentiment labeling. 1) Neutral : 0 2) Positive : 1 3) Negative: 2
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, the surge in reviews and comments on newspapers and social media has made sentiment analysis a focal point of interest for researchers. Sentiment analysis is also gaining popularity in the Bengali language. However, Aspect-Based Sentiment Analysis is considered a difficult task in the Bengali language due to the shortage of perfectly labeled datasets and the complex variations in the Bengali language. This study used two open-source benchmark datasets of the Bengali language, Cricket, and Restaurant, for our Aspect-Based Sentiment Analysis task. The original work was based on the Random Forest, Support Vector Machine, K-Nearest Neighbors, and Convolutional Neural Network models. In this work, we used the Bidirectional Encoder Representations from Transformers, the Robustly Optimized BERT Approach, and our proposed hybrid transformative Random Forest and Bidirectional Encoder Representations from Transformers (tRF-BERT) models to compare the results with the existing work. After comparing the results, we can clearly see that all the models used in our work achieved better results than any of the previous works on the same dataset. Amongst them, our proposed transformative Random Forest and Bidirectional Encoder Representations from Transformers achieved the highest F1 score and accuracy. The accuracy and F1 score of aspect detection for the Cricket dataset were 0.89 and 0.85, respectively, and for the Restaurant dataset were 0.92 and 0.89 respectively.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of 34,812 Bengali posts and comments sourced from Facebook, Twitter, and Instagram, Bengali news portals and literature. Techniques employed in data acquisition included data scraping from social media accounts through API and scraping only text data from websites. Microblogs consist of posts and comments from platforms like Facebook, Twitter, and Instagram, which allow for the capture of informal and emotionally rich text. Newspaper and magazine articles provide formal, sentiment-related information through opinions. Online literature, including Bengali novels, poems, and blogs, incorporates semantic relationships and linguistic nuances. Text data is collected from public sources through automated scripts. We used selenium scripts, created using the Python programming language. We used APIs to obtain structured social media data. Additionally, we complied with the requirements of privacy, data collection, and ethics.It contains 5 Emotion and 5 Sentiment class. For emotion "Creepy" being the most frequent emotion with 12,000 entries, followed by "Unbiased" with 8,500 entries, "Joyful" with 7,500 entries, "Bullying" with 4,000 entries, and "Surprise" with 2,500 entries. On the other hand, for sentiment "Negative" being the most frequent with 8,000 entries, followed by "Neutral" with 7,000 entries, "Strongly Negative" with 6,800 entries, "Positive" with 5,500 entries, and "Strongly Positive" with 4,500 entries in that order.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
About this Dataset
This dataset is designed for sentiment analysis tasks, specifically to classify text comments as positive or negative. It's a supervised dataset, meaning each comment is already labeled with its corresponding sentiment.
Key Features:
Two Columns: - Text: Contains the raw text of the comments. - Tag: Indicates the sentiment of the comment, labeled as either "positive" or "negative."
Supervised Learning: Ideal for training and evaluating machine learning models for sentiment classification.
Potential Applications: - Sentiment Analysis: Build models to automatically analyze emotions and opinions in various text data. - Social Media Analysis: Understand public sentiment towards brands, products, or topics on social media platforms. - Customer Feedback Analysis: Gauge customer satisfaction and identify areas for improvement based on reviews and feedback. - Text Classification: Develop text categorization systems for diverse applications.