This dataset is used in Multilabel sentiment analysis and emotion detection for YouTube comments in different kinds of Bengali videos.
There are two files in the folder. There are might be multiple comments with same text. Also it may be noted that, the comments collected here contain abusive and vulgar words, slangs and personal attack. Therefore, we ensure that all annotators are adults.
Sentiment.csv
Id - Unique id number for the comment. Text - Text of the data Label - 1 (3 class label) or 2 (5 class label) Score - Denotes the polarity of the comment. In three class labelling : 1(positive), 0 (neutral), -1(negative) In three class labelling : 2 (highly positive), 1(positive), 0 (neutral), -1(negative), -2(highly negative) Lan - Language of the comment. EN (English), BN (Bengali), RN (Romanized Bangla) Domain - Category of the video.
Emotion.csv
Id - Unique id number for the comment. Text - Text of the data emotion - Corresponding emotion of the comment. Anger/Joy/Disgust/Fear/Surprise/Sad/None (no emotion found) Lan - Language of the comment. EN (English), BN (Bengali), RN (Romanized Bangla) Domain - Category of the video.
If you use the dataset in any research work, please cite the following paper as
N. Irtiza Tripto and M. Eunus Ali, "Detecting Multilabel Sentiment and Emotions from Bangla YouTube Comments," 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), Sylhet, 2018, pp. 1-6.
doi: 10.1109/ICBSLP.2018.8554875
It will be helpful for researchers specially in analyzing sentiments from social media in non-English language
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Bangla Product Comments Dataset is a comprehensive collection of product reviews gathered from diverse ecommerce platforms in Bangladesh. This dataset offers a rich source of information reflecting customer opinions and sentiments towards various products available online. This dataset holds significant value for businesses, researchers, and data scientists interested in understanding consumer behavior, product perception, and sentiment analysis within the Bangladeshi ecommerce landscape. By leveraging this dataset, stakeholders can derive actionable insights to enhance product quality, marketing strategies, and overall customer satisfaction.
Columns:
This dataset was created by Nuhash Afnan
This is a data set of Sentiment Analysis On Bangla News Comments where every data was annotated by three different individuals to get three different perspectives and based on the majorities decisions the final tag was chosen. This data set contains 13802 data in total.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset presents the news articles published in a renowned Bengali YouTube news channel along with the public comments, replies, and other corresponding information. There are 7,62,678 samples of data with 15 features. The features include video URL, title of the news, likes in the video, video views, publishing date, hashtags, video description, comments with corresponding likes, and replies with likes. To ensure the privacy of the commentators, their names have been encoded.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Critical studies found NLP systems to bias based on gender and racial identities. However, few studies focused on identities defined by cultural factors like religion and nationality. Compared to English, such research efforts are even further limited in major languages like Bengali due to the unavailability of labeled datasets. Our paper (see the reference) describes a process for developing a bias evaluation dataset highlighting cultural influences on identity. We also provide this Bengali dataset as an artifact outcome that can contribute to future critical research.
If you find this dataset useful, please cite the associated paper:
Das, D., Guha, S., & Semaan, B. (2023, May). Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) (pp. 68-83).
BibTeX:
@inproceedings{das-etal-2023-toward, title = "Toward Cultural Bias Evaluation Datasets: The Case of {B}engali Gender, Religious, and National Identity", author = "Das, Dipto and Guha, Shion and Semaan, Bryan", booktitle = "Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.c3nlp-1.8", pages = "68--83", }
This dataset contains around 1300 positive and negative Bengal ( Bangla ) sentiment words. This lexicon was created from a Bengali review corpus.
If you use this lexicon please cite following paper-
@inproceedings{sazzed2020development, title={Development of Sentiment Lexicon in Bengali utilizing Corpus and Cross-lingual Resources}, author={Sazzed, Salim}, booktitle={2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI)}, pages={237--244}, year={2020}, organization={IEEE Computer Society} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The repository contains 3307 Negative reviews and 8500 Positive reviews collected and manually annotated from Youtube Bengali drama.If you use this dataset, please cite the following paper-@inproceedings{sazzed2020cross,title={Cross-lingual sentiment classification in low-resource Bengali language},author={Sazzed, Salim},booktitle={Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)},pages={50--60},year={2020}
}If you have any questions, please email me- salimsazzad222@gmail.com.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Bangla Sentiment Dataset is a curated collection of sentiment-rich textual data in Bangla, focused on recent and trending topics. This dataset has been compiled from diverse sources, including Bangladeshi online newspapers, social media platforms, and blogs, ensuring a wide spectrum of language styles and sentiment expressions.
Key Features: Focus on Recent Topics: The dataset emphasizes contemporary issues, trending discussions, and popular topics in Bangladeshi society. This includes sentiments on political developments, social movements, entertainment, cultural events, and other recent happenings.
Source Variety:
Online Newspapers: Articles, editorials, headlines, and reader comments provide structured and semi-formal sentiment data. Social Media: Posts, tweets, and comments reflect informal, conversational language with high emotional expressiveness. Blogs: Opinion pieces and discussions offer detailed and context-rich sentiment content. Sentiment Labels: Each entry in the dataset is annotated with one of the following sentiment categories:
Positive (1): Texts expressing happiness, agreement, or optimism. Negative (0): Texts reflecting criticism, disagreement, or pessimism. Neutral (2): Texts presenting balanced or factual statements with minimal emotional bias. Linguistic and Stylistic Diversity: The dataset captures a range of Bangla language variations, including:
Formal and informal Bangla usage. Regional dialects. Transliterated Bangla (Banglish) commonly used on social media. Real-World Context: The inclusion of recent topics ensures that the dataset is relevant for analyzing public sentiment around current events and trends. This makes it particularly useful for real-time sentiment analysis applications.
This dataset provides an invaluable resource for researchers and practitioners aiming to explore sentiment analysis in Bangla, with a special emphasis on modern-day relevance and real-world applicability.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
BᴀɴɢʟᴀBᴏᴏᴋ: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews
This repository contains the code, data, and models of the paper titled "BᴀɴɢʟᴀBᴏᴏᴋ: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews" published in the Findings of the Association for Computational Linguistics: ACL 2023.
License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
Data Format
Each row consists of a book review sample.… See the full description on the dataset page: https://huggingface.co/datasets/Starscream-11813/BanglaBook.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Mahfuz Ahmed Masum
Released under CC0: Public Domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 1443 Bangla book reviews. Among them 471 reviews are annotated as negative sentiment and 972 reviews are labelled as positive sentiment. All the reviews are collected from different online book shops and social media groups. The reviews are manually annotated by two native Bengali speakers. Though, the dataset is relatively small but it can be used for learning as well as research purpose.
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
This dataset was created by Sushmit
Released under GPL 2
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Air travel is one of the most used ways of transit in our daily lives. So it's no wonder that more and more people are sharing their experiences with airlines and airports using web-based online surveys. This dataset aims to do topic modeling and sentiment analysis on Skytrax (airlinequality.com) and Tripadvisor (tripadvisor.com) postings where there is a lot of interest and engagement from people who have used it or want to use it for airlines. The goal of individuals gathering at Skytrax and Tripadvisor is to make better decisions based on the actual experiences of other customers who have flown with airlines. We gathered online reviews submitted by consumers who have flown with Bangladesh airlines in the past. The dataset gives a view of customer online reviews in terms of Bangladesh Airlines that covers all the reviews from September 2018 to July 2021. In this dataset, both 519 online reviews are for domestic routes and 528 online reviews are for international route passengers online review and lastly, work on overall 1047 online reviews of Bangladesh Airlines. The second dataset are combining the Skytrax and Tripadvisor datasets, we arrived with 1095 review records for Bangladesh Airlines.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, the surge in reviews and comments on newspapers and social media has made sentiment analysis a focal point of interest for researchers. Sentiment analysis is also gaining popularity in the Bengali language. However, Aspect-Based Sentiment Analysis is considered a difficult task in the Bengali language due to the shortage of perfectly labeled datasets and the complex variations in the Bengali language. This study used two open-source benchmark datasets of the Bengali language, Cricket, and Restaurant, for our Aspect-Based Sentiment Analysis task. The original work was based on the Random Forest, Support Vector Machine, K-Nearest Neighbors, and Convolutional Neural Network models. In this work, we used the Bidirectional Encoder Representations from Transformers, the Robustly Optimized BERT Approach, and our proposed hybrid transformative Random Forest and Bidirectional Encoder Representations from Transformers (tRF-BERT) models to compare the results with the existing work. After comparing the results, we can clearly see that all the models used in our work achieved better results than any of the previous works on the same dataset. Amongst them, our proposed transformative Random Forest and Bidirectional Encoder Representations from Transformers achieved the highest F1 score and accuracy. The accuracy and F1 score of aspect detection for the Cricket dataset were 0.89 and 0.85, respectively, and for the Restaurant dataset were 0.92 and 0.89 respectively.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the Bangla language, sentiment analysis is becoming more and more significant. Aspect-based sentiment analysis (ABSA) predicts the sentiment polarity on an aspect level. The data were collected from numerous individuals with a minimum of two aspects. Every comment is a complex or compound sentence. The datasets are organized in a folder named "BANGLA_ABSA dataset" which has four Excel files, one for each of the datasets: Car_ABSA, Mobile_phone_ABSA, Movie_ABSA, and Restaurant_ABSA. Each Excel file contains three columns namely Id, Comment, and {Aspect category, Sentiment Polarity}. Car_ABSA, Mobile_phone_ABSA, Movie_ABSA, and Restaurant_ABSA datasets have 1149, 975, 800, and 801 rows of data respectively.
Introduces three datasets of expressing hate, commonly used topics, and opinions for hate speech detection, document classification, and sentiment analysis, respectively.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 44,001 Bengali comments, curated to detect cyberbullying using Natural Language Processing (NLP) techniques. Each comment is labeled by experts, categorizing different forms of harassment and offensive behavior. The dataset enables the identification of inappropriate content, ranging from mild to severe harassment, facilitating precise classification and analysis. This resource is designed for researchers and developers working on cyberbullying detection, sentiment… See the full description on the dataset page: https://huggingface.co/datasets/faisalahmed/Bengali_Cyberbullying_Detection_Comments_Dataset.
This dataset was created by Towhidul.Tonmoy
This dataset was created by Mohd. Istiaq Hossain Junaid
This dataset is used in Multilabel sentiment analysis and emotion detection for YouTube comments in different kinds of Bengali videos.
There are two files in the folder. There are might be multiple comments with same text. Also it may be noted that, the comments collected here contain abusive and vulgar words, slangs and personal attack. Therefore, we ensure that all annotators are adults.
Sentiment.csv
Id - Unique id number for the comment. Text - Text of the data Label - 1 (3 class label) or 2 (5 class label) Score - Denotes the polarity of the comment. In three class labelling : 1(positive), 0 (neutral), -1(negative) In three class labelling : 2 (highly positive), 1(positive), 0 (neutral), -1(negative), -2(highly negative) Lan - Language of the comment. EN (English), BN (Bengali), RN (Romanized Bangla) Domain - Category of the video.
Emotion.csv
Id - Unique id number for the comment. Text - Text of the data emotion - Corresponding emotion of the comment. Anger/Joy/Disgust/Fear/Surprise/Sad/None (no emotion found) Lan - Language of the comment. EN (English), BN (Bengali), RN (Romanized Bangla) Domain - Category of the video.
If you use the dataset in any research work, please cite the following paper as
N. Irtiza Tripto and M. Eunus Ali, "Detecting Multilabel Sentiment and Emotions from Bangla YouTube Comments," 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), Sylhet, 2018, pp. 1-6.
doi: 10.1109/ICBSLP.2018.8554875
It will be helpful for researchers specially in analyzing sentiments from social media in non-English language