Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Mahadih534/Bengali-E-commerce-sentiments dataset hosted on Hugging Face and contributed by the HF Datasets community
This is a data set of Sentiment Analysis On Bangla News Comments where every data was annotated by three different individuals to get three different perspectives and based on the majorities decisions the final tag was chosen. This data set contains 13802 data in total.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The repository contains 3307 Negative reviews and 8500 Positive reviews collected and manually annotated from Youtube Bengali drama.If you use this dataset, please cite the following paper-@inproceedings{sazzed2020cross,title={Cross-lingual sentiment classification in low-resource Bengali language},author={Sazzed, Salim},booktitle={Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)},pages={50--60},year={2020}
}If you have any questions, please email me- salimsazzad222@gmail.com.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
ROOTS Subset: roots_indic-bn_bangla_sentiment_classification_datasets
Bangla Sentiment Classification Datasets
Dataset uid: bangla_sentiment_classification_datasets
Description
Multiple sentiment classification datasets for Bengali, which can also be used for training LMs. The Datasets are the following: ABSA_datasets -- This dataset has developed to perform aspect based sentiment analysis task in Bangla. License: CC BY 4.0 SAIL_data -- This dataset, consists of tweet… See the full description on the dataset page: https://huggingface.co/datasets/bigscience-data/roots_indic-bn_bangla_sentiment_classification_datasets.
2-2325: From Twitter datasets May-November 2013 2326-16127: From http://dx.doi.org/10.17632/n53xt69gnf.3
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Welcome to the Bangla Financial Lexicon Data Dictionary project!
The financial lexicon data dictionary is a list of words used to calculate the sentiment of financial news articles. Bangla words were collected from an online Bangla dictionary API and manually categorized into 6 weighted groups. To accurately determine the sentiment of sentences, a lexicon data dictionary is required. This project's lexicon data dictionary only contains Bangla words and includes words with positive sentiment and words with negative sentiment.
This dataset was a crucial part of our research published in the journal paper titled "Stock Market Prediction of Bangladesh Using Multivariate Long Short-Term Memory with Sentiment Identification." The paper can be accessed and cited at http://doi.org/10.11591/ijece.v13i5.pp5696-5706.
Understanding the Categories:
Bull words This word collection is called bull words because, from a financial standpoint, they are considered to have positive connotations. These words are typically associated with upward market trends, increasing stock prices, and overall economic growth. In this sense, bull words are viewed as desirable and are often used by financial analysts and investors to convey optimism about the state of the economy.
Bear words Bear word list is the opposite of positive sentimental words in financial sentiment analysis. For the purpose of evaluating the sentiment around business news, every phrase on this list is regarded as having a contradictory sentiment. Bear word lists typically consist of words that are associated with downward trends in the stock market, such as recession, inflation, unemployment, and bankruptcy.
Negative words Negative word list has words like “ন়া”, “নয়”, and “নেই” which can make a full sentence negative in the Bangla language. These negative words can have a significant impact on the overall sentiment of a sentence, even if the other words in the sentence are positive. The negative word list is a crucial tool for sentiment analysis in the Bangla language.
Coordinating conjunction words (Co con.) In the Bangla language conjunctions like “কিন্তু”, “আদপে”, “এবং”, “অথবা” plays an important role in sentence making. They should have their own weighted effect value in sentiment analysis. By assigning weighted effect values to conjunctions in Bangla language, resulting in more accurate sentiment analysis.
Subordinating conjunctions (Sub con.) Another kind of conjunctions list with words like "অধিকন্ত", "এমনকি", "বিশেষত". These conjunctions are often used to indicate a shift in tone or emphasis in a sentence and can play a significant role in shaping the overall sentiment. By assigning weighted effect values to these conjunctions, financial analysts can further refine their sentiment analysis, providing even more accurate insights into the sentiment of financial news and information.
Adjectives and adverbs (Adj.) We listed some adjectives and adverbs like "সবচাইতে", "অধিক", "সর্বাধিক" as they are used to glorify the sentence sentiment more than other simple words. We categorized them into 3 weighted categories: high, medium, and low. Words with high weight have the greatest impact, words with medium weight have a moderate impact, and words with low weight have the least impact.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Critical studies found NLP systems to bias based on gender and racial identities. However, few studies focused on identities defined by cultural factors like religion and nationality. Compared to English, such research efforts are even further limited in major languages like Bengali due to the unavailability of labeled datasets. Our paper (see the reference) describes a process for developing a bias evaluation dataset highlighting cultural influences on identity. We also provide this Bengali dataset as an artifact outcome that can contribute to future critical research.
If you find this dataset useful, please cite the associated paper:
Das, D., Guha, S., & Semaan, B. (2023, May). Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) (pp. 68-83).
BibTeX:
@inproceedings{das-etal-2023-toward, title = "Toward Cultural Bias Evaluation Datasets: The Case of {B}engali Gender, Religious, and National Identity", author = "Das, Dipto and Guha, Shion and Semaan, Bryan", booktitle = "Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.c3nlp-1.8", pages = "68--83", }
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Bengali Sentiment Analysis Dataset
Dataset Description
This dataset contains 44,236 Bengali sentences with corresponding sentiment labels, synthetically generated using ChatGPT for natural language processing and machine learning research.
Dataset Summary
Language: Bengali (বাংলা) Total Entries: 44,236 synthetic sentences Task: Sentiment Classification Format: JSON Generation Method: OpenAI ChatGPT (GPT-4) License: CC0 1.0 Universal (Public Domain)… See the full description on the dataset page: https://huggingface.co/datasets/shaikh25/synthetic-bengali-sentiment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset "Motamot" containing 7,058 data points labeled with Positive and Negative sentiments, tailored specifically for Political Sentiment Analysis in the Bengali language. The dataset comprises 4,132 instances labeled as Positive and 2,926 instances labeled as Negative sentiments.
Specifics of the Core Data: —------------------------------- Train 5647, Test 706, Validation 705
Train : —-------------------------------
Positive: 3306
Negative: 2341
Test : —-------------------------------
Positive: 413
Negative: 293
Validation : —-------------------------------
Positive: 413
Negative: 292
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is created by leveraging the social media platforms such as twitter for developing corpus across multiple languages. The corpus creation methodology is applicable for resource-scarce languages provided the speakers of that particular language are active users on social media platforms. We present an approach to extract social media microblogs such as tweets (Twitter). We created corpus for multilingual sentiment analysis and emoji prediction in Hindi, Bengali and Telugu. Further, we perform and analyze multiple NLP tasks utilizing the corpus to get interesting observations.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset offers a comprehensive collection of Bengali news articles specifically curated for the purpose of fake news detection. The data has been meticulously gathered and processed to aid researchers and practitioners in developing and testing models for distinguishing between real and fake news in the Bengali language.
The dataset underwent extensive cleaning to remove HTML symbols, unusual commas, and other formatting issues. It was then structured into a CSV format for ease of use and analysis. The data is well-suited for training and evaluating machine learning models aimed at fake news detection, text classification, and sentiment analysis.
This dataset is an invaluable resource for researchers, developers, and data scientists working on text classification and fake news detection in Bengali. Its extensive coverage and detailed attributes provide a robust foundation for developing advanced analytical and machine learning models.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Introduces three datasets of expressing hate, commonly used topics, and opinions for hate speech detection, document classification, and sentiment analysis, respectively.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This BanglaBlend dataset is a comprehensive collection of Bangla (Bengali) sentences meticulously categorized based on two specific forms: Saint(Sadhu) and Common(Cholito). This dataset is comprised of a total 7350 annotated Bangla sentences as well as it is preprocessed dataset where several data preprocessing techniques have been applied. This dataset is designed to facilitate research and development in natural language processing (NLP) and computational linguistics, particularly for Bangla, a widely spoken language in Bangladesh and parts of India. Specially, this dataset can be leveraged for several natural language processing task such as text summarization, text classification, sentiment analysis, automatic language translation.
Social Media User Sentiment Analysis Dataset. Each user comments are labeled with either positive (1), negative (2), or neutral (0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, the surge in reviews and comments on newspapers and social media has made sentiment analysis a focal point of interest for researchers. Sentiment analysis is also gaining popularity in the Bengali language. However, Aspect-Based Sentiment Analysis is considered a difficult task in the Bengali language due to the shortage of perfectly labeled datasets and the complex variations in the Bengali language. This study used two open-source benchmark datasets of the Bengali language, Cricket, and Restaurant, for our Aspect-Based Sentiment Analysis task. The original work was based on the Random Forest, Support Vector Machine, K-Nearest Neighbors, and Convolutional Neural Network models. In this work, we used the Bidirectional Encoder Representations from Transformers, the Robustly Optimized BERT Approach, and our proposed hybrid transformative Random Forest and Bidirectional Encoder Representations from Transformers (tRF-BERT) models to compare the results with the existing work. After comparing the results, we can clearly see that all the models used in our work achieved better results than any of the previous works on the same dataset. Amongst them, our proposed transformative Random Forest and Bidirectional Encoder Representations from Transformers achieved the highest F1 score and accuracy. The accuracy and F1 score of aspect detection for the Cricket dataset were 0.89 and 0.85, respectively, and for the Restaurant dataset were 0.92 and 0.89 respectively.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a manually annotated Bangla Emotion corpus, which incorporates the diversity of fine-grained emotion expressions in social-media text. We tried to consider more fine-grained emotion labels such as Sadness, Happiness, Disgust, Surprise, Fear and Anger - which are, according to Paul Ekman (1999), the six basic emotion categories. For this task, we collected a large amount of raw text data from the user’s comments on two different Facebook groups (Ekattor TV and Airport Magistrates) and from the public post of a popular blogger and activist Dr. Imran H Sarker. These comments are mostly reactions to ongoing socio-political issues and towards the economic success and failure of Bangladesh. We scrape a total of 32923 comments from the three sources aforementioned above. Out of these, a total of 6314 comments were annotated into the six categories. The distribution of the annotated corpus is as follows:
sad = 1341 happy = 1908 disgust = 703 surprise = 562 fear = 384 angry = 1416
We have also provided a balanced set from the above data and split the dataset into training and test set of equal ratio. We considered a proportion of 5:1 for training and evaluation purpose. More information on the dataset and the experiments on it could be found in our paper (related links below).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
BanglaSER is a specialized dataset designed for the task of Bangla speech emotion recognition. This dataset includes a rich collection of speech-audio recordings that capture a variety of fundamental human emotions. It is curated to support research and development in the field of speech emotion recognition, particularly for the Bangla language, and is suitable for various deep learning architectures.
We extend our gratitude to the contributors and participants who made this dataset possible. Their efforts have greatly enriched the field of speech emotion recognition and provided valuable resources for the community.
Feel free to explore the dataset and utilize it in your research and projects. We look forward to seeing the innovative applications and advancements that will emerge from the use of BanglaSER
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
BᴀɴɢʟᴀBᴏᴏᴋ: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews
This repository contains the code, data, and models of the paper titled "BᴀɴɢʟᴀBᴏᴏᴋ: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews" published in the Findings of the Association for Computational Linguistics: ACL 2023.
License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
Data Format
Each row consists of a book review sample. The… See the full description on the dataset page: https://huggingface.co/datasets/Starscream-11813/BanglaBook.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 44,001 Bengali comments, curated to detect cyberbullying using Natural Language Processing (NLP) techniques. Each comment is labeled by experts, categorizing different forms of harassment and offensive behavior. The dataset enables the identification of inappropriate content, ranging from mild to severe harassment, facilitating precise classification and analysis. This resource is designed for researchers and developers working on cyberbullying detection, sentiment… See the full description on the dataset page: https://huggingface.co/datasets/faisalahmed/Bengali_Cyberbullying_Detection_Comments_Dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The different prominences of the collected data are given shortly: Data Types: Audio Data size: 1750 (speech-text-phase), 1000 (speech-base-phase) Linguistic diversity: Short text Audio capturing quality: 44.1 KHz, Mono
We created a new personality traits dataset for our research work because there is a noticeable absence of datasets for automatically assessing personality from Bangla Speech. This data, processed with Machine Learning models, demonstrated that different personality produce varying magnitudes at different frequencies, exhibiting distinct patterns.
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Mahadih534/Bengali-E-commerce-sentiments dataset hosted on Hugging Face and contributed by the HF Datasets community