43 datasets found
  1. h

    Bengali-E-commerce-sentiments

    • huggingface.co
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahadi Hassan (2024). Bengali-E-commerce-sentiments [Dataset]. https://huggingface.co/datasets/Mahadih534/Bengali-E-commerce-sentiments
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2024
    Authors
    Mahadi Hassan
    License

    Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
    License information was derived automatically

    Description

    Mahadih534/Bengali-E-commerce-sentiments dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. n

    Data Set For Sentiment Analysis On Bengali News Comments

    • narcis.nl
    • data.mendeley.com
    Updated Sep 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chowdhury, M (via Mendeley Data) (2019). Data Set For Sentiment Analysis On Bengali News Comments [Dataset]. http://doi.org/10.17632/n53xt69gnf.2
    Explore at:
    Dataset updated
    Sep 15, 2019
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Chowdhury, M (via Mendeley Data)
    Description

    This is a data set of Sentiment Analysis On Bangla News Comments where every data was annotated by three different individuals to get three different perspectives and based on the majorities decisions the final tag was chosen. This data set contains 13802 data in total.

  3. f

    Bangla (Bengali) Drama Review Dataset

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    salim sazzed (2023). Bangla (Bengali) Drama Review Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.13162085.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    salim sazzed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The repository contains 3307 Negative reviews and 8500 Positive reviews collected and manually annotated from Youtube Bengali drama.If you use this dataset, please cite the following paper-@inproceedings{sazzed2020cross,title={Cross-lingual sentiment classification in low-resource Bengali language},author={Sazzed, Salim},booktitle={Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)},pages={50--60},year={2020}

    }If you have any questions, please email me- salimsazzad222@gmail.com.

  4. h

    roots_indic-bn_bangla_sentiment_classification_datasets

    • huggingface.co
    Updated Sep 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigScience Data (2022). roots_indic-bn_bangla_sentiment_classification_datasets [Dataset]. https://huggingface.co/datasets/bigscience-data/roots_indic-bn_bangla_sentiment_classification_datasets
    Explore at:
    Dataset updated
    Sep 23, 2022
    Dataset authored and provided by
    BigScience Data
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    ROOTS Subset: roots_indic-bn_bangla_sentiment_classification_datasets

      Bangla Sentiment Classification Datasets
    

    Dataset uid: bangla_sentiment_classification_datasets

      Description
    

    Multiple sentiment classification datasets for Bengali, which can also be used for training LMs. The Datasets are the following: ABSA_datasets -- This dataset has developed to perform aspect based sentiment analysis task in Bangla. License: CC BY 4.0 SAIL_data -- This dataset, consists of tweet… See the full description on the dataset page: https://huggingface.co/datasets/bigscience-data/roots_indic-bn_bangla_sentiment_classification_datasets.

  5. Bangla-Sentiment-Analysis

    • kaggle.com
    Updated Jan 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FARID (2021). Bangla-Sentiment-Analysis [Dataset]. https://www.kaggle.com/datasets/faridmiah/banglasentimentanalysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    FARID
    Description

    Context

    2-2325: From Twitter datasets May-November 2013 2326-16127: From http://dx.doi.org/10.17632/n53xt69gnf.3

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  6. Bangla Financial lexicon Sentiment dictionary

    • kaggle.com
    Updated Jul 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Ashraful Islam (2023). Bangla Financial lexicon Sentiment dictionary [Dataset]. https://www.kaggle.com/datasets/mdashrafulislam1998/bangla-financial-lexicon-data-dictionary
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 30, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Md. Ashraful Islam
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Welcome to the Bangla Financial Lexicon Data Dictionary project!

    The financial lexicon data dictionary is a list of words used to calculate the sentiment of financial news articles. Bangla words were collected from an online Bangla dictionary API and manually categorized into 6 weighted groups. To accurately determine the sentiment of sentences, a lexicon data dictionary is required. This project's lexicon data dictionary only contains Bangla words and includes words with positive sentiment and words with negative sentiment.

    This dataset was a crucial part of our research published in the journal paper titled "Stock Market Prediction of Bangladesh Using Multivariate Long Short-Term Memory with Sentiment Identification." The paper can be accessed and cited at http://doi.org/10.11591/ijece.v13i5.pp5696-5706.

    Understanding the Categories:

    Bull words This word collection is called bull words because, from a financial standpoint, they are considered to have positive connotations. These words are typically associated with upward market trends, increasing stock prices, and overall economic growth. In this sense, bull words are viewed as desirable and are often used by financial analysts and investors to convey optimism about the state of the economy.

    Bear words Bear word list is the opposite of positive sentimental words in financial sentiment analysis. For the purpose of evaluating the sentiment around business news, every phrase on this list is regarded as having a contradictory sentiment. Bear word lists typically consist of words that are associated with downward trends in the stock market, such as recession, inflation, unemployment, and bankruptcy.

    Negative words Negative word list has words like “ন়া”, “নয়”, and “নেই” which can make a full sentence negative in the Bangla language. These negative words can have a significant impact on the overall sentiment of a sentence, even if the other words in the sentence are positive. The negative word list is a crucial tool for sentiment analysis in the Bangla language.

    Coordinating conjunction words (Co con.) In the Bangla language conjunctions like “কিন্তু”, “আদপে”, “এবং”, “অথবা” plays an important role in sentence making. They should have their own weighted effect value in sentiment analysis. By assigning weighted effect values to conjunctions in Bangla language, resulting in more accurate sentiment analysis.

    Subordinating conjunctions (Sub con.) Another kind of conjunctions list with words like "অধিকন্ত", "এমনকি", "বিশেষত". These conjunctions are often used to indicate a shift in tone or emphasis in a sentence and can play a significant role in shaping the overall sentiment. By assigning weighted effect values to these conjunctions, financial analysts can further refine their sentiment analysis, providing even more accurate insights into the sentiment of financial news and information.

    Adjectives and adverbs (Adj.) We listed some adjectives and adverbs like "সবচাইতে", "অধিক", "সর্বাধিক" as they are used to glorify the sentence sentiment more than other simple words. We categorized them into 3 weighted categories: high, medium, and low. Words with high weight have the greatest impact, words with medium weight have a moderate impact, and words with low weight have the least impact.

  7. Z

    Bengali Identity Bias Evaluation Dataset (BIBED)

    • data.niaid.nih.gov
    • kaggle.com
    Updated Aug 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Das, Dipto (2023). Bengali Identity Bias Evaluation Dataset (BIBED) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7775520
    Explore at:
    Dataset updated
    Aug 7, 2023
    Dataset provided by
    Semaan, Bryan
    Guha, Shion
    Das, Dipto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Critical studies found NLP systems to bias based on gender and racial identities. However, few studies focused on identities defined by cultural factors like religion and nationality. Compared to English, such research efforts are even further limited in major languages like Bengali due to the unavailability of labeled datasets. Our paper (see the reference) describes a process for developing a bias evaluation dataset highlighting cultural influences on identity. We also provide this Bengali dataset as an artifact outcome that can contribute to future critical research.

    If you find this dataset useful, please cite the associated paper:

    Das, D., Guha, S., & Semaan, B. (2023, May). Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) (pp. 68-83).

    BibTeX:

    @inproceedings{das-etal-2023-toward, title = "Toward Cultural Bias Evaluation Datasets: The Case of {B}engali Gender, Religious, and National Identity", author = "Das, Dipto and Guha, Shion and Semaan, Bryan", booktitle = "Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.c3nlp-1.8", pages = "68--83", }

  8. h

    synthetic-bengali-sentiment

    • huggingface.co
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shaikh R (2025). synthetic-bengali-sentiment [Dataset]. http://doi.org/10.57967/hf/5762
    Explore at:
    Dataset updated
    Jun 12, 2025
    Authors
    shaikh R
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Bengali Sentiment Analysis Dataset

      Dataset Description
    

    This dataset contains 44,236 Bengali sentences with corresponding sentiment labels, synthetically generated using ChatGPT for natural language processing and machine learning research.

      Dataset Summary
    

    Language: Bengali (বাংলা) Total Entries: 44,236 synthetic sentences Task: Sentiment Classification Format: JSON Generation Method: OpenAI ChatGPT (GPT-4) License: CC0 1.0 Universal (Public Domain)… See the full description on the dataset page: https://huggingface.co/datasets/shaikh25/synthetic-bengali-sentiment.

  9. m

    Motamot: A Dataset for Revealing the Supremacy of Large Language Models over...

    • data.mendeley.com
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatema Tuj Johora Faria (2024). Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis [Dataset]. http://doi.org/10.17632/hdhnrrwdz2.1
    Explore at:
    Dataset updated
    May 13, 2024
    Authors
    Fatema Tuj Johora Faria
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset "Motamot" containing 7,058 data points labeled with Positive and Negative sentiments, tailored specifically for Political Sentiment Analysis in the Bengali language. The dataset comprises 4,132 instances labeled as Positive and 2,926 instances labeled as Negative sentiments.

    Specifics of the Core Data: —------------------------------- Train 5647, Test 706, Validation 705

    Train : —-------------------------------

    Positive: 3306

    Negative: 2341

    Test : —-------------------------------

    Positive: 413

    Negative: 293

    Validation : —-------------------------------

    Positive: 413

    Negative: 292

  10. f

    Data from: Twitter corpus of Resource-Scarce Languages for Sentiment...

    • figshare.com
    zip
    Updated Jun 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajat Singh; Nurendra Choudhary (2018). Twitter corpus of Resource-Scarce Languages for Sentiment Analysis and Multilingual Emoji Prediction [Dataset]. http://doi.org/10.6084/m9.figshare.6477782.v6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2018
    Dataset provided by
    figshare
    Authors
    Rajat Singh; Nurendra Choudhary
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is created by leveraging the social media platforms such as twitter for developing corpus across multiple languages. The corpus creation methodology is applicable for resource-scarce languages provided the speakers of that particular language are active users on social media platforms. We present an approach to extract social media microblogs such as tweets (Twitter). We created corpus for multilingual sentiment analysis and emoji prediction in Hindi, Bengali and Telugu. Further, we perform and analyze multiple NLP tasks utilizing the corpus to get interesting observations.

  11. Bengali Fake News Dataset

    • kaggle.com
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evil Spirit05 (2024). Bengali Fake News Dataset [Dataset]. https://www.kaggle.com/datasets/evilspirit05/bengali-fake-news-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Evil Spirit05
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
    This dataset offers a comprehensive collection of Bengali news articles specifically curated for the purpose of fake news detection. The data has been meticulously gathered and processed to aid researchers and practitioners in developing and testing models for distinguishing between real and fake news in the Bengali language.
    

    Key Features:

    • Source: The dataset includes news articles scraped from popular news websites and public APIs. Major sources include reputable Bengali news portals to ensure a diverse range of content.
    • Coverage: The dataset spans from January 2018 to November 2018, providing a rich historical perspective on news trends.

    Attributes:

    • Author’s Name: The individual or organization responsible for the news article.
    • Title: The headline of the news article.
    • Main Body: The full text of the news article.
    • News Date: The publication date of the article.
    • URL: The web address where the article was originally published.
    • Country Name: The country associated with the news source.
    • Source: The original news outlet or media.
    • Word Count: The total number of words in the article.

    Data Processing:

    The dataset underwent extensive cleaning to remove HTML symbols, unusual commas, and other formatting issues. It was then structured into a CSV format for ease of use and analysis. The data is well-suited for training and evaluating machine learning models aimed at fake news detection, text classification, and sentiment analysis.
    

    Applications:

    • Fake News Detection: Train models to identify fake news articles.
    • Text Classification: Classify news articles based on their content.
    • Sentiment Analysis: Analyze the sentiment of Bengali news articles.
    This dataset is an invaluable resource for researchers, developers, and data scientists working on text classification and fake news detection in Bengali. Its extensive coverage and detailed attributes provide a robust foundation for developing advanced analytical and machine learning models.
    
  12. O

    Bengali Hate Speech

    • opendatalab.com
    • paperswithcode.com
    zip
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institute of Data Science, National University of Ireland (2022). Bengali Hate Speech [Dataset]. https://opendatalab.com/OpenDataLab/Bengali_Hate_Speech
    Explore at:
    zip(2949793 bytes)Available download formats
    Dataset updated
    Sep 21, 2022
    Dataset provided by
    RWTH Aachen University
    Nanjing University of Science and Technology
    Institute of Data Science, National University of Ireland
    Vrije University Amsterdam
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Introduces three datasets of expressing hate, commonly used topics, and opinions for hate speech detection, document classification, and sentiment analysis, respectively.

  13. m

    BanglaBlend: A Large-Scale Nobel Dataset of Bangla Sentences Categorized by...

    • data.mendeley.com
    Updated Dec 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umme Ayman ayman (2024). BanglaBlend: A Large-Scale Nobel Dataset of Bangla Sentences Categorized by Saint(Sadhu) and Common(Cholito) Form of Bengali Language [Dataset]. http://doi.org/10.17632/7rx9mk8v4m.3
    Explore at:
    Dataset updated
    Dec 9, 2024
    Authors
    Umme Ayman ayman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This BanglaBlend dataset is a comprehensive collection of Bangla (Bengali) sentences meticulously categorized based on two specific forms: Saint(Sadhu) and Common(Cholito). This dataset is comprised of a total 7350 annotated Bangla sentences as well as it is preprocessed dataset where several data preprocessing techniques have been applied. This dataset is designed to facilitate research and development in natural language processing (NLP) and computational linguistics, particularly for Bangla, a widely spoken language in Bangladesh and parts of India. Specially, this dataset can be leveraged for several natural language processing task such as text summarization, text classification, sentiment analysis, automatic language translation.

  14. P

    SentNoB Dataset

    • paperswithcode.com
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khondoker Ittehadul Islam; Sudipta Kar; Md Saiful Islam; Mohammad Ruhul Amin (2024). SentNoB Dataset [Dataset]. https://paperswithcode.com/dataset/sentnob
    Explore at:
    Dataset updated
    Jan 24, 2024
    Authors
    Khondoker Ittehadul Islam; Sudipta Kar; Md Saiful Islam; Mohammad Ruhul Amin
    Description

    Social Media User Sentiment Analysis Dataset. Each user comments are labeled with either positive (1), negative (2), or neutral (0).

  15. f

    Aspect detection for restaurant dataset.

    • plos.figshare.com
    xls
    Updated Sep 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shihab Ahmed; Moythry Manir Samia; Maksuda Haider Sayma; Md. Mohsin Kabir; M. F. Mridha (2024). Aspect detection for restaurant dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0308050.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 20, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Shihab Ahmed; Moythry Manir Samia; Maksuda Haider Sayma; Md. Mohsin Kabir; M. F. Mridha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, the surge in reviews and comments on newspapers and social media has made sentiment analysis a focal point of interest for researchers. Sentiment analysis is also gaining popularity in the Bengali language. However, Aspect-Based Sentiment Analysis is considered a difficult task in the Bengali language due to the shortage of perfectly labeled datasets and the complex variations in the Bengali language. This study used two open-source benchmark datasets of the Bengali language, Cricket, and Restaurant, for our Aspect-Based Sentiment Analysis task. The original work was based on the Random Forest, Support Vector Machine, K-Nearest Neighbors, and Convolutional Neural Network models. In this work, we used the Bidirectional Encoder Representations from Transformers, the Robustly Optimized BERT Approach, and our proposed hybrid transformative Random Forest and Bidirectional Encoder Representations from Transformers (tRF-BERT) models to compare the results with the existing work. After comparing the results, we can clearly see that all the models used in our work achieved better results than any of the previous works on the same dataset. Amongst them, our proposed transformative Random Forest and Bidirectional Encoder Representations from Transformers achieved the highest F1 score and accuracy. The accuracy and F1 score of aspect detection for the Cricket dataset were 0.89 and 0.85, respectively, and for the Restaurant dataset were 0.92 and 0.89 respectively.

  16. m

    BanglaEmotion: A Benchmark Dataset for Bangla Textual Emotion Analysis

    • data.mendeley.com
    Updated Nov 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Ataur Rahman (2020). BanglaEmotion: A Benchmark Dataset for Bangla Textual Emotion Analysis [Dataset]. http://doi.org/10.17632/24xd7w7dhp.1
    Explore at:
    Dataset updated
    Nov 20, 2020
    Authors
    Md Ataur Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present a manually annotated Bangla Emotion corpus, which incorporates the diversity of fine-grained emotion expressions in social-media text. We tried to consider more fine-grained emotion labels such as Sadness, Happiness, Disgust, Surprise, Fear and Anger - which are, according to Paul Ekman (1999), the six basic emotion categories. For this task, we collected a large amount of raw text data from the user’s comments on two different Facebook groups (Ekattor TV and Airport Magistrates) and from the public post of a popular blogger and activist Dr. Imran H Sarker. These comments are mostly reactions to ongoing socio-political issues and towards the economic success and failure of Bangladesh. We scrape a total of 32923 comments from the three sources aforementioned above. Out of these, a total of 6314 comments were annotated into the six categories. The distribution of the annotated corpus is as follows:

    sad = 1341 happy = 1908 disgust = 703 surprise = 562 fear = 384 angry = 1416

    We have also provided a balanced set from the above data and split the dataset into training and test set of equal ratio. We considered a proportion of 5:1 for training and evaluation purpose. More information on the dataset and the experiments on it could be found in our paper (related links below).

  17. BanglaSER: Bangla Audio for Emotion Recognition

    • kaggle.com
    Updated Aug 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evil Spirit05 (2024). BanglaSER: Bangla Audio for Emotion Recognition [Dataset]. https://www.kaggle.com/datasets/evilspirit05/emotion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 27, 2024
    Dataset provided by
    Kaggle
    Authors
    Evil Spirit05
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
    BanglaSER is a specialized dataset designed for the task of Bangla speech emotion recognition. This dataset includes a rich collection of speech-audio recordings that capture a variety of fundamental human emotions. It is curated to support research and development in the field of speech emotion recognition, particularly for the Bangla language, and is suitable for various deep learning architectures.
    

    Dataset Composition:

    • Total Number of Recordings: 1,467
    • Number of Speakers: 34 (17 male and 17 female)
    • Age Range of Speakers: 19 to 47 years
    • Recording Devices: Smartphones and laptops

    Emotional States Covered:

    • Angry
    • Happy
    • Neutral
    • Sad
    • Surprise

    Recording Structure:

    Each emotional state is represented by:

    • 3 Statements spoken three times by each participant.
    • For Angry, Happy, Sad, and Surprise: 3 statements × 3 repetitions × 34 speakers = 1,224 recordings.
    • For Neutral: 3 statements × 3 repetitions × 27 speakers = 243 recordings

    Key Features:

    Balanced Representation:

    • The dataset is carefully balanced with an equal number of male and female participants, ensuring that the recordings reflect diverse voices and emotional expressions.
    • Emotions are evenly distributed across the dataset, providing a robust basis for training and evaluating emotion recognition models.

    Realistic Recording Conditions:

    • Recordings are made using commonly available devices, such as smartphones and laptops, which helps in preserving the naturalistic quality of the audio.
    • The dataset reflects real-life acoustic environments, making it more applicable to real-world applications.

    Deep Learning Compatibility:

    • BanglaSER is designed to be compatible with various deep learning architectures, including Convolutional Neural Networks (CNNs), Long Short-Term Memory Networks (LSTMs), and Bidirectional LSTMs (BiLSTMs).
    • The dataset can be used for a range of tasks, from emotion classification to sentiment analysis, and more.

    Usage and Applications:

    • Emotion Recognition Models: BanglaSER provides a diverse set of recordings that are ideal for training models to recognize and classify emotions in Bangla speech.
    • Benchmarking and Evaluation: The dataset serves as a benchmark for evaluating the performance of emotion recognition systems and can help in comparing different model architectures and techniques.
    • Research and Development: Researchers can use BanglaSER to explore new methods in speech emotion recognition, develop novel algorithms, and enhance the understanding of emotion in speech.

    Dataset Access:

    Download Link: https://data.mendeley.com/datasets/t9h6p943xy/5

    • Documentation: Detailed documentation and guidelines for using the dataset are provided to assist users in effectively leveraging the data.

    Acknowledgments:

    We extend our gratitude to the contributors and participants who made this dataset possible. Their efforts have greatly enriched the field of speech emotion recognition and provided valuable resources for the community.
    
    Feel free to explore the dataset and utilize it in your research and projects. We look forward to seeing the innovative applications and advancements that will emerge from the use of BanglaSER
    
  18. h

    BanglaBook

    • huggingface.co
    Updated Jul 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syed Rifat Raiyan (2024). BanglaBook [Dataset]. https://huggingface.co/datasets/Starscream-11813/BanglaBook
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 17, 2024
    Authors
    Syed Rifat Raiyan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    BᴀɴɢʟᴀBᴏᴏᴋ: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews

    This repository contains the code, data, and models of the paper titled "BᴀɴɢʟᴀBᴏᴏᴋ: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews" published in the Findings of the Association for Computational Linguistics: ACL 2023.

    License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

      Data Format
    

    Each row consists of a book review sample. The… See the full description on the dataset page: https://huggingface.co/datasets/Starscream-11813/BanglaBook.

  19. h

    Bengali_Cyberbullying_Detection_Comments_Dataset

    • huggingface.co
    Updated Oct 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faisal Ahmed (2024). Bengali_Cyberbullying_Detection_Comments_Dataset [Dataset]. https://huggingface.co/datasets/faisalahmed/Bengali_Cyberbullying_Detection_Comments_Dataset
    Explore at:
    Dataset updated
    Oct 3, 2024
    Authors
    Faisal Ahmed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 44,001 Bengali comments, curated to detect cyberbullying using Natural Language Processing (NLP) techniques. Each comment is labeled by experts, categorizing different forms of harassment and offensive behavior. The dataset enables the identification of inappropriate content, ranging from mild to severe harassment, facilitating precise classification and analysis. This resource is designed for researchers and developers working on cyberbullying detection, sentiment… See the full description on the dataset page: https://huggingface.co/datasets/faisalahmed/Bengali_Cyberbullying_Detection_Comments_Dataset.

  20. m

    Bangla Speech Personality Traits Data

    • data.mendeley.com
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Sajeebul Islam Sk. (2024). Bangla Speech Personality Traits Data [Dataset]. http://doi.org/10.17632/fb6dm3yb6m.1
    Explore at:
    Dataset updated
    Apr 17, 2024
    Authors
    Md. Sajeebul Islam Sk.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The different prominences of the collected data are given shortly: Data Types: Audio Data size: 1750 (speech-text-phase), 1000 (speech-base-phase) Linguistic diversity: Short text Audio capturing quality: 44.1 KHz, Mono

    We created a new personality traits dataset for our research work because there is a noticeable absence of datasets for automatically assessing personality from Bangla Speech. This data, processed with Machine Learning models, demonstrated that different personality produce varying magnitudes at different frequencies, exhibiting distinct patterns.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mahadi Hassan (2024). Bengali-E-commerce-sentiments [Dataset]. https://huggingface.co/datasets/Mahadih534/Bengali-E-commerce-sentiments

Bengali-E-commerce-sentiments

Mahadih534/Bengali-E-commerce-sentiments

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 31, 2024
Authors
Mahadi Hassan
License

Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically

Description

Mahadih534/Bengali-E-commerce-sentiments dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu