98 datasets found
  1. Large Sentiment Analysis Bangla Dataset

    • kaggle.com
    zip
    Updated Jul 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mir Tahmid (2024). Large Sentiment Analysis Bangla Dataset [Dataset]. https://www.kaggle.com/datasets/tahmidmir/largesentimentdata
    Explore at:
    zip(275593 bytes)Available download formats
    Dataset updated
    Jul 27, 2024
    Authors
    Mir Tahmid
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is a large-scale Bangla dataset based on positive, negative, and and neutral comments. It has four features: platform, where we get the comments; sources; comment; sentiment; and label.

    There are four columns which are Platform, Comment, Sentiment, and Label. I have collected Bangla comments from Twitter, Youtube, and Google. Comment is about positive, negative, and neutral. Sentiment is about making toxic, neutral, sad, funny, and happy comments that are labeled by 0, 1, 2, 3, and 4.

  2. Large Bangla Sentiment Dataset

    • kaggle.com
    zip
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parthaa_ghosh (2024). Large Bangla Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/parthaaghosh/large-bangla-sentiment-dataset
    Explore at:
    zip(14689290 bytes)Available download formats
    Dataset updated
    Jun 6, 2024
    Authors
    Parthaa_ghosh
    Description

    This is a dataset for Bengali sentiment analysis which a merged dataset from the publicly available sentiment dataset. The sources I used to make a merged bangla sentiment dataset are: 1) https://www.kaggle.com/datasets/cryptexcode/sentnob-sentiment-analysis-in-noisy-bangla-texts 2) https://github.com/atik-05/Bangla_ABSA_Datasets/tree/master 3) https://data.mendeley.com/datasets/n53xt69gnf/3 4) https://github.com/shakkhor/Academic-Thesis/blob/master/450/comments.csv 5) https://github.com/mohsinulkabir14/BanglaBook/tree/main/data/csv After that, i applied some cleaning and preprocessing on this merge dataset. In the dataset, there are 2 columns. One is "Data" and another is "Label". There are 3 labels for sentiment labeling. 1) Neutral : 0 2) Positive : 1 3) Negative: 2

  3. m

    RevBangla: Bangla Product Sentiment Analysis Dataset

    • data.mendeley.com
    Updated Mar 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saieef Sarower Sunny (2024). RevBangla: Bangla Product Sentiment Analysis Dataset [Dataset]. http://doi.org/10.17632/bnbbcdsf4m.1
    Explore at:
    Dataset updated
    Mar 6, 2024
    Authors
    Saieef Sarower Sunny
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Bangla Product Comments Dataset is a comprehensive collection of product reviews gathered from diverse ecommerce platforms in Bangladesh. This dataset offers a rich source of information reflecting customer opinions and sentiments towards various products available online. This dataset holds significant value for businesses, researchers, and data scientists interested in understanding consumer behavior, product perception, and sentiment analysis within the Bangladeshi ecommerce landscape. By leveraging this dataset, stakeholders can derive actionable insights to enhance product quality, marketing strategies, and overall customer satisfaction.

    Columns:

    1. Product_ID: A unique identifier for each product, facilitating organization and referencing.
    2. Date: The date when the comment was posted, providing temporal context for analysis.
    3. Customer Name: The name or identifier of the customer who submitted the comment, ensuring traceability and potential user segmentation.
    4. Rating: A numerical representation (typically on a scale of 1 to 5) reflecting the customer's overall satisfaction level with the product.
    5. Label Sentiment: A categorical label assigned to each comment indicating the sentiment expressed by the customer (e.g., positive, negative). This classification facilitates sentiment analysis tasks.
    6. Comment: The actual text of the customer's review or comment, conveying specific opinions, feedback, or experiences regarding the product.
  4. m

    MONOBHAV: A Large-Scale Bengali Dataset for Fine-Grained Sentiment Analysis

    • data.mendeley.com
    Updated Jan 27, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meherunnesa Hossain Ibnath (2026). MONOBHAV: A Large-Scale Bengali Dataset for Fine-Grained Sentiment Analysis [Dataset]. http://doi.org/10.17632/968kvv98m4.3
    Explore at:
    Dataset updated
    Jan 27, 2026
    Authors
    Meherunnesa Hossain Ibnath
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MONOBHAV is a Bengali dataset for fine-grained sentiment analysis. It contains 10,000 Bengali texts collected from social media platforms and newspaper websites. Each text is manually annotated by native Bengali speakers into five sentiment classes - Strongly Negative, Negative, Neutral, Positive, and Strongly Positive. This dataset enhances the resources available for Bengali sentiment analysis and supports the development and evaluation of more accurate sentiment models for the language.

  5. SentiFive: A Multi-Class Bengali Dataset

    • kaggle.com
    zip
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amer Mahbub (2025). SentiFive: A Multi-Class Bengali Dataset [Dataset]. https://www.kaggle.com/datasets/amermahbub01/sentifive
    Explore at:
    zip(1566621 bytes)Available download formats
    Dataset updated
    Nov 11, 2025
    Authors
    Amer Mahbub
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    SentiFive is a multi-class Bengali sentiment analysis dataset consisting of 31,411 YouTube comments, manually annotated into five sentiment categories: Strongly Negative, Weakly Negative, Neutral, Weakly Positive, and Strongly Positive. The dataset is designed to support research in fine-grained sentiment classification and low-resource language processing.

    Unlike previous Bengali sentiment datasets that focus on binary or ternary sentiment, SentiFive enables more nuanced modeling of user opinions expressed in informal social media contexts. Data were collected from a diverse set of YouTube videos, covering topics such as news, entertainment, and politics.

    Rahman, M. A., Mahbub, A., Paul, B. N., Bhattacharjee, P., & Ashik, M. A.-Z. (2025). SentiFive: A Multi-Class Bengali Dataset for Sentiment Analysis. Accepted for presentation at the IEEE 7th International Conference on Sustainable Technologies for Industry 5.0 (STI 2025), Dhaka, Bangladesh, December 11–12, 2025.

  6. n

    Data Set For Sentiment Analysis On Bengali News Comments

    • narcis.nl
    • data.mendeley.com
    Updated Sep 15, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chowdhury, M (via Mendeley Data) (2019). Data Set For Sentiment Analysis On Bengali News Comments [Dataset]. http://doi.org/10.17632/n53xt69gnf.2
    Explore at:
    Dataset updated
    Sep 15, 2019
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Chowdhury, M (via Mendeley Data)
    Description

    This is a data set of Sentiment Analysis On Bangla News Comments where every data was annotated by three different individuals to get three different perspectives and based on the majorities decisions the final tag was chosen. This data set contains 13802 data in total.

  7. EBLICT Bangla Sentiment Analysis Dataset

    • kaggle.com
    zip
    Updated Nov 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sushmit (2021). EBLICT Bangla Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/sushmit0109/eblict-bangla-sentiment-analysis-dataset
    Explore at:
    zip(18422037 bytes)Available download formats
    Dataset updated
    Nov 30, 2021
    Authors
    Sushmit
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Dataset

    This dataset was created by Sushmit

    Released under GPL 2

    Contents

  8. m

    BANGLA-ABSA: Unique Aspect Based Sentiment Analysis datasets in Bangla...

    • data.mendeley.com
    Updated Jul 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmudul Hasan (2024). BANGLA-ABSA: Unique Aspect Based Sentiment Analysis datasets in Bangla Language [Dataset]. http://doi.org/10.17632/998m4jy3m9.3
    Explore at:
    Dataset updated
    Jul 9, 2024
    Authors
    Mahmudul Hasan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the Bangla language, sentiment analysis is becoming more and more significant. Aspect-based sentiment analysis (ABSA) predicts the sentiment polarity on an aspect level. The data were collected from numerous individuals with a minimum of two aspects. Every comment is a complex or compound sentence. The datasets are organized in a folder named "BANGLA_ABSA dataset" which has four Excel files, one for each of the datasets: Car_ABSA, Mobile_phone_ABSA, Movie_ABSA, and Restaurant_ABSA. Each Excel file contains three columns namely Id, Comment, and {Aspect category, Sentiment Polarity}. Car_ABSA, Mobile_phone_ABSA, Movie_ABSA, and Restaurant_ABSA datasets have 1149, 975, 800, and 801 rows of data respectively.

  9. m

    Bengali Political Sentiment Analysis Dataset

    • data.mendeley.com
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adib Mahmud (2025). Bengali Political Sentiment Analysis Dataset [Dataset]. http://doi.org/10.17632/x5yc4m5yg2.2
    Explore at:
    Dataset updated
    Oct 2, 2025
    Authors
    Adib Mahmud
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises 3,290 Bengali political comments sourced from social media platforms, news comment sections, and online political discussions, specifically curated for sentiment analysis research in Bengali NLP. The corpus provides a comprehensive resource for training and evaluating sentiment classification models within the political domain. The dataset features 3,290 instances distributed across five sentiment classes with excellent balance (variance <8%): Very Negative (675, 20.5%), Negative (663, 20.2%), Neutral (626, 19.0%), Very Positive (664, 20.2%), and Positive (662, 20.1%). Stored in Excel format with two columns containing Bengali political comments (Unicode text) and corresponding sentiment labels, the dataset maintains high quality with no missing values and verified annotations. Comment lengths average 83 characters, ranging from 11 to 398 characters. The collection encompasses diverse political discourse including government policies and governance, electoral processes and democracy, political parties and leadership dynamics, social and economic issues, current affairs and political events, along with public opinion and citizen responses to political developments. This dataset serves multiple research purposes, including Bengali sentiment analysis model development and benchmarking, political discourse analysis and opinion mining, natural language processing research for low-resource languages, cross-lingual sentiment analysis studies, social media analytics for Bengali content, multi-class text classification research, and comparative political sentiment studies across different linguistic and cultural contexts.

  10. Bangla (Bengali) Drama Review Dataset

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    salim sazzed (2023). Bangla (Bengali) Drama Review Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.13162085.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    salim sazzed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The repository contains 3307 Negative reviews and 8500 Positive reviews collected and manually annotated from Youtube Bengali drama.If you use this dataset, please cite the following paper-@inproceedings{sazzed2020cross,title={Cross-lingual sentiment classification in low-resource Bengali language},author={Sazzed, Salim},booktitle={Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)},pages={50--60},year={2020}

    }If you have any questions, please email me- salimsazzad222@gmail.com.

  11. m

    A Dataset for Sentiment Polarity Detection of Bengali Book Reviews

    • data.mendeley.com
    Updated Aug 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eftekhar Hossain (2020). A Dataset for Sentiment Polarity Detection of Bengali Book Reviews [Dataset]. http://doi.org/10.17632/2wcw3sxxr3.1
    Explore at:
    Dataset updated
    Aug 8, 2020
    Authors
    Eftekhar Hossain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 1443 Bangla book reviews. Among them 471 reviews are annotated as negative sentiment and 972 reviews are labelled as positive sentiment. All the reviews are collected from different online book shops and social media groups. The reviews are manually annotated by two native Bengali speakers. Though, the dataset is relatively small but it can be used for learning as well as research purpose.

  12. n

    Bangla Bengali sentiment lexicon dictionary with positive and negative words...

    • narcis.nl
    • data.mendeley.com
    Updated Mar 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sazzed, S (via Mendeley Data) (2021). Bangla Bengali sentiment lexicon dictionary with positive and negative words [Dataset]. http://doi.org/10.17632/zggnjpnmwp.2
    Explore at:
    Dataset updated
    Mar 9, 2021
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Sazzed, S (via Mendeley Data)
    Description

    This dataset contains around 1300 positive and negative Bengal ( Bangla ) sentiment words. This lexicon was created from a Bengali review corpus.

    If you use this lexicon please cite following paper-

    @inproceedings{sazzed2020development, title={Development of Sentiment Lexicon in Bengali utilizing Corpus and Cross-lingual Resources}, author={Sazzed, Salim}, booktitle={2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI)}, pages={237--244}, year={2020}, organization={IEEE Computer Society} }

    https://www.cs.odu.edu/~ssazzed/IEEE_IRI_2020.pdf

  13. all_results.json

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    json
    Updated Apr 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sadia Sharmin (2025). all_results.json [Dataset]. http://doi.org/10.6084/m9.figshare.28824797.v1
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Apr 18, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Sadia Sharmin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As a result of the technological advancements of the internet, Bangladeshi users are increasingly active on social networks. In this sense, social media influencers are becoming more well-known and attracting a growing number of users. Bangladeshi food review influencers are becoming more and more well-known every day. The most sophisticated Bengali sequence classification model was used in this study's analysis of social network interaction data. Through an extensive exploration of the social media landscape, we delve into the realm of food reviews. We used the sequence classification model to classify the comments collected from social media for our study. Our findings reveal that the majority of viewers hold a positive perception of Bengali food reviews on social media, while a small number of outliers may express contrasting opinions. Notably, our classifier, BanglaBERT, achieves an impressive prediction accuracy of 83.76%, emphasizing the reliability and effectiveness of our approach.

  14. Bengali Text Classification Dataset

    • kaggle.com
    zip
    Updated Jan 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Rasel Meya (2025). Bengali Text Classification Dataset [Dataset]. https://www.kaggle.com/datasets/raselmeya/bengali-text-classification-dataset
    Explore at:
    zip(2992074 bytes)Available download formats
    Dataset updated
    Jan 27, 2025
    Authors
    Md. Rasel Meya
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Contents

    This dataset is a consolidated collection of four public Bengali text datasets curated for sentiment analysis, toxic comment classification and bengali news classification. It consists of Bengali text comments annotated with multiple categories, covering a wide range of sentiment and content-based labels. Aimed at advancing research in Bengali language processing, this dataset is particularly suited for tasks like sentiment analysis, hate speech detection, and contextual comment categorization.

    Categories and Labels

    The dataset spans over 23 distinct categories, including:

    • Sentiment Labels: Positive, Very Positive, Negative, Very Negative, Neutral
    • Toxicity Labels: Hate, Vulgar, Insult, Threat, Troll
    • Content Themes: Political, Religious, National, International, Sports, Entertainment, Lifestyle, Crime, Editorial, Education, Finance, Technology, Miscellaneous

    Label Distribution

    The dataset contains approximately 56,219 entries distributed across the following categories:

    শ্রেণী (Category)সংখ্যা (Count)
    নিরপেক্ষ (Neutral)10,536
    ইতিবাচক (Positive)9,945
    নেতিবাচক (Negative)6,505
    জাতীয় (National)5,321
    খুব নেতিবাচক (Very Negative)3,928
    অশ্লীল (Vulgar)2,505
    খুব ইতিবাচক (Very Positive)2,280
    আন্তর্জাতিক (International)1,898
    ঘৃণা (Hate)1,894
    ক্রীড়া (Sports)1,858
    ট্রল (Troll)1,389
    বিবিধ (Miscellaneous)1,236
    ধর্মীয় (Religious)1,188
    সম্পাদকীয় (Editorial)1,021
    হুমকি (Threat)916
    রাজনীতি (Politics)879
    বিনোদন (Entertainment)788
    অপমান (Insult)596
    লাইফস্টাইল (Lifestyle)342
    অপরাধ (Crime)335
    শিক্ষা (Education)308
    অর্থ (Finance)300
    প্রযুক্তি (Technology)251

    Future Work and Research Topics

    1. Multi-Label Classification for Toxicity Detection in Bengali

      • Toxicity in online content often features overlapping categories (e.g., hate and insult). Developing a robust multi-label classification model to accurately differentiate among these types could improve automated content moderation for Bengali users. Future work could also explore hierarchical labeling strategies to handle nested toxicity types.
    2. Transfer Learning and Domain Adaptation for Bengali Language Processing

      • Given the limited annotated resources in Bengali, leveraging transfer learning from large multilingual models (e.g., mBERT, XLM-R) could enhance performance. Exploring domain adaptation (such as fine-tuning for specific domains like news or social media comments) may yield insights into the domain-specific nuances in sentiment and toxicity classification.
    3. Sentiment Analysis in Code-Mixed Language

      • Bengali users often incorporate English words or phrases, creating “code-mixed” content. This dataset could be extended to support research in Bengali-English code-mixed sentiment analysis, enabling models to effectively capture sentiment across mixed-language data—especially relevant for social media and online forums.
    4. Explainable Toxicity and Sentiment Detection Models

      • Explainable models can be valuable for toxicity classification, where transparency in predictions is critical. By integrating attention mechanisms or explainability layers, researchers can develop models that highlight key words or phrases influencing each label, helping contextualize toxicity in Bengali content.
    5. Time-Based Sentiment Analysis for Socio-Political Events

      • With categories like politics and national affairs, this dataset could support temporal sentiment analysis, especially around key events (e.g., elections, sports events). Time-series models could be developed to observe sentiment shifts and toxic comment trends in response to socio-political developments.
    6. Low-Resource Language Model Development and Benchmarking

      • Future research could focus on developing Bengali-specific language models trained entirely on Bengali data, benchmarking architectures like BERT, LSTM, and Transformer models to identify which best serve Bengali sentiment and toxicity analysis. This work would contribute significantly to resources for low-resource language processing.

    This dataset serves as a valuable asset for advancing NLP research in Bengali, supporting applications such as social media moderation, sentiment-based recommendation systems, public sentiment analysis, and automated hate speech regulation tools.

    Dataset Acknowledgments

    This dataset was cr...

  15. Bengali Sentiment Analysis Dataset

    • kaggle.com
    zip
    Updated Sep 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rhs Liza (2025). Bengali Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/rhsliza/bengali-sentiment-analysis-dataset
    Explore at:
    zip(16210 bytes)Available download formats
    Dataset updated
    Sep 27, 2025
    Authors
    Rhs Liza
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Rhs Liza

    Released under CC0: Public Domain

    Contents

  16. Z

    Bengali Identity Bias Evaluation Dataset (BIBED)

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +2more
    Updated Aug 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Das, Dipto; Guha, Shion; Semaan, Bryan (2023). Bengali Identity Bias Evaluation Dataset (BIBED) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7775520
    Explore at:
    Dataset updated
    Aug 7, 2023
    Dataset provided by
    University of Colorado Boulder
    University of Toronto
    Authors
    Das, Dipto; Guha, Shion; Semaan, Bryan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Critical studies found NLP systems to bias based on gender and racial identities. However, few studies focused on identities defined by cultural factors like religion and nationality. Compared to English, such research efforts are even further limited in major languages like Bengali due to the unavailability of labeled datasets. Our paper (see the reference) describes a process for developing a bias evaluation dataset highlighting cultural influences on identity. We also provide this Bengali dataset as an artifact outcome that can contribute to future critical research.

    If you find this dataset useful, please cite the associated paper:

    Das, D., Guha, S., & Semaan, B. (2023, May). Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP) (pp. 68-83).

    BibTeX:

    @inproceedings{das-etal-2023-toward, title = "Toward Cultural Bias Evaluation Datasets: The Case of {B}engali Gender, Religious, and National Identity", author = "Das, Dipto and Guha, Shion and Semaan, Bryan", booktitle = "Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.c3nlp-1.8", pages = "68--83", }

  17. m

    A Multimodal Bangla Text–Audio Dataset for Sentiment Analysis

    • data.mendeley.com
    Updated Dec 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Darun Nayeem (2025). A Multimodal Bangla Text–Audio Dataset for Sentiment Analysis [Dataset]. http://doi.org/10.17632/5yb4jjzrx3.1
    Explore at:
    Dataset updated
    Dec 15, 2025
    Authors
    Md. Darun Nayeem
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    • Bangla, a language spoken by more than 230 million people worldwide, is significantly underrepresented in speech and sentiment analysis research when compared to high-resource languages. • This is addressed with the dataset. Researchers and developers working on low-resource language technologies, such as sentiment analysis, speech recognition, and multimodal learning frameworks, should find this extensive resource very helpful. • Sentiment-aware speech recognition, speech-based emotion detection, emotionally expressive text-to-speech systems, multimodal sentiment classification, and speaker-independent recognition models are just a few of the many applications that can be developed and evaluated using this dataset. • Its modular structure promotes continuous research expansion by enabling contributors to add new regional vocabularies, dialectal variations, or additional sentiment classes over time. • The dataset is precisely balanced, with 4,000 audio recordings created by four native speakers (two male and two female) and 500 samples for each sentiment category. The sentences capture the natural and everyday use of the Bangla language, spanning a wide range of topics that include events, emotions, personal experiences, and general statements.

  18. m

    Bengali YouTube News Opinion Data for Temporal Sentiment Analysis

    • data.mendeley.com
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lomat Haider Chowdhury (2023). Bengali YouTube News Opinion Data for Temporal Sentiment Analysis [Dataset]. http://doi.org/10.17632/3c3j3bkxvn.1
    Explore at:
    Dataset updated
    Oct 5, 2023
    Authors
    Lomat Haider Chowdhury
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    The dataset presents the news articles published in a renowned Bengali YouTube news channel along with the public comments, replies, and other corresponding information. There are 7,62,678 samples of data with 15 features. The features include video URL, title of the news, likes in the video, video views, publishing date, hashtags, video description, comments with corresponding likes, and replies with likes. To ensure the privacy of the commentators, their names have been encoded.

  19. h

    RBE_Sent

    • huggingface.co
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dalia Barua (2025). RBE_Sent [Dataset]. https://huggingface.co/datasets/DaliaBarua/RBE_Sent
    Explore at:
    Dataset updated
    Apr 20, 2025
    Authors
    Dalia Barua
    Description

    RBE_Sent Dataset Description: The RBE_Sent (Roman Bengali-English Sentiment) dataset is a synthetic, gold-standard code-mixed dataset developed for sentiment analysis tasks involving Romanized Bengali and English. It captures real-world bilingual usage by blending Roman Bengali with English tokens within the same textual instances. The dataset is designed to support research in multilingual natural language processing, especially in the context of low-resource, code-mixed languages. Each… See the full description on the dataset page: https://huggingface.co/datasets/DaliaBarua/RBE_Sent.

  20. Bengali Sentiment Dataset

    • kaggle.com
    zip
    Updated Jul 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nuhash Afnan (2020). Bengali Sentiment Dataset [Dataset]. https://www.kaggle.com/nuhashafnan/pseudolabel
    Explore at:
    zip(631287 bytes)Available download formats
    Dataset updated
    Jul 25, 2020
    Authors
    Nuhash Afnan
    Description

    Dataset

    This dataset was created by Nuhash Afnan

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mir Tahmid (2024). Large Sentiment Analysis Bangla Dataset [Dataset]. https://www.kaggle.com/datasets/tahmidmir/largesentimentdata
Organization logo

Large Sentiment Analysis Bangla Dataset

A large Scale Dataset for Bangla Sentiment Analysis

Explore at:
zip(275593 bytes)Available download formats
Dataset updated
Jul 27, 2024
Authors
Mir Tahmid
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This is a large-scale Bangla dataset based on positive, negative, and and neutral comments. It has four features: platform, where we get the comments; sources; comment; sentiment; and label.

There are four columns which are Platform, Comment, Sentiment, and Label. I have collected Bangla comments from Twitter, Youtube, and Google. Comment is about positive, negative, and neutral. Sentiment is about making toxic, neutral, sad, funny, and happy comments that are labeled by 0, 1, 2, 3, and 4.

Search
Clear search
Close search
Google apps
Main menu