93 datasets found
  1. oyo-reviews-dataset

    • kaggle.com
    zip
    Updated Jun 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepkumar patel (2023). oyo-reviews-dataset [Dataset]. https://www.kaggle.com/datasets/deeppatel9095/oyo-reviews-dataset
    Explore at:
    zip(32300432 bytes)Available download formats
    Dataset updated
    Jun 24, 2023
    Authors
    Deepkumar patel
    Description

    The inspiration behind creating the OYO Review Dataset for sentiment analysis was to explore the sentiment and opinions expressed in hotel reviews on the OYO Hotels platform. Analyzing the sentiment of customer reviews can provide valuable insights into the overall satisfaction of guests, identify areas for improvement, and assist in making data-driven decisions to enhance the hotel experience. By collecting and curating this dataset, Deep Patel, Nikki Patel, and Nimil aimed to contribute to the field of sentiment analysis in the context of the hospitality industry. Sentiment analysis allows us to classify the sentiment expressed in textual data, such as reviews, into positive, negative, or neutral categories. This analysis can help hotel management and stakeholders understand customer sentiments, identify common patterns, and address concerns or issues that may affect the reputation and customer satisfaction of OYO Hotels. The dataset provides a valuable resource for training and evaluating sentiment analysis models specifically tailored to the hospitality domain. Researchers, data scientists, and practitioners can utilize this dataset to develop and test various machine learning and natural language processing techniques for sentiment analysis, such as classification algorithms, sentiment lexicons, or deep learning models. Overall, the goal of creating the OYO Review Dataset for sentiment analysis was to facilitate research and analysis in the area of customer sentiments and opinions in the hotel industry. By understanding the sentiment of hotel reviews, businesses can strive to improve their services, enhance customer satisfaction, and make data-driven decisions to elevate the overall guest experience.

    Deep Patel: https://www.linkedin.com/in/deep-patel-55ab48199/ Nikki Patel: https://www.linkedin.com/in/nikipatel9/ Nimil lathiya: https://www.linkedin.com/in/nimil-lathiya-059a281b1/

  2. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of CĂłrdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  3. o

    Flipkart Product Sentiment Analysis

    • opendatabay.com
    .undefined
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Flipkart Product Sentiment Analysis [Dataset]. https://www.opendatabay.com/data/ai-ml/b2f4e6f3-5c3e-4a16-bff7-9176e12787f8
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Reviews & Ratings
    Description

    This dataset provides a valuable collection of customer reviews for products purchased from Flipkart, a prominent e-commerce platform. It captures the customer experience and feedback regarding specific products, including their assigned ratings. The dataset is ideal for sentiment analysis, product insights, and understanding customer satisfaction.

    Columns

    • Index: A numerical identifier for each record.
    • Product_name: The name of the product, often accompanied by brief details.
    • Review: The textual review provided by the customer about their experience with the product.
    • Rating: A numerical rating given by the customer for the product, typically ranging from 1 to 5 stars.

    Distribution

    The dataset is typically provided in a CSV format. It contains approximately 2,303 records across its columns. While the exact file size is not specified, its structure is well-defined with distinct columns for product information, reviews, and ratings.

    Usage

    This dataset is highly suitable for various analytical and machine learning applications, including: * Performing sentiment analysis on customer reviews to gauge product perception. * Extracting product insights and understanding common customer feedback themes. * Developing and training Natural Language Processing (NLP) models for text classification and opinion mining. * Building recommendation systems based on user ratings and review content. * Analysing customer satisfaction levels and identifying areas for product improvement.

    Coverage

    The dataset focuses on customer reviews from Flipkart. While Flipkart primarily operates in India, the dataset's stated region for availability is global. There are no specific notes on time range or demographic scope within the provided information for the reviews themselves. The dataset was listed on 17/06/2025.

    License

    CC0

    Who Can Use It

    This dataset is intended for a wide range of users, including: * Data scientists and machine learning engineers for building sentiment analysis models. * NLP researchers for advancing text understanding and processing techniques. * Product managers seeking to understand customer feedback and improve product offerings. * Business analysts looking to derive actionable insights from customer reviews and ratings. * Individuals interested in AI and LLM data for training and experimentation.

    Dataset Name Suggestions

    • Flipkart Customer Reviews Dataset
    • Flipkart Product Sentiment Analysis
    • E-commerce Customer Feedback Data
    • Product Ratings and Reviews (Flipkart)
    • Indian E-commerce Review Data

    Attributes

    Original Data Source: Flipkart Reviews Sentiment Analysis

  4. o

    Amazon Customer Review Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Amazon Customer Review Dataset [Dataset]. https://www.opendatabay.com/data/consumer/3769a0a1-dc8b-44e7-9bcf-1c8f2d3fdddc
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Reviews & Ratings
    Description

    This dataset is a collection of customer reviews obtained from Amazon.com. It is designed for multilingual sentiment analysis and opinion mining, containing reviews in five different languages: Italian, German, Spanish, French, and English. The dataset is valuable for natural language processing tasks, sentiment analysis algorithms, and various machine learning applications that require diverse language data for training and evaluation. It can be used to train and fine-tune models to automatically classify sentiments, predict customer satisfaction, and extract key information from customer reviews.

    Columns

    • user_name: The name of the reviewer.
    • stars: The number of stars awarded in the review.
    • country: The country of the reviewer.
    • date: The date when the review was posted.
    • title: The title of the review.
    • text: The main body of the review text.
    • helpful: The count of people who found the review useful.

    Distribution

    The dataset is typically provided in a CSV file format. While specific total row counts are not available, examples of column value distributions are present, such as 675 total values for user names and 640 total values for star ratings, with 92% being 5/5 reviews. The dataset is structured to support various text and NLP applications.

    Usage

    This dataset is ideal for a range of applications, including: * Multilingual sentiment analysis. * Opinion mining studies. * Developing and testing natural language processing tasks. * Building sentiment analysis algorithms. * Training machine learning models to classify sentiments. * Predicting customer satisfaction from review data. * Extracting key insights and information from customer feedback.

    Coverage

    The dataset's coverage is global, drawing reviews from Amazon.com. It includes content in Italian, German, Spanish, French, and English, indicating its relevance to regions where these languages are spoken. The dataset contains a 'date' column for each review; however, a specific time range for the reviews themselves is not provided.

    License

    CC-BY-NC

    Who Can Use It

    This dataset is suitable for: * Data Scientists and Researchers: For developing and testing machine learning models for sentiment analysis, NLP, and text classification across multiple languages. * E-commerce Analysts: To understand customer satisfaction, product performance, and market sentiment from user reviews. * Language Model Developers: To fine-tune large language models with diverse text data for improved natural language understanding. * Businesses: To gain insights into customer feedback and improve product or service offerings.

    Dataset Name Suggestions

    • Amazon Customer Review Data
    • Multilingual Amazon Product Reviews
    • E-commerce Customer Sentiment Data
    • Global Amazon Review Collection

    Attributes

    Original Data Source: Amazon Review Dataset LLM

  5. Amazon Product Reviews

    • kaggle.com
    Updated Nov 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Amazon Product Reviews [Dataset]. https://www.kaggle.com/datasets/thedevastator/amazon-product-reviews/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Amazon Product Reviews

    18 Years of Customer Ratings and Experiences

    By Huggingface Hub [source]

    About this dataset

    The Amazon Reviews Polarity Dataset discloses eighteen years of customers' ratings and reviews from Amazon.com, offering an unparalleled trove of insight and knowledge. Drawing from the immense pool of over 35 million customer reviews, this dataset presents a broad spectrum of customer opinions on products they have bought or used. This invaluable data is a gold mine for improving products and services as it contains comprehensive information regarding customers' experiences with a product including ratings, titles, and plaintext content. At the same time, this dataset contains both customer-specific data along with product information which encourages deep analytics that could lead to great advances in providing tailored solutions for customers. Has your product been favored by the majority? Are there any aspects that need extra care? Use Amazon Reviews Polarity to gain deeper insights into what your customers want - explore now!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    • Analyze customer ratings to identify trends: Take a look at how many customers have rated the same product or service with the same score (e.g., 4 stars). You can use this information to identify what customers like or don’t like about it by examining common sentiment throughout the reviews. Identifying these patterns can help you make decisions on which features of your products or services to emphasize in order to boost sales and satisfaction rates.

    2 Review content analysis: Analyzing review content is one of the best ways to gauge customer sentiment toward specific features or aspects of a product/service. Using natural language processing tools such as Word2Vec, Latent Dirichlet Allocation (LDA), or even simple keyword search algorithms can quickly reveal general topics that are discussed in relation to your product/service across multiple reviews - allowing you quickly pinpoint areas that may need improvement for particular items within your lines of business.

    3 Track associated scores over time: By tracking customer ratings overtime, you may be able to better understand when there has been an issue with something specific related to your product/service - such as negative response toward a feature that was introduced but didn’t seem popular among customers and was removed shortly after introduction.. This can save time and money by identifying issues before they become widespread concerns with larger sets of consumers who invest their money in using your company's item(s).

    4 Visualize sentiment data over time graphs : Utilizing visualizations such as bar graphs can help identify trends across different categories quicker than raw numbers alone; combining both numeric values along with color differences associated between different scores allows you spot anomalies easier - allowing faster resolution times when trying figure out why certain spikes occurred where other stayed stable (or vice-versa) when comparing similar data points through time-series based visualization models

    Research Ideas

    • Developing a customer sentiment analysis system that can be used to quickly analyze the sentiment of reviews and identify any potential areas of improvement.
    • Building a product recommendation service that takes into account the ratings and reviews of customers when recommending similar products they may be interested in purchasing.
    • Training a machine learning model to accurately predict customers’ ratings on new products they have not yet tried and leverage this for further product development optimization initiatives

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:-------------------------------------------------------------------| | label | The sentiment of the review, either positive or negative. (String) | | title | The title of the review. (String) ...

  6. Analyzing sentiments related to various products

    • kaggle.com
    Updated Sep 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tanya (2020). Analyzing sentiments related to various products [Dataset]. https://www.kaggle.com/tanyadayanand/analyzing-sentiments-related-to-various-products
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 11, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tanya
    Description

    PLEASE UPVOTE if you find it useful !!!!

    Analyzing sentiments related to various products such as Tablet, Mobile and various other gizmos can be fun and difficult especially when collected across various demographics around the world. In this dataset develop a machine learning model to accurately classify various products into 4 different classes of sentiments based on the raw text review provided by the user. Analyzing these sentiments will not only help serve the customers better but can also reveal lolot of customer traits present/hidden in the reviews.

    The sentiment analysis requires a lot to be taken into account mainly due to the preprocessing involved to represent raw text and make them machine-understandable. Usually, we stem and lemmatize the raw information and then represent it using TF-IDF, Word Embeddings, etc. However, provided the state-of-the-art NLP models such as Transformer based BERT models one can skip the manual feature engineering like TF-IDF and Count Vectorizers.

  7. o

    Global Customer Product Feedback Dataset

    • opendatabay.com
    .undefined
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Global Customer Product Feedback Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/28775c3e-a835-4f3c-a0fb-06360defabcf
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    This dataset features 171,000 product reviews, meticulously labelled with sentiment indicators. It includes associated metadata such as product names and prices. The core purpose of this dataset is to facilitate sentiment analysis of product reviews, allowing for the automatic classification of textual content into positive, negative, or neutral sentiments. This resource is invaluable for understanding customer perception and informing business strategies or consumer purchasing decisions.

    Columns

    • product_name: The name of the product being reviewed (e.g., "Bumtum Baby Pull-Up Diaper Pants Combo Pack").
    • product_price: The price of the product at the time of review. Price ranges in the dataset span from 59.00 to 65257.25, with some outliers up to 86990.00.
    • Rate: The numerical rating given by the reviewer. Ratings in the dataset primarily fall within ranges such as 1.00-1.20, 2.00-2.20, 3.00-3.20, 4.00-4.20, and 4.80-5.00.
    • Review: The full text of the customer review. Examples include reviews described as 'good' or 'ok'.
    • Summary: A concise summary of the review, often including 'Nan' (Not a Number) or 'Other' categories for some entries.
    • Sentiment: The assigned sentiment label, indicating whether the review is positive, negative, or neutral. The dataset shows an approximate distribution of 34% positive, 34% neutral, and 33% other (likely negative) sentiments.

    Distribution

    The dataset comprises approximately 171,000 product reviews. It typically exists in a tabular structure, often suitable for formats like CSV. The price data exhibits a wide range, with a significant number of entries between 59.00 and 4405.55. Review ratings are distributed across the scale, with notable counts for ratings between 4.80-5.00 and 1.00-1.20. Sentiment labels are well-distributed across positive, neutral, and negative categories.

    Usage

    This dataset is ideal for: * Developing and training machine learning algorithms for sentiment analysis. * Automating the classification of product reviews by sentiment. * Tracking customer sentiment trends over time for specific products or brands. * Identifying product strengths and areas for improvement based on customer feedback. * Empowering consumers to make informed purchasing decisions by aggregating sentiment.

    Coverage

    The dataset's region of coverage is global. No specific time range for the reviews themselves is specified within the available information, though the dataset was listed on 17/06/2025. No specific demographic scope is provided.

    License

    CC0

    Who Can Use It

    • Businesses and product managers: To monitor and understand customer sentiment for their offerings, identify product improvements, and gauge market reception.
    • Data scientists and machine learning engineers: For training and validating natural language processing (NLP) models focused on sentiment classification.
    • Market researchers: To analyse broad trends in consumer opinion and behaviour related to products.
    • Consumers: To inform their purchasing choices by reviewing aggregated sentiment.
    • Academics and researchers: For studies on consumer behaviour, text analysis, and sentiment modelling.

    Dataset Name Suggestions

    • Product Review Sentiment Dataset
    • E-commerce Product Review Sentiment Data
    • Global Customer Product Feedback Dataset
    • Sentiment Labelled Product Reviews

    Attributes

    Original Data Source: 171k product review with Sentiment Dataset

  8. o

    NLP Preprocessed Sentiment Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). NLP Preprocessed Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/6323a1b5-7112-49bd-ad55-c1ef6968abc3
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    This dataset is a substantial collection of over 241,000 English-language comments, gathered from various online platforms. Each comment within the dataset has been carefully annotated with a sentiment label: 0 for negative sentiment, 1 for neutral, and 2 for positive. The primary aim of this dataset is to facilitate the training and evaluation of multi-class sentiment analysis models, designed to work effectively with real-world text data. The dataset has undergone a preprocessing stage, ensuring comments are in lowercase, and are cleaned of punctuation, URLs, numbers, and stopwords, making it readily usable for Natural Language Processing (NLP) pipelines.

    Columns

    • Comment: This column contains the user-generated text content.
    • Sentiment: This column provides the corresponding sentiment label for each comment, where 0 denotes Negative, 1 denotes Neutral, and 2 denotes Positive.

    Distribution

    The dataset comprises over 241,000 records. While the specific file format is not detailed, such datasets are typically provided in a tabular format, often as a CSV file. It is structured with two distinct columns as described above, suitable for direct integration into machine learning workflows.

    Usage

    This dataset is ideally suited for a variety of applications and use cases, including: * Training sentiment classifiers utilising advanced models such as LSTM, BiLSTM, CNN, BERT, or RoBERTa. * Evaluating the efficacy of different preprocessing and tokenisation strategies for text data. * Benchmarking NLP models on multi-class classification tasks to assess their performance. * Supporting educational projects and research initiatives in the fields of opinion mining or text classification. * Fine-tuning transformer models on a large and diverse collection of sentiment-annotated text.

    Coverage

    The dataset's coverage is global, comprising English-language comments. It focuses on general user-generated text content without specific demographic notes. The dataset is listed with a version of 1.0.

    License

    CC0

    Who Can Use It

    This dataset is suitable for individuals and organisations involved in data science and analytics. Intended users include: * Data Scientists and Machine Learning Engineers for developing and deploying sentiment analysis models. * Researchers and Academics for studies in NLP, text classification, and opinion mining. * Students undertaking educational projects in artificial intelligence and machine learning.

    Dataset Name Suggestions

    • Multi-class Comment Sentiment Data
    • User Text Sentiment Collection
    • Online Comment Sentiment Analysis Dataset
    • English Sentiment Labelled Comments
    • Preprocessed Sentiment Dataset

    Attributes

    Original Data Source: Sentiment Analysis Dataset

  9. Pakistan Restaurants Reviews

    • kaggle.com
    Updated Mar 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanwal Zahoor (2021). Pakistan Restaurants Reviews [Dataset]. https://www.kaggle.com/kanwalzahoor/pakistan-restaurants-reviews/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 25, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kanwal Zahoor
    Area covered
    Pakistan
    Description

    Dataset

    This dataset was created by Kanwal Zahoor

    Contents

  10. Pakistani Traffic Sentiment Analysis

    • kaggle.com
    Updated Feb 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Altaf Khan (2023). Pakistani Traffic Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/altafk/pakistani-traffic-sentiment-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhammad Altaf Khan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Pakistan
    Description

    The dataset with two columns: "Text" and "Label". The "Text" column contains sentiments of Pakistani traffic, which includes both positive and negative reviews. The "Label" column is used to classify each sentiment as either positive or negative, where positive reviews are labeled with "0" and negative reviews are labeled with "1". This dataset can be used for sentiment analysis tasks, which involve using natural language processing techniques to analyze and classify text data based on the emotions and opinions expressed within the text. By training a machine learning model on this dataset, you can create a system that can automatically classify new traffic sentiments as either positive or negative. Some possible applications of this type of sentiment analysis include monitoring public opinion about traffic-related issues, identifying areas where improvements are needed, and evaluating the effectiveness of traffic-related policies and initiatives. Additionally, businesses in the transportation industry could use this type of analysis to understand customer feedback and improve their services accordingly.

  11. o

    Consumer Product Reviews and Sentiment Analysis

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Consumer Product Reviews and Sentiment Analysis [Dataset]. https://www.opendatabay.com/data/consumer/2d257b09-10c2-4d4a-b01e-bc2c00f0b679
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Reviews & Ratings
    Description

    This dataset contains customer reviews for various products, including details about product categories, brands, user ratings, and sentiment analysis. It is designed for applications such as sentiment classification, product recommendation systems, and the analysis of consumer behaviour. The dataset allows users to identify trends in customer satisfaction and gain insights into consumer preferences based on brand and category.

    Columns

    • item_category: The category identifier of the product under review.
    • item_id: The unique identifier for a specific product.
    • brand: The brand identifier associated with the product.
    • user_id: The unique identifier of the customer who submitted the review.
    • date: The date when the review was posted, typically in YYYY-MM-DD format.
    • comment: The textual content of the review as provided by the user.
    • rating: The numerical rating given by the user, often on a scale (e.g., 1 to 5).
    • tonality: The sentiment classification of the review, indicating whether it is positive or negative.

    Distribution

    The data file is typically available in CSV format. The dataset comprises approximately 14,221 records. Analysis of the sentiment distribution within the dataset indicates that 84% of reviews are classified as positive, while 16% are classified as negative.

    Usage

    This dataset is ideally suited for several applications, including: * Performing sentiment analysis on product reviews to gauge public opinion. * Identifying patterns and trends in customer satisfaction over time. * Developing and improving product recommendation systems. * Understanding consumer preferences based on specific brands and product categories.

    Coverage

    The dataset covers a time range from 30th July 2009 to 25th July 2017. The data has a global regional scope. No specific demographic scope is detailed within the available information.

    License

    CCO

    Who Can Use It

    This dataset is valuable for a range of users and their specific applications: * Data Scientists and Machine Learning Engineers: To train and evaluate sentiment analysis models, develop natural language processing (NLP) applications, and build recommendation engines. * Marketing Professionals: To understand customer feedback, identify popular products, and assess the impact of marketing campaigns on brand perception. * Businesses and Product Managers: To inform product development strategies, monitor customer satisfaction, and identify areas for improvement based on consumer feedback. * Researchers: For academic studies on consumer behaviour, sentiment analysis techniques, and market trends.

    Dataset Name Suggestions

    • Consumer Product Reviews and Sentiment Analysis
    • Customer Feedback and Ratings
    • Product Review Tonality Dataset
    • E-commerce Customer Insights
    • Global Product Review Data

    Attributes

    Original Data Source: 🏬🛍️😀 Consumer Sentiments and Ratings

  12. Airline Reviews Dataset

    • kaggle.com
    Updated Mar 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujal Suthar (2024). Airline Reviews Dataset [Dataset]. https://www.kaggle.com/datasets/sujalsuthar/airlines-reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sujal Suthar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains reviews of the top 10 rated airlines in 2023 sourced from the Airline Quality (https://www.airlinequality.com) website. The reviews cover various aspects of the flight experience, including seat comfort, staff service, food and beverages, inflight entertainment, value for money, and overall rating. The dataset is suitable for sentiment analysis, customer satisfaction analysis, and other similar tasks.

    Usage - Download the dataset file airlines_reviews.csv. - Use the dataset for analysis, visualization, and machine learning tasks.

    List of Airlines 1. Singapore Airlines 2. Qatar Airways 3. All Nippon Airways 4. Emirates 5. Japan Airlines 6. Turkish Airlines 7. Air France 8. Cathay Pacific Airways 9. EVA Air 10.Korean Air

    This dataset is provided under the MIT License.

  13. o

    Customer Feedback Analysis Collection

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Customer Feedback Analysis Collection [Dataset]. https://www.opendatabay.com/data/ai-ml/338afca5-aa8c-4d80-9be0-7f844ef4e85e
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    This dataset comprises a collection of product reviews gathered from prominent e-commerce platforms, including Hepsiburada, Trendyol, and N11. It provides a valuable resource for various data science and analytics tasks, offering insights into customer feedback and sentiment towards diverse products. The dataset is particularly well-suited for developing and evaluating models for classification, text mining, and natural language processing applications.

    Columns

    • Metin (Review Text): This column contains the actual text of the product review comments.
    • Durum (Sentiment Status): This column indicates the sentiment of the review, categorised as 0 for negative, 1 for positive, and 2 for neutral.

    Distribution

    The dataset is provided in CSV UTF-8 format. It contains a total of 15,170 reviews. Within this collection, there are 6,799 positive reviews, 6,978 negative reviews, and 1,393 neutral reviews, providing a varied distribution of sentiment.

    Usage

    This dataset is ideal for: * Sentiment Analysis: Building and training models to classify product review sentiment. * Text Classification: Developing algorithms to categorise text based on expressed opinions. * Natural Language Processing (NLP) Research: Exploring various NLP techniques such as topic modelling, named entity recognition, and language understanding in the context of e-commerce feedback. * Customer Feedback Analysis: Gaining insights into customer satisfaction and product performance.

    Coverage

    The data originates from various e-commerce platforms, offering a global scope in terms of its potential applicability. The dataset was listed on 8th June 2025. Specific demographic or precise temporal ranges for the collected reviews are not detailed in the available information, though it pertains to online consumer product commentary.

    License

    CCO

    Who Can Use It

    This dataset is suitable for: * Data Scientists: For machine learning projects focused on text analysis and classification. * Machine Learning Engineers: To train and test sentiment analysis models. * Academic Researchers: For studies in computational linguistics, natural language processing, and e-commerce analytics. * Businesses: To understand customer opinions and improve product offerings or services.

    Dataset Name Suggestions

    • E-commerce Product Review Sentiment
    • Turkish Online Review Dataset
    • Customer Feedback Analysis Collection
    • Multi-Platform Product Opinions

    Attributes

    Original Data Source: E-Ticaret Ürün Yorumları

  14. IMDb Movie Review Sentiment

    • kaggle.com
    Updated Dec 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). IMDb Movie Review Sentiment [Dataset]. https://www.kaggle.com/datasets/thedevastator/imdb-movie-review-sentiment-dataset/suggestions?status=pending&yourSuggestions=true
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 2, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    IMDb Movie Review Sentiment

    Movie Review Sentiment

    By imdb (From Huggingface) [source]

    About this dataset

    The IMDb Large Movie Review Dataset is a comprehensive collection of movie reviews used for sentiment classification. The dataset includes a wide range of movie reviews along with their corresponding sentiment labels, which indicate whether the review is positive or negative in nature. This invaluable dataset is aimed at facilitating sentiment analysis and classification tasks in the field of natural language processing.

    The main purpose of the train.csv file within this dataset is to provide a curated collection of movie reviews, each accompanied by its respective sentiment label. This file proves particularly useful for training machine learning models to accurately predict sentiment and classify reviews based on their emotional tone.

    Similarly, the test.csv file contains another set of movie reviews along with corresponding sentiment labels. Meant for testing and validating the performance of trained models, this dataset enables researchers and developers to evaluate their models' effectiveness in real-world scenarios.

    Additionally, the unsupervised.csv file offers an alternative subset within the dataset. Unlike train.csv and test.csv, unsupervised.csv does not include any associated sentiment labels for individual movie reviews. This specific subset serves as a valuable resource for exploring unsupervised learning techniques within the domain of sentiment classification.

    By utilizing this meticulously compiled IMDb Large Movie Review Dataset, researchers and data scientists can delve into various aspects related to analyzing sentiments in textual data. With its carefully labeled data points covering both positive and negative sentiments expressed in diverse film critiques, this dataset empowers users to develop sophisticated machine learning algorithms that accurately assess subjective opinions from text data

    How to use the dataset

    Introduction:

    Dataset Overview: - Train.csv: This file contains a set of movie reviews along with their sentiment labels. It is intended for training your sentiment analysis models. - Test.csv: This file provides another set of movie reviews along with their corresponding sentiment labels. You can use this file to evaluate the performance of your trained models. - Unsupervised.csv: This file includes movie reviews without any associated sentiment labels. It can be used for unsupervised sentiment classification tasks.

    Columns in the Dataset: - text: The main column containing the text of each movie review. - label: The sentiment label assigned to each review, indicating whether it is positive or negative.

    Guidelines for Using the Dataset:

    • Training Your Model:

      • Begin by loading and preprocessing the data from train.csv
      • Treat 'text' as your input feature and 'label' as your target variable
      • Explore different machine learning or deep learning algorithms suitable for text classification
      • Train your model using various techniques, such as bag-of-words, word embeddings, or transformers
      • Evaluate and fine-tune your model's performance using test.csv
    • Evaluating Your Model:

      • Load test.csv and preprocess the data similar to what you did with train.csv
      • Use this preprocessed test data to evaluate the accuracy, precision, recall, F1 score or other relevant metrics of your trained model on unseen data
      • Analyze these metrics to understand how well your model is performing in predicting sentiments
    • Advancing Your Model (Unsupervised Classification):

      • Utilize unsupervised.csv for unsupervised sentiment classification tasks
      • Preprocess the movie reviews in this file and explore techniques like clustering, topic modeling, or self-supervised learning
      • Extract patterns, themes, or sentiments from the reviews without any guidance from labeled data

    Conclusion:

    Research Ideas

    • Sentiment Analysis: This dataset can be used to train models for sentiment analysis, where the goal is to predict whether a movie review is positive or negative based on its text.
    • NLP Research: The dataset can be used for various natural language processing (NLP) tasks such as text classification, information extraction, or named entity recognition. Researchers and practitioners can leverage this dataset to develop and evaluate new algorithms and techniques in the field of NLP.
    • Recommendation Systems: The sentiment labels in this dataset can be used as a source of feedback or user preferences for recommendation systems. By analyzing the sentiments expressed in reviews,...
  15. A

    ‘uHack Sentiments 2.0: Decode Code Words’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Dec 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘uHack Sentiments 2.0: Decode Code Words’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-uhack-sentiments-2-0-decode-code-words-ce3a/88e2b3fd/?iid=004-193&v=presentation
    Explore at:
    Dataset updated
    Dec 28, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘uHack Sentiments 2.0: Decode Code Words’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/manishtripathi86/uhack-sentiments-20-decode-code-words on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    The challenge here is to analyze and deep dive into the natural language text (reviews) and bucket them based on their topics of discussion. Furthermore, analyzing the overall sentiment will also help the business to make tangible decisions.

    The data set provided to you has a mix of customer reviews for products across categories and retailers. We would like you to model on the data

    to bucket the future reviews in their respective topics (Note: A review can talk about multiple topics)

    Overall polarity (positive/negative sentiment)

    Train: 6136 rows x 14 columns

    Test: 2631 rows x 14 columns

    Topics (Components, Delivery and Customer Support, Design and Aesthetics, Dimensions, Features, Functionality, Installation, Material, Price, Quality and Usability) Polarity (Positive/Negative) Note: The target variables are all encoded in the train dataset for convenience. Please submit the test results in the similar encoded fashion for us to evaluate your results.

    | | Field Name Data Type Purpose Variable type Id Integer Unique identifier for each review Input Review String Review written by customers on a retail website Input Components String 1: aspects related to components Target 0: None Delivery and Customer Support String 1: some aspects related to delivery, return, exchange and customer support Target 0: None Design and Aesthetics String 1: some aspects related to components Target 0: None Dimensions String 1: related to product dimension and size Target 0: None Features String 1: related to product features Target 0 : None
    Functionality String 1: related to working of a product Target 0: None Installation String 1: related to installation of the product Target 0: None Material String 1: related to material of the product Target 0: None Price String 1: related to pricing details of a product Target 0: None Quality String 1: related to quality aspects of a product Target 0: None Usability String 1: related to usability of a product Target 0: None Polarity Integer 1: Positive sentiment; Target 0: Negative Sentiment | | | --- | --- | | | | | | | --- | --- | | | |

    Skills: Text Pre-processing – Lemmatization , Tokenization, N-Grams and other relevant methods Multi-Class Classification, Multi-label Classification Optimizing Log Loss

    Overview Ugam, a Merkle company, is a leading analytics and technology services company. Our customer-centric approach delivers impactful business results for large corporations by leveraging data, technology, and expertise.

    We consistently deliver superior, impactful results through the right blend of human intelligence and AI. With 3300+ people spread across locations worldwide, we successfully deploy our services to create success stories across industries like Retail & Consumer Brands, High Tech, BFSI, Distribution, and Market Research & Consulting. Over the past 21 years, Ugam has been recognized by several firms including Forrester and Gartner, named the No.1 data science company in India by Analytics Insight, and certified as a Great Place to WorkÂŽ.

    Problem Statement: The last two decades have witnessed a significant change in how consumers purchase products and express their experience/opinions in reviews, posts, and content across platforms. These online reviews are not only useful to reflect customers’ sentiment towards a product but also help businesses fix gaps and find potential opportunities which could further influence future purchases.

    Participants need develop a machine learning model that can analyse customers’ sentiments based on their reviews and feedback.

    NOTE: The prize money will be for the interested candidates who are willing to get interviewed or hired by Ugam. Winner are requested to come to the Machine Leaning Developers Summit2022, happening at Bangalore, for receiving the prize money.

    dataset link: https://machinehack.com/hackathon/uhack_sentiments_20_decode_code_words/overview

    --- Original source retains full ownership of the source dataset ---

  16. o

    Nykaa Customer Review Dataset

    • opendatabay.com
    .undefined
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Nykaa Customer Review Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/9c4972ad-2725-4037-b30a-44f15eab04d8
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    E-commerce & Online Transactions
    Description

    This dataset contains Google Play Store reviews for Nykaa, a multi-brand cosmetics e-commerce company, collected up to August 2021. It aims to provide insights into customer sentiment, categorised as positive, neutral, or negative based on star ratings. The dataset is valuable for understanding customer satisfaction and identifying key themes within app reviews.

    Columns

    • content: The raw app review text, without filters for length or language.
    • sentiment_labels: The sentiment score, binned into categories. Reviews with a 5-star rating are classified as POSITIVE, 3 and 4-star ratings are NEUTRAL, and all other ratings are NEGATIVE. The sentiment distribution shows 71% POSITIVE and 20% NEUTRAL sentiments.

    Distribution

    The dataset is structured with two columns and is provided in a Fasttext-compatible format. A test split constitutes 20% of the total data, which is approximately one-quarter the size of the training data. Specific total row or record counts are not available in the provided information.

    Usage

    This dataset is ideal for a range of natural language processing (NLP) tasks, including sentiment analysis, text classification, and customer feedback analysis. It can be utilised by data scientists and machine learning engineers to build and train models for predicting customer sentiment, identifying common complaints or praises, and gaining actionable insights into user experience for e-commerce applications.

    Coverage

    The dataset covers Google Play Store reviews for Nykaa, collected up until August 2021. It has global regional coverage, capturing a broad spectrum of user feedback.

    License

    CC-BY-NC

    Who Can Use It

    Intended users include data scientists, machine learning practitioners, NLP researchers, and businesses in the e-commerce sector. They can use this dataset to develop sentiment analysis models, understand customer satisfaction trends, inform product development, and enhance user engagement strategies.

    Dataset Name Suggestions

    Nykaa App Reviews Sentiment, E-commerce App Review Sentiment, Nykaa Customer Review Data, Mobile App User Sentiment

    Attributes

    Original Data Source: Nykaa App Review Sentiment

  17. Product sentiment analysis

    • kaggle.com
    zip
    Updated Sep 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ask9 (2020). Product sentiment analysis [Dataset]. https://www.kaggle.com/arbazkhan971/product-sentiment-analysis
    Explore at:
    zip(406932 bytes)Available download formats
    Dataset updated
    Sep 4, 2020
    Authors
    ask9
    Description

    **Overview Analyzing sentiments related to various products such as Tablet, Mobile and various other gizmos can be fun and difficult especially when collected across various demographics around the world. In this weekend hackathon, we challenge the machinehackers community to develop a machine learning model to accurately classify various products into 4 different classes of sentiments based on the raw text review provided by the user. Analyzing these sentiments will not only help us serve the customers better but can also reveal lot of customer traits present/hidden in the reviews.

    The sentiment analysis requires a lot to be taken into account mainly due to the preprocessing involved to represent raw text and make them machine-understandable. Usually, we stem and lemmatize the raw information and then represent it using TF-IDF, Word Embeddings, etc. However, provided the state-of-the-art NLP models such as Transformer based BERT models one can skip the manual feature engineering like TF-IDF and Count Vectorizers.

    In this short span of time, we would encourage you to leverage the ImageNet moment (Transfer Learning) in NLP using various pre-trained models.

    Dataset Description:

    Train.csv - 6364 rows x 4 columns (Inlcudes Sentiment Columns as Target) Test.csv - 2728 rows x 3 columns Sample Submission.csv - Please check the Evaluation section for more details on how to generate a valid submission

    Attribute Description:

    Text_ID - Unique Identifier Product_Description - Description of the product review by a user Product_Type - Different types of product (9 unique products) Class - Represents various sentiments 0 - Cannot Say 1 - Negative 2 - Positive 3 - No Sentiment Skills:

    NLP, Sentiment Analysis Feature extraction from raw text using TF-IDF, CountVectorizer Using Word Embedding to represent words as vectors Using Pretrained models like Transformers, BERT Optimizing multi-class log loss to generalize well on unseen data

  18. UCI ML Drug Review dataset

    • kaggle.com
    Updated Dec 13, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jessica Li (2018). UCI ML Drug Review dataset [Dataset]. https://www.kaggle.com/jessicali9530/kuc-hackathon-winter-2018/home
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jessica Li
    Description

    This dataset was used for the Winter 2018 Kaggle University Club Hackathon and is now publicly available. See Acknowledgments section for citation and licensing. Note: The types of data and recommendation based solutions provided by the contestants are purely for NLP learning purposes. They are not suitable for a real world drug recommendations solutions.

    Welcome to the Kaggle University Club Hackathon!

    If you are interested in joining Kaggle University Club, please e-mail Jessica Li at lijessica@google.com

    This Hackathon is open to all undergraduate, master, and PhD students who are part of the Kaggle University Club program. The Hackathon provides students with a chance to build capacity via hands-on ML, learn from one another, and engage in a self-defined project that is meaningful to their careers.

    Teams must register via Google Form to be eligible for the Hackathon. The Hackathon starts on Monday, November 12, 2018 and ends on Monday, December 10, 2018. Teams have one month to work on a team submission. Teams must do all work within the Kernel editor and set Kernel(s) to public at all times.

    Prompt

    The freestyle format of hackathons has time and again stimulated groundbreaking and innovative data insights and technologies. The Kaggle University Club Hackathon recreates this environment virtually on our platform. We challenge you to build a meaningful project around the UCI Machine Learning - Drug Review Dataset. Teams are free to let their creativity run and propose methods to analyze this dataset and form interesting machine learning models.

    Machine learning has permeated nearly all fields and disciplines of study. One hot topic is using natural language processing and sentiment analysis to identify, extract, and make use of subjective information. The UCI ML Drug Review dataset provides patient reviews on specific drugs along with related conditions and a 10-star patient rating system reflecting overall patient satisfaction. The data was obtained by crawling online pharmaceutical review sites. This data was published in a study on sentiment analysis of drug experience over multiple facets, ex. sentiments learned on specific aspects such as effectiveness and side effects (see the acknowledgments section to learn more).

    The sky's the limit here in terms of what your team can do! Teams are free to add supplementary datasets in conjunction with the drug review dataset in their Kernel. Discussion is highly encouraged within the forum and Slack so everyone can learn from their peers.

    Here are just a couple ideas as to what you could do with the data:

    • Classification: Can you predict the patient's condition based on the review?
    • Regression: Can you predict the rating of the drug based on the review?
    • Sentiment analysis: What elements of a review make it more helpful to others? Which patients tend to have more negative reviews? Can you determine if a review is positive, neutral, or negative?
    • Data visualizations: What kind of drugs are there? What sorts of conditions do these patients have?

    Top Submissions

    There is no one correct answer to this Hackathon, and teams are free to define the direction of their own project. That being said, there are certain core elements generally found across all outstanding Kernels on the Kaggle platform. The best Kernels are:

    1. Complex: How many domains of analysis and topics does this Kernel cover? Does it attempt machine learning methods? Does the Kernel offer a variety of unique analyses and interesting conclusions or solutions?
    2. Original: What is the subject matter of this Kernel? Does it have a well-defined and interesting project scope, narrative or problem? Could the results make an impact? Is it thought provoking?
    3. Approachable: How easy is it to understand this Kernel? Are all thought processes clear? Is the code clean, with useful comments? Are visualizations and processes articulated and self-explanatory?

    Teams with top submissions have a chance to receive exclusive Kaggle University Club swag and be featured on our official blog and across social media.

    IMPORTANT: Teams must set all Kernels to public at all times. This is so we can track each team's progression, but more importantly it encourages collaboration, productive discussion, and healthy inspiration to all teams. It is not so that teams can simply copycat good ideas. If a team's Kernel isn't their own organic work, it will not be considered a top submission. Teams must come up with a project on their own.

    Submission Styling

    The final Kernel submission for the Hackathon must contain the following information:

    • All team members added as collaborators to the Kernel
    • Somewhere at the top of your Kernel, find a space to put down all team member names, university name, club name, and team name (as specified whe...
  19. o

    Social Media Product Sentiment Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Social Media Product Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/9b65bd60-fc6f-4688-8485-65a329695762
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Social Media and Networking
    Description

    This dataset contains tweets posted for various services and products along with the emotion contained in each tweet. It is designed to be used for training various machine learning models focused on analysing sentiments in tweets. The dataset includes key information such as the tweet text, the specific product or service the tweet references, and the emotion expressed within the tweet.

    Columns

    • tweet_text: The actual text content of the tweet.
    • emotion_in_tweet_is_directed_at: Identifies the product or service that is the subject of the tweet's emotion.
      • Example values include "iPad" (10% of values), and "Other" (2345 entries, 26% of values).
    • is_there_an_emotion_directed_at_a_brand_or_product: Indicates the type of emotion present in the tweet.
      • There are 9066 unique values for this column.
      • Emotion distribution:
        • 64% of entries show no emotion toward a brand or product.
        • 59% of entries indicate positive emotion.
        • 33% of entries are categorised as "Other" (726 entries).

    Distribution

    The dataset is typically provided in a tabular format, suitable for data analysis and machine learning tasks. While the exact number of rows or records is not specified in the provided information, it consists of a collection of tweet entries. Data files are usually in CSV format.

    Usage

    This dataset is ideally suited for: * Developing and training machine learning models for sentiment analysis. * Analysing customer feedback and public opinion towards products and services expressed on social media. * Research into natural language processing (NLP) and text classification. * Understanding trends in public sentiment related to specific brands or industries.

    Coverage

    The dataset has a global coverage, making it applicable for analysis of tweets from various regions. Specific time ranges or demographic scopes are not detailed in the available information.

    License

    CCO

    Who Can Use It

    This dataset is intended for: * Machine Learning Engineers and Data Scientists for model development. * Researchers in natural language processing, social media analysis, and marketing. * Businesses looking to analyse public sentiment regarding their products or market trends. * Students learning about data analysis, NLP, and machine learning.

    Dataset Name Suggestions

    • Product Tweets Sentiment Dataset
    • Tweet Emotion Analysis Data
    • Social Media Product Sentiment
    • Tweets for Sentiment Models
    • Brand Sentiment Tweet Data

    Attributes

    Original Data Source: Product Tweets Dataset

  20. o

    App Store Ratings & Feedback

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). App Store Ratings & Feedback [Dataset]. https://www.opendatabay.com/data/consumer/bca613d5-9f17-4e0e-aaff-892f0b8e3281
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Reviews & Ratings
    Description

    This dataset provides a collection of over 12,000 user reviews for various applications from an app store. It includes user-assigned ratings, which can be used to classify reviews as either positive or negative. The dataset is a valuable resource for conducting sentiment analysis tasks and can assist beginners in working with annotated, real-world data to understand user feedback on mobile applications. It serves as a foundation for exploring consumer sentiment and application performance insights.

    Columns

    • reviewId: A unique identifier assigned to each individual review.
    • userName: The username of the person who submitted the review.
    • userImage: The location of the image associated with the user who posted the review.
    • content: The full text of the user's review.
    • score: The rating given to the application by the user, ranging from 1 to 5, where a score of 5 indicates the most positive sentiment and 1 signifies the most negative.
    • thumbsUpCount: The total number of users who have upvoted a particular review.
    • reviewCreatedVersion: The specific version of the application that the review pertains to.
    • at: The precise date and time when the review was originally posted.
    • replyContent: Any reply provided to the original user review by the app developer or another party.
    • repliedAt: The date and time when a reply to the review was posted.

    Distribution

    The dataset contains over 12,000 distinct reviews, with 12,495 unique review identifiers recorded. Ratings are distributed across the 1 to 5 scale, with significant counts for scores like 1.00-1.20 (2,506 reviews), 2.00-2.20 (2,344 reviews), 3.00-3.20 (1,991 reviews), 4.00-4.20 (2,775 reviews), and 4.80-5.00 (2,879 reviews). The number of upvotes (thumbsUpCount) for reviews spans a wide range, from 0 to 397. Many reviews (17%) do not specify a version, while '1.5.11' accounts for 4% of review versions. A substantial portion of reviews (53%) do not have a corresponding reply content. The data is typically provided in a CSV file format.

    Usage

    This dataset is ideally suited for a variety of analytical and machine learning applications. It is particularly useful for: * Performing sentiment analysis to gauge public opinion on mobile applications. * Developing and training natural language processing (NLP) models, such as BERT-based sentiment classifiers. * Extracting key insights and trends from user feedback to inform app development and marketing strategies. * Educating beginners in the field of sentiment analysis and text mining using annotated, real-world data. * Analysing user engagement and the impact of replies on review visibility.

    Coverage

    The dataset offers a global scope, encompassing reviews from users worldwide. The time range for user-posted reviews extends from 8th February 2015 to 28th October 2020. Replies to reviews cover a slightly broader period, from 14th January 2013 to 28th October 2020. The data reflects feedback from real users of various app store applications, providing a diverse demographic perspective on mobile app usage and satisfaction.

    License

    CCO

    Who Can Use It

    This dataset is beneficial for a wide range of users, including: * Data Scientists and Machine Learning Engineers: For building and evaluating sentiment analysis models, text classification systems, and other NLP applications. * Researchers: To study user behaviour, app success factors, and the dynamics of online reviews. * App Developers and Product Managers: To understand user feedback, identify pain points, and prioritise feature development based on sentiment. * Market Analysts: To monitor brand perception, conduct competitor analysis, and track market trends in the app industry. * Students: As an excellent practical resource for learning about data cleaning, text preprocessing, and sentiment analysis techniques.

    Dataset Name Suggestions

    • Google Play Store User Reviews
    • Mobile App Sentiment Analysis Dataset
    • App Store Ratings & Feedback
    • Digital Product Review Data
    • Consumer App Review Dataset

    Attributes

    Original Data Source: Google Play Store Reviews

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Deepkumar patel (2023). oyo-reviews-dataset [Dataset]. https://www.kaggle.com/datasets/deeppatel9095/oyo-reviews-dataset
Organization logo

oyo-reviews-dataset

Exploring Customer Sentiments in OYO Hotel Reviews: A Dataset for Sentiment Anal

Explore at:
zip(32300432 bytes)Available download formats
Dataset updated
Jun 24, 2023
Authors
Deepkumar patel
Description

The inspiration behind creating the OYO Review Dataset for sentiment analysis was to explore the sentiment and opinions expressed in hotel reviews on the OYO Hotels platform. Analyzing the sentiment of customer reviews can provide valuable insights into the overall satisfaction of guests, identify areas for improvement, and assist in making data-driven decisions to enhance the hotel experience. By collecting and curating this dataset, Deep Patel, Nikki Patel, and Nimil aimed to contribute to the field of sentiment analysis in the context of the hospitality industry. Sentiment analysis allows us to classify the sentiment expressed in textual data, such as reviews, into positive, negative, or neutral categories. This analysis can help hotel management and stakeholders understand customer sentiments, identify common patterns, and address concerns or issues that may affect the reputation and customer satisfaction of OYO Hotels. The dataset provides a valuable resource for training and evaluating sentiment analysis models specifically tailored to the hospitality domain. Researchers, data scientists, and practitioners can utilize this dataset to develop and test various machine learning and natural language processing techniques for sentiment analysis, such as classification algorithms, sentiment lexicons, or deep learning models. Overall, the goal of creating the OYO Review Dataset for sentiment analysis was to facilitate research and analysis in the area of customer sentiments and opinions in the hotel industry. By understanding the sentiment of hotel reviews, businesses can strive to improve their services, enhance customer satisfaction, and make data-driven decisions to elevate the overall guest experience.

Deep Patel: https://www.linkedin.com/in/deep-patel-55ab48199/ Nikki Patel: https://www.linkedin.com/in/nikipatel9/ Nimil lathiya: https://www.linkedin.com/in/nimil-lathiya-059a281b1/

Search
Clear search
Close search
Google apps
Main menu