100+ datasets found
  1. u

    Amazon review data 2018

    • cseweb.ucsd.edu
    • nijianmo.github.io
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
    Explore at:
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    Context

    This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

    • More reviews:

      • The total number of reviews is 233.1 million (142.8 million in 2014).
    • New reviews:

      • Current data includes reviews in the range May 1996 - Oct 2018.
    • Metadata: - We have added transaction metadata for each review shown on the review page.

      • Added more detailed metadata of the product landing page.

    Acknowledgements

    If you publish articles based on this dataset, please cite the following paper:

    • Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
  2. b

    Amazon reviews Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Mar 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2023). Amazon reviews Dataset [Dataset]. https://brightdata.com/products/datasets/amazon/reviews
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Mar 21, 2023
    Dataset authored and provided by
    Bright Data
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Utilize our Amazon reviews dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset can aid in understanding customer behavior, product performance, and market trends, empowering organizations to refine their product and marketing strategies. Access the entire dataset or tailor a subset to fit your requirements. Popular use cases include: Product Performance Analysis: Analyze Amazon reviews to assess product performance, uncovering customer satisfaction levels, common issues, and highly praised features to inform product improvements and marketing messages. Customer Behavior Insights: Gain insights into customer behavior, purchasing patterns, and preferences, enabling more personalized marketing and product recommendations. Demand Forecasting: Leverage Amazon reviews to predict future product demand by analyzing historical review data and identifying trends, helping to optimize inventory management and sales strategies. Accessing and analyzing the Amazon reviews dataset supports market strategy optimization by leveraging insights to analyze key market trends and customer preferences, enhancing overall business decision-making.

  3. i

    Bangladesh Airlines Sentiment Review Dataset

    • ieee-dataport.org
    Updated Oct 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khan Md Hasib (2022). Bangladesh Airlines Sentiment Review Dataset [Dataset]. https://ieee-dataport.org/documents/bangladesh-airlines-sentiment-review-dataset
    Explore at:
    Dataset updated
    Oct 25, 2022
    Authors
    Khan Md Hasib
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Air travel is one of the most used ways of transit in our daily lives. So it's no wonder that more and more people are sharing their experiences with airlines and airports using web-based online surveys. This dataset aims to do topic modeling and sentiment analysis on Skytrax (airlinequality.com) and Tripadvisor (tripadvisor.com) postings where there is a lot of interest and engagement from people who have used it or want to use it for airlines.

  4. Synthetic E-commerce Product Reviews Dataset

    • kaggle.com
    Updated May 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aryan Kumar (2025). Synthetic E-commerce Product Reviews Dataset [Dataset]. https://www.kaggle.com/datasets/aryan208/synthetic-e-commerce-product-reviews-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aryan Kumar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Synthetic E-commerce Product Reviews Dataset

    This dataset contains 4 million synthetic e-commerce product reviews across 8 popular categories, including:

    • Electronics
    • Home & Kitchen
    • Fashion
    • Beauty
    • Toys & Games
    • Books
    • Health & Personal Care
    • Sports & Outdoors

    Each row includes: - product_id: Synthetic product identifier - product_title: Product name (e.g., “Wireless Bluetooth Earbuds”) - category: One of 8 categories - review_text: Realistic user review - rating: Integer (1 to 5 stars) - sentiment: Sentiment derived from review text (Positive, Neutral, Negative)

    💡 Use Cases

    • NLP sentiment analysis
    • Product review summarization
    • E-commerce recommender systems
    • Fake review detection
    • Fine-tuning LLMs on product-related tasks

    📦 Format

    CSV format (UTF-8 encoded)

    🔄 License

    Public Domain – CC0 1.0 Universal

  5. d

    2005 - 2017 School Quality Review Ratings

    • catalog.data.gov
    • data.cityofnewyork.us
    • +2more
    Updated Nov 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2024). 2005 - 2017 School Quality Review Ratings [Dataset]. https://catalog.data.gov/dataset/2005-2017-school-quality-review-ratings
    Explore at:
    Dataset updated
    Nov 29, 2024
    Dataset provided by
    data.cityofnewyork.us
    Description

    Yearly data of Quality Review ratings from 2005 to 2017

  6. o

    ChatGPT User Satisfaction Ratings

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). ChatGPT User Satisfaction Ratings [Dataset]. https://www.opendatabay.com/data/ai-ml/fd21bbf8-e5bf-4a34-93c2-57ae36ffbaf0
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Reviews & Ratings
    Description

    This dataset provides user reviews for ChatGPT, offering valuable qualitative feedback, satisfaction ratings, and submission dates. It captures a diverse array of user sentiments, from concise remarks to more detailed feedback. The ratings are provided on a scale of 1 to 5, indicating different levels of user satisfaction. The dataset spans several months, which allows for temporal analysis of sentiment trends, as each review includes a timestamp. This data is ideal for gaining insights into user characteristics and for improving application features and services.

    Columns

    • Review Id: A unique identifier for each individual review. This is formatted as a String, typically in a UUID structure.
    • Review: The actual text of the user's feedback, offering qualitative insights into their experience with the application. This is a String data type.
    • Ratings: User-submitted numerical ratings, ranging from 1 (lowest satisfaction) to 5 (highest satisfaction), indicating their level of contentment. This is an Integer data type.
    • Review Date: The timestamp when the review was originally submitted, recorded in MM/DD/YYYY HH:MM format, serving as a Date_Time data type.

    Distribution

    The dataset is provided as a free resource. While a sample file will be updated separately to the platform, the data quality is assessed as 5 out of 5, with the current version being 1.0. It was listed on 08/06/2025, with 1 view and 0 downloads recorded so far. The dataset contains approximately 193,154 unique reviews.

    Usage

    This dataset is particularly useful for various analytical applications, including: * Sentiment Analysis: Developing models to predict the emotional tone or sentiment conveyed in user reviews. * Customer Feedback Analysis: Extracting actionable insights that can inform and guide improvements to application features and services. * Review Classification: Building machine learning models to categorise user reviews, for instance, as positive or negative. * Data Visualisation: Creating visual representations of review patterns and trends. * Exploratory Data Analysis: Investigating the characteristics and underlying patterns within the review data. * Natural Language Processing (NLP): Applying NLP techniques to understand and process the textual feedback. * Text Mining: Discovering patterns and insights from the large collection of text reviews. * Time-Series Analysis: Examining how sentiment and ratings evolve over time based on review timestamps.

    Coverage

    This dataset comprises user reviews for ChatGPT collected from 25th July 2023 to 24th August 2024. The data collection is global, reflecting feedback from users worldwide.

    License

    CCO

    Who Can Use It

    This dataset is ideal for a range of users interested in understanding user feedback and sentiment, including: * Data Scientists and Machine Learning Engineers for building and training sentiment analysis and classification models. * Product Managers and App Developers to gain actionable insights for product improvement and feature development. * Market Researchers to understand user satisfaction and market perception of AI applications. * Academic Researchers studying human-computer interaction, natural language processing, or user behaviour.

    Dataset Name Suggestions

    • ChatGPT User Reviews
    • GPT User Review Sentiment Data
    • AI App User Feedback Dataset
    • ChatGPT User Satisfaction Ratings

    Attributes

    Original Data Source: ChatGPT Users Reviews

  7. P

    Data from: Towards a Data-Driven Requirements Engineering Approach:...

    • paperswithcode.com
    • data.niaid.nih.gov
    • +1more
    Updated Oct 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jialiang Wei; Anne-Lise Courbis; Thomas Lambolais; Binbin Xu; Pierre Louis Bernard; Gérard Dray (2022). Towards a Data-Driven Requirements Engineering Approach: Automatic Analysis of User Reviews Dataset [Dataset]. https://paperswithcode.com/dataset/towards-a-data-driven-requirements
    Explore at:
    Dataset updated
    Oct 31, 2022
    Authors
    Jialiang Wei; Anne-Lise Courbis; Thomas Lambolais; Binbin Xu; Pierre Louis Bernard; Gérard Dray
    Description

    6000 French user reviews from three applications on Google Play (Garmin Connect, Huawei Health, Samsung Health) are labelled manually. We selected four labels: rating, bug report, feature request and user experience.

    Ratings are simple text which express the overall evaluation to that app, including praise, criticism, or dissuasion. Bug reports show the problems that users have met while using the app, like loss of data, crash of app, connection error, etc. Feature requests reflect the demande of users on new function, new content, new interface, etc. In user experience, users describe their experience in relation to the functionality of the app, how does certain functions be helpful.

    As we can observe from the following table, that shows examples of labelled user reviews, each review belongs to one or more categories.

    AppTotalRatingBug reportFeature requestUser experience
    Garmin Connect20001260757170493
    Huawei Health20001068819384289
    Samsung Health20001324491486349
  8. S

    1. Amazon Fitbit Review data set

    • scidb.cn
    Updated Sep 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    linweizhen (2022). 1. Amazon Fitbit Review data set [Dataset]. http://doi.org/10.57760/sciencedb.j00133.00042
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 2, 2022
    Dataset provided by
    Science Data Bank
    Authors
    linweizhen
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Description

    The Instant Data Scraper crawler crawler crawler crawls the Amazon review Data set.

  9. d

    State Review Framework Manager Database

    • catalog.data.gov
    Updated Jan 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OECA, Office of Compliance (2022). State Review Framework Manager Database [Dataset]. https://catalog.data.gov/dataset/state-review-framework-manager-database
    Explore at:
    Dataset updated
    Jan 24, 2022
    Description

    The State Review Framework is a primary means by which EPA conducts oversight of three core federal statutes: Clean Air Act, Clean Water Act, and Resource Conservation and Recovery Act. The routine, nationwide review provides a consistent process for evaluating the performance of state, local and EPA compliance and enforcement programs. The overarching goal of the reviews is to ensure fair and consistent enforcement necessary to protect human health and the environment.

  10. Curated Email-Based Code Reviews Datasets

    • figshare.com
    bin
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingzhao Liang; Ping Charoenwet; Patanamon Thongtanunam (2024). Curated Email-Based Code Reviews Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.24679656.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mingzhao Liang; Ping Charoenwet; Patanamon Thongtanunam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code review is an important practice that improves the overall quality of a proposed patch (i.e. code changes). While much research focused on tool-based code reviews (e.g. a Gerrit code review tool, GitHub), many traditional open-source software (OSS) projects still conduct code reviews through emails. However, due to the nature of unstructured email-based data, it can be challenging to mine email-based code reviews, hindering researchers from delving into the code review practice of such long-standing OSS projects. Therefore, this paper presents large-scale datasets of email-based code reviews of 167 projects across three OSS communities (i.e. Linux Kernel, OzLabs, and FFmpeg). We mined the data from Patchwork, a web-based patch-tracking system for email-based code review, and curated the data by grouping a submitted patch and its revised versions and grouping email aliases. Our datasets include a total of 4.2M patches with 2.1M patch groups and 169K email addresses belonging to 141K individuals. Our published artefacts include the datasets as well as a tool suite to crawl, curate, and store Patchwork data. With our datasets, future work can directly delve into an email-based code review practice of large OSS projects without additional effort in data collection and curation.

  11. o

    Overwatch 2 - Steam Review Dataset

    • opendatabay.com
    .undefined
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Overwatch 2 - Steam Review Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/2acc096a-d2df-4630-a0aa-ebda8024d61c
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 27, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    This dataset comprises user reviews and associated data for Overwatch 2, a popular video game title, sourced from the official Steam store. Overwatch 2 is the highly anticipated sequel to the original Overwatch game, developed by Blizzard Entertainment. As we know, it's renowned for its unfavorable reviews on Steam.

    I don't scrape many reviews because it would take a wicked amount of time and resources to do so.

    Disclaimer All data belongs to Valve Corporation and are not mine

    License

    CC0

    Original Data Source: Overwatch 2 - Steam Review Dataset

  12. Data from: Paper Reviews Data Set

    • kaggle.com
    Updated Jan 22, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Felipe Navarro (2018). Paper Reviews Data Set [Dataset]. https://www.kaggle.com/fnbalves/paper-reviews-data-set/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 22, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Felipe Navarro
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Felipe Navarro

    Released under Database: Open Database, Contents: Database Contents

    Contents

  13. o

    Playstore Review Analytics Data

    • opendatabay.com
    .undefined
    Updated Jul 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Playstore Review Analytics Data [Dataset]. https://www.opendatabay.com/data/ai-ml/a62f86b2-2039-45fa-8758-a78fbbcedf6a
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Reviews & Ratings
    Description

    This dataset is a collection of user reviews for various Google Apps available on the Play Store. It provides detailed insights into user feedback, ratings, and engagement with different applications. The dataset's primary purpose is to offer a rich resource for understanding user sentiment, identifying app performance issues, and tracking user satisfaction over time. It is a valuable asset for analytics and natural language processing tasks related to app reviews.

    Columns

    • reviewId: A unique identifier for each individual user review.
    • userName: The name of the user who submitted the review.
    • userImage: The URL pointing to the user's profile image.
    • content: The textual review provided by the user about the app.
    • score: The numerical rating given by the user for the app, typically on a scale of 1 to 5.
    • thumbsUpCount: The total number of likes or "thumbs up" received by that specific review.
    • reviewCreatedVersion: The version of the app that was being reviewed at the time the review was created.
    • at: The date and time when the user's review was created.
    • replyContent: The textual content of the reply provided by the app developer to the user's review. A significant portion of reviews do not have a developer reply.
    • repliedAt: The date and time when the developer's reply was issued. Many entries in this column are null, indicating no developer response.

    Distribution

    The dataset contains over 90,000 app reviews. The score column shows a distribution across ratings, with substantial counts for scores like 1.00-1.20, 2.00-2.20, 3.00-3.20, 4.00-4.20, and 4.80-5.00. For thumbsUpCount, the majority of reviews have a relatively low number of likes (0-720), but there are instances with significantly higher counts, reaching up to over 14,000 likes. The reviewCreatedVersion column shows a variety of app versions, with some being more frequently reviewed than others. Review creation dates span a period from April 2014 to February 2021, with a notable increase in review volume towards the later years, particularly between May 2020 and February 2021.

    Usage

    This dataset is ideal for: * Sentiment analysis of app reviews. * Natural Language Processing (NLP) tasks, such as topic modelling, text classification, and entity recognition. * App performance monitoring and identifying user pain points. * Market research on user satisfaction and trends in app usage. * Developing AI and Machine Learning models for predicting app ratings or automatically classifying feedback.

    Coverage

    The dataset offers global coverage for app reviews. The time range for review creation spans from 10th April 2014 to 4th February 2021. While developer replies are included, the data on repliedAt primarily indicates a single latest date (4th February 2021) with the majority being null, suggesting that developer reply timestamps are not as broadly distributed across the dataset as review creation times.

    License

    CC0

    Who Can Use It

    • App Developers: To understand user feedback, identify bugs, and improve app features.
    • Data Analysts: For trends analysis, user behaviour insights, and reporting.
    • Researchers: In fields like computer science, internet studies, and data analytics for academic studies on online reviews.
    • Machine Learning Engineers: To train models for sentiment analysis, user support automation, or content moderation.
    • Product Managers: To gather insights for product iteration and strategic planning.

    Dataset Name Suggestions

    • Google Play Store App Reviews
    • Play Store User Feedback
    • Google Apps Ratings and Reviews
    • Mobile App Review Data
    • Playstore Review Analytics Data

    Attributes

    Original Data Source: Google Apps Playstore Reviews

  14. IMDB review data

    • kaggle.com
    Updated May 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muaaz9922 (2024). IMDB review data [Dataset]. https://www.kaggle.com/muaaz9922/imdb-review-data/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 10, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muaaz9922
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Muaz Tahir

    Released under CC0: Public Domain

    Contents

  15. o

    Beginner NLP Sentiment Dataset

    • opendatabay.com
    .undefined
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Beginner NLP Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/8a243c80-023c-42ed-8519-63f8a4a353ff
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    This dataset contains 1,000 text reviews gathered from various restaurants, with each review clearly marked as either positive or negative. It has been created with beginners in mind, particularly for those delving into the fields of sentiment analysis and natural language processing (NLP). The dataset serves as an excellent starting point for understanding how to process and classify textual data.

    Columns

    • Unnamed: 0: An index or identifier for each individual review. This column can generally be disregarded for analytical purposes.
    • sentence: This column holds the actual text content of the restaurant review itself.
    • label: This indicates the sentiment associated with the review. A value of 1 signifies a positive review, while 0 denotes a negative review. There are 500 positive and 500 negative reviews within the dataset.

    Distribution

    The dataset is provided as a CSV (Comma-Separated Values) file, named Beginner_Reviews_dataset.csv. It has a file size of approximately 66.84 kB. The dataset consists of 1,000 records or rows, with each row representing a single restaurant review and its corresponding sentiment label.

    Usage

    This dataset is designed to be user-friendly for those new to data science. It can be utilised to train and evaluate sentiment analysis models, making it ideal for binary classification tasks. It is well-suited for educational purposes, assisting learners in developing skills in text preprocessing, feature extraction, and various classification algorithms within the NLP domain.

    Coverage

    The reviews included in this dataset originate from various restaurants, implying a global scope rather than a specific geographic region. There is no specific time range for the reviews themselves detailed in the provided information, nor any particular demographic focus beyond being restaurant reviews.

    License

    CC0

    Who Can Use It

    This dataset is primarily intended for beginners in sentiment analysis and natural language processing. It is suitable for: * Students learning text analytics and machine learning. * New practitioners looking for simple datasets to practise building classification models. * Anyone interested in educational projects involving text data and sentiment classification.

    Dataset Name Suggestions

    • Restaurant Review Sentiment Analysis
    • Beginner NLP Sentiment Dataset
    • Food Review Opinion Data
    • Simple Restaurant Sentiment Reviews
    • Binary Restaurant Review Sentiment

    Attributes

    Original Data Source: ❤️ vs 😡: Sentiment Analysis 📝

  16. Language Generation Dataset: 200M Samples

    • kaggle.com
    zip
    Updated Sep 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Chatterjee (2019). Language Generation Dataset: 200M Samples [Dataset]. https://www.kaggle.com/datasets/imdeepmind/language-generation-dataset-200m-samples
    Explore at:
    zip(3416608411 bytes)Available download formats
    Dataset updated
    Sep 7, 2019
    Authors
    Abhishek Chatterjee
    Description

    Context

    Amazon Customer Reviews Dataset is a dataset of user-generated product reviews on the shopping website Amazon. It contains over 130 million product reviews.

    This dataset contains a tiny fraction of that dataset processed and prepared specifically for language generation.

    To know how the dataset is prepared, then please check the GitHub repository for this dataset. https://github.com/imdeepmind/AmazonReview-LanguageGenerationDataset

    Content

    The dataset is stored in an SQLite database. The database contains one table called reviews. This table contains two columns sequence and next.

    The sequence column contains sequences of characters. In this dataset, each sequence of 40 characters long.

    The next column contains the next character after the sequence.

    There are about 200 million samples are in the dataset.

    Acknowledgements

    Thanks to Amazon for making this awesome dataset. Here is the link for the dataset: https://s3.amazonaws.com/amazon-reviews-pds/readme.html

    Inspiration

    This dataset can be used for Language Generation. As it contains 200 million samples, complex Deep Learning models can be trained on this data.

  17. d

    School Nutrition Programs - Administrative Review Summary - Program Year...

    • catalog.data.gov
    • data.texas.gov
    • +1more
    Updated Dec 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.austintexas.gov (2024). School Nutrition Programs - Administrative Review Summary - Program Year 2016 - 2017 [Dataset]. https://catalog.data.gov/dataset/school-nutrition-programs-administrative-review-summary-program-year-2016-2017
    Explore at:
    Dataset updated
    Dec 25, 2024
    Dataset provided by
    data.austintexas.gov
    Description

    About the Dataset This dataset contains: A list of School Food Authorities (SFAs) that have recently undergone Administrative Reviews with TDA, including: Types of school nutrition program operated Special provision programs utilized Whether or not there were Findings This report can be found on SquareMeals Compliance for NSLP for the current program year and will be posted to ODP within three months after the end of the program year.

  18. o

    The White paper on Telecommunication markets review - Dataset - Open...

    • opendata.gov.jo
    Updated Dec 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). The White paper on Telecommunication markets review - Dataset - Open Government Data [Dataset]. https://opendata.gov.jo/dataset/the-white-paper-on-telecommunication-markets-review-3550-2019
    Explore at:
    Dataset updated
    Dec 19, 2024
    Description

    The White paper on Telecommunication markets review

  19. h

    autotrain-data-imdb-sentiment-analysis

    • huggingface.co
    Updated Aug 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Feng Peng (2023). autotrain-data-imdb-sentiment-analysis [Dataset]. https://huggingface.co/datasets/linktimecloud/autotrain-data-imdb-sentiment-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2023
    Authors
    Feng Peng
    Description

    AutoTrain Dataset for project: imdb-sentiment-analysis

      Dataset Description
    

    This dataset has been automatically processed by AutoTrain for project imdb-sentiment-analysis.

      Languages
    

    The BCP-47 code for the dataset's language is en.

      Dataset Structure
    
    
    
    
    
    
    
      Data Instances
    

    A sample from this dataset looks as follows: [ { "text": "Me neither, but this flick is unfortunately one of those movies that are too bad to be good and… See the full description on the dataset page: https://huggingface.co/datasets/linktimecloud/autotrain-data-imdb-sentiment-analysis.

  20. Rotten Tomatoes Reviews Dataset

    • kaggle.com
    • huggingface.co
    Updated Jan 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emre Baloğlu (2022). Rotten Tomatoes Reviews Dataset [Dataset]. https://www.kaggle.com/mrbaloglu/rotten-tomatoes-reviews-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 12, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Emre Baloğlu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words.

    Acknowledgements

    This dataset is converted to a csv file by using the data descriped in the paper of Pang and Lee (2005) which is frequently used as a benchmark for text classification tasks. Also, special thanks to Product School for enabling the photo used in the banner for public use.

    References

    Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/

Amazon review data 2018

Explore at:
83 scholarly articles cite this dataset (View in Google Scholar)
Dataset authored and provided by
UCSD CSE Research Project
Description

Context

This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

  • More reviews:

    • The total number of reviews is 233.1 million (142.8 million in 2014).
  • New reviews:

    • Current data includes reviews in the range May 1996 - Oct 2018.
  • Metadata: - We have added transaction metadata for each review shown on the review page.

    • Added more detailed metadata of the product landing page.

Acknowledgements

If you publish articles based on this dataset, please cite the following paper:

  • Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
Search
Clear search
Close search
Google apps
Main menu