98 datasets found

D
SYNERGY - Open machine learning dataset on study selection in systematic...
dataverse.nl
csv, json, txt, zip
Updated Apr 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan De Bruin; Jonathan De Bruin; Yongchao Ma; Yongchao Ma; Gerbrich Ferdinands; Gerbrich Ferdinands; Jelle Teijema; Jelle Teijema; Rens Van de Schoot; Rens Van de Schoot (2023). SYNERGY - Open machine learning dataset on study selection in systematic reviews [Dataset]. http://doi.org/10.34894/HE6NAQ
Explore at:
txt(212), json(702), zip(16028323), json(19426), txt(263), zip(3560967), txt(305), json(470), txt(279), zip(2355371), json(23201), csv(460956), txt(200), json(685), json(546), csv(63996), zip(2989015), zip(5749455), txt(331), txt(315), json(691), json(23775), csv(672721), json(468), txt(415), json(22778), csv(31919), csv(746832), json(18392), zip(62992826), csv(234822), txt(283), zip(34788857), json(475), txt(242), json(533), csv(42227), json(24548), zip(738232), json(22477), json(25491), zip(11463283), json(17741), csv(490660), json(19662), json(578), csv(19786), zip(14708207), zip(24619707), zip(2404439), json(713), json(27224), json(679), json(26426), txt(185), json(906), zip(18534723), json(23550), txt(266), txt(317), zip(6019723), json(33943), txt(436), csv(388378), json(469), zip(2106498), txt(320), csv(451336), txt(338), zip(19428163), json(14326), json(31652), txt(299), csv(96153), txt(220), csv(114789), json(15452), csv(5372708), json(908), csv(317928), csv(150923), json(465), csv(535584), json(26090), zip(8164831), json(19633), txt(316), json(23494), csv(133950), json(18638), csv(3944082), json(15345), json(473), zip(4411063), zip(10396095), zip(835096), txt(255), json(699), csv(654705), txt(294), csv(989865), zip(1028035), txt(322), zip(15085090), txt(237), txt(310), json(756), json(30628), json(19490), json(25908), txt(401), json(701), zip(5543909), json(29397), zip(14007470), json(30058), zip(58869042), csv(852937), json(35711), csv(298011), csv(187163), txt(258), zip(3526740), json(568), json(21552), zip(66466788), csv(215250), json(577), csv(103010), txt(306), zip(11840006)Available download formats
Unique identifier
https://doi.org/10.34894/HE6NAQ
Dataset updated
Apr 24, 2023
Dataset provided by
DataverseNL
Authors
Jonathan De Bruin; Jonathan De Bruin; Yongchao Ma; Yongchao Ma; Gerbrich Ferdinands; Gerbrich Ferdinands; Jelle Teijema; Jelle Teijema; Rens Van de Schoot; Rens Van de Schoot
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
SYNERGY is a free and open dataset on study selection in systematic reviews, comprising 169,288 academic works from 26 systematic reviews. Only 2,834 (1.67%) of the academic works in the binary classified dataset are included in the systematic reviews. This makes the SYNERGY dataset a unique dataset for the development of information retrieval algorithms, especially for sparse labels. Due to the many available variables available per record (i.e. titles, abstracts, authors, references, topics), this dataset is useful for researchers in NLP, machine learning, network analysis, and more. In total, the dataset contains 82,668,134 trainable data points. The easiest way to get the SYNERGY dataset is via the synergy-dataset Python package. See https://github.com/asreview/synergy-dataset for all information.
u
Goodreads Book Reviews
cseweb.ucsd.edu
json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Goodreads Book Reviews [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain reviews from the Goodreads book review website, and a variety of attributes describing the items. Critically, these datasets have multiple levels of user interaction, raging from adding to a shelf, rating, and reading.

Metadata includes

reviews

add-to-shelf, read, review actions

book attributes: title, isbn

graph of similar books

Basic Statistics:

Items: 1,561,465

Users: 808,749

Interactions: 225,394,930
c
Booking dot com reviews datasets
crawlfeeds.com
csv, zip
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Booking dot com reviews datasets [Dataset]. https://crawlfeeds.com/datasets/booking-dot-com-reviews-datasets
Explore at:
csv, zipAvailable download formats
Dataset updated
Jun 15, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
The Booking.com Reviews Dataset is a comprehensive collection of user-generated reviews for hotels, hostels, bed & breakfasts, and other accommodations listed on Booking.com. This dataset provides detailed information on customer reviews, including ratings, review text, review dates, customer demographics, and more. It is a valuable resource for analyzing customer sentiment, service quality, and overall guest experiences across different types of accommodations worldwide.

Key Features:

Review Data: Includes detailed customer reviews with both positive and negative feedback, providing insights into customer experiences and satisfaction levels.

Ratings: Features individual ratings for various aspects of the accommodations, such as cleanliness, location, service, value for money, and overall satisfaction.

Review Dates: Provides the dates of each review, enabling trend analysis over time.

Accommodation Details: Includes information about the accommodations being reviewed, such as name and location.

Language Support: Reviews are available in multiple languages, reflecting the diverse user base of Booking.com.

Use Cases:

Sentiment Analysis: Ideal for businesses and researchers conducting sentiment analysis to understand customer opinions and trends in the hospitality industry.

Market Research: Useful for market research and competitive analysis, identifying strengths and weaknesses of different accommodation types and regions.

Machine Learning: Beneficial for developing machine learning models for natural language processing, sentiment classification, and recommendation systems.

Customer Experience Improvement: Helps hotel managers and owners understand customer feedback to improve services and guest experiences.

Academic Research: Suitable for academic research in hospitality management, consumer behavior, data science, and artificial intelligence.

Dataset Format:

The dataset is available in CSV format making it easy to use for data analysis, machine learning, and application development.

Access 3 million+ US hotel reviews — submit your request today.
Amazon Customer Review Data
zenodo.org
pdf
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akash Shashikant Vaykar; Abhishek Kaushik; Abhishek Kaushik; Akash Shashikant Vaykar (2024). Amazon Customer Review Data [Dataset]. http://doi.org/10.5281/zenodo.3549704
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3549704
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Akash Shashikant Vaykar; Abhishek Kaushik; Abhishek Kaushik; Akash Shashikant Vaykar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset: Amazon Customer Review Data for sentiment analysis

Size: 60889 appox.

Format: .CSV

Period: 2013 to 2019

Categories: 5…… (Mobiles, Smart TV, Books, Mobile Accessories, Refrigerator)

Unique_ID: Customized (Primary Key)

Review_Header: user’s comment in few words

Review_Text: User’s comment in details (3-4 lines)

Rating: (1- Very Low, 2 🡪 Low, 3🡪 Avg, 4 🡪 Good, 5 - Excellent)

Posting Period: 2013 to 2019

Own_Rating: for 1-2 🡪 Negative, 3🡪 Neutral, 4-5 🡪 Positive
Amazon Product Reviews
kaggle.com
Updated Nov 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Amazon Product Reviews [Dataset]. https://www.kaggle.com/datasets/thedevastator/amazon-product-reviews/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Amazon Product Reviews

18 Years of Customer Ratings and Experiences

By Huggingface Hub [source]

About this dataset

The Amazon Reviews Polarity Dataset discloses eighteen years of customers' ratings and reviews from Amazon.com, offering an unparalleled trove of insight and knowledge. Drawing from the immense pool of over 35 million customer reviews, this dataset presents a broad spectrum of customer opinions on products they have bought or used. This invaluable data is a gold mine for improving products and services as it contains comprehensive information regarding customers' experiences with a product including ratings, titles, and plaintext content. At the same time, this dataset contains both customer-specific data along with product information which encourages deep analytics that could lead to great advances in providing tailored solutions for customers. Has your product been favored by the majority? Are there any aspects that need extra care? Use Amazon Reviews Polarity to gain deeper insights into what your customers want - explore now!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Analyze customer ratings to identify trends: Take a look at how many customers have rated the same product or service with the same score (e.g., 4 stars). You can use this information to identify what customers like or don’t like about it by examining common sentiment throughout the reviews. Identifying these patterns can help you make decisions on which features of your products or services to emphasize in order to boost sales and satisfaction rates.

2 Review content analysis: Analyzing review content is one of the best ways to gauge customer sentiment toward specific features or aspects of a product/service. Using natural language processing tools such as Word2Vec, Latent Dirichlet Allocation (LDA), or even simple keyword search algorithms can quickly reveal general topics that are discussed in relation to your product/service across multiple reviews - allowing you quickly pinpoint areas that may need improvement for particular items within your lines of business.

3 Track associated scores over time: By tracking customer ratings overtime, you may be able to better understand when there has been an issue with something specific related to your product/service - such as negative response toward a feature that was introduced but didn’t seem popular among customers and was removed shortly after introduction.. This can save time and money by identifying issues before they become widespread concerns with larger sets of consumers who invest their money in using your company's item(s).

4 Visualize sentiment data over time graphs : Utilizing visualizations such as bar graphs can help identify trends across different categories quicker than raw numbers alone; combining both numeric values along with color differences associated between different scores allows you spot anomalies easier - allowing faster resolution times when trying figure out why certain spikes occurred where other stayed stable (or vice-versa) when comparing similar data points through time-series based visualization models

Research Ideas

Developing a customer sentiment analysis system that can be used to quickly analyze the sentiment of reviews and identify any potential areas of improvement.

Building a product recommendation service that takes into account the ratings and reviews of customers when recommending similar products they may be interested in purchasing.

Training a machine learning model to accurately predict customers’ ratings on new products they have not yet tried and leverage this for further product development optimization initiatives

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:-------------------------------------------------------------------| | label | The sentiment of the review, either positive or negative. (String) | | title | The title of the review. (String) ...
c
Apple mobile phones reviews
crawlfeeds.com
json, zip
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Apple mobile phones reviews [Dataset]. https://crawlfeeds.com/datasets/apple-mobile-phones-reviews
Explore at:
zip, jsonAvailable download formats
Dataset updated
Apr 29, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
App mobile phones reviews structured dataset. This small dataset is ideal for NLP and to test machine learning algorithms.

Get large dataset from our resources.

Extracted from amazon.

Data included only for apple mobile phones.

Reach out to us for large datasets
m
Consumer Review of Clothing Product
data.mendeley.com
Updated Feb 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nadhif Girawan (2024). Consumer Review of Clothing Product [Dataset]. http://doi.org/10.17632/pg3s4hw68k.3
Explore at:
Unique identifier
https://doi.org/10.17632/pg3s4hw68k.3
Dataset updated
Feb 19, 2024
Authors
Nadhif Girawan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is collected on our own from various sources. This dataset comprises a comprehensive collection of reviews pertaining to clothing products and serves as a valuable resource for multilabel classification research. Each data entry is meticulously annotated with relevant labels, allowing researchers to explore various dimensions of the clothing products being reviewed. The dataset offers a rich diversity of perspectives and opinions, enabling the development and evaluation of robust classification models that can accurately predict multiple aspects of a given clothing item. With its focus on multilabel classification, this data contributes significantly to advancing the understanding and application of machine learning algorithms in the fashion industry.
h
drug-reviews
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mouwiya S. A. Al-Qaisieh, drug-reviews [Dataset]. https://huggingface.co/datasets/Mouwiya/drug-reviews
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Mouwiya S. A. Al-Qaisieh
License
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Description
Dataset Details

1.Dataset Loading:

Initially, we load the Drug Review Dataset from the UC Irvine Machine Learning Repository. This dataset contains patient reviews of different drugs, along with the medical condition being treated and the patients' satisfaction ratings.

2.Data Preprocessing:

The dataset is preprocessed to ensure data integrity and consistency. We handle missing values and ensure that each patient ID is unique across the dataset.

3.Text… See the full description on the dataset page: https://huggingface.co/datasets/Mouwiya/drug-reviews.
i
IMDb Movie Reviews Dataset
ieee-dataport.org
Updated Aug 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Pal (2022). IMDb Movie Reviews Dataset [Dataset]. https://ieee-dataport.org/open-access/imdb-movie-reviews-dataset
Explore at:
Dataset updated
Aug 2, 2022
Authors
Aditya Pal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R
i
SaudiShopInsights Dataset: Saudi Customer Reviews in Clothes and Electronics...
ieee-dataport.org
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
razan alrefaey (2023). SaudiShopInsights Dataset: Saudi Customer Reviews in Clothes and Electronics [Dataset]. https://ieee-dataport.org/documents/saudishopinsights-dataset-saudi-customer-reviews-clothes-and-electronics
Explore at:
Dataset updated
Dec 19, 2023
Authors
razan alrefaey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Saudi Arabia
Description
natural language processing
h
amazon_us_reviews
huggingface.co
tensorflow.org
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Polina Kazakova (2023). amazon_us_reviews [Dataset]. https://huggingface.co/datasets/polinaeterna/amazon_us_reviews
Explore at:
Dataset updated
Jun 30, 2023
Authors
Polina Kazakova
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Amazon Customer Reviews (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. Accordingly, we are releasing this data to further research in multiple disciplines related to understanding customer product experiences. Specifically, this dataset was constructed to represent a sample of customer evaluations and opinions, variation in the perception of a product across geographical regions, and promotional intent or bias in reviews.

Over 130+ million customer reviews are available to researchers as part of this release. The data is available in TSV files in the amazon-reviews-pds S3 bucket in AWS US East Region. Each line in the data files corresponds to an individual review (tab delimited, with no quote and escape characters).

Each Dataset contains the following columns:

marketplace: 2 letter country code of the marketplace where the review was written.

customer_id: Random identifier that can be used to aggregate reviews written by a single author.

review_id: The unique ID of the review.

product_id: The unique Product ID the review pertains to. In the multilingual dataset the reviews for the same product in different countries can be grouped by the same product_id.

product_parent: Random identifier that can be used to aggregate reviews for the same product.

product_title: Title of the product.

product_category: Broad product category that can be used to group reviews (also used to group the dataset into coherent parts).

star_rating: The 1-5 star rating of the review.

helpful_votes: Number of helpful votes.

total_votes: Number of total votes the review received.

vine: Review was written as part of the Vine program.

verified_purchase: The review is on a verified purchase.

review_headline: The title of the review.

review_body: The review text.

review_date: The date the review was written.
Language Generation Dataset: 200M Samples
kaggle.com
zip
Updated Sep 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Chatterjee (2019). Language Generation Dataset: 200M Samples [Dataset]. https://www.kaggle.com/datasets/imdeepmind/language-generation-dataset-200m-samples
Explore at:
zip(3416608411 bytes)Available download formats
Dataset updated
Sep 7, 2019
Authors
Abhishek Chatterjee
Description
Context

Amazon Customer Reviews Dataset is a dataset of user-generated product reviews on the shopping website Amazon. It contains over 130 million product reviews.

This dataset contains a tiny fraction of that dataset processed and prepared specifically for language generation.

To know how the dataset is prepared, then please check the GitHub repository for this dataset. https://github.com/imdeepmind/AmazonReview-LanguageGenerationDataset

Content

The dataset is stored in an SQLite database. The database contains one table called reviews. This table contains two columns sequence and next.

The sequence column contains sequences of characters. In this dataset, each sequence of 40 characters long.

The next column contains the next character after the sequence.

There are about 200 million samples are in the dataset.

Acknowledgements

Thanks to Amazon for making this awesome dataset. Here is the link for the dataset: https://s3.amazonaws.com/amazon-reviews-pds/readme.html

Inspiration

This dataset can be used for Language Generation. As it contains 200 million samples, complex Deep Learning models can be trained on this data.
m
Mobile App Logo and User Reviews Recommendation
data.mendeley.com
Updated Aug 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iconix Sas (2024). Mobile App Logo and User Reviews Recommendation [Dataset]. http://doi.org/10.17632/v4ndw78f9b.1
Explore at:
Unique identifier
https://doi.org/10.17632/v4ndw78f9b.1
Dataset updated
Aug 15, 2024
Authors
Iconix Sas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset offers thorough app metadata from the Google Play Store and a sentiment analysis of user reviews for the app. The first dataset (App_Sentiment_Analysis.csv) provides insights into user views and experiences via translated review texts, sentiment classifications, and numerical ratings for sentiment polarity and subjectivity. The second dataset (Review.csv) covers various program parameters, including ratings, review counts, sizes, installation counts, content ratings, genres, and more. When combined, these datasets allow for an in-depth examination of user reviews and app performance, which supports tactics for app suggestion and enhancement. And also used app logo images using recommendations in this dataset.
Z
Dataset for Machine Learning Assisted Citation Screening for Systematic...
data.niaid.nih.gov
Updated Dec 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dhrangadhariya, Anjani (2023). Dataset for Machine Learning Assisted Citation Screening for Systematic Reviews [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10423426
Explore at:
Dataset updated
Dec 22, 2023
Dataset provided by
Dhrangadhariya, Anjani
Müller, Henning
Hilfiker, Roger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The work "Machine Learning Assisted Citation Screening for Systematic Reviews" explored the problem of citation screening automation using machine-learning (ML) with an aim to accelerate the process of generating systematic reviews. Manual process of citation screening involve two reviewers manually screening the searched studies using a predefined inclusion criteria. If the study passes the "inclusion" criteria, it is included for further analysis or is excluded. As apparant through manual screening process, the work considered citation screening as a binary classification problem whereby any ML classifier could be trained to separate the searched studies into these two classes (include and exclude).

A physiotherapy citation screening dataset was used to test automation approaches and the dataset includes the studies identified for citation screening in an update to the systematic review by Hilfiker et al. The dataset included titles and abstracts (citations) from 31,279 (deduplicated: 25,540) studies identified during the search phase of this SR. These studies were already manually assessed for relevance and labelled by two reviewers into two mutually exclusive labels. The uploaded file consists of 25,540 data samples, with each data sample separated by a new line. It is a tab separated file and the data in it is structured as shown below. This dataset was manually labelled into include and exclude by Hilfiker et al.

Title PMID Abstract Class MeSH terms (separated by a pipe)

Structured exercise improves physical functioning in women with stages I and II breast cancer: results of a randomized controlled trial.
11157015 Abstract PURPOSE: Self-directed and supervised exercise were compared with usual care in a clinical trial designed to evaluate the effect of structured exercise on physical functioning and other dimensions of health-related quality of life in women with stages I and II breast cancer. PATIENTS AND METHODS: One hundred twenty-three women with stages I and II breast cancer completed baseline evaluations of generic and disease- and site-specific health-related quality of life, aerobic capacity, and body weight. Participants were randomly allocated to one of three intervention groups: usual care (control group), self-directed exercise, or supervised exercise. Quality of life, aerobic capacity, and body weight measures were repeated at 26 weeks... include or exclude Clinical Trial | Comparative Study | Randomized Controlled Trial | Research Support, Non-U.S. Gov't | Antineoplastic Combined Chemotherapy Protocols | Breast Neoplasms | Breast Neoplasms | Breast Neoplasms | Chemotherapy, Adjuvant | Exercise | Female | Humans | Middle Aged | Neoplasm Staging | Quality of Life | Radiotherapy, Adjuvant

If you use this dataset in your research, please cite our papers.
f
Amazon Reviews Full
figshare.com
application/x-gzip
Updated Nov 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luís Fred (2020). Amazon Reviews Full [Dataset]. http://doi.org/10.6084/m9.figshare.13232537.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13232537.v1
Dataset updated
Nov 13, 2020
Dataset provided by
figshare
Authors
Luís Fred
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Amazon Review Full Score DatasetVersion 3, Updated 09/09/2015ORIGINThe Amazon reviews dataset consists of reviews from amazon. The data span a period of 18 years, including ~35 million reviews up to March 2013. Reviews include product and user information, ratings, and a plaintext review. For more information, please refer to the following paper: J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.The Amazon reviews full score dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the above dataset. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).DESCRIPTIONThe Amazon reviews full score dataset is constructed by randomly taking 600,000 training samples and 130,000 testing samples for each review score from 1 to 5. In total there are 3,000,000 trainig samples and 650,000 testing samples.The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3 columns in them, corresponding to class index (1 to 5), review title and review text. The review title and text are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is " ".
h
walmart-reviews-dataset
huggingface.co
Updated Feb 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2021). walmart-reviews-dataset [Dataset]. https://huggingface.co/datasets/crawlfeeds/walmart-reviews-dataset
Explore at:
Dataset updated
Feb 17, 2021
Authors
Crawl Feeds
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
🛒 Walmart Product Reviews Dataset (6.7K Records)

This dataset contains 6,700+ structured customer reviews from Walmart.com. Each entry includes product-level metadata along with review details, making it ideal for small-scale machine learning models, sentiment analysis, and ecommerce insights.

📑 Dataset Fields

Column Description

url Direct product page URL

name Product name/title

sku Product SKU (Stock Keeping Unit)

price Product price (numeric, USD)… See the full description on the dataset page: https://huggingface.co/datasets/crawlfeeds/walmart-reviews-dataset.
w
Chemical product and function dataset
data.wu.ac.at
catalog.data.gov
xls
Updated Jun 4, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2017). Chemical product and function dataset [Dataset]. https://data.wu.ac.at/schema/data_gov/NWM4NDc4MWItYTE3NS00NjdhLWJkZDYtYjkyNDRlYTMzZjgw
Explore at:
xlsAvailable download formats
Dataset updated
Jun 4, 2017
Dataset provided by
U.S. Environmental Protection Agency
Description
Merged product weight fraction and chemical function data.

This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K. Phillips, R. Brooks, T. Hong, and J. Wambaugh. Characterization and prediction of chemical functions and weight fractions in consumer products. Toxicology Reports. Elsevier B.V., Amsterdam, NETHERLANDS, 3: 723-732, (2016).
c
Apple iPhone SE reviews & ratings Dataset
cubig.ai
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Apple iPhone SE reviews & ratings Dataset [Dataset]. https://cubig.ai/store/products/143/apple-iphone-se-reviews-ratings-dataset
Explore at:
Dataset updated
Feb 25, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data introduction • Apple-iphone-se-reviews dataset is a dataset that scrapes data from the Flipkart website using Selenium and BeautifulSoup links.

2) Data utilization (1)Apple-iphone-se-reviews data has characteristics that: • User ratings for Apple iPhone SE on Indian e-commerce website Flipkart are . We aim at NLP text classification through user ratings, review titles, and review text. (2)Apple-iphone-se-reviews data can be used to: • Rating prediction: You can support automated review analysis and summarization by developing machine learning models to predict ratings based on review text. • Product Improvement: Insights gained from reviews can help us identify common issues and areas for improvement in iPhone SE and guide product development and quality improvements.
The Artificial Intelligence in Retail Market size was USD 4951.2 Million in...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Mar 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2024). The Artificial Intelligence in Retail Market size was USD 4951.2 Million in 2023 [Dataset]. https://www.cognitivemarketresearch.com/artificial-intelligence-in-retail-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Mar 1, 2024
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, the global Artificial Intelligence in Retail market size is USD 4951.2 million in 2023and will expand at a compound annual growth rate (CAGR) of 39.50% from 2023 to 2030.

Enhanced customer personalization to provide viable market output Demand for online remains higher in Artificial Intelligence in the Retail market. The machine learning and deep learning category held the highest Artificial Intelligence in Retail market revenue share in 2023. North American Artificial Intelligence In Retail will continue to lead, whereas the Asia-Pacific Artificial Intelligence In Retail market will experience the most substantial growth until 2030.

Market Dynamics of the Artificial Intelligence in the Retail Market

Key Drivers for Artificial Intelligence in Retail Market

Enhanced Customer Personalization to Provide Viable Market Output

A primary driver of Artificial Intelligence in the Retail market is the pursuit of enhanced customer personalization. A.I. algorithms analyze vast datasets of customer behaviors, preferences, and purchase history to deliver highly personalized shopping experiences. Retailers leverage this insight to offer tailored product recommendations, targeted marketing campaigns, and personalized promotions. The drive for superior customer personalization not only enhances customer satisfaction but also increases engagement and boosts sales. This focus on individualized interactions through A.I. applications is a key driver shaping the dynamic landscape of A.I. in the retail market.

January 2023 - Microsoft and digital start-up AiFi worked together to offer Smart Store Analytics. It is a cloud-based tracking solution that helps merchants with operational and shopper insights for intelligent, cashierless stores.

Source-techcrunch.com/2023/01/10/aifi-microsoft-smart-store-analytics/

Improved Operational Efficiency to Propel Market Growth

Another pivotal driver is the quest for improved operational efficiency within the retail sector. A.I. technologies streamline various aspects of retail operations, from inventory management and demand forecasting to supply chain optimization and cashier-less checkout systems. By automating routine tasks and leveraging predictive analytics, retailers can enhance efficiency, reduce costs, and minimize errors. The pursuit of improved operational efficiency is a key motivator for retailers to invest in AI solutions, enabling them to stay competitive, adapt to dynamic market conditions, and meet the evolving demands of modern consumers in the highly competitive artificial intelligence (AI) retail market.

January 2023 - The EY Retail Intelligence solution, which is based on Microsoft Cloud, was introduced by the Fintech business EY to give customers a safe and efficient shopping experience. In order to deliver insightful information, this solution makes use of Microsoft Cloud for Retail and its technologies, which include image recognition, analytics, and artificial intelligence (A.I.).

Source-www.ey.com/en_gl/news/2023/01/ey-announces-launch-of-retail-solution-that-builds-on-the-microsoft-cloud-to-help-achieve-seamless-consumer-shopping-experiences

Key Restraints for Artificial Intelligence in Retail Market

Data Security Concerns to Restrict Market Growth

A prominent restraint in Artificial Intelligence in the Retail market is the pervasive concern over data security. As retailers increasingly rely on A.I. to process vast amounts of customer data for personalized experiences, there is a growing apprehension regarding the protection of sensitive information. The potential for data breaches and cyberattacks poses a significant challenge, as retailers must navigate the delicate balance between utilizing customer data for AI-driven initiatives and safeguarding it against potential security threats. Addressing these concerns is crucial to building and maintaining consumer trust in A.I. applications within the retail sector.

Key Trends for Artificial Intelligence in Retail Market

Surge in Voice-Enabled Shopping Interfaces Reshaping Retail Experiences

Voice-enabled A.I. assistants such as Amazon Alexa and Google Assistant are revolutionizing the way consumers engage with retail platforms. Shoppers can now utilize voice commands to search, compare, and purchase products, thereby streamlining and accelerating the buying process. Retailers...
h
drugscom_reviews
huggingface.co
Updated Feb 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zakia Salod (2024). drugscom_reviews [Dataset]. https://huggingface.co/datasets/Zakia/drugscom_reviews
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2024
Authors
Zakia Salod
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for "DrugsCom Reviews"

Dataset Summary

The DrugsCom Reviews dataset is originally sourced from the UCI Machine Learning Repository. It provides patient reviews on specific drugs along with related conditions and a 10-star patient rating reflecting overall patient satisfaction. The dataset has been uploaded to Hugging Face to facilitate easier access and use by the machine learning community. It contains 161,297 instances in the training set and 53,766… See the full description on the dataset page: https://huggingface.co/datasets/Zakia/drugscom_reviews.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jonathan De Bruin; Jonathan De Bruin; Yongchao Ma; Yongchao Ma; Gerbrich Ferdinands; Gerbrich Ferdinands; Jelle Teijema; Jelle Teijema; Rens Van de Schoot; Rens Van de Schoot (2023). SYNERGY - Open machine learning dataset on study selection in systematic reviews [Dataset]. http://doi.org/10.34894/HE6NAQ

SYNERGY - Open machine learning dataset on study selection in systematic reviews

Explore at:

13 scholarly articles cite this dataset (View in Google Scholar)

txt(212), json(702), zip(16028323), json(19426), txt(263), zip(3560967), txt(305), json(470), txt(279), zip(2355371), json(23201), csv(460956), txt(200), json(685), json(546), csv(63996), zip(2989015), zip(5749455), txt(331), txt(315), json(691), json(23775), csv(672721), json(468), txt(415), json(22778), csv(31919), csv(746832), json(18392), zip(62992826), csv(234822), txt(283), zip(34788857), json(475), txt(242), json(533), csv(42227), json(24548), zip(738232), json(22477), json(25491), zip(11463283), json(17741), csv(490660), json(19662), json(578), csv(19786), zip(14708207), zip(24619707), zip(2404439), json(713), json(27224), json(679), json(26426), txt(185), json(906), zip(18534723), json(23550), txt(266), txt(317), zip(6019723), json(33943), txt(436), csv(388378), json(469), zip(2106498), txt(320), csv(451336), txt(338), zip(19428163), json(14326), json(31652), txt(299), csv(96153), txt(220), csv(114789), json(15452), csv(5372708), json(908), csv(317928), csv(150923), json(465), csv(535584), json(26090), zip(8164831), json(19633), txt(316), json(23494), csv(133950), json(18638), csv(3944082), json(15345), json(473), zip(4411063), zip(10396095), zip(835096), txt(255), json(699), csv(654705), txt(294), csv(989865), zip(1028035), txt(322), zip(15085090), txt(237), txt(310), json(756), json(30628), json(19490), json(25908), txt(401), json(701), zip(5543909), json(29397), zip(14007470), json(30058), zip(58869042), csv(852937), json(35711), csv(298011), csv(187163), txt(258), zip(3526740), json(568), json(21552), zip(66466788), csv(215250), json(577), csv(103010), txt(306), zip(11840006)Available download formats

Unique identifier

https://doi.org/10.34894/HE6NAQ

Dataset updated

Apr 24, 2023

Dataset provided by

DataverseNL

Authors

Jonathan De Bruin; Jonathan De Bruin; Yongchao Ma; Yongchao Ma; Gerbrich Ferdinands; Gerbrich Ferdinands; Jelle Teijema; Jelle Teijema; Rens Van de Schoot; Rens Van de Schoot

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

SYNERGY is a free and open dataset on study selection in systematic reviews, comprising 169,288 academic works from 26 systematic reviews. Only 2,834 (1.67%) of the academic works in the binary classified dataset are included in the systematic reviews. This makes the SYNERGY dataset a unique dataset for the development of information retrieval algorithms, especially for sparse labels. Due to the many available variables available per record (i.e. titles, abstracts, authors, references, topics), this dataset is useful for researchers in NLP, machine learning, network analysis, and more. In total, the dataset contains 82,668,134 trainable data points. The easiest way to get the SYNERGY dataset is via the synergy-dataset Python package. See https://github.com/asreview/synergy-dataset for all information.

Clear search

Close search

Google apps

Main menu

SYNERGY - Open machine learning dataset on study selection in systematic...

Goodreads Book Reviews

Booking dot com reviews datasets

Amazon Customer Review Data

Amazon Product Reviews

Amazon Product Reviews

18 Years of Customer Ratings and Experiences

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Apple mobile phones reviews

Consumer Review of Clothing Product

drug-reviews

IMDb Movie Reviews Dataset

SaudiShopInsights Dataset: Saudi Customer Reviews in Clothes and Electronics...

amazon_us_reviews

Language Generation Dataset: 200M Samples

Context

Content

Acknowledgements

Inspiration

Mobile App Logo and User Reviews Recommendation

Dataset for Machine Learning Assisted Citation Screening for Systematic...

Amazon Reviews Full

walmart-reviews-dataset

Chemical product and function dataset

Apple iPhone SE reviews & ratings Dataset

The Artificial Intelligence in Retail Market size was USD 4951.2 Million in...

drugscom_reviews

SYNERGY - Open machine learning dataset on study selection in systematic reviews