Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset comprises customer reviews for Amazon, an online retail giant, featuring insights into customer experiences, including ratings, review titles, texts, and metadata. It is valuable for analyzing customer satisfaction, sentiment, and trends.
Column Descriptions:
Reviewer Name: Identifies the reviewer. Profile Link: Links to the reviewer's profile for additional insights. Country: Indicates the reviewer's location. Review Count: Number of reviews by the same user, showing engagement level. Review Date: When the review was posted, useful for time analysis. Rating: Numerical satisfaction measure. Review Title: Summarizes the review sentiment. Review Text: Detailed customer feedback. Date of Experience: When the service/product was experienced.
Prospective applications:
Sentiment Analysis: Analyze review texts and titles to assess overall customer sentiment toward products, enabling the identification of strengths and weaknesses. Customer Satisfaction Tracking: Track and visualize rating trends over time to understand fluctuations in customer satisfaction. Product Improvement: Identify common themes in reviews to highlight areas for product enhancement or development. Market Segmentation: Use country and demographic information to customize marketing strategies and gain insights into regional preferences. Competitor Analysis: Evaluate customer feedback on Amazon products in comparison to competitors to determine market positioning. Recommendation Systems: Leverage review data to enhance recommendation algorithms, improving personalized shopping experiences. Trend Analysis: Investigate temporal patterns in reviews to link sentiment changes with marketing efforts or product launches.
This extensive dataset serves as a valuable asset for various analyses focused on enhancing customer engagement and refining business strategies.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created from the scraped reviews from products in Amazon for the purpose of text classification. The classes are three in number namely; - Negative Reviews - Neutral Reviews - Positive Reviews
Data columns includes; - Sentiments - Cleaned Review - Cleaned Review Length - Review Score
This dataset presents the problem of multiclass classification with the use of ML algorithms and also deep learning algorithms. Moreover, there is a class imbalance; negative reviews has the lowest number of reviews compared to positive and neutral reviews.
For ML algo use a mapping of; negative--> -1, neutral--> 0, positive --> 1
For Deep Learning algo use a mapping of; negative --> 0 neutral --> 1 positive --> 2
Looking forward to your model discoveries on this dataset.
Please leave an upvote if you find this relevant 😀.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
2 useful files:
This is a large-scale Amazon Reviews dataset, collected in 2023 by McAuley Lab, and it includes rich features such as:
- User Reviews (ratings, text, helpfulness votes, etc.); - Item Metadata (descriptions, price, raw image, etc.); - Links (user-item / bought together graphs).
What's New? In the Amazon Reviews'23, we provide:
Larger Dataset: We collected 571.54M reviews, **245.2% **larger than the last version; - Newer Interactions: Current interactions range from May. 1996 to Sep. 2023; Richer Metadata: More descriptive features in item metadata; Fine-grained Timestamp: Interaction timestamp at the second or finer level; Cleaner Processing: Cleaner item metadata than previous versions; Standard Splitting: Standard data splits to encourage RecSys benchmarking.
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
About
This is a mock dataset with Amazon product reviews. Classes are structured: 6 "level 1" classes, 64 "level 2" classes, and 510 "level 3" classes.
3 files are shared:
Level 1 classes are: health personal care, toys games, beauty, pet supplies, baby products, and grocery gourmet food.
Dataset originally from https://www.kaggle.com/datasets/kashnitsky/hierarchical-text-classification
This dataset consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plaintext review. Number of reviews -> 568,454 Number of users -> 256,059 Number of products -> 74,258
Citation - J. McAuley and J. Leskovec. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews WWW, 2013.
These datasets contain 1.48 million question and answer pairs about products from Amazon.
Metadata includes
question and answer text
is the question binary (yes/no), and if so does it have a yes/no answer?
timestamps
product ID (to reference the review dataset)
Basic Statistics:
Questions: 1.48 million
Answers: 4,019,744
Labeled yes/no questions: 309,419
Number of unique products with questions: 191,185
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains over 4,900 customer reviews from Amazon, including text-based feedback, star ratings, and helpfulness votes.
It can be used for:
reviewText
: Full written reviewoverall
: Star rating (1 to 5)summary
: Short summary of the reviewhelpful_yes
: Number of users who found the review helpfultotal_vote
: Total votes on helpfulnessday_diff
: Days since the review was writtenThis dataset is suitable for natural language processing (NLP) and supervised learning tasks.
This is a publicly available dataset for educational and research use.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
These datasets contain reviews from the Goodreads book review website, and a variety of attributes describing the items. Critically, these datasets have multiple levels of user interaction, raging from adding to a shelf, rating, and reading.
Metadata includes
reviews
add-to-shelf, read, review actions
book attributes: title, isbn
graph of similar books
Basic Statistics:
Items: 1,561,465
Users: 808,749
Interactions: 225,394,930
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
What is this?
This is a cleaned version of Amazon Product Dataset 2020 from Kaggle.
Why?
Using via Hugging Face API is easier; Kaggle API is annoying because their authentication is having credentials in a folder. Cleaned because 13/28 columns are empty.
Amazon Customer Reviews (a.k.a. Product Reviews) is one of Amazon’s iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. Accordingly, we are releasing this data to further research in multiple disciplines related to understanding customer product experiences. Specifically, this dataset was constructed to represent a sample of customer evaluations and opinions, variation in the perception of a product across geographical regions, and promotional intent or bias in reviews.
These datasets contain reviews from the Steam video game platform, and information about which games were bundled together.
Metadata includes
reviews
purchases, plays, recommends (likes)
product bundles
pricing information
Basic Statistics:
Reviews: 7,793,069
Users: 2,567,538
Items: 15,474
Bundles: 615
This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.
Metadata includes
product IDs
bounding boxes
Basic Statistics:
Scenes: 47,739
Products: 38,111
Scene-Product Pairs: 93,274
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Book ratings
This dataset has two files:
Books_rating.csv --> With information about books ratings made by users books_data.csv --> Metadata about the books, title, author, genre, etc.
It is intended as an input dataset to train a recommender system. It was obtained from this dataset of Amazon book reviews in Kaggle
These datasets contain peer-to-peer trades from various recommendation platforms.
Metadata includes
peer-to-peer trades
have and want lists
image data (tradesy)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
The Amazon Reviews Polarity Dataset discloses eighteen years of customers' ratings and reviews from Amazon.com, offering an unparalleled trove of insight and knowledge. Drawing from the immense pool of over 35 million customer reviews, this dataset presents a broad spectrum of customer opinions on products they have bought or used. This invaluable data is a gold mine for improving products and services as it contains comprehensive information regarding customers' experiences with a product including ratings, titles, and plaintext content. At the same time, this dataset contains both customer-specific data along with product information which encourages deep analytics that could lead to great advances in providing tailored solutions for customers. Has your product been favored by the majority? Are there any aspects that need extra care? Use Amazon Reviews Polarity to gain deeper insights into what your customers want - explore now!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
- Analyze customer ratings to identify trends: Take a look at how many customers have rated the same product or service with the same score (e.g., 4 stars). You can use this information to identify what customers like or don’t like about it by examining common sentiment throughout the reviews. Identifying these patterns can help you make decisions on which features of your products or services to emphasize in order to boost sales and satisfaction rates.
2 Review content analysis: Analyzing review content is one of the best ways to gauge customer sentiment toward specific features or aspects of a product/service. Using natural language processing tools such as Word2Vec, Latent Dirichlet Allocation (LDA), or even simple keyword search algorithms can quickly reveal general topics that are discussed in relation to your product/service across multiple reviews - allowing you quickly pinpoint areas that may need improvement for particular items within your lines of business.
3 Track associated scores over time: By tracking customer ratings overtime, you may be able to better understand when there has been an issue with something specific related to your product/service - such as negative response toward a feature that was introduced but didn’t seem popular among customers and was removed shortly after introduction.. This can save time and money by identifying issues before they become widespread concerns with larger sets of consumers who invest their money in using your company's item(s).
4 Visualize sentiment data over time graphs : Utilizing visualizations such as bar graphs can help identify trends across different categories quicker than raw numbers alone; combining both numeric values along with color differences associated between different scores allows you spot anomalies easier - allowing faster resolution times when trying figure out why certain spikes occurred where other stayed stable (or vice-versa) when comparing similar data points through time-series based visualization models
- Developing a customer sentiment analysis system that can be used to quickly analyze the sentiment of reviews and identify any potential areas of improvement.
- Building a product recommendation service that takes into account the ratings and reviews of customers when recommending similar products they may be interested in purchasing.
- Training a machine learning model to accurately predict customers’ ratings on new products they have not yet tried and leverage this for further product development optimization initiatives
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description | |:--------------|:-------------------------------------------------------------------| | label | The sentiment of the review, either positive or negative. (String) | | title | The title of the review. (String) ...
These datasets include ratings as well as social (or trust) relationships between users. Data are from LibraryThing (a book review website) and epinions (general consumer reviews).
Metadata includes
reviews
price paid (epinions)
helpfulness votes (librarything)
flags (librarything)
We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing, including over 250k musical scores in MusicXML format. PDMX is the largest publicly available, copyright-free MusicXML dataset in existence. PDMX includes genre, tag, description, and popularity metadata for every file.
- The dataset contains written reviews over time written on amazon on Apple mobile devices. This dataset was created by CrawlFeeds and contains around 180K reviews along with Country & Date and other features such as:
- Did the user buy the product?
- On what product did the user write the review?
- and more.
- Analyze the sentiment of the review, try to isolate the phrases associated with positive/negative reviews.
- Study the connection between country and review sentiment
- Study the connection between the time of day and sentiment
- More datasets
If you use this dataset in your research, please credit CrawlFeeds
This is a collection recipes paired with variants, e.g. a recipe matched with a vegan version of the same recipe.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset comprises customer reviews for Amazon, an online retail giant, featuring insights into customer experiences, including ratings, review titles, texts, and metadata. It is valuable for analyzing customer satisfaction, sentiment, and trends.
Column Descriptions:
Reviewer Name: Identifies the reviewer. Profile Link: Links to the reviewer's profile for additional insights. Country: Indicates the reviewer's location. Review Count: Number of reviews by the same user, showing engagement level. Review Date: When the review was posted, useful for time analysis. Rating: Numerical satisfaction measure. Review Title: Summarizes the review sentiment. Review Text: Detailed customer feedback. Date of Experience: When the service/product was experienced.
Prospective applications:
Sentiment Analysis: Analyze review texts and titles to assess overall customer sentiment toward products, enabling the identification of strengths and weaknesses. Customer Satisfaction Tracking: Track and visualize rating trends over time to understand fluctuations in customer satisfaction. Product Improvement: Identify common themes in reviews to highlight areas for product enhancement or development. Market Segmentation: Use country and demographic information to customize marketing strategies and gain insights into regional preferences. Competitor Analysis: Evaluate customer feedback on Amazon products in comparison to competitors to determine market positioning. Recommendation Systems: Leverage review data to enhance recommendation algorithms, improving personalized shopping experiences. Trend Analysis: Investigate temporal patterns in reviews to link sentiment changes with marketing efforts or product launches.
This extensive dataset serves as a valuable asset for various analyses focused on enhancing customer engagement and refining business strategies.