Facebook
TwitterThis Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
Facebook
TwitterIn 2024, ** percent of U.S. consumers answering a survey were confident in having seen fake product reviews on Amazon. Although the number might seem very high, the figure has decreased compared to 2023, when ** percent of respondents stated the same.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset was created by Alexander Stephens
Released under Database: Open Database, Contents: © Original Authors
Fake reviews from Amazon
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises a large-scale collection of Amazon Reviews, gathered in 2023 by the McAuley Lab. Specifically, it focuses on the Amazon Fashion category, containing a total of 800K+ user reviews. It serves as a valuable resource for conducting sentiment analysis.
We will categorize rating ranges into ternary classes as follow: • Ratings from 1 to 2: Negative (-1) • Ratings of 3: Neutral (0) • Ratings from 4 to 5: Positive (1)
Total Positive Sentiments: 346924 Total Negative Sentiments: 346924 Total Neutral Sentiments: 173462
This dataset is suitable for binary classification as it already contains a balance between positive and negative sentiments. However, for ternary classification, it's necessary to balance the target values using undersampling or oversampling techniques.
The dataset encompasses the following fields: 1. Rating: 1.0 to 5.0 2. title: title of the user review 3. text: Text body of the user review 4. images: Images that users post after they have received the product. Each image has different sizes (small, medium, large), represented by the small_image_url, medium_image_url, and large_image_url respectively 5. asin: ID of the product 6. parent_asin: Parent ID of the product. Note: Products with different colors, styles, sizes usually belong to the same parent ID. The “asin” in previous Amazon datasets is actually parent ID. Please use parent ID to find product meta 7. user_id: ID of the reviewer 8. timestamp": Time of the review (Unix time) 9. helpful_vote: Helpful votes of the review 10. verified_purchase: User purchase verification 11. target: Labels for text reviews, where Positive (1), Negative (-1), and Neutral (0) represent the related sentiments.
Dataset DOI: https://doi.org/10.48550/arXiv.2403.03952 Cite Article: Hou et al. (2024) proposed a method for bridging language and items for retrieval and recommendation
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This is a small subset of dataset of Book reviews from Amazon Kindle Store category.
5-core dataset of product reviews from Amazon Kindle Store category from May 1996 - July 2014. Contains total of 982619 entries. Each reviewer has at least 5 reviews and each product has at least 5 reviews in this dataset. Columns - asin - ID of the product, like B000FA64PK -helpful - helpfulness rating of the review - example: 2/3. -overall - rating of the product. -reviewText - text of the review (heading). -reviewTime - time of the review (raw). -reviewerID - ID of the reviewer, like A3SPTOKDG7WBLN -reviewerName - name of the reviewer. -summary - summary of the review (description). -unixReviewTime - unix timestamp.
There are two files one is preprocessed ready for sentiment analysis and other is unprocessed to you basically have to process the dataset and then perform sentiment analysis
This dataset is taken from Amazon product data, Julian McAuley, UCSD website. http://jmcauley.ucsd.edu/data/amazon/
License to the data files belong to them.
-Sentiment analysis on reviews. -Understanding how people rate usefulness of a review/ What factors influence helpfulness of a review. -Fake reviews/ outliers. -Best rated product IDs, or similarity between products based on reviews alone (not the best idea ikr). -Any other interesting analysis
Facebook
TwitterA small subset of dataset of product reviews from Amazon Kindle Store category.
5-core dataset of product reviews from Amazon Kindle Store category from May 1996 - July 2014. Contains total of 982619 entries. Each reviewer has at least 5 reviews and each product has at least 5 reviews in this dataset.
asin - ID of the product, like B000FA64PKhelpful - helpfulness rating of the review - example: 2/3.overall - rating of the product.reviewText - text of the review (heading).reviewTime - time of the review (raw).reviewerID - ID of the reviewer, like A3SPTOKDG7WBLNreviewerName - name of the reviewer.summary - summary of the review (description).unixReviewTime - unix timestamp.The dataset originally contained a json file of the reviews, but some people had issues opening it and getting it to work so I've added a csv file which contains same data. You can use whichever one is easier to work with.
This dataset is taken from Amazon product data, Julian McAuley, UCSD website. http://jmcauley.ucsd.edu/data/amazon/
License to the data files belong to them.
Facebook
TwitterThese datasets contain attributes about products sold on ModCloth and Amazon which may be sources of bias in recommendations (in particular, attributes about how the products are marketed). Data also includes user/item interactions for recommendation.
Metadata includes
ratings
product images
user identities
item sizes, user genders
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
One of the most important problems in e-commerce is the correct calculation of the points given to after-sales products. The solution to this problem is to provide greater customer satisfaction for the e-commerce site, product prominence for sellers, and a seamless shopping experience for buyers. Another problem is the correct ordering of the comments given to the products. The prominence of misleading comments will cause both financial losses and customer losses. In solving these 2 basic problems, e-commerce site and sellers will increase their sales, while customers will complete their purchasing journey without any problems.
This dataset consists of ranking product ratings and reviews on Amazon. Please review this notebook to observe how I came up with this dataset This dataset containing Amazon Product Data includes product categories and various metadata.
The product with the most comments in the electronics category has user ratings and comments. In this way, we expect you to perform sentiment analysis with your specific methods.
Facebook
TwitterDataset of Amazon Reviews for Electronics Category.
5-core dataset of product reviews from Amazon Electronics category from May 1996 - July 2014. Contains total of 1689188 entries. Each reviewer has at least 5 reviews and each product has at least 5 reviews in this dataset.
Columns are: asin - ID of the product, like B000FA64PK helpful - helpfulness rating of the review - example: 2/3. overall - rating of the product. reviewText - text of the review (heading). reviewTime - time of the review (raw). reviewerID - ID of the reviewer, like A3SPTOKDG7WBLN reviewerName - name of the reviewer. summary - summary of the review (description). unixReviewTime - unix timestamp.
This dataset is taken from Amazon product data, Julian McAuley, UCSD website. http://jmcauley.ucsd.edu/data/amazon/
License to the data files belongs to them.
Facebook
TwitterWe introduce a new dataset and propose a method that combines information retrieval techniques for selecting relevant reviews (given a question) and "reading comprehension" models for synthesizing an answer (given a question and review). Our dataset consists of 923k questions, 3.6M answers and 14M reviews across 156k products. Building on the well-known Amazon dataset, we collect additional annotations, marking each question as either answerable or unanswerable based on the available reviews.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains ~3000 Amazon Product Reviews from ~200 products in the Electronics department in order to better understand the sentiment of products listed on Amazon. Further analysis and the source code for curating this dataset can be found in the Github Repository below:
Github Repository for the corresponding code: https://github.com/laxman-22/Amazon-Product-Reviews-Sentiment-Analysis/tree/main
Facebook
TwitterThis data set comprises a labeled training set, validation samples, and testing samples for ordinal quantification. The goal of quantification is not to predict the class label of each individual instance, but the distribution of labels in unlabeled sets of data. The data is extracted from the McAuley data set of product reviews in Amazon, where the goal is to predict the 5-star rating of each textual review. We have sampled this data according to three protocols that are suited for quantification research. The first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ(50%), is a variant thereof, where only the smoothest 50% of all APP samples are considered. This variant is targeted at ordinal quantification, where classes are ordered and a similarity of neighboring classes can be assumed. 5-star ratings of product reviews lie on an ordinal scale and, hence, pose such an ordinal quantification task. The third protocol considers "real" distributions of labels. These distributions stem from actual products in the original data set. The data is represented by a RoBERTa embedding. In our experience, logistic regression classifiers work well with this representation. You can extract our data sets yourself, for instance, if you require a raw textual representation. The original McAuley data set is public already and we provide all of our extraction scripts. Extraction scripts and experiments: https://github.com/mirkobunse/regularized-oq Original data by McAuley: https://jmcauley.ucsd.edu/data/amazon/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The diverse types of fake text generation practices by spammer make spam detection challenging. Existing works use manually designed discrete textual or behavior features, which cannot capture complex global semantics of text and reviews. Some studies use limited features while neglecting other significant features. However, in case of a large number of features set, the selection of all features leads to overfitting the model and expensive computation. The problem statement of this research paper revolves around addressing challenges concerning feature selection and evolving spammer behavior and linguistic features, with the goal of devising an efficient model for spam detection. The primary objective of this endeavor was to identify the most efficacious subset of features and patterns for the task of spam detection. Spammer behavior features and linguistic features often exhibit complex relationships that influence the nature of spam reviews. The unified representation of features is another challenging task in spam detection. Various deep learning approaches have been proposed for spam detection and classification but these methods are specialized in extracting the features but lack to capture feature dependencies effectively with other features but there is a lack of comprehensive models that integrate linguistic and behavioral features to improve the accuracy of spam detection. The proposed spam detection framework SD-FSL-CLSTM used the fusion of spammer behavior features and linguistic features which automatically detect and classify the spam reviews. Fusion enables the proposed model to automatically learn the interactions between the features during the training process, allowing it to capture complex relationships and make predictions based on both types of features. SD-FSL-CLSTM framework apparently shows the promising result by obtaining a minimum accuracy 97%.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context😀 This is a small subset of dataset of headphone reviews from Amazon .
Content: this dataset has 6 columns🎧🎧🎧🎧 1. Customer Name --name of customer who buy the product 2. REVIEW_TITLE-- review in short 3. Color-- color of the product 4. REVIEW_DATE -- date when customer gives rating for eg: 05-Sep-21 5. COMMENTS-- customers comment what are feeling of customer about product 6. RATINGS -- how customer rate out of 5 star for eg: 4/5
Which file to use? There is only one files one is preprocessed ready for sentiment analysis
Acknowledgements This dataset is taken from Amazon product data, https://www.amazon.in/boat-headphones/s?k=boat+headphones
License to the data files belong to them.
Inspiration -Sentiment analysis on reviews. -Understanding how people rate usefulness of a review/ What factors influence helpfulness of a review. -Fake reviews/ outliers.
Facebook
TwitterThis dataset consists of product details from amazon. The details include product and user information (added productName), ratings, and a plain text review. It also includes reviews from all other Amazon categories.
Comprises of a very small and simple subset of the wide number of food products in Amazon, but it will suffice for working on some simple projects/projects that needs the product name to be present
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Amazon.com, Inc. is an American multinational technology company based in Seattle, Washington, which focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. This dataset contains daily data on the top 100 most popular products based on sales. It will be updated on a weekly basis. The data in this dataset was extracted from Amazon Best Sellers page: https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/
As of now, the dataset consists of data for February-July 2021 and 6 features: - Date - Number in rating - Product name - Rating - Number of reviews - Price
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is a comprehensive collection of information about books, designed for use in recommendation systems and chatbot development. It includes details about a wide range of books, making it suitable for various applications in the fields of machine learning, natural language processing, and artificial intelligence.
Key Features Book Information: Each entry contains details such as title, author, genre, publication year, and synopsis.
User Ratings: User-generated ratings and reviews, allowing for collaborative filtering and personalized recommendations.
Cover Images: URLs or file paths to cover images for visual representation.
Links and References: Links to external databases, Amazon pages, or other relevant sources for more in-depth information.
Potential Use Cases Recommendation Systems: Utilize the dataset to build and train recommendation models for suggesting books based on user preferences.
Chatbot Development: Enhance chatbots with a rich source of book-related information, enabling more engaging and context-aware conversations.
Natural Language Processing (NLP): Use the dataset for text analysis, sentiment analysis, and other NLP tasks related to book reviews and synopses.
Data Exploration and Analysis: Explore trends in book preferences, popular genres, and author popularity.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThis Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper: