17 datasets found
  1. u

    Amazon review data 2018

    • cseweb.ucsd.edu
    • nijianmo.github.io
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
    Explore at:
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    Context

    This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

    • More reviews:

      • The total number of reviews is 233.1 million (142.8 million in 2014).
    • New reviews:

      • Current data includes reviews in the range May 1996 - Oct 2018.
    • Metadata: - We have added transaction metadata for each review shown on the review page.

      • Added more detailed metadata of the product landing page.

    Acknowledgements

    If you publish articles based on this dataset, please cite the following paper:

    • Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
  2. U.S. consumers confident in having seen fake reviews on Amazon 2024

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). U.S. consumers confident in having seen fake reviews on Amazon 2024 [Dataset]. https://www.statista.com/statistics/997026/amazon-shopping-categories-largest-share-fake-product-reviews/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2024, ** percent of U.S. consumers answering a survey were confident in having seen fake product reviews on Amazon. Although the number might seem very high, the figure has decreased compared to 2023, when ** percent of respondents stated the same.

  3. Fake reviews Amazon

    • kaggle.com
    zip
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Stephens (2025). Fake reviews Amazon [Dataset]. https://www.kaggle.com/datasets/alexanderstephens440/fake-reviews-amazon
    Explore at:
    zip(5187845 bytes)Available download formats
    Dataset updated
    Apr 7, 2025
    Authors
    Alexander Stephens
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Alexander Stephens

    Released under Database: Open Database, Contents: © Original Authors

    Contents

    Fake reviews from Amazon

  4. Amazon Fashion 800k+ User Reviews Dataset

    • kaggle.com
    zip
    Updated Mar 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fawad Hossaini 1415 (2024). Amazon Fashion 800k+ User Reviews Dataset [Dataset]. https://www.kaggle.com/datasets/fawadhossaini1415/amazon-fashion-800k-user-reviews-dataset
    Explore at:
    zip(103546328 bytes)Available download formats
    Dataset updated
    Mar 25, 2024
    Authors
    Fawad Hossaini 1415
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises a large-scale collection of Amazon Reviews, gathered in 2023 by the McAuley Lab. Specifically, it focuses on the Amazon Fashion category, containing a total of 800K+ user reviews. It serves as a valuable resource for conducting sentiment analysis.

    We will categorize rating ranges into ternary classes as follow: • Ratings from 1 to 2: Negative (-1) • Ratings of 3: Neutral (0) • Ratings from 4 to 5: Positive (1)

    Total Positive Sentiments: 346924 Total Negative Sentiments: 346924 Total Neutral Sentiments: 173462

    This dataset is suitable for binary classification as it already contains a balance between positive and negative sentiments. However, for ternary classification, it's necessary to balance the target values using undersampling or oversampling techniques.

    The dataset encompasses the following fields: 1. Rating: 1.0 to 5.0 2. title: title of the user review 3. text: Text body of the user review 4. images: Images that users post after they have received the product. Each image has different sizes (small, medium, large), represented by the small_image_url, medium_image_url, and large_image_url respectively 5. asin: ID of the product 6. parent_asin: Parent ID of the product. Note: Products with different colors, styles, sizes usually belong to the same parent ID. The “asin” in previous Amazon datasets is actually parent ID. Please use parent ID to find product meta 7. user_id: ID of the reviewer 8. timestamp": Time of the review (Unix time) 9. helpful_vote: Helpful votes of the review 10. verified_purchase: User purchase verification 11. target: Labels for text reviews, where Positive (1), Negative (-1), and Neutral (0) represent the related sentiments.

    Dataset DOI: https://doi.org/10.48550/arXiv.2403.03952 Cite Article: Hou et al. (2024) proposed a method for bridging language and items for retrieval and recommendation

  5. Amazon Kindle Book Review for Sentiment Analysis

    • kaggle.com
    zip
    Updated Sep 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meet Nagadia (2021). Amazon Kindle Book Review for Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/meetnagadia/amazon-kindle-book-review-for-sentiment-analysis
    Explore at:
    zip(6686485 bytes)Available download formats
    Dataset updated
    Sep 3, 2021
    Authors
    Meet Nagadia
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Context

    This is a small subset of dataset of Book reviews from Amazon Kindle Store category.

    Content

    5-core dataset of product reviews from Amazon Kindle Store category from May 1996 - July 2014. Contains total of 982619 entries. Each reviewer has at least 5 reviews and each product has at least 5 reviews in this dataset. Columns - asin - ID of the product, like B000FA64PK -helpful - helpfulness rating of the review - example: 2/3. -overall - rating of the product. -reviewText - text of the review (heading). -reviewTime - time of the review (raw). -reviewerID - ID of the reviewer, like A3SPTOKDG7WBLN -reviewerName - name of the reviewer. -summary - summary of the review (description). -unixReviewTime - unix timestamp.

    Which file to use?

    There are two files one is preprocessed ready for sentiment analysis and other is unprocessed to you basically have to process the dataset and then perform sentiment analysis

    Acknowledgements

    This dataset is taken from Amazon product data, Julian McAuley, UCSD website. http://jmcauley.ucsd.edu/data/amazon/

    License to the data files belong to them.

    Inspiration

    -Sentiment analysis on reviews. -Understanding how people rate usefulness of a review/ What factors influence helpfulness of a review. -Fake reviews/ outliers. -Best rated product IDs, or similarity between products based on reviews alone (not the best idea ikr). -Any other interesting analysis

  6. Amazon reviews: Kindle Store Category

    • kaggle.com
    zip
    Updated May 22, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bharadwaj Srigiriraju (2018). Amazon reviews: Kindle Store Category [Dataset]. https://www.kaggle.com/bharadwaj6/kindle-reviews
    Explore at:
    zip(550036223 bytes)Available download formats
    Dataset updated
    May 22, 2018
    Authors
    Bharadwaj Srigiriraju
    Description

    Context

    A small subset of dataset of product reviews from Amazon Kindle Store category.

    Content

    5-core dataset of product reviews from Amazon Kindle Store category from May 1996 - July 2014. Contains total of 982619 entries. Each reviewer has at least 5 reviews and each product has at least 5 reviews in this dataset.

    Columns

    • asin - ID of the product, like B000FA64PK
    • helpful - helpfulness rating of the review - example: 2/3.
    • overall - rating of the product.
    • reviewText - text of the review (heading).
    • reviewTime - time of the review (raw).
    • reviewerID - ID of the reviewer, like A3SPTOKDG7WBLN
    • reviewerName - name of the reviewer.
    • summary - summary of the review (description).
    • unixReviewTime - unix timestamp.

    Which file to use?

    The dataset originally contained a json file of the reviews, but some people had issues opening it and getting it to work so I've added a csv file which contains same data. You can use whichever one is easier to work with.

    Acknowledgements

    This dataset is taken from Amazon product data, Julian McAuley, UCSD website. http://jmcauley.ucsd.edu/data/amazon/

    License to the data files belong to them.

    Inspiration

    • Sentiment analysis on reviews.
    • Understanding how people rate usefulness of a review/ What factors influence helpfulness of a review.
    • Fake reviews/ outliers.
    • best rated product IDs, or similarity between products based on reviews alone (not the best idea ikr).
    • Any other interesting analysis.
  7. u

    Marketing Bias data

    • cseweb.ucsd.edu
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Marketing Bias data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    These datasets contain attributes about products sold on ModCloth and Amazon which may be sources of bias in recommendations (in particular, attributes about how the products are marketed). Data also includes user/item interactions for recommendation.

    Metadata includes

    • ratings

    • product images

    • user identities

    • item sizes, user genders

  8. amazon reviews for sentiment analysis

    • kaggle.com
    zip
    Updated Jul 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarık Kaan Koç (2022). amazon reviews for sentiment analysis [Dataset]. https://www.kaggle.com/datasets/tarkkaanko/amazon/data
    Explore at:
    zip(595680 bytes)Available download formats
    Dataset updated
    Jul 21, 2022
    Authors
    Tarık Kaan Koç
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    One of the most important problems in e-commerce is the correct calculation of the points given to after-sales products. The solution to this problem is to provide greater customer satisfaction for the e-commerce site, product prominence for sellers, and a seamless shopping experience for buyers. Another problem is the correct ordering of the comments given to the products. The prominence of misleading comments will cause both financial losses and customer losses. In solving these 2 basic problems, e-commerce site and sellers will increase their sales, while customers will complete their purchasing journey without any problems.

    This dataset consists of ranking product ratings and reviews on Amazon. Please review this notebook to observe how I came up with this dataset This dataset containing Amazon Product Data includes product categories and various metadata.

    What is expected of you?

    The product with the most comments in the electronics category has user ratings and comments. In this way, we expect you to perform sentiment analysis with your specific methods.

  9. Amazon Electronics Reviews

    • kaggle.com
    zip
    Updated Mar 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivam Parab (2021). Amazon Electronics Reviews [Dataset]. https://kaggle.com/shivamparab/amazon-electronics-reviews
    Explore at:
    zip(506763411 bytes)Available download formats
    Dataset updated
    Mar 1, 2021
    Authors
    Shivam Parab
    Description

    Context

    Dataset of Amazon Reviews for Electronics Category.

    Content

    5-core dataset of product reviews from Amazon Electronics category from May 1996 - July 2014. Contains total of 1689188 entries. Each reviewer has at least 5 reviews and each product has at least 5 reviews in this dataset.

    Columns are: asin - ID of the product, like B000FA64PK helpful - helpfulness rating of the review - example: 2/3. overall - rating of the product. reviewText - text of the review (heading). reviewTime - time of the review (raw). reviewerID - ID of the reviewer, like A3SPTOKDG7WBLN reviewerName - name of the reviewer. summary - summary of the review (description). unixReviewTime - unix timestamp.

    Acknowledgements

    This dataset is taken from Amazon product data, Julian McAuley, UCSD website. http://jmcauley.ucsd.edu/data/amazon/

    License to the data files belongs to them.

    Inspiration

    • Sentiment analysis on reviews.
    • Understanding how people rate the usefulness of a review/ What factors influence helphelpfulness of a review.
    • Fake reviews/ outliers.
    • Any other interesting analysis.
  10. b

    AmazonQA

    • berd-platform.de
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mansi Gupta; Nitish Kulkarni; Raghuveer Chanda; Anirudha Rayasam; Zachary C. Lipton; Mansi Gupta; Nitish Kulkarni; Raghuveer Chanda; Anirudha Rayasam; Zachary C. Lipton (2025). AmazonQA [Dataset]. http://doi.org/10.82939/svjqd-94a72
    Explore at:
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
    Authors
    Mansi Gupta; Nitish Kulkarni; Raghuveer Chanda; Anirudha Rayasam; Zachary C. Lipton; Mansi Gupta; Nitish Kulkarni; Raghuveer Chanda; Anirudha Rayasam; Zachary C. Lipton
    Description

    We introduce a new dataset and propose a method that combines information retrieval techniques for selecting relevant reviews (given a question) and "reading comprehension" models for synthesizing an answer (given a question and review). Our dataset consists of 923k questions, 3.6M answers and 14M reviews across 156k products. Building on the well-known Amazon dataset, we collect additional annotations, marking each question as either answerable or unanswerable based on the available reviews.

  11. Amazon Product Reviews for Sentiment Analysis

    • kaggle.com
    zip
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aryan Laxman Sirohi (2024). Amazon Product Reviews for Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/laxman22/amazon-product-reviews-for-sentiment-analysis/discussion
    Explore at:
    zip(1078096 bytes)Available download formats
    Dataset updated
    Nov 25, 2024
    Authors
    Aryan Laxman Sirohi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains ~3000 Amazon Product Reviews from ~200 products in the Electronics department in order to better understand the sentiment of products listed on Amazon. Further analysis and the source code for curating this dataset can be found in the Github Repository below:

    Github Repository for the corresponding code: https://github.com/laxman-22/Amazon-Product-Reviews-Sentiment-Analysis/tree/main

  12. Product Reviews for Ordinal Quantification

    • data.europa.eu
    • data.niaid.nih.gov
    • +1more
    unknown
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Product Reviews for Ordinal Quantification [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8176791?locale=nl
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    Description

    This data set comprises a labeled training set, validation samples, and testing samples for ordinal quantification. The goal of quantification is not to predict the class label of each individual instance, but the distribution of labels in unlabeled sets of data. The data is extracted from the McAuley data set of product reviews in Amazon, where the goal is to predict the 5-star rating of each textual review. We have sampled this data according to three protocols that are suited for quantification research. The first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ(50%), is a variant thereof, where only the smoothest 50% of all APP samples are considered. This variant is targeted at ordinal quantification, where classes are ordered and a similarity of neighboring classes can be assumed. 5-star ratings of product reviews lie on an ordinal scale and, hence, pose such an ordinal quantification task. The third protocol considers "real" distributions of labels. These distributions stem from actual products in the original data set. The data is represented by a RoBERTa embedding. In our experience, logistic regression classifiers work well with this representation. You can extract our data sets yourself, for instance, if you require a raw textual representation. The original McAuley data set is public already and we provide all of our extraction scripts. Extraction scripts and experiments: https://github.com/mirkobunse/regularized-oq Original data by McAuley: https://jmcauley.ucsd.edu/data/amazon/

  13. Amazon dataset used in the proposed framework.

    • plos.figshare.com
    xls
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amna Iqbal; Muhammad Younas; Muhammad Kashif Hanif; Muhammad Murad; Rabia Saleem; Muhammad Aater Javed (2025). Amazon dataset used in the proposed framework. [Dataset]. http://doi.org/10.1371/journal.pone.0313628.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Amna Iqbal; Muhammad Younas; Muhammad Kashif Hanif; Muhammad Murad; Rabia Saleem; Muhammad Aater Javed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The diverse types of fake text generation practices by spammer make spam detection challenging. Existing works use manually designed discrete textual or behavior features, which cannot capture complex global semantics of text and reviews. Some studies use limited features while neglecting other significant features. However, in case of a large number of features set, the selection of all features leads to overfitting the model and expensive computation. The problem statement of this research paper revolves around addressing challenges concerning feature selection and evolving spammer behavior and linguistic features, with the goal of devising an efficient model for spam detection. The primary objective of this endeavor was to identify the most efficacious subset of features and patterns for the task of spam detection. Spammer behavior features and linguistic features often exhibit complex relationships that influence the nature of spam reviews. The unified representation of features is another challenging task in spam detection. Various deep learning approaches have been proposed for spam detection and classification but these methods are specialized in extracting the features but lack to capture feature dependencies effectively with other features but there is a lack of comprehensive models that integrate linguistic and behavioral features to improve the accuracy of spam detection. The proposed spam detection framework SD-FSL-CLSTM used the fusion of spammer behavior features and linguistic features which automatically detect and classify the spam reviews. Fusion enables the proposed model to automatically learn the interactions between the features during the training process, allowing it to capture complex relationships and make predictions based on both types of features. SD-FSL-CLSTM framework apparently shows the promising result by obtaining a minimum accuracy 97%.

  14. HEADPHONE DATASET REVIEW ANALYSIS

    • kaggle.com
    zip
    Updated Jul 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Waquar Azam (2022). HEADPHONE DATASET REVIEW ANALYSIS [Dataset]. https://www.kaggle.com/datasets/mdwaquarazam/headphone-dataset-review-analysis/discussion
    Explore at:
    zip(77958 bytes)Available download formats
    Dataset updated
    Jul 1, 2022
    Authors
    Md Waquar Azam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context😀 This is a small subset of dataset of headphone reviews from Amazon .

    Content: this dataset has 6 columns🎧🎧🎧🎧 1. Customer Name --name of customer who buy the product 2. REVIEW_TITLE-- review in short 3. Color-- color of the product 4. REVIEW_DATE -- date when customer gives rating for eg: 05-Sep-21 5. COMMENTS-- customers comment what are feeling of customer about product 6. RATINGS -- how customer rate out of 5 star for eg: 4/5

    Which file to use? There is only one files one is preprocessed ready for sentiment analysis

    Acknowledgements This dataset is taken from Amazon product data, https://www.amazon.in/boat-headphones/s?k=boat+headphones

    License to the data files belong to them.

    Inspiration -Sentiment analysis on reviews. -Understanding how people rate usefulness of a review/ What factors influence helpfulness of a review. -Fake reviews/ outliers.

  15. Amazon Food

    • kaggle.com
    zip
    Updated Feb 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AIStct (2022). Amazon Food [Dataset]. https://www.kaggle.com/datasets/aistct/amazonfood
    Explore at:
    zip(8223533 bytes)Available download formats
    Dataset updated
    Feb 3, 2022
    Authors
    AIStct
    Description

    Context

    This dataset consists of product details from amazon. The details include product and user information (added productName), ratings, and a plain text review. It also includes reviews from all other Amazon categories.

    Content

    Comprises of a very small and simple subset of the wide number of food products in Amazon, but it will suffice for working on some simple projects/projects that needs the product name to be present

  16. Amazon Top 100 Best Sellers in Electronics 2021

    • kaggle.com
    zip
    Updated Aug 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Pastushko (2021). Amazon Top 100 Best Sellers in Electronics 2021 [Dataset]. https://www.kaggle.com/annpastushko/amazon-top-100-best-sellers-in-electronics-2021
    Explore at:
    zip(241622 bytes)Available download formats
    Dataset updated
    Aug 5, 2021
    Authors
    Anna Pastushko
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Amazon.com, Inc. is an American multinational technology company based in Seattle, Washington, which focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. This dataset contains daily data on the top 100 most popular products based on sales. It will be updated on a weekly basis. The data in this dataset was extracted from Amazon Best Sellers page: https://www.amazon.com/Best-Sellers-Electronics/zgbs/electronics/

    Content

    As of now, the dataset consists of data for February-July 2021 and 6 features: - Date - Number in rating - Product name - Rating - Number of reviews - Price

  17. Books Dataset

    • kaggle.com
    zip
    Updated Dec 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdallah Wagih Ibrahim (2023). Books Dataset [Dataset]. https://www.kaggle.com/datasets/abdallahwagih/books-dataset
    Explore at:
    zip(1657228 bytes)Available download formats
    Dataset updated
    Dec 13, 2023
    Authors
    Abdallah Wagih Ibrahim
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset is a comprehensive collection of information about books, designed for use in recommendation systems and chatbot development. It includes details about a wide range of books, making it suitable for various applications in the fields of machine learning, natural language processing, and artificial intelligence.

    Key Features Book Information: Each entry contains details such as title, author, genre, publication year, and synopsis.

    User Ratings: User-generated ratings and reviews, allowing for collaborative filtering and personalized recommendations.

    Cover Images: URLs or file paths to cover images for visual representation.

    Links and References: Links to external databases, Amazon pages, or other relevant sources for more in-depth information.

    Potential Use Cases Recommendation Systems: Utilize the dataset to build and train recommendation models for suggesting books based on user preferences.

    Chatbot Development: Enhance chatbots with a rich source of book-related information, enabling more engaging and context-aware conversations.

    Natural Language Processing (NLP): Use the dataset for text analysis, sentiment analysis, and other NLP tasks related to book reviews and synopses.

    Data Exploration and Analysis: Explore trends in book preferences, popular genres, and author popularity.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/

Amazon review data 2018

Explore at:
93 scholarly articles cite this dataset (View in Google Scholar)
Dataset authored and provided by
UCSD CSE Research Project
Description

Context

This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

  • More reviews:

    • The total number of reviews is 233.1 million (142.8 million in 2014).
  • New reviews:

    • Current data includes reviews in the range May 1996 - Oct 2018.
  • Metadata: - We have added transaction metadata for each review shown on the review page.

    • Added more detailed metadata of the product landing page.

Acknowledgements

If you publish articles based on this dataset, please cite the following paper:

  • Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
Search
Clear search
Close search
Google apps
Main menu