Amazon Prime is constantly growing in the United States: as of December 2019, there were an estimated 112 million U.S. Amazon Prime subscribers, up from 95 million in June 2018. On average, Amazon Prime members spent 1,400 U.S. dollars on the e-retail platform per year. March 2019 data also states that non-Prime members only spent 600 U.S. dollars annually. Amazon Prime Amazon Prime is a paid subscription service offered by online retail platform Amazon. The subscription includes services such as music and video streaming, free two-day (or faster) shipping, as well as many other benefits. The program was launched in 2005 and is available internationally. In 2019, Amazon generated 19.21 billion U.S. dollars in revenues through its subscription services segment. Subscription services do not only include Amazon Prime revenues, but also audiobook, e-book, digital video, digital music and other non-AWS subscription services. Prime shoppers The most popular product categories purchased by Amazon Prime shoppers in the United States were electronics, apparel, and home and kitchen goods. Amazon Prime shoppers are more engaged that non-members: during a February 2019 survey, 20 percent of Amazon Prime members stated that they shopped on Amazon a few times per week, with seven percent saying that they did so on an (almost) daily basis.
For the first time since 2016, the number of Amazon Prime members has declined in the United States. In the first quarter of 2023, *** million users had a Prime account on Amazon, down by roughly ***** compared to the same quarter in 2022. In 2024, it picked up again, reaching *** million subscribers.
Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.
Dataset Statistics
# Nodes | %Fraud Nodes (Class=1) |
---|---|
11,944 | 9.5 |
Relation | # Edges |
---|---|
U-P-U | |
U-S-U | |
U-V-U | 1,036,737 |
All |
Graph Construction
The Amazon dataset includes product reviews under the Musical Instruments category. Similar to this paper, we label users with more than 80% helpful votes as benign entities and users with less than 20% helpful votes as fraudulent entities. we conduct a fraudulent user detection task on the Amazon-Fraud dataset, which is a binary classification task. We take 25 handcrafted features from this paper as the raw node features for Amazon-Fraud. We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within one week; 3) U-V-U: it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.
To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by sriya reddy
Released under Apache 2.0
https://brightdata.com/licensehttps://brightdata.com/license
Gain extensive insights with our Amazon datasets, encompassing detailed product information including pricing, reviews, ratings, brand names, product categories, sellers, ASINs, images, and much more. Ideal for market researchers, data analysts, and eCommerce professionals looking to excel in the competitive online marketplace. Over 425M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:
Title Asin Main Image Brand Name Description Availability Subcategory Categories Parent Asin Type Product Type Name Model Number Manufacturer Color Size Date First Available Released Model Year Item Model Number Part Number Price Total Reviews Total Ratings Average Rating Features Best Sellers Rank Subcategory Buybox Buybox Seller Id Buybox Is Amazon Images Product URL And more
This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
The number of households in the United Kingdom with an Amazon Prime Video subscription increased to **** million in the first quarter of 2025. However, the service faces tough competition from the subscription video-on-demand (SVOD) giant Netflix, who reported **** million subscribers in the UK. Who has more movies and TV shows, Netflix or Amazon? Although Netflix is outperforming Amazon Prime Video in terms of household subscriptions, the latter does have a greater number of titles available. Amazon Prime Video had a total of ****** titles available in the UK as of late-2024, whereas Netflix had *****. The majority of Amazon Prime Video titles are movies in the UK. Amazon’s content spend Amazon Prime Video is an SVOD service owned and run by the online retailer Amazon. In 2023, Amazon’s video and music content budget amounted to an estimated **** billion U.S. dollars, an increase of over ** billion U.S. dollars compared with five years earlier.
In March 2022, Amazon Prime Video had over *** million unique visitors in India, a significant increase from the previous year. The global player had partnerships with Yash Raj Films, Dharma Productions, and T-Series.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Sümeyra
Released under MIT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
This dataset was created by Gautam Kumar
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recommender systems based on matrix factorization act as black-box models and are unable to explain the recommended items. After adding the neighborhood algorithm, the explainability is measured by the user's neighborhood recommendation, but the subjective explicit preference of the target user is ignored. To better combine the latent factors from matrix factorization and the target user's explicit preferences, an explainable recommender system based on reconstructed explanatory factors and multi-modal matrix factorization (ERS-REFMMF) is proposed. ERS-REFMMF is a two-layer model, and the underlying model decomposes the multi-modal scoring matrix to get the rich latent features of the user and the item based on the method of Funk-SVD, in which the multi-modal scoring matrix consists of the original matrix and the preference features and sentiment scores exhibited by users in the reviews corresponding to the ratings. The set of candidate items is obtained based on the latent features, and the explainability is reconstructed based on the subjective preference of the target user and the real recognition level of the neighbors. The upper layer is the multi-objective high-performance recommendation stage, in which the candidate set is optimized by a multi-objective evolutionary algorithm to bring the user a final recommendation list that is accurate, recallable, diverse, and interpretable, in which the accuracy and recall are represented by F1-measure. Experimental results on three real datasets from Amazon show that the proposed model is competitive compared to existing recommendation methods in both stages.
Browse Amazon.com Inc (AMZN) market data. Get instant pricing estimates and make batch downloads of binary, CSV, and JSON flat files.
Consolidated last sale, exchange BBO and national BBO across all US equity options exchanges. Includes single name stock options (e.g. TSLA), options on ETFs (e.g. SPY, QQQ), index options (e.g. VIX), and some indices (e.g. SPIKE and VSPKE). This dataset is based on the newer, binary OPRA feed after the migration to SIAC's OPRA Pillar SIP in 2021. OPRA is notable for the size of its data and we recommend users to anticipate several TBs of data per day for the full dataset in its highest granularity (MBP-1).
Origin: Options Price Reporting Authority
Supported data encodings: DBN, JSON, CSV Learn more
Supported market data schemas: MBP-1, OHLCV-1s, OHLCV-1m, OHLCV-1h, OHLCV-1d, TBBO, Trades, Statistics, Definition Learn more
Resolution: Immediate publication, nanosecond-resolution timestamps
This dataset is a collection of customer reviews obtained from Amazon.com. It is designed for multilingual sentiment analysis and opinion mining, containing reviews in five different languages: Italian, German, Spanish, French, and English. The dataset is valuable for natural language processing tasks, sentiment analysis algorithms, and various machine learning applications that require diverse language data for training and evaluation. It can be used to train and fine-tune models to automatically classify sentiments, predict customer satisfaction, and extract key information from customer reviews.
The dataset is typically provided in a CSV file format. While specific total row counts are not available, examples of column value distributions are present, such as 675 total values for user names and 640 total values for star ratings, with 92% being 5/5 reviews. The dataset is structured to support various text and NLP applications.
This dataset is ideal for a range of applications, including: * Multilingual sentiment analysis. * Opinion mining studies. * Developing and testing natural language processing tasks. * Building sentiment analysis algorithms. * Training machine learning models to classify sentiments. * Predicting customer satisfaction from review data. * Extracting key insights and information from customer feedback.
The dataset's coverage is global, drawing reviews from Amazon.com. It includes content in Italian, German, Spanish, French, and English, indicating its relevance to regions where these languages are spoken. The dataset contains a 'date' column for each review; however, a specific time range for the reviews themselves is not provided.
CC-BY-NC
This dataset is suitable for: * Data Scientists and Researchers: For developing and testing machine learning models for sentiment analysis, NLP, and text classification across multiple languages. * E-commerce Analysts: To understand customer satisfaction, product performance, and market sentiment from user reviews. * Language Model Developers: To fine-tune large language models with diverse text data for improved natural language understanding. * Businesses: To gain insights into customer feedback and improve product or service offerings.
Original Data Source: Amazon Review Dataset LLM
This dataset provides comprehensive real-time data from Amazon's global marketplaces. It includes detailed product information, reviews, seller profiles, best sellers, deals, influencers, and more across all Amazon domains worldwide. The data covers product attributes like pricing, availability, specifications, reviews and ratings, as well as seller information including profiles, contact details, and performance metrics. Users can leverage this dataset for price monitoring, competitive analysis, market research, and building e-commerce applications. The API enables real-time access to Amazon's vast product catalog and marketplace data, helping businesses make data-driven decisions about pricing, inventory, and market positioning. Whether you're conducting market analysis, tracking competitors, or building e-commerce tools, this dataset provides current and reliable Amazon marketplace data. The dataset is delivered in a JSON format via REST API.
SUMMARY:
Vumonic provides its clients email receipt datasets on weekly, monthly, or quarterly subscriptions, for any online consumer vertical. We gain consent-based access to our users' email inboxes through our own proprietary apps, from which we gather and extract all the email receipts and put them into a structured format for consumption of our clients. We currently have over 1M users in our India panel.
If you are not familiar with email receipt data, it provides item and user-level transaction information (all PII-wiped), which allows for deep granular analysis of things like marketshare, growth, competitive intelligence, and more.
VERTICALS:
PRICING/QUOTE:
Our email receipt data is priced market-rate based on the requirement. To give a quote, all we need to know is:
Send us over this info and we can answer any questions you have, provide sample, and more.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data set provides de-identified population data for diabetes and hyperlipidemia comorbidity prevalence. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and 2016 calendar years.
Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.
Support for Health Equity datasets and tools provided by Amazon Web Services (AWS) through their Health Equity Initiative.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
HBO originally launched Max at a time when almost every cable TV conglomerate was releasing their own streaming service, to compete with Netflix and Amazon Prime Video. In Warner Bros case, it had...
This dataset was created by pothula swaraj
This dataset contains longitudinal purchases data from 5027 Amazon.com users in the US, spanning 2018 through 2022: amazon-purchases.csv It also includes demographic data and other consumer level variables for each user with data in the dataset. These consumer level variables were collected through an online survey and are included in survey.csv fields.csv describes the columns in the survey.csv file, where fields/survey columns correspond to survey questions. The dataset also contains the survey instrument used to collect the data. More details about the survey questions and possible responses, and the format in which they were presented can be found by viewing the survey instrument. A 'Survey ResponseID' column is present in both the amazon-purchases.csv and survey.csv files. It links a user's survey responses to their Amazon.com purchases. The 'Survey ResponseID' was randomly generated at the time of data collection. amazon-purchases.csv Each row in this file corresponds to an Amazon order. Each such row has the following columns: Survey ResponseID Order date Shipping address state Purchase price per unit Quantity ASIN/ISBN (Product Code) Title Category The data were exported by the Amazon users from Amazon.com and shared by users with their informed consent. PII and other information not listed above were stripped from the data. This processing occurred on users' machines before sharing with researchers.
Amazon Prime is constantly growing in the United States: as of December 2019, there were an estimated 112 million U.S. Amazon Prime subscribers, up from 95 million in June 2018. On average, Amazon Prime members spent 1,400 U.S. dollars on the e-retail platform per year. March 2019 data also states that non-Prime members only spent 600 U.S. dollars annually. Amazon Prime Amazon Prime is a paid subscription service offered by online retail platform Amazon. The subscription includes services such as music and video streaming, free two-day (or faster) shipping, as well as many other benefits. The program was launched in 2005 and is available internationally. In 2019, Amazon generated 19.21 billion U.S. dollars in revenues through its subscription services segment. Subscription services do not only include Amazon Prime revenues, but also audiobook, e-book, digital video, digital music and other non-AWS subscription services. Prime shoppers The most popular product categories purchased by Amazon Prime shoppers in the United States were electronics, apparel, and home and kitchen goods. Amazon Prime shoppers are more engaged that non-members: during a February 2019 survey, 20 percent of Amazon Prime members stated that they shopped on Amazon a few times per week, with seven percent saying that they did so on an (almost) daily basis.