These datasets contain 1.48 million question and answer pairs about products from Amazon.
Metadata includes
question and answer text
is the question binary (yes/no), and if so does it have a yes/no answer?
timestamps
product ID (to reference the review dataset)
Basic Statistics:
Questions: 1.48 million
Answers: 4,019,744
Labeled yes/no questions: 309,419
Number of unique products with questions: 191,185
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
The Amazon Reviews Polarity Dataset discloses eighteen years of customers' ratings and reviews from Amazon.com, offering an unparalleled trove of insight and knowledge. Drawing from the immense pool of over 35 million customer reviews, this dataset presents a broad spectrum of customer opinions on products they have bought or used. This invaluable data is a gold mine for improving products and services as it contains comprehensive information regarding customers' experiences with a product including ratings, titles, and plaintext content. At the same time, this dataset contains both customer-specific data along with product information which encourages deep analytics that could lead to great advances in providing tailored solutions for customers. Has your product been favored by the majority? Are there any aspects that need extra care? Use Amazon Reviews Polarity to gain deeper insights into what your customers want - explore now!
For more datasets, click here.
- đ¨ Your notebook can be here! đ¨!
- Analyze customer ratings to identify trends: Take a look at how many customers have rated the same product or service with the same score (e.g., 4 stars). You can use this information to identify what customers like or donât like about it by examining common sentiment throughout the reviews. Identifying these patterns can help you make decisions on which features of your products or services to emphasize in order to boost sales and satisfaction rates.
2 Review content analysis: Analyzing review content is one of the best ways to gauge customer sentiment toward specific features or aspects of a product/service. Using natural language processing tools such as Word2Vec, Latent Dirichlet Allocation (LDA), or even simple keyword search algorithms can quickly reveal general topics that are discussed in relation to your product/service across multiple reviews - allowing you quickly pinpoint areas that may need improvement for particular items within your lines of business.
3 Track associated scores over time: By tracking customer ratings overtime, you may be able to better understand when there has been an issue with something specific related to your product/service - such as negative response toward a feature that was introduced but didnât seem popular among customers and was removed shortly after introduction.. This can save time and money by identifying issues before they become widespread concerns with larger sets of consumers who invest their money in using your company's item(s).
4 Visualize sentiment data over time graphs : Utilizing visualizations such as bar graphs can help identify trends across different categories quicker than raw numbers alone; combining both numeric values along with color differences associated between different scores allows you spot anomalies easier - allowing faster resolution times when trying figure out why certain spikes occurred where other stayed stable (or vice-versa) when comparing similar data points through time-series based visualization models
- Developing a customer sentiment analysis system that can be used to quickly analyze the sentiment of reviews and identify any potential areas of improvement.
- Building a product recommendation service that takes into account the ratings and reviews of customers when recommending similar products they may be interested in purchasing.
- Training a machine learning model to accurately predict customersâ ratings on new products they have not yet tried and leverage this for further product development optimization initiatives
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description | |:--------------|:-------------------------------------------------------------------| | label | The sentiment of the review, either positive or negative. (String) | | title | The title of the review. (String) ...
This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Amazon Customer Reviews (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. Accordingly, we are releasing this data to further research in multiple disciplines related to understanding customer product experiences. Specifically, this dataset was constructed to represent a sample of customer evaluations and opinions, variation in the perception of a product across geographical regions, and promotional intent or bias in reviews.
Over 130+ million customer reviews are available to researchers as part of this release. The data is available in TSV files in the amazon-reviews-pds S3 bucket in AWS US East Region. Each line in the data files corresponds to an individual review (tab delimited, with no quote and escape characters).
Each Dataset contains the following columns:
⢠200K+ Seller Leads ⢠Seller Type: Brand/PL Seller, 1P/Amazon Vendor Central and 3P Sellers ⢠Selling Platforms: Amazon USA, UK, EU, CA, AU ⢠C-Suite/Marketing/Sales Contacts ⢠FBA/FBM Sellers ⢠Filter your leads by revenue, categories, location, SKU's and more ⢠100% manually researched and verified.
For over a decade, we have been manually collecting Amazon seller data from various data sources such as Amazon, LinkedIn, Google, and others. We specialize in getting valid data so you may conduct ads and begin selling without hesitation.
We designed our data packages for all types of organizations, thus they are reasonably priced. We are always trying to reduce our prices to better suit all of your requirements.
So, if youâre looking to reach out to your targeted Amazon sellers, now is the greatest time to do so and offer your goods, services, and promotions. You can get your targeted Amazon Sellers List with seller contact information.
Alternatively, if you provide Amazon Seller Names or IDs, we will conduct Custom Research and deliver the customized list to you.
Data Points Available:
Full Name Linkedin URL Direct Email Generic Phone Number Business Name and Address Company Website Seller IDs and URLs Revenue Seller Review Count Niche FBA/Non-FBA Country and More
DES is publishing the Amazon spend for state agencies collected through the Washington State Amazon Business account. The data set only includes closed orders. Any orders that are still in process or have been cancelled are not included. This data is for Fiscal Year 20 (July 1, 2019 to June 30, 2020)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
During the last few decades, with the rise of Youtube, Amazon, Netflix and many other such web services, recommender systems have taken more and more place in our lives. From e-commerce (suggest to buyers articles that could interest them) to online advertisement (suggest to users the right contents, matching their preferences), recommender systems are today unavoidable in our daily online journeys. In a very general way, recommender systems are algorithms aimed at suggesting relevant items to users (items being movies to watch, text to read, products to buy or anything else depending on industries).
Recommender systems are really critical in some industries as they can generate a huge amount of income when they are efficient or also be a way to stand out significantly from competitors. As a proof of the importance of recommender systems, we can mention that, a few years ago, Netflix organised a challenges (the âNetflix prizeâ) where the goal was to produce a recommender system that performs better than its own algorithm with a prize of 1 million dollars to win.
These datasets contain attributes about products sold on ModCloth Amazon which may be sources of bias in recommendations (in particular, attributes about how the products are marketed).Data includes user/item interactions.
Apply different paradigm, methods and algorithms to recommand right Product to the right Users, during right Time.
In 2024, Amazon's net revenue from subscription services segment amounted to 44.37 billion U.S. dollars. Subscription services include Amazon Prime, for which Amazon reported 200 million paying members worldwide at the end of 2020. The AWS category generated 107.56 billion U.S. dollars in annual sales. During the most recently reported fiscal year, the companyâs net revenue amounted to 638 billion U.S. dollars. Amazon revenue segments Amazon is one of the biggest online companies worldwide. In 2019, the companyâs revenue increased by 21 percent, compared to Googleâs revenue growth during the same fiscal period, which was just 18 percent. The majority of Amazonâs net sales are generated through its North American business segment, which accounted for 236.3 billion U.S. dollars in 2020. The United States are the companyâs leading market, followed by Germany and the United Kingdom. Business segment: Amazon Web Services Amazon Web Services, commonly referred to as AWS, is one of the strongest-growing business segments of Amazon. AWS is a cloud computing service that provides individuals, companies and governments with a wide range of computing, networking, storage, database, analytics and application services, among many others. As of the third quarter of 2020, AWS accounted for approximately 32 percent of the global cloud infrastructure services vendor market.
Amazon Review 2023 is an updated version of the Amazon Review 2018 dataset. This dataset mainly includes reviews (ratings, text) and item metadata (desc- riptions, category information, price, brand, and images). Compared to the pre- vious versions, the 2023 version features larger size, newer reviews (up to Sep 2023), richer and cleaner meta data, and finer-grained timestamps (from day to milli-second).
This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.
Metadata includes
product IDs
bounding boxes
Basic Statistics:
Scenes: 47,739
Products: 38,111
Scene-Product Pairs: 93,274
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Motivation:Entity Matching is the task of determining which records from different data sources describe the same real-world entity. It is an important task for data integration and has been the focus of many research works. A large number of entity matching/record linkage tasks has been made available for evaluating entity matching methods. However, the lack of fixed development and test splits as well as correspondence sets including both matching and non-matching record pairs hinders the reproducibility and comparability of benchmark experiments. In an effort to enhance the reproducibility and comparability of the experiments, we complement existing entity matching benchmark tasks with fixed sets of non-matching pairs as well as fixed development and test splits. Dataset Description:An augmented version of the amazon-google products dataset for benchmarking entity matching/record linkage methods found at: https://dbs.uni-leipzig.de/research/projects/object_matching/benchmark_datasets_for_entity_resolutio...The augmented version adds a fixed set of non-matching pairs to the original dataset. In addition, fixed splits for training, validation and testing as well as their corresponding feature vectors are provided. The feature vectors are built using data type specific similarity metrics.The dataset contains 1,363 records describing products deriving from amazon which are matched against 3,226 product records from google. The gold standards have manual annotations for 1,298 matching and 6,306 non-matching pairs. The total number of attributes used to decribe the product records are 4 while the attribute density is 0.75.The augmented dataset enhances the reproducibility of matching methods and the comparability of matching results.The dataset is part of the CompERBench repository which provides 21 complete benchmark tasks for entity matching for public download:http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html
These datasets contain peer-to-peer trades from various recommendation platforms.
Metadata includes
peer-to-peer trades
have and want lists
image data (tradesy)
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset comprises customer reviews for Amazon, an online retail giant, featuring insights into customer experiences, including ratings, review titles, texts, and metadata. It is valuable for analyzing customer satisfaction, sentiment, and trends.
Column Descriptions:
Reviewer Name: Identifies the reviewer. Profile Link: Links to the reviewer's profile for additional insights. Country: Indicates the reviewer's location. Review Count: Number of reviews by the same user, showing engagement level. Review Date: When the review was posted, useful for time analysis. Rating: Numerical satisfaction measure. Review Title: Summarizes the review sentiment. Review Text: Detailed customer feedback. Date of Experience: When the service/product was experienced.
Prospective applications:
Sentiment Analysis: Analyze review texts and titles to assess overall customer sentiment toward products, enabling the identification of strengths and weaknesses. Customer Satisfaction Tracking: Track and visualize rating trends over time to understand fluctuations in customer satisfaction. Product Improvement: Identify common themes in reviews to highlight areas for product enhancement or development. Market Segmentation: Use country and demographic information to customize marketing strategies and gain insights into regional preferences. Competitor Analysis: Evaluate customer feedback on Amazon products in comparison to competitors to determine market positioning. Recommendation Systems: Leverage review data to enhance recommendation algorithms, improving personalized shopping experiences. Trend Analysis: Investigate temporal patterns in reviews to link sentiment changes with marketing efforts or product launches.
This extensive dataset serves as a valuable asset for various analyses focused on enhancing customer engagement and refining business strategies.
Global MODIS vegetation indices are designed to provide consistent spatial and temporal comparisons of vegetation conditions. Blue, red, and near-infrared reflectances, centered at 469-nanometers, 645-nanometers, and 858-nanometers, respectively, are used to determine the MODIS daily vegetation indices. The MODIS Normalized Difference Vegetation Index (NDVI) complements NOAA's Advanced Very High Resolution Radiometer (AVHRR) NDVI products and provides continuity for time series historical applications. MODIS also includes a new Enhanced Vegetation Index (EVI) that minimizes canopy background variations and maintains sensitivity over dense vegetation conditions. The EVI also uses the blue band to remove residual atmosphere contamination caused by smoke and sub-pixel thin cloud clouds. The MODIS NDVI and EVI products are computed from atmospherically corrected bi-directional surface reflectances that have been masked for water, clouds, heavy aerosols, and cloud shadows. Global MOD13Q1 data are provided every 16 days at 250-meter spatial resolution as a gridded level-3 product in the Sinusoidal projection. Lacking a 250m blue band, the EVI algorithm uses the 500m blue band to correct for residual atmospheric effects, with negligible spatial artifacts. Vegetation indices are used for global monitoring of vegetation conditions and are used in products displaying land cover and land cover changes. These data may be used as input for modeling global biogeochemical and hydrologic processes and global and regional climate. These data also may be used for characterizing land surface biophysical properties and processes, including primary production and land cover conversion.
From 2004 to 2024, the net revenue of Amazon e-commerce and service sales has increased tremendously. In the fiscal year ending December 31, the multinational e-commerce company's net revenue was almost *** billion U.S. dollars, up from *** billion U.S. dollars in 2023.Amazon.com, a U.S. e-commerce company originally founded in 1994, is the worldâs largest online retailer of books, clothing, electronics, music, and many more goods. As of 2024, the company generates the majority of it's net revenues through online retail product sales, followed by third-party retail seller services, cloud computing services, and retail subscription services including Amazon Prime. From seller to digital environment Through Amazon, consumers are able to purchase goods at a rather discounted price from both small and large companies as well as from other users. Both new and used goods are sold on the website. Due to the wide variety of goods available at prices which often undercut local brick-and-mortar retail offerings, Amazon has dominated the retailer market. As of 2024, Amazonâs brand worth amounts to over *** billion U.S. dollars, topping the likes of companies such as Walmart, Ikea, as well as digital competitors Alibaba and eBay. One of Amazon's first forays into the world of hardware was its e-reader Kindle, one of the most popular e-book readers worldwide. More recently, Amazon has also released several series of own-branded products and a voice-controlled virtual assistant, Alexa. Headquartered in North America Due to its location, Amazon offers more services in North America than worldwide. As a result, the majority of the companyâs net revenue in 2023 was actually earned in the United States, Canada, and Mexico. In 2023, approximately *** billion U.S. dollars was earned in North America compared to only roughly *** billion U.S. dollars internationally.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created from the scraped reviews from products in Amazon for the purpose of text classification. The classes are three in number namely; - Negative Reviews - Neutral Reviews - Positive Reviews
Data columns includes; - Sentiments - Cleaned Review - Cleaned Review Length - Review Score
This dataset presents the problem of multiclass classification with the use of ML algorithms and also deep learning algorithms. Moreover, there is a class imbalance; negative reviews has the lowest number of reviews compared to positive and neutral reviews.
For ML algo use a mapping of; negative--> -1, neutral--> 0, positive --> 1
For Deep Learning algo use a mapping of; negative --> 0 neutral --> 1 positive --> 2
Looking forward to your model discoveries on this dataset.
Please leave an upvote if you find this relevant đ.
DES is publishing the Amazon spend for state agencies collected through the Washington State Amazon Business account. The data set only includes closed orders. Any orders that are still in process or have been cancelled are not included. This data is for Fiscal Year 18 (July 1, 2017 to June 30, 2018).
These datasets contain reviews from the Goodreads book review website, and a variety of attributes describing the items. Critically, these datasets have multiple levels of user interaction, raging from adding to a shelf, rating, and reading.
Metadata includes
reviews
add-to-shelf, read, review actions
book attributes: title, isbn
graph of similar books
Basic Statistics:
Items: 1,561,465
Users: 808,749
Interactions: 225,394,930
Comprehensive dataset covering Amazon Prime availability across 27 countries, including launch dates, pricing, and regional benefit differences
This data set provides three related land cover products for four study areas across the Brazilian Amazon: Manaus, Amazonas; Tapajos National Forest, Para Western (Santarem); Rio Branco, Acre; and Rondonia, Rondonia. Products include (1) orthorectified JERS-1 and RadarSat images, (2) land cover classifications derived from the SAR data, and (3) biomass estimates in tons per hectare based on the land cover classification. There are 12 image files (.tif) with this data set.Orthorectified JERS-1 and RadarSat images are provided as GeoTIFF images - one file for each study area.For the Manaus and Tapajos sites: The images are orthorectified at 12.5-meter resolution and then re-sampled at 25-meter resolution.For the Rondonia and Rio Branco sites: The images from 1978 are orthorectified at 25-meter resolution and then re-sampled at 90-meter resolution. Each GeoTIFF file contains 3 image channels: - 2 L-band JERS-1 data in Fall and Spring seasons and - 1 C-band RadarSat data.Land cover classifications are based on two JERS-1 images and one RadarSat image and provided as GeoTIFFs - one file for each study area. Four major land cover classes are distinguished: (1) Flat surface; (2) Regrowth area; (3) Short vegetation; and (4) Tall vegetation. The biomass estimates in tons per hectare are based on the land cover classification results and are reported in one GeoTIFF file for each study area.DATA QUALITY STATEMENT: The Data Center has determined that there are questions about the quality of the data reported in this data set. The data set has missing or incomplete data, metadata, or other documentation that diminishes the usability of the products.KNOWN PROBLEMS: The data providers note that due to limited resources, these data have been neither validated nor quality-assured for general use. For that reason, extreme caution is advised when considering the use of these data.Any use of the derived data is not recommended because the results have not been validated. However, the DEM and vectors (related data set), and orthorectified SAR data can be used if the user understands how these were produced and accepts the limitations.
These datasets contain 1.48 million question and answer pairs about products from Amazon.
Metadata includes
question and answer text
is the question binary (yes/no), and if so does it have a yes/no answer?
timestamps
product ID (to reference the review dataset)
Basic Statistics:
Questions: 1.48 million
Answers: 4,019,744
Labeled yes/no questions: 309,419
Number of unique products with questions: 191,185