https://brightdata.com/licensehttps://brightdata.com/license
Buy Amazon datasets and get access to over 300 million records from any Amazon domain. Get insights on Amazon products, sellers, and reviews.
https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
AmazonQAC: A Large-Scale, Naturalistic Query Autocomplete Dataset
Train Dataset Size: 395 million samplesTest Dataset Size: 20k samplesSource: Amazon Search LogsFile Format: ParquetCompression: Snappy If you use this dataset, please cite our EMNLP 2024 paper: @inproceedings{everaert-etal-2024-amazonqac, title = "{A}mazon{QAC}: A Large-Scale, Naturalistic Query Autocomplete Dataset", author = "Everaert, Dante and Patki, Rohit and Zheng, Tianqi and Potts… See the full description on the dataset page: https://huggingface.co/datasets/amazon/AmazonQAC.
Amazon Review 2023 is an updated version of the Amazon Review 2018 dataset. This dataset mainly includes reviews (ratings, text) and item metadata (desc- riptions, category information, price, brand, and images). Compared to the pre- vious versions, the 2023 version features larger size, newer reviews (up to Sep 2023), richer and cleaner meta data, and finer-grained timestamps (from day to milli-second).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides detailed sales data from Amazon, offering a comprehensive look at various product categories and their performance over time. It includes information on sales figures, order details, product categories, and customer demographics.
Description: A unique identifier for each order placed on Amazon. This field helps to track individual orders and link related records.
Description: The date when the order was placed. This field is crucial for analyzing sales trends over time and identifying seasonal patterns.
Description: The current status of the order (e.g., Shipped, Delivered, Pending). This field provides insight into the order fulfillment process and helps monitor order processing efficiency.
Description: Indicates the method used to fulfill the order (e.g., Fulfilled by Amazon, Fulfilled by Seller). This feature helps in analyzing the performance of different fulfillment methods and their impact on customer satisfaction.
Description: The channel through which the sale was made (e.g., Amazon Website, Mobile App). This field is useful for evaluating the effectiveness of different sales channels and understanding customer preferences.
Description: The product category to which the purchased item belongs (e.g., Electronics, Clothing, Home Goods). This feature aids in analyzing sales performance across various product categories.
Description: The shipping service level selected for the order (e.g., Standard Shipping, Two-Day Shipping). This field helps to assess the impact of shipping options on delivery times and customer satisfaction.
Description: The size of the product ordered (e.g., Small, Medium, Large). This feature is relevant for analyzing sales performance based on product size and understanding inventory requirements.
Description: The status of the shipment with the carrier (e.g., In Transit, Delivered, Returned). This field provides insights into the shipping process and helps in monitoring delivery performance and handling returns.
Examine trends in sales over time, identify peak periods, and analyze performance by product category.
Explore customer demographics to understand purchasing behavior and preferences.
Assess which products are performing well and which are not, aiding in inventory and supply chain management.
Develop targeted marketing campaigns based on sales trends and customer profiles.
This dataset is a simulated collection of Amazon sales data and is intended for educational and analytical purposes.
This dataset was created to facilitate data analysis and machine learning projects. It is ideal for practicing data manipulation, statistical analysis, and predictive modeling.
Amazon Customer Reviews (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. Accordingly, we are releasing this data to further research in multiple disciplines related to understanding customer product experiences. Specifically, this dataset was constructed to represent a sample of customer evaluations and opinions, variation in the perception of a product across geographical regions, and promotional intent or bias in reviews.
Over 130+ million customer reviews are available to researchers as part of this release. The data is available in TSV files in the amazon-reviews-pds S3 bucket in AWS US East Region. Each line in the data files corresponds to an individual review (tab delimited, with no quote and escape characters).
Each Dataset contains the following columns : marketplace - 2 letter country code of the marketplace where the review was written. customer_id - Random identifier that can be used to aggregate reviews written by a single author. review_id - The unique ID of the review. product_id - The unique Product ID the review pertains to. In the multilingual dataset the reviews for the same product in different countries can be grouped by the same product_id. product_parent - Random identifier that can be used to aggregate reviews for the same product. product_title - Title of the product. product_category - Broad product category that can be used to group reviews (also used to group the dataset into coherent parts). star_rating - The 1-5 star rating of the review. helpful_votes - Number of helpful votes. total_votes - Number of total votes the review received. vine - Review was written as part of the Vine program. verified_purchase - The review is on a verified purchase. review_headline - The title of the review. review_body - The review text. review_date - The date the review was written.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('amazon_us_reviews', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Amazon random products data and its last extracted on 20 sept 2022.
Downlod similar products data for months aug and sept
1. https://crawlfeeds.com/datasets/amazon-products-dataset-sept-2022
2. https://crawlfeeds.com/datasets/amazon-products-dataset-aug-2022
The dataset contains reviews which were web scraped with the Python library BeautifulSoup, where the reviews were webscraped from Amazon products.
The columns of the dataset:
How did I label my dataset, or rather how did I label the reviews as inconsistent (1) or consistent (0) ?
To begin, the VADER Sentiment tool was utilized to extract the compound sentiment value for each text review. Subsequently, the polarity of the review's text was assigned by labeling it as 'Positive' if the review's compound value exceeded 0.05, 'Negative' if the compound value was below -0.05, and 'Neutral' otherwise. Once the text polarity had been extracted for all reviews, the star polarity for each review was determined based on the number of stars assigned. Specifically, reviews that contained a star rating of 1 or 2 were labeled as 'Negative', reviews with a rating of 3 were labeled as 'Neutral', and those with 4 or 5 stars were labeled as 'Positive'.
In order to identify inconsistencies or mismatches within a review, a comparison was made between the review's text polarity and star polarity. Reviews that had matching polarities were labeled as 'Consistent' (represented by 0 in binary). Conversely, if there was a mismatch between the two polarities, the review was labeled as 'Inconsistent' (represented by 1 in binary). This binary value was then recorded in the 'inconsistentStatus' column.
FYI : You could delete off the column 'inconsistentStatus' and use your own logic for labelling the rows as consistent or inconsistent.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Amazon Food Products Dataset is a large-scale collection of product listings, reviews, and metadata sourced from Amazon. This dataset is valuable for understanding consumer behaviour, analyzing product trends, and training machine learning models for recommendation systems and sentiment analysis. It includes various categories, providing insights into customer preferences, product ratings, and review sentiments.
Each record in the dataset contains the following key fields:
This dataset is ideal for a variety of applications:
CC0
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Amazon Bin Image Dataset contains 50,000 images and metadata from bins of a pod in an operating Amazon Fulfillment Center. The bin images in this dataset are captured as robot units carry pods as part of normal Amazon Fulfillment Center operations. This dataset can be used for research in variety of areas like computer vision, counting genetic items and learning from weakly-tagged data.
For each image, there is a corresponding entry of its metadata in JSON format stored in metadata.sqlite
i.e. for image 01290.jpg, there is a corresponding json object in the data field of the metadata file which can be retrieved with query SELECT data FROM metadata WHERE img_id = 01290
;
Refer the Starter Notebook to see how to work with the dataset.
Amazon uses a random storage scheme where items are placed into accessible bins with available space, so the contents of each bin are random, rather than organized by specific product types. Thus, each bin image may show only one type of product or a diverse range of products. Occasionally, items are misplaced while being handled, so the contents of some bin images may not match the recorded inventory of that bin.
These are some typical images in the dataset. A bin contains multiple object categories and various number of instances. The corresponding metadata exist for each bin image and it includes the object category identification (ASIN - Amazon Standard Identification Number), quantity and dimensions of objects. The size of bins are various depending on the size of objects in it. The tapes in front of the bins are for preventing the items from falling out of the bins and sometimes it might make the objects unclear. Objects are sometimes heavily occluded by other objects or limited viewpoint of the images.
Image Credits: Unsplash - helloimnik
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Amazon Products Sales Dataset 2023 is a large e-commerce dataset that summarizes various product information in a tabular format, including product name, price, rating, discount information, images, and links by 142 major categories collected from Amazon's website.
2) Data Utilization (1) Amazon Products Sales Dataset 2023 has characteristics that: • Each row contains 10 key attributes, including product name, main/subcategory, image, Amazon link, rating, number of ratings, discount price, and actual price. • The data encompasses a wide range of products and is structured to enable multi-faceted analysis such as price policy, customer evaluation, and trend by category. (2) Amazon Products Sales Dataset 2023 can be used to: • Product Recommendation and Marketing Strategy: Use rating, price, and category data to develop a customized recommendation system, analyze popular products, and establish a category-specific marketing strategy. • Price and Discount Policy Analysis—Based on discounted prices and actual prices, ratings, reviews, etc., it can be applied to effective pricing policies, promotion strategies, market competitiveness analyses, and more.
This both the original .tfrecords and a Parquet representation of the YouTube 8 Million dataset. YouTube-8M is a large-scale labeled video dataset that consists of millions of YouTube video IDs, with high-quality machine-generated annotations from a diverse vocabulary of 3,800+ visual entities. It comes with precomputed audio-visual features from billions of frames and audio segments, designed to fit on a single hard disk. This dataset also includes the YouTube-8M Segments data from June 2019. This dataset is 'Lakehouse Ready'. Meaning, you can query this data in-place straight out of the Registry of Open Data S3 bucket. Deploy this dataset's corresponding CloudFormation template to create the AWS Glue Catalog entries into your account in about 30 seconds. That one step will enable you to interact with the data with AWS Athena, AWS SageMaker, AWS EMR, or join into your AWS Redshift clusters. More detail in (the documentation)[https://github.com/aws-samples/data-lake-as-code/blob/roda-ml/README.md.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about companies. It has 19 rows and is filtered where the company is Amazon. It features 5 columns: employee type, CEO, CEO gender, and CEO approval.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about news. It has 6,974 rows and is filtered where the keywords includes Amazon. It features 10 columns including source, publication date, section, and news link.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about companies. It has 19 rows and is filtered where the company is Amazon. It features 5 columns: employees, CEO, CEO gender, and CEO approval.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset: Amazon Customer Review Data for sentiment analysis
Size: 60889 appox.
Format: .CSV
Period: 2013 to 2019
Categories: 5…… (Mobiles, Smart TV, Books, Mobile Accessories, Refrigerator)
Unique_ID: Customized (Primary Key)
Review_Header: user’s comment in few words
Review_Text: User’s comment in details (3-4 lines)
Rating: (1- Very Low, 2 🡪 Low, 3🡪 Avg, 4 🡪 Good, 5 - Excellent)
Posting Period: 2013 to 2019
Own_Rating: for 1-2 🡪 Negative, 3🡪 Neutral, 4-5 🡪 Positive
Amazon Customer Reviews Dataset is a dataset of user-generated product reviews on the shopping website Amazon. It contains over 130 million product reviews.
This dataset contains a tiny fraction of that dataset processed and prepared specifically for language generation.
To know how the dataset is prepared, then please check the GitHub repository for this dataset. https://github.com/imdeepmind/AmazonReview-LanguageGenerationDataset
The dataset is stored in an SQLite database. The database contains one table called reviews. This table contains two columns sequence and next.
The sequence column contains sequences of characters. In this dataset, each sequence of 40 characters long.
The next column contains the next character after the sequence.
There are about 200 million samples are in the dataset.
Thanks to Amazon for making this awesome dataset. Here is the link for the dataset: https://s3.amazonaws.com/amazon-reviews-pds/readme.html
This dataset can be used for Language Generation. As it contains 200 million samples, complex Deep Learning models can be trained on this data.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
App mobile phones reviews structured dataset. This small dataset is ideal for NLP and to test machine learning algorithms.
Get large dataset from our resources.
Extracted from amazon.
Data included only for apple mobile phones.
Reach out to us for large datasets
This statistic shows the importance of big data access methods worldwide as of 2019. Amazon S3 was seen as the most important big data access method, with around ** percent of respondents stating that it was critical or very important to their organization.
This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.
Metadata includes
product IDs
bounding boxes
Basic Statistics:
Scenes: 47,739
Products: 38,111
Scene-Product Pairs: 93,274
These datasets contain peer-to-peer trades from various recommendation platforms.
Metadata includes
peer-to-peer trades
have and want lists
image data (tradesy)
https://brightdata.com/licensehttps://brightdata.com/license
Buy Amazon datasets and get access to over 300 million records from any Amazon domain. Get insights on Amazon products, sellers, and reviews.