93 datasets found

UK Optimal Product Price Prediction Dataset
kaggle.com
zip
Updated Nov 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
asaniczka (2023). UK Optimal Product Price Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/asaniczka/uk-optimal-product-price-prediction/data
Explore at:
zip(122747870 bytes)Available download formats
Dataset updated
Nov 7, 2023
Authors
asaniczka
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Area covered
United Kingdom
Description
This dataset contains product prices from Amazon UK, with a focus on price prediction. With a good amount of data on what price points sell the most, you can train machine learning models to predict the optimal price for a product based on its features and product name.

If you find this dataset useful, make sure to show your appreciation by upvoting! ❤️✨

Inspirations

This dataset is a superset of my Amazon UK product price dataset. Another inspiration is this competition that awareded 100K Prize Money

What To Do?

Your objective is to create a prediction model that will assist sellers in pricing their products within the optimal price range to generate the most sales.

The dataset includes various data points, such as the number of reviews, rating, best seller status, and items sold last month.

You can select specific factors (e.g., over 100 reviews = optimal price for the product) and then divide the dataset into products priced optimally vs products priced unoptimally.

By utilizing techniques like vectorizing product names and features, you can train a model to provide the optimal price for a product, which sellers or businesses might find valuable.

How to know if a product sells?

I would prefer to use the number of reviews as a metric to determine if a product sells. More reviews = more sales, right?

According to one source only 1-2% of buyers leave a review

So if we multiply the reviews for a product by 50x, then we would get a good understanding how many units has sold.

If we then multiple the product price by number of units sold, we'd get the total revenue generated by the product

How is this useful?

Sellers and businesses can leverage your model to determine the optimal price for their products, thereby maximizing sales.

Businesses can assess the profitability of a product and plan their supply chain accordingly.
Global net revenue of Amazon 2014-2024, by product group
statista.com
abripper.com
Updated Feb 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global net revenue of Amazon 2014-2024, by product group [Dataset]. https://www.statista.com/statistics/672747/amazons-consolidated-net-revenue-by-segment/
Explore at:
Dataset updated
Feb 15, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In 2024, Amazon's net revenue from subscription services segment amounted to 44.37 billion U.S. dollars. Subscription services include Amazon Prime, for which Amazon reported 200 million paying members worldwide at the end of 2020. The AWS category generated 107.56 billion U.S. dollars in annual sales. During the most recently reported fiscal year, the company’s net revenue amounted to 638 billion U.S. dollars. Amazon revenue segments Amazon is one of the biggest online companies worldwide. In 2019, the company’s revenue increased by 21 percent, compared to Google’s revenue growth during the same fiscal period, which was just 18 percent. The majority of Amazon’s net sales are generated through its North American business segment, which accounted for 236.3 billion U.S. dollars in 2020. The United States are the company’s leading market, followed by Germany and the United Kingdom. Business segment: Amazon Web Services Amazon Web Services, commonly referred to as AWS, is one of the strongest-growing business segments of Amazon. AWS is a cloud computing service that provides individuals, companies and governments with a wide range of computing, networking, storage, database, analytics and application services, among many others. As of the third quarter of 2020, AWS accounted for approximately 32 percent of the global cloud infrastructure services vendor market.
u
Amazon Question and Answer Data
cseweb.ucsd.edu
json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Amazon Question and Answer Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain 1.48 million question and answer pairs about products from Amazon.

Metadata includes

question and answer text

is the question binary (yes/no), and if so does it have a yes/no answer?

timestamps

product ID (to reference the review dataset)

Basic Statistics:

Questions: 1.48 million

Answers: 4,019,744

Labeled yes/no questions: 309,419

Number of unique products with questions: 191,185
USA Optimal Product Price Prediction Dataset
kaggle.com
zip
Updated Nov 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
asaniczka (2023). USA Optimal Product Price Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/asaniczka/usa-optimal-product-price-prediction
Explore at:
zip(106250140 bytes)Available download formats
Dataset updated
Nov 7, 2023
Authors
asaniczka
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
This dataset contains product prices from Amazon USA, with a focus on price prediction. With a good amount of data on what price points sell the most, you can train machine learning models to predict the optimal price for a product based on its features and product name.

If you find this dataset useful, make sure to show your appreciation by upvoting! ❤️✨

Inspirations

This dataset is a superset of my Amazon USA product price dataset. Another inspiration is this competition that awareded 100K Prize Money

What To Do?

Your objective is to create a prediction model that will assist sellers in pricing their products within the optimal price range to generate the most sales.

The dataset includes various data points, such as the number of reviews, rating, best seller status, and items sold last month.

You can select specific factors (e.g., over 100 reviews = optimal price for the product) and then divide the dataset into products priced optimally vs products priced unoptimally.

By utilizing techniques like vectorizing product names and features, you can train a model to provide the optimal price for a product, which sellers or businesses might find valuable.

How to know if a product sells?

I would prefer to use the number of reviews as a metric to determine if a product sells. More reviews = more sales, right?

According to one source only 1-2% of buyers leave a review

So if we multiply the reviews for a product by 50x, then we would get a good understanding how many units has sold.

If we then multiple the product price by number of units sold, we'd get the total revenue generated by the product

How is this useful?

Sellers and businesses can leverage your model to determine the optimal price for their products, thereby maximizing sales.

Businesses can assess the profitability of a product and plan their supply chain accordingly.
u
Amazon review data 2018
cseweb.ucsd.edu
nijianmo.github.io
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
Explore at:
Dataset authored and provided by
UCSD CSE Research Project
Description
Context

This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

More reviews:

The total number of reviews is 233.1 million (142.8 million in 2014).

New reviews:

Current data includes reviews in the range May 1996 - Oct 2018.

Metadata: - We have added transaction metadata for each review shown on the review page.

Added more detailed metadata of the product landing page.

Acknowledgements

If you publish articles based on this dataset, please cite the following paper:

Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
Amazon dataset for ERS-REFMMF
figshare.com
txt
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Teng Chang (2024). Amazon dataset for ERS-REFMMF [Dataset]. http://doi.org/10.6084/m9.figshare.25126313.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25126313.v1
Dataset updated
Feb 1, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Teng Chang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recommender systems based on matrix factorization act as black-box models and are unable to explain the recommended items. After adding the neighborhood algorithm, the explainability is measured by the user's neighborhood recommendation, but the subjective explicit preference of the target user is ignored. To better combine the latent factors from matrix factorization and the target user's explicit preferences, an explainable recommender system based on reconstructed explanatory factors and multi-modal matrix factorization (ERS-REFMMF) is proposed. ERS-REFMMF is a two-layer model, and the underlying model decomposes the multi-modal scoring matrix to get the rich latent features of the user and the item based on the method of Funk-SVD, in which the multi-modal scoring matrix consists of the original matrix and the preference features and sentiment scores exhibited by users in the reviews corresponding to the ratings. The set of candidate items is obtained based on the latent features, and the explainability is reconstructed based on the subjective preference of the target user and the real recognition level of the neighbors. The upper layer is the multi-objective high-performance recommendation stage, in which the candidate set is optimized by a multi-objective evolutionary algorithm to bring the user a final recommendation list that is accurate, recallable, diverse, and interpretable, in which the accuracy and recall are represented by F1-measure. Experimental results on three real datasets from Amazon show that the proposed model is competitive compared to existing recommendation methods in both stages.
d
Satellite US Supply Chain Dataset Package (Amazon, Fedex, Walmart) +...
datarade.ai
.csv
Updated Jan 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Space Know (2023). Satellite US Supply Chain Dataset Package (Amazon, Fedex, Walmart) + Research Report Available [Dataset]. https://datarade.ai/data-products/satellite-us-supply-chain-dataset-package-amazon-fedex-wal-space-know
Explore at:
.csvAvailable download formats
Dataset updated
Jan 18, 2023
Dataset authored and provided by
Space Know
Area covered
United States
Description
SpaceKnow USA Supply Chain Premium Dataset gives you data (by locations and company) of US Supply Chain choke points in near-real-time as seen from satellite images. The uniqueness of this dataset lies in its granularity.

About dataset: We apply proprietary algorithms to SAR satellite imagery of key industrial, transportation, storage, and logistics locations to create daily indices of industry activity. Data was collected from more than 5,000 locations across the USA. Thanks to the use of SAR satellite technology, the quality of the SpaceKnow dataset is not influenced by weather fluctuations.

In total SpaceKnow USA Supply Chain dataset offers +50 specific indices with real-time insights. The premium dataset includes company-focused indices. This type of data can be used by investors to get insight on important KPIs such as revenue.

This dataset is:

Daily frequency History from Jan 2017 - present

Within one package we provide you with real-time insights into:

Port Container country-level indices(A container port or container terminal is a facility where cargo containers are transshipped between different transport vehicles, for onward transportation) Port Container indices for the major ports in US: Port of Los Angeles Port of Long Beach Port of New York & New Jersey Port of Savannah Port of Houston Port of Virginia Port of Oakland in California Port of South Carolina Port of Miami

Trucking Stop indices for the most important locations in the supply chain like: Iowa Nevada South Carolina Oregon North Carolina

Inland Containers index on a country-level

Logistics Center index on a country-level (Logistics centers are distribution hubs for finished goods that need to be transported to another location. We include logistics centers from companies like Amazon, Walmart, Fedex and others)

Logistics Center indices for states like: California New York Illinois Indiana South Carolina And many more…

Logistics Center indices for companies: Amazon Walmart Fedex

Research Reports Don't have the capacity to analyze the data? Let SpaceKnow's in-house economists do the heavy lifting so that you can focus on what's important. SpaceKnow writes research reports based on what the data from the US Supply Chain dataset package is showing. The document includes a detailed explanation of what is happening with supporting charts and tables. The reports are published on a monthly basis.

Delivery Mechanisms All of the delivery mechanisms detailed below are available as part of this package. Data is distributed only in the flat-table CSV format. Methods how to access the data: Dashboard - option that also offers data visualization within the webpage Automatic email delivery API access to our dataset Research reports - provided via email in PDF format

Client Support

Each client is assigned an account representative who will reach out periodically to make sure that the data packages are meeting your needs. Here are some other ways to contact SpaceKnow in case you have a specific question.

For delivery questions and issues: Please reach out to support@spaceknow.com

For data questions: Please reach out to product@spaceknow.com

For pricing/sales support: Please reach out to info@spaceknow.com or sales@spaceknow.com
Canada Optimal Product Price Prediction Dataset
kaggle.com
zip
Updated Nov 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
asaniczka (2023). Canada Optimal Product Price Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/asaniczka/canada-optimal-product-price-prediction
Explore at:
zip(152593255 bytes)Available download formats
Dataset updated
Nov 7, 2023
Authors
asaniczka
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Area covered
Canada
Description
This dataset contains product prices from Amazon Canada, with a focus on price prediction. With a good amount of data on what price points sell the most, you can train machine learning models to predict the optimal price for a product based on its features and product name.

If you find this dataset useful, make sure to show your appreciation by upvoting! ❤️✨

Inspirations

This dataset is a superset of my Amazon Canada product price dataset. Another inspiration is this competition that awareded 100K Prize Money

What To Do?

Your objective is to create a prediction model that will assist sellers in pricing their products within the optimal price range to generate the most sales.

The dataset includes various data points, such as the number of reviews, rating, best seller status, and items sold last month.

You can select specific factors (e.g., over 100 reviews = optimal price for the product) and then divide the dataset into products priced optimally vs products priced unoptimally.

By utilizing techniques like vectorizing product names and features, you can train a model to provide the optimal price for a product, which sellers or businesses might find valuable.

How to know if a product sells?

I would prefer to use the number of reviews as a metric to determine if a product sells. More reviews = more sales, right?

According to one source only 1-2% of buyers leave a review

So if we multiply the reviews for a product by 50x, then we would get a good understanding how many units has sold.

If we then multiple the product price by number of units sold, we'd get the total revenue generated by the product

How is this useful?

Sellers and businesses can leverage your model to determine the optimal price for their products, thereby maximizing sales.

Businesses can assess the profitability of a product and plan their supply chain accordingly.
h
amazon_us_reviews
huggingface.co
tensorflow.org
Updated Jun 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Polina Kazakova (2023). amazon_us_reviews [Dataset]. https://huggingface.co/datasets/polinaeterna/amazon_us_reviews
Explore at:
Dataset updated
Jun 30, 2023
Authors
Polina Kazakova
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Amazon Customer Reviews (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. Accordingly, we are releasing this data to further research in multiple disciplines related to understanding customer product experiences. Specifically, this dataset was constructed to represent a sample of customer evaluations and opinions, variation in the perception of a product across geographical regions, and promotional intent or bias in reviews.

Over 130+ million customer reviews are available to researchers as part of this release. The data is available in TSV files in the amazon-reviews-pds S3 bucket in AWS US East Region. Each line in the data files corresponds to an individual review (tab delimited, with no quote and escape characters).

Each Dataset contains the following columns:

marketplace: 2 letter country code of the marketplace where the review was written.

customer_id: Random identifier that can be used to aggregate reviews written by a single author.

review_id: The unique ID of the review.

product_id: The unique Product ID the review pertains to. In the multilingual dataset the reviews for the same product in different countries can be grouped by the same product_id.

product_parent: Random identifier that can be used to aggregate reviews for the same product.

product_title: Title of the product.

product_category: Broad product category that can be used to group reviews (also used to group the dataset into coherent parts).

star_rating: The 1-5 star rating of the review.

helpful_votes: Number of helpful votes.

total_votes: Number of total votes the review received.

vine: Review was written as part of the Vine program.

verified_purchase: The review is on a verified purchase.

review_headline: The title of the review.

review_body: The review text.

review_date: The date the review was written.
Reddit Sentiment VS Stock Price
zenodo.org
bin, csv, json, png +2
Updated May 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Will Baysingar; Will Baysingar (2025). Reddit Sentiment VS Stock Price [Dataset]. http://doi.org/10.5281/zenodo.15367306
Explore at:
csv, bin, png, text/x-python, txt, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15367306
Dataset updated
May 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Will Baysingar; Will Baysingar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overall, this project was meant test the relationship between social media posts and their short-term effect on stock prices. We decided to use Reddit posts from financial specific subreddit communities like r/wallstreetbets, r/investing, and r/stocks to see the changes in the market associated with a variety of posts made by users. This idea came to light because of the GameStop short squeeze that showed the power of social media in the market. Typically, stock prices should purely represent the total present value of all the future value of the company, but the question we are asking is whether social media can impact that intrinsic value. Our research question was known from the start and it was do Reddit posts for or against a certain stock provide insight into how the market will move in a short window. To solve this problem, we selected five large tech companies including Apple, Tesla, Amazon, Microsoft, and Google. These companies would likely give us more data in the subreddits and would have less volatility day to day allowing us to simulate an experiment easier. They trade at very high values so a change from a Reddit post would have to be significant giving us proof that there is an effect.

Next, we had to choose our data sources for to have data to test with. First, we tried to locate the Reddit data using a Reddit API, but due to circumstances regarding Reddit requiring approval to use their data we switched to a Kaggle dataset that contained metadata from Reddit. For our second data set we had planned to use Yahoo Finance through yfinance, but due to the large amount of data we were pulling from this public API our IP address was temporarily blocked. This caused us to switch our second data to pull from Alpha Vantage. While this was a large switch in the public it was a minor roadblock and fixing the Finance pulling section allowed for everything else to continue to work in succession. Once we had both of our datasets programmatically pulled into our local vs code, we implemented a pipeline to clean, merge, and analyze all the data. At the end, we implement a Snakemake workflow to ensure the project was easily reproducible. To continue, we utilized Textblob to label our Reddit posts with a sentiment value of positive, negative, or neutral and provide us with a correlation value to analyze with. We then matched the time frame of each post with the stock data and computed any possible changes, found a correlation coefficient, and graphed our findings.

To conclude the data analysis, we found that there is relatively small or no correlation between the total companies, but Microsoft and Google do show stronger correlations when analyzed on their own. However, this may be due to other circumstances like why the post was made or if the market had other trends on those dates already. A larger analysis with more data from other social media platforms would be needed to conclude for our hypothesis that there is a strong correlation.
Data from: LBA-ECO LC-03 SAR Images, Land Cover, and Biomass, Four Areas...
data.nasa.gov
search.dataone.org
+6more
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). LBA-ECO LC-03 SAR Images, Land Cover, and Biomass, Four Areas across Brazilian Amazon [Dataset]. https://data.nasa.gov/dataset/lba-eco-lc-03-sar-images-land-cover-and-biomass-four-areas-across-brazilian-amazon-9b36f
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Area covered
Amazon Rainforest
Description
This data set provides three related land cover products for four study areas across the Brazilian Amazon: Manaus, Amazonas; Tapajos National Forest, Para Western (Santarem); Rio Branco, Acre; and Rondonia, Rondonia. Products include (1) orthorectified JERS-1 and RadarSat images, (2) land cover classifications derived from the SAR data, and (3) biomass estimates in tons per hectare based on the land cover classification. There are 12 image files (.tif) with this data set.Orthorectified JERS-1 and RadarSat images are provided as GeoTIFF images - one file for each study area.For the Manaus and Tapajos sites: The images are orthorectified at 12.5-meter resolution and then re-sampled at 25-meter resolution.For the Rondonia and Rio Branco sites: The images from 1978 are orthorectified at 25-meter resolution and then re-sampled at 90-meter resolution. Each GeoTIFF file contains 3 image channels: - 2 L-band JERS-1 data in Fall and Spring seasons and - 1 C-band RadarSat data.Land cover classifications are based on two JERS-1 images and one RadarSat image and provided as GeoTIFFs - one file for each study area. Four major land cover classes are distinguished: (1) Flat surface; (2) Regrowth area; (3) Short vegetation; and (4) Tall vegetation. The biomass estimates in tons per hectare are based on the land cover classification results and are reported in one GeoTIFF file for each study area.DATA QUALITY STATEMENT: The Data Center has determined that there are questions about the quality of the data reported in this data set. The data set has missing or incomplete data, metadata, or other documentation that diminishes the usability of the products.KNOWN PROBLEMS: The data providers note that due to limited resources, these data have been neither validated nor quality-assured for general use. For that reason, extreme caution is advised when considering the use of these data.Any use of the derived data is not recommended because the results have not been validated. However, the DEM and vectors (related data set), and orthorectified SAR data can be used if the user understands how these were produced and accepts the limitations.
d
Amazon Email Receipt Data | Consumer Transaction Data | Asia, EMEA, LATAM,...
datarade.ai
.json, .xml, .csv
Updated Oct 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Measurable AI (2023). Amazon Email Receipt Data | Consumer Transaction Data | Asia, EMEA, LATAM, MENA, India | Granular & Aggregate Data available [Dataset]. https://datarade.ai/data-products/amazon-email-receipt-data-consumer-transaction-data-asia-measurable-ai
Explore at:
.json, .xml, .csvAvailable download formats
Dataset updated
Oct 12, 2023
Dataset authored and provided by
Measurable AI
Area covered
Asia, Latin America, Malaysia, Japan, Mexico, Colombia, United States of America, Thailand, Brazil, Chile, Argentina, Pakistan
Description
The Measurable AI Amazon Consumer Transaction Dataset is a leading source of email receipts and consumer transaction data, offering data collected directly from users via Proprietary Consumer Apps, with millions of opt-in users.

We source our email receipt consumer data panel via two consumer apps which garner the express consent of our end-users (GDPR compliant). We then aggregate and anonymize all the transactional data to produce raw and aggregate datasets for our clients.

Use Cases Our clients leverage our datasets to produce actionable consumer insights such as: - Market share analysis - User behavioral traits (e.g. retention rates) - Average order values - Promotional strategies used by the key players. Several of our clients also use our datasets for forecasting and understanding industry trends better.

Coverage - Asia (Japan) - EMEA (Spain, United Arab Emirates)

Granular Data Itemized, high-definition data per transaction level with metrics such as - Order value - Items ordered - No. of orders per user - Delivery fee - Service fee - Promotions used - Geolocation data and more

Aggregate Data - Weekly/ monthly order volume - Revenue delivered in aggregate form, with historical data dating back to 2018. All the transactional e-receipts are sent from app to users’ registered accounts.

Most of our clients are fast-growing Tech Companies, Financial Institutions, Buyside Firms, Market Research Agencies, Consultancies and Academia.

Our dataset is GDPR compliant, contains no PII information and is aggregated & anonymized with user consent. Contact business@measurable.ai for a data dictionary and to find out our volume in each country.
Amazon Reviews
kaggle.com
zip
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bengisu yılmaz (2024). Amazon Reviews [Dataset]. https://www.kaggle.com/datasets/bengisuylmaz/amazon-reviews
Explore at:
zip(721803 bytes)Available download formats
Dataset updated
Jul 10, 2024
Authors
Bengisu yılmaz
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset belongs to a single product on Amazon and contains detailed information about its reviews. Each entry in the dataset represents a review written by a customer.

reviewerid: A unique identifier for the reviewer. Each reviewer has a distinct ID that helps differentiate their reviews from others. asin: Amazon Standard Identification Number. It is a unique identifier assigned to each product on Amazon. reviewername: The name or username of the reviewer. This is the display name of the person who wrote the review. helpful: A measure indicating how many people found the review helpful. This is often shown as a ratio, e.g., [2,3] where 2 people found it helpful out of 3 total votes. reviewtext: The actual text of the review. This is the content of what the reviewer wrote about the product. overall: The rating given by the reviewer, usually on a scale of 1 to 5 stars. summary: A short summary or title of the review. This is often a brief highlight of the reviewer's opinion. unixreviewtime: The time the review was written, represented as a Unix timestamp. This is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT). reviewtime: The human-readable date when the review was written, typically in the format "MM DD, YYYY". day_diff: The difference in days between the review date and some reference date (often the current date or the date the dataset was compiled). This helps to understand how recent the review is. helpful_yes: The number of people who found the review helpful. This is the first number in the "helpful" ratio. total_vote: The total number of votes the review received. This is the second number in the "helpful" ratio.
I
Data for Implementing Deep Soil and Dynamic Root Uptake in Noah-MP (v4.5):...
databank.illinois.edu
investigacion.usc.gal
+1more
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carolina A. Bieri; Francina Dominguez; Gonzalo Miguez-Macho; Ying Fan (2025). Data for Implementing Deep Soil and Dynamic Root Uptake in Noah-MP (v4.5): Impact on Amazon Dry-Season Transpiration [Dataset]. http://doi.org/10.13012/B2IDB-8777292_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-8777292_V1
Dataset updated
Mar 19, 2025
Authors
Carolina A. Bieri; Francina Dominguez; Gonzalo Miguez-Macho; Ying Fan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
U.S. National Science Foundation (NSF)
Description
This repository includes HRLDAS Noah-MP model output generated as part of Bieri et al. (2025) - Implementing deep soil and dynamic root uptake in Noah-MP (v4.5): Impact on Amazon dry-season transpiration. These data are distributed in two different formats: Raw model output files and subsetted files that include data for a specific variable. All files are .nc format (NetCDF) and aggregated into .tar files to facilitate download. Given the size of these datasets, Globus transfer is the best way to download them. Raw model output for four model experiments is available: FD (control), GW, SOIL, and ROOT. See the associated publication for information on the different experiments. These data span an approximately 20 year period from 01 Jun 2000 to 31 Dec 2019. The data have a spatial resolution of 4 km and a temporal frequency of 3 hours. These data are for a domain in the southern Amazon basin (see Figure 1 in the associated publication). Data for each experiment is available as a .tar file which includes 3-hourly NetCDF files. All default Noah-MP output variables are included in each file. As a result, the .tar files are quite large and may take many hours or even days to transfer depending on your network speed and local configurations. These files are named 'noahmp_output_2000_2019_EXP.tar', where EXP is the name of the experiment (FD, GW, SOIL, or ROOT). Subsetted model output at a daily temporal resolution for all four model experiments is also available. These .tar files include the following variables: water table depth (ZWT), latent heat flux (LH), sensible heat flux (HFX), soil moisture (SOIL_M), canopy evaporation (ECAN), ground evaporation (EDIR), transpiration (ETRAN), rainfall rate at the surface (QRAIN), and two variables that are specific to the ROOT experiment: ROOTACTIVITY (root activity function) and GWRD (active root water uptake depth). There is one file for each variable within the tarred files. These files are named 'noahmp_output_subset_2000_2019_EXP.tar', where EXP is the name of the experiment (FD, GW, SOIL, or ROOT). Finally, there is a sample dataset with raw 3-hourly output from the ROOT experiment for one day. The purpose of this sample dataset is to allow users to confirm if these data meet their needs before initiating a full transfer via Globus. This file is named 'noahmp_output_sample_ROOT.tar'. The README.txt file provides information on the Noah-MP output variables in these datasets, among other specifications. Information on HRLDAS Noah-MP and names/definitions of model output variables that are useful in working with these data are available here: http://dx.doi.org/10.5065/ew8g-yr95. Note that some output variables may be listed in this document under a different variable name, so searching for the long name (e.g. 'baseflow' instead of 'QRF') is recommended. Information on additional output variables that were added to the model as part of this study is available here: https://github.com/bieri2/bieri-et-al-2025-EGU-GMD/tree/DynaRoot. Model code, configuration files, and forcing data used to carry out the model simulations are linked in the related resources section.
m
Data from: Amazon Rainforest Wildfires Rumor Detection
data.mendeley.com
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bram Janssens (2022). Amazon Rainforest Wildfires Rumor Detection [Dataset]. http://doi.org/10.17632/m7k4gsffry.1
Explore at:
Unique identifier
https://doi.org/10.17632/m7k4gsffry.1
Dataset updated
Dec 6, 2022
Authors
Bram Janssens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Amazon Rainforest
Description
The data set contains information about the Amazon rainforest wildfires that took place in 2019. Twitter data has been collected between August 21, 2019 and September 27, 2019 based on the following hashtags: #PrayforAmazonas, #AmazonRainforest, and #AmazonFire.

The goal of this data set is to detect whether a tweet is identified as a rumor or not (given by the 'label' column). A tweet that is identified as a rumor is labeled as 1, and 0 otherwise. The tweets were labeled by two independent annotators using the following guidelines. Whether a tweet is a rumor or not depends on 3 important aspects: (1) A rumor is a piece of information that is unverified or not confirmed by official instances. In other words, it does not matter whether the information turns out to be true or false in the future. (2) More specifically, a tweet is a rumor if the information is unverified at the time of posting. (3) For a tweet to be a rumor, it should contain an assertion, meaning the author of tweet commits to the truth of the message.

In sum, the annotators indicated that a tweet is a rumor if it consisted of an assertion giving information that is unverifiable at the time of posting. Practically, to check whether the information in a tweet was verified or confirmed by official instances at the moment of tweeting, the annotators used BBC News and Reuters. After all the tweets were labeled, the annotators re-iterated over the tweets they disagreed on to produce the final tweet label.

Besides the label indicating whether a tweet is a rumor or not (i.e., ‘label’), the data set contains the tweet itself (i.e., ‘full_text’), and additional metadata (e.g., ‘created_at’, ‘favorite_count’). In total, the data set contains 1,392 observations of which 184 (13%) are identified as rumors.

This data set can be used by researchers to make rumor detection models (i.e., statistical, machine learning and deep learning models) using both unstructured (i.e., textual) and structured data.
RBC-SatImg: Sentinel-2 Imagery and WatData Labels for Water Mapping
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Aug 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Helena Calatrava; Helena Calatrava; Bhavya Duvvuri; Bhavya Duvvuri; Haoqing Li; Haoqing Li; Ricardo Borsoi; Ricardo Borsoi; Tales Imbiriba; Tales Imbiriba; Edward Beighley; Edward Beighley; Deniz Erdogmus; Deniz Erdogmus; Pau Closas; Pau Closas (2024). RBC-SatImg: Sentinel-2 Imagery and WatData Labels for Water Mapping [Dataset]. http://doi.org/10.5281/zenodo.13345343
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13345343
Dataset updated
Aug 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Helena Calatrava; Helena Calatrava; Bhavya Duvvuri; Bhavya Duvvuri; Haoqing Li; Haoqing Li; Ricardo Borsoi; Ricardo Borsoi; Tales Imbiriba; Tales Imbiriba; Edward Beighley; Edward Beighley; Deniz Erdogmus; Deniz Erdogmus; Pau Closas; Pau Closas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Description

This dataset is linked to the publication "Recursive classification of satellite imaging time-series: An application to land cover mapping". In this paper, we introduce the recursive Bayesian classifier (RBC), which converts any instantaneous classifier into a robust online method through a probabilistic framework that is resilient to non-informative image variations. To reproduce the results presented in the paper, the RBC-SatImg folder and the code in the GitHub repository RBC-SatImg are required.

The RBC-SatImg folder contains:

Sentinel-2 time-series imagery from three key regions: Oroville Dam (CA, USA) and Charles River (Boston, MA, USA) for water mapping, and the Amazon Rainforest (Brazil) for deforestation detection.

The RBC-WatData dataset with manually generated water mapping labels for the Oroville Dam and Charles River regions. This dataset is well-suited for multitemporal land cover and water mapping research, as it accounts for the dynamic evolution of true class labels over time.

Pickle files with output to reproduce the results in the paper, including:

Instantaneous classification results for GMM, LR, SIC, WN, DWM

Posterior results obtained with the RBC framework

The Sentinel-2 images and forest labels used in the deforestation detection experiment for the Amazon Rainforest have been obtained from the MultiEarth Challenge dataset.

Folder Structure

The following paths can be changed in the configuration file from the GitHub repository as desired. The RBC-SatImg is organized as follows:

`./log/` (EMPTY): Default path for storing log files generated during code execution.

`./evaluation_results/`: Contains the results to reproduce the findings in the paper, including two sub-folders:

`./classification/`: For each test site, four sub-folders are included as:

`./accuracy/`: Each sub-folder corresponding to an experimental configuration contains pickle files with balanced classification accuracy results and information about the models. The default configuration used in the paper is "conf_00."

`./figures/`: Includes result figures from the manuscript in SVG format.

`./likelihoods/`: Contains pickle files with instantaneous classification results.

`./posteriors/`: Contains pickle files with posterior results generated by the RBC framework.

`./sensitivity_analysis/`: Contains sensitivity analysis results, organized by different test sites and epsilon values.

`./Sentinel2_data/`: Contains Sentinel-2 images used for training and evaluation, organized by scenarios (Oroville Dam, Charles River, Amazon Rainforest). Selected images have been filtered and processed as explained in the manuscript. The Amazon Rainforest images and labels have been obtained from the MultiEarth dataset, and consequently, the labels are included in this folder instead of the RBC-WatData folder.

`./RBC-WatData/`: Contains the water labels that we manually generated with the LabelStudio tool.
r
Amazon Prime Member Annual Spending Data 2019-2024
redstagfulfillment.com
html
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Red Stag Fulfillment (2025). Amazon Prime Member Annual Spending Data 2019-2024 [Dataset]. https://redstagfulfillment.com/average-annual-spend-of-an-amazon-prime-member/
Explore at:
htmlAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Red Stag Fulfillment
Time period covered
2019 - 2024
Area covered
United States
Variables measured
Prime Day spending averages, Annual Prime member spending, Demographic spending patterns, Annual non-Prime customer spending, Prime membership penetration rates
Description
Comprehensive dataset tracking Amazon Prime member spending patterns from 2019-2024, including comparison with non-Prime customers and demographic breakdowns
Z
PIPr: A Dataset of Public Infrastructure as Code Programs
data.niaid.nih.gov
zenodo.org
Updated Nov 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sokolowski, Daniel; Spielmann, David; Salvaneschi, Guido (2023). PIPr: A Dataset of Public Infrastructure as Code Programs [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8262770
Explore at:
Dataset updated
Nov 28, 2023
Dataset provided by
University of St. Gallen
Authors
Sokolowski, Daniel; Spielmann, David; Salvaneschi, Guido
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Programming Languages Infrastructure as Code (PL-IaC) enables IaC programs written in general-purpose programming languages like Python and TypeScript. The currently available PL-IaC solutions are Pulumi and the Cloud Development Kits (CDKs) of Amazon Web Services (AWS) and Terraform. This dataset provides metadata and initial analyses of all public GitHub repositories in August 2022 with an IaC program, including their programming languages, applied testing techniques, and licenses. Further, we provide a shallow copy of the head state of those 7104 repositories whose licenses permit redistribution. The dataset is available under the Open Data Commons Attribution License (ODC-By) v1.0. Contents:

metadata.zip: The dataset metadata and analysis results as CSV files. scripts-and-logs.zip: Scripts and logs of the dataset creation. LICENSE: The Open Data Commons Attribution License (ODC-By) v1.0 text. README.md: This document. redistributable-repositiories.zip: Shallow copies of the head state of all redistributable repositories with an IaC program. This artifact is part of the ProTI Infrastructure as Code testing project: https://proti-iac.github.io. Metadata The dataset's metadata comprises three tabular CSV files containing metadata about all analyzed repositories, IaC programs, and testing source code files. repositories.csv:

ID (integer): GitHub repository ID url (string): GitHub repository URL downloaded (boolean): Whether cloning the repository succeeded name (string): Repository name description (string): Repository description licenses (string, list of strings): Repository licenses redistributable (boolean): Whether the repository's licenses permit redistribution created (string, date & time): Time of the repository's creation updated (string, date & time): Time of the last update to the repository pushed (string, date & time): Time of the last push to the repository fork (boolean): Whether the repository is a fork forks (integer): Number of forks archive (boolean): Whether the repository is archived programs (string, list of strings): Project file path of each IaC program in the repository programs.csv:

ID (string): Project file path of the IaC program repository (integer): GitHub repository ID of the repository containing the IaC program directory (string): Path of the directory containing the IaC program's project file solution (string, enum): PL-IaC solution of the IaC program ("AWS CDK", "CDKTF", "Pulumi") language (string, enum): Programming language of the IaC program (enum values: "csharp", "go", "haskell", "java", "javascript", "python", "typescript", "yaml") name (string): IaC program name description (string): IaC program description runtime (string): Runtime string of the IaC program testing (string, list of enum): Testing techniques of the IaC program (enum values: "awscdk", "awscdk_assert", "awscdk_snapshot", "cdktf", "cdktf_snapshot", "cdktf_tf", "pulumi_crossguard", "pulumi_integration", "pulumi_unit", "pulumi_unit_mocking") tests (string, list of strings): File paths of IaC program's tests testing-files.csv:

file (string): Testing file path language (string, enum): Programming language of the testing file (enum values: "csharp", "go", "java", "javascript", "python", "typescript") techniques (string, list of enum): Testing techniques used in the testing file (enum values: "awscdk", "awscdk_assert", "awscdk_snapshot", "cdktf", "cdktf_snapshot", "cdktf_tf", "pulumi_crossguard", "pulumi_integration", "pulumi_unit", "pulumi_unit_mocking") keywords (string, list of enum): Keywords found in the testing file (enum values: "/go/auto", "/testing/integration", "@AfterAll", "@BeforeAll", "@Test", "@aws-cdk", "@aws-cdk/assert", "@pulumi.runtime.test", "@pulumi/", "@pulumi/policy", "@pulumi/pulumi/automation", "Amazon.CDK", "Amazon.CDK.Assertions", "Assertions_", "HashiCorp.Cdktf", "IMocks", "Moq", "NUnit", "PolicyPack(", "ProgramTest", "Pulumi", "Pulumi.Automation", "PulumiTest", "ResourceValidationArgs", "ResourceValidationPolicy", "SnapshotTest()", "StackValidationPolicy", "Testing", "Testing_ToBeValidTerraform(", "ToBeValidTerraform(", "Verifier.Verify(", "WithMocks(", "[Fact]", "[TestClass]", "[TestFixture]", "[TestMethod]", "[Test]", "afterAll(", "assertions", "automation", "aws-cdk-lib", "aws-cdk-lib/assert", "aws_cdk", "aws_cdk.assertions", "awscdk", "beforeAll(", "cdktf", "com.pulumi", "def test_", "describe(", "github.com/aws/aws-cdk-go/awscdk", "github.com/hashicorp/terraform-cdk-go/cdktf", "github.com/pulumi/pulumi", "integration", "junit", "pulumi", "pulumi.runtime.setMocks(", "pulumi.runtime.set_mocks(", "pulumi_policy", "pytest", "setMocks(", "set_mocks(", "snapshot", "software.amazon.awscdk.assertions", "stretchr", "test(", "testing", "toBeValidTerraform(", "toMatchInlineSnapshot(", "toMatchSnapshot(", "to_be_valid_terraform(", "unittest", "withMocks(") program (string): Project file path of the testing file's IaC program Dataset Creation scripts-and-logs.zip contains all scripts and logs of the creation of this dataset. In it, executions/executions.log documents the commands that generated this dataset in detail. On a high level, the dataset was created as follows:

A list of all repositories with a PL-IaC program configuration file was created using search-repositories.py (documented below). The execution took two weeks due to the non-deterministic nature of GitHub's REST API, causing excessive retries. A shallow copy of the head of all repositories was downloaded using download-repositories.py (documented below). Using analysis.ipynb, the repositories were analyzed for the programs' metadata, including the used programming languages and licenses. Based on the analysis, all repositories with at least one IaC program and a redistributable license were packaged into redistributable-repositiories.zip, excluding any node_modules and .git directories. Searching Repositories The repositories are searched through search-repositories.py and saved in a CSV file. The script takes these arguments in the following order:

Github access token. Name of the CSV output file. Filename to search for. File extensions to search for, separated by commas. Min file size for the search (for all files: 0). Max file size for the search or * for unlimited (for all files: *). Pulumi projects have a Pulumi.yaml or Pulumi.yml (case-sensitive file name) file in their root folder, i.e., (3) is Pulumi and (4) is yml,yaml. https://www.pulumi.com/docs/intro/concepts/project/ AWS CDK projects have a cdk.json (case-sensitive file name) file in their root folder, i.e., (3) is cdk and (4) is json. https://docs.aws.amazon.com/cdk/v2/guide/cli.html CDK for Terraform (CDKTF) projects have a cdktf.json (case-sensitive file name) file in their root folder, i.e., (3) is cdktf and (4) is json. https://www.terraform.io/cdktf/create-and-deploy/project-setup Limitations The script uses the GitHub code search API and inherits its limitations:

Only forks with more stars than the parent repository are included. Only the repositories' default branches are considered. Only files smaller than 384 KB are searchable. Only repositories with fewer than 500,000 files are considered. Only repositories that have had activity or have been returned in search results in the last year are considered. More details: https://docs.github.com/en/search-github/searching-on-github/searching-code The results of the GitHub code search API are not stable. However, the generally more robust GraphQL API does not support searching for files in repositories: https://stackoverflow.com/questions/45382069/search-for-code-in-github-using-graphql-v4-api Downloading Repositories download-repositories.py downloads all repositories in CSV files generated through search-respositories.py and generates an overview CSV file of the downloads. The script takes these arguments in the following order:

Name of the repositories CSV files generated through search-repositories.py, separated by commas. Output directory to download the repositories to. Name of the CSV output file. The script only downloads a shallow recursive copy of the HEAD of the repo, i.e., only the main branch's most recent state, including submodules, without the rest of the git history. Each repository is downloaded to a subfolder named by the repository's ID.
D
Related Data for COMET: Convolutional Dimension Interaction for...
researchdata.ntu.edu.sg
bin, rar
Updated Apr 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhuoyi Lin; Zhuoyi Lin (2022). Related Data for COMET: Convolutional Dimension Interaction for Collaborative Filtering [Dataset]. http://doi.org/10.21979/N9/TO2HBX
Explore at:
rar(17557), bin(2840027), bin(21443604), bin(130949)Available download formats
Unique identifier
https://doi.org/10.21979/N9/TO2HBX
Dataset updated
Apr 23, 2022
Dataset provided by
DR-NTU (Data)
Authors
Zhuoyi Lin; Zhuoyi Lin
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Latent factor models play a dominant role among recommendation techniques. However, most of the existing latent factor models assume both historical interactions and embedding dimensions are independent of each other, and thus regrettably ignore the high-order interaction information among historical interactions and embedding dimensions. In this paper, we propose a novel latent factor model called COMET (COnvolutional diMEnsion inTeraction), which simultaneously model the high-order interaction patterns among historical interactions and embedding dimensions. To be specific, COMET stacks the embeddings of historical interactions horizontally at first, which results in two "embedding maps". In this way, internal interactions and dimensional interactions can be exploited by convolutional neural networks with kernels of different sizes simultaneously. A fully-connected multi-layer perceptron is then applied to obtain two interaction vectors. Lastly, the representations of users and items are enriched by the learnt interaction vectors, which can further be used to produce the final prediction. Extensive experiments and ablation studies on various public implicit feedback datasets clearly demonstrate the effectiveness and the rationality of our proposed method.
LBA-ECO LC-09 Soil Composition and Structure in the Brazilian Amazon:...
data.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). LBA-ECO LC-09 Soil Composition and Structure in the Brazilian Amazon: 1992-1995 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/lba-eco-lc-09-soil-composition-and-structure-in-the-brazilian-amazon-1992-1995-14188
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Area covered
Amazon Rainforest
Description
This data set reports basic soil structure and composition information for five Amazonian research sites: Altamira, Bragantina, Tome-Acu, and Ponta de Pedras, all four in the state of Para, Brazil; and one site in Yapu, Colombia. Soil characteristics reported for all five study sites include cation information (e.g., H, Al, Mg, K, Na, S), percent of soil C, N, and organic matter, soil texture/composition and color, pH, and land use history. Soil bulk density and tons of carbon/ha are reported for only three of the study sites: Altamira, Bragantina, and Tome-Acu. All of the data are provided in one comma-separated data file.The five study areas represent characteristic differences in soil fertility and a range of land uses typical of the Amazon region. One of these areas, Altamira, is characterized by above average pH, nutrients, and texture. The other four areas are more typical of the 75 percent of the Amazon that is characterized by Oxisols and Ultisols, with well-drained but low pH and low levels of nutrients. Ponta de Pedras in Marajo Island, located in the estuary, is composed of upland Oxisols and floodplain alluvial soils. Igarape-Acu in the Bragantina region is characterized by both nutrient-poor Spodosols and Oxisols. Tome-Acu, south of Igarape-Acu, represents a mosaic of Oxisols and Ultisols. Yapu, in the Colombian Vaupes, is composed of patches of Spodosols and Oxisols. Three of the areas are colonization regions at various degrees of development: Altamira is a colonization front that opened up in 1971, whereas Tome-Acu was settled by a Japanese population in the 1930s, and Bragantina was settled in the early part of the twentieth century. Marajo (Ponta de Pedras) is the home of caboclos, whereas Yapu is home to Tukanoan Native American populations. In these study areas slash-and-burn cultivation as well as plantation agriculture and mechanized agriculture are employed. Length of fallows vary in these communities. The two indigenous areas leave their land in longer fallow than do the three colonization areas, and the proportion of land prepared from secondary forests increases with length of settlement as the stock of mature forest declines over time.

Facebook

Twitter

Click to copy link

Link copied

Cite

asaniczka (2023). UK Optimal Product Price Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/asaniczka/uk-optimal-product-price-prediction/data

UK Optimal Product Price Prediction Dataset

Scraped dataset from Amazon UK, with price prediction in focus

Explore at:

zip(122747870 bytes)Available download formats

Dataset updated

Nov 7, 2023

Authors

asaniczka

License

Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically

Area covered

United Kingdom

Description

This dataset contains product prices from Amazon UK, with a focus on price prediction. With a good amount of data on what price points sell the most, you can train machine learning models to predict the optimal price for a product based on its features and product name.

If you find this dataset useful, make sure to show your appreciation by upvoting! ❤️✨

Inspirations

This dataset is a superset of my Amazon UK product price dataset. Another inspiration is this competition that awareded 100K Prize Money

What To Do?

Your objective is to create a prediction model that will assist sellers in pricing their products within the optimal price range to generate the most sales.
The dataset includes various data points, such as the number of reviews, rating, best seller status, and items sold last month.
You can select specific factors (e.g., over 100 reviews = optimal price for the product) and then divide the dataset into products priced optimally vs products priced unoptimally.
By utilizing techniques like vectorizing product names and features, you can train a model to provide the optimal price for a product, which sellers or businesses might find valuable.

How to know if a product sells?

I would prefer to use the number of reviews as a metric to determine if a product sells. More reviews = more sales, right?
According to one source only 1-2% of buyers leave a review
So if we multiply the reviews for a product by 50x, then we would get a good understanding how many units has sold.
If we then multiple the product price by number of units sold, we'd get the total revenue generated by the product

How is this useful?

Sellers and businesses can leverage your model to determine the optimal price for their products, thereby maximizing sales.
Businesses can assess the profitability of a product and plan their supply chain accordingly.

Clear search

Close search

Google apps

Main menu

UK Optimal Product Price Prediction Dataset

Inspirations

What To Do?

How to know if a product sells?

How is this useful?

Global net revenue of Amazon 2014-2024, by product group

Amazon Question and Answer Data

USA Optimal Product Price Prediction Dataset

Inspirations

What To Do?

How to know if a product sells?

How is this useful?

Amazon review data 2018

Context

Acknowledgements

Amazon dataset for ERS-REFMMF

Satellite US Supply Chain Dataset Package (Amazon, Fedex, Walmart) +...

Canada Optimal Product Price Prediction Dataset

Inspirations

What To Do?

How to know if a product sells?

How is this useful?

amazon_us_reviews

Reddit Sentiment VS Stock Price

Data from: LBA-ECO LC-03 SAR Images, Land Cover, and Biomass, Four Areas...

Amazon Email Receipt Data | Consumer Transaction Data | Asia, EMEA, LATAM,...

Amazon Reviews

Data for Implementing Deep Soil and Dynamic Root Uptake in Noah-MP (v4.5):...

Data from: Amazon Rainforest Wildfires Rumor Detection

RBC-SatImg: Sentinel-2 Imagery and WatData Labels for Water Mapping

Data Description

Folder Structure

Amazon Prime Member Annual Spending Data 2019-2024

PIPr: A Dataset of Public Infrastructure as Code Programs

Related Data for COMET: Convolutional Dimension Interaction for...

LBA-ECO LC-09 Soil Composition and Structure in the Brazilian Amazon:...

UK Optimal Product Price Prediction Dataset

Scraped dataset from Amazon UK, with price prediction in focus

Inspirations

What To Do?

How to know if a product sells?

How is this useful?