8 datasets found

wine reviews_small.csv
kaggle.com
Updated Aug 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sailesh S (2021). wine reviews_small.csv [Dataset]. https://www.kaggle.com/datasets/sailesh07/wine-reviews-smallcsv/suggestions?status=pending&yourSuggestions=true
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sailesh S
Description
Dataset

This dataset was created by Sailesh S

Contents
Data from: Red wine DataSet
kaggle.com
Updated Aug 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suraj_kumar_Gupta (2023). Red wine DataSet [Dataset]. https://www.kaggle.com/datasets/soorajgupta7/red-wine-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Suraj_kumar_Gupta
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
Datasets Description:

The datasets under discussion pertain to the red and white variants of Portuguese "Vinho Verde" wine. Detailed information is available in the reference by Cortez et al. (2009). These datasets encompass physicochemical variables as inputs and sensory variables as outputs. Notably, specifics regarding grape types, wine brand, and selling prices are absent due to privacy and logistical concerns.

Classification and Regression Tasks: One can interpret these datasets as being suitable for both classification and regression analyses. The classes are ordered, albeit imbalanced. For instance, the dataset contains a more significant number of normal wines compared to excellent or poor ones.

Dataset Contents: For a comprehensive understanding, readers are encouraged to review the work by Cortez et al. (2009). The input variables, derived from physicochemical tests, include: 1. Fixed acidity 2. Volatile acidity 3. Citric acid 4. Residual sugar 5. Chlorides 6. Free sulfur dioxide 7. Total sulfur dioxide 8. Density 9. pH 10. Sulphates 11. Alcohol

The output variable, based on sensory data, is denoted by: 12. Quality (score ranging from 0 to 10)

Usage Tips: A practical suggestion involves setting a threshold for the dependent variable, defining wines with a quality score of 7 or higher as 'good/1' and the rest as 'not good/0.' This facilitates meaningful experimentation with hyperparameter tuning using decision tree algorithms and analyzing ROC curves and AUC values.

Operational Workflow: To efficiently utilize the dataset, the following steps are recommended: 1. Utilize a File Reader (for csv) to a linear correlation node and an interactive histogram for basic Exploratory Data Analysis (EDA). 2. Employ a File Reader to a Rule Engine Node for transforming the 10-point scale to a dichotomous variable indicating 'good wine' and 'rest.' 3. Implement a Rule Engine Node output to an input of Column Filter node to filter out the original 10-point feature, thus preventing data leakage. 4. Apply a Column Filter Node output to the input of Partitioning Node to execute a standard train/test split (e.g., 75%/25%, choosing 'random' or 'stratified'). 5. Feed the Partitioning Node train data split output into the input of Decision Tree Learner node. 6. Connect the Partitioning Node test data split output to the input of Decision Tree predictor Node. 7. Link the Decision Tree Learner Node output to the input of Decision Tree Node. 8. Finally, connect the Decision Tree output to the input of ROC Node for model evaluation based on the AUC value.

Tools and Acknowledgments: For an efficient analysis, consider using KNIME, a valuable graphical user interface (GUI) tool. Additionally, the dataset is available on the UCI machine learning repository, and proper acknowledgment and citation of the dataset source by Cortez et al. (2009) are essential for use.
T
wine_quality
tensorflow.org
beta.dataverse.org
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). wine_quality [Dataset]. https://www.tensorflow.org/datasets/catalog/wine_quality
Explore at:
Dataset updated
Nov 23, 2022
Description
Two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

Number of Instances: red wine - 1599; white wine - 4898

Input variables (based on physicochemical tests):

fixed acidity

volatile acidity

citric acid

residual sugar

chlorides

free sulfur dioxide

total sulfur dioxide

density

pH

sulphates

alcohol

Output variable (based on sensory data):

quality (score between 0 and 10)

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('wine_quality', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
White Wine Quality Dataset
kaggle.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumit R Washimkar (2025). White Wine Quality Dataset [Dataset]. https://www.kaggle.com/datasets/sumit17125/wine-quality-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 1, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sumit R Washimkar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
White Wine Quality Dataset

Introduction

This dataset contains physicochemical properties of white wine samples. The goal is to analyze how these features influence the quality of wine. It can be used for exploratory data analysis, statistical modeling, and machine learning tasks such as regression and classification.

Dataset Information

The dataset consists of multiple white wine samples with their respective chemical compositions. Each row represents a different wine sample, and the columns correspond to specific properties that impact its taste and quality.

Columns Description

Fixed Acidity: Concentration of non-volatile acids (e.g., tartaric acid) in g/dm³.

Volatile Acidity: Amount of acetic acid in g/dm³, which can affect the wine’s aroma and taste. High levels can lead to an unpleasant vinegar-like taste.

Citric Acid: Presence of citric acid in g/dm³, which adds freshness and flavor to the wine.

Residual Sugar: The amount of sugar remaining after fermentation, measured in g/dm³. Affects the wine's sweetness.

Chlorides: Amount of salt (sodium chloride) in the wine, measured in g/dm³. Higher values can negatively affect taste.

Free Sulfur: The level of free sulfur dioxide (SO₂), which acts as an antioxidant and antimicrobial agent, helping preserve the wine’s freshness.

Possible Use Cases

Exploratory Data Analysis (EDA): Understanding the distribution and correlation between wine features.

Wine Quality Prediction: Using machine learning models to predict wine quality based on physicochemical attributes.

Feature Importance Analysis: Identifying which features have the most impact on wine quality.

Acknowledgments

This dataset is inspired by wine composition studies and can be used for educational and research purposes.
Wine Dataset Classification Results.csv
kaggle.com
Updated Feb 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leila Carey (2024). Wine Dataset Classification Results.csv [Dataset]. https://www.kaggle.com/datasets/leilacarey/wine1-csv/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 14, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Leila Carey
Description
Dataset

This dataset was created by Leila Carey

Contents
White Wine Data
kaggle.com
Updated Feb 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sudheesh R (2023). White Wine Data [Dataset]. https://www.kaggle.com/datasets/sudheeshr/white-wine-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sudheesh R
Description
The data consist of variants of the Portuguese Vinho-Verde wine and has 1599 observations of Red wine and 4898 observations of White wine. For each, we have the wine quality (scored between 0 and 10) and eleven chemical attributes (quantitative), which are as follows: Fixed acidity, Volatile acidity, Citric acid, Residual sugar, Chlorides, Free sulfur dioxide, Total sulfur dioxide, Density, PH, Sulphates, and Alcohol

Fixed acidity - Most acids involved wine or fixed or nonvolatile Volatile acidity - The number of acetic acids in wine which at too high of levels can lead to an unpleasant, vinegar taste Citric acid - Can be found in small quantities, add freshness and the flavor to the wine Residual sugar - The amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1g/L and wines with greater than 45g/L considered as sweet. Chlorides - The amount of salt in the wine Free sulfur dioxide - The free form of sulfur dioxide that is not bound to other molecules, and is used to calculate molecular sulfur dioxide Total sulfur dioxide - The amount of free and bound forms of sulfur dioxide Density - The density of water is close to that of water depending on the percent of alcohol and the sugar PH - Describe how acidic or basic a wine in on a scale from 0 to 14 Sulfates - A wine additive which can contribute to sulfur dioxide gas levels, which act as an antimicrobial and antioxidant Alcohol - The percent alcohol content of the wine
Iris Species
kaggle.com
zip
Updated Sep 27, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI Machine Learning (2016). Iris Species [Dataset]. https://www.kaggle.com/datasets/uciml/iris
Explore at:
zip(3687 bytes)Available download formats
Dataset updated
Sep 27, 2016
Dataset authored and provided by
UCI Machine Learning
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.

It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

The columns in this dataset are:

Id

SepalLengthCm

SepalWidthCm

PetalLengthCm

PetalWidthCm

Species
Customer Segmentation : Clustering
kaggle.com
Updated Jan 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishakh Patel (2024). Customer Segmentation : Clustering [Dataset]. https://www.kaggle.com/datasets/vishakhdapat/customer-segmentation-clustering
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 13, 2024
Dataset provided by
Kaggle
Authors
Vishakh Patel
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Customer Personality Analysis involves a thorough examination of a company's optimal customer profiles. This analysis facilitates a deeper understanding of customers, enabling businesses to tailor products to meet the distinct needs, behaviors, and concerns of various customer types.

By conducting a Customer Personality Analysis, businesses can refine their products based on the preferences of specific customer segments. Rather than allocating resources to market a new product to the entire customer database, companies can identify the segments most likely to be interested in the product. Subsequently, targeted marketing efforts can be directed toward those particular segments, optimizing resource utilization and increasing the likelihood of successful product adoption.

Details of Features are as below:

Id: Unique identifier for each individual in the dataset.

Year_Birth: The birth year of the individual.

Education: The highest level of education attained by the individual.

Marital_Status: The marital status of the individual.

Income: The annual income of the individual.

Kidhome: The number of young children in the household.

Teenhome: The number of teenagers in the household.

Dt_Customer: The date when the customer was first enrolled or became a part of the company's database.

Recency: The number of days since the last purchase or interaction.

MntWines: The amount spent on wines.

MntFruits: The amount spent on fruits.

MntMeatProducts: The amount spent on meat products.

MntFishProducts: The amount spent on fish products.

MntSweetProducts: The amount spent on sweet products.

MntGoldProds: The amount spent on gold products.

NumDealsPurchases: The number of purchases made with a discount or as part of a deal.

NumWebPurchases: The number of purchases made through the company's website.

NumCatalogPurchases: The number of purchases made through catalogs.

NumStorePurchases: The number of purchases made in physical stores.

NumWebVisitsMonth: The number of visits to the company's website in a month.

AcceptedCmp3: Binary indicator (1 or 0) whether the individual accepted the third marketing campaign.

AcceptedCmp4: Binary indicator (1 or 0) whether the individual accepted the fourth marketing campaign.

AcceptedCmp5: Binary indicator (1 or 0) whether the individual accepted the fifth marketing campaign.

AcceptedCmp1: Binary indicator (1 or 0) whether the individual accepted the first marketing campaign.

AcceptedCmp2: Binary indicator (1 or 0) whether the individual accepted the second marketing campaign.

Complain: Binary indicator (1 or 0) whether the individual has made a complaint.

Z_CostContact: A constant cost associated with contacting a customer.

Z_Revenue: A constant revenue associated with a successful campaign response.

Response: Binary indicator (1 or 0) whether the individual responded to the marketing campaign.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sailesh S (2021). wine reviews_small.csv [Dataset]. https://www.kaggle.com/datasets/sailesh07/wine-reviews-smallcsv/suggestions?status=pending&yourSuggestions=true

wine reviews_small.csv

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 12, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Sailesh S

Description

Dataset

This dataset was created by Sailesh S

Clear search

Close search

Google apps

Main menu

wine reviews_small.csv

Dataset

Contents

Data from: Red wine DataSet

wine_quality

White Wine Quality Dataset

White Wine Quality Dataset

Introduction

Dataset Information

Columns Description

Possible Use Cases

Acknowledgments

Wine Dataset Classification Results.csv

Dataset

Contents

White Wine Data

Iris Species

Customer Segmentation : Clustering

wine reviews_small.csv

Dataset

Contents