100+ datasets found

Spanish Wine Quality Dataset
kaggle.com
Updated Apr 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fedesoriano (2022). Spanish Wine Quality Dataset [Dataset]. https://www.kaggle.com/datasets/fedesoriano/spanish-wine-quality-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 26, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
fedesoriano
Description
Similar Datasets

CERN Proton Collision Dataset: LINK

Airfoil Self-Noise Dataset: LINK

CERN Electron Collision Data: LINK

Wind Speed Prediction Dataset: LINK

Stellar Classification Dataset - SDSS17: LINK

Context

This dataset is related to red variants of spanish wines. The dataset describes several popularity and description metrics their effect on it's quality. The datasets can be used for classification or regression tasks. The classes are ordered and not balanced (i.e. the quality goes from almost 5 to 4 points). The task is to predict either the quality of wine or the prices using the given data.

Content

The dataset contains 7500 different types of red wines from Spain with 11 features that describe their price, rating, and even some flavor description. The was collected by me using web scraping from different sources (from wine specialized pages to supermarkets). Please acknowledge the hard work to obtain and create this dataset, you can upvote it if you find it useful to use on your projects :)

If the dataset becomes popular I will probably try to create a bigger version with wines from other countries and a wider spectrum of ratings.

Attribute Information

winery: Winery name

wine: Name of the wine

year: Year in which the grapes were harvested

rating: Average rating given to the wine by the users [from 1-5]

num_reviews: Number of users that reviewed the wine

country: Country of origin [Spain]

region: Region of the wine

price: Price in euros [€]

type: Wine variety

body: Body score, defined as the richness and weight of the wine in your mouth [from 1-5]

acidity: Acidity score, defined as wine's “pucker” or tartness; it's what makes a wine refreshing and your tongue salivate and want another sip [from 1-5]

Citation Request

If you want to cite this data:

fedesoriano. (April 2022). Spanish Wine Quality Dataset. Retrieved [Date Retrieved] from https://www.kaggle.com/datasets/fedesoriano/spanish-wine-quality-dataset
T
wine_quality
tensorflow.org
beta.dataverse.org
+1more
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). wine_quality [Dataset]. https://www.tensorflow.org/datasets/catalog/wine_quality
Explore at:
Dataset updated
Nov 23, 2022
Description
Two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

Number of Instances: red wine - 1599; white wine - 4898

Input variables (based on physicochemical tests):

fixed acidity

volatile acidity

citric acid

residual sugar

chlorides

free sulfur dioxide

total sulfur dioxide

density

pH

sulphates

alcohol

Output variable (based on sensory data):

quality (score between 0 and 10)

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('wine_quality', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
A
‘Red and White Wine Quality Analysis’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Red and White Wine Quality Analysis’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-red-and-white-wine-quality-analysis-0938/d129fe93/?iid=005-800&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Red and White Wine Quality Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/saigeethac/red-and-white-wine-quality-datasets on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Wine Quality Data Set

This data set is available in UCI at https://archive.ics.uci.edu/ml/datasets/Wine+Quality.

Abstract: Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests.

Data Set Information:

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

Attribute Information:

Input variables (based on physicochemical tests):

fixed acidity

volatile acidity

citric acid

residual sugar

chlorides

free sulfur dioxide

total sulfur dioxide

density

pH

sulphates

alcohol

Output variable (based on sensory data):

quality (score between 0 and 10)

These columns have been described in the Kaggle Data Explorer.

Context

The authors state "we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods." We have briefly explored this aspect and see that Red wine quality prediction on the test and training datasets is almost the same (~88%) with just three features. Likewise White wine quality prediction appears to depend on just one feature. This may be due to the privacy and logistics issues mentioned by the dataset authors.

Content

Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. Both these datasets are analyzed and linear regression models are developed in Python 3. The github link provided for the source code also includes a Flask web application for deployment on the local machine or on Heroku.

Acknowledgements

Datasets: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

Banner Image: Photo by Roberta Sorge on Unsplash

Github Link

Complete code has been uploaded onto github at https://github.com/saigeethachandrashekar/wine_quality.

Please clone the repo - this contains both the datasets, the code required for building and saving the model on to your local system. Code for a Flask app is provided for deploying the models on your local machine. The app can also be deployed on Heroku - the requirements.txt and Procfile are also provided for this.

Next Steps

White wine quality prediction appears to depend on just one feature. This may be due to the privacy and logistics issues mentioned by the dataset authors (e.g. there is no data about grape types, wine brand, wine selling price, etc.) or it may be due to other factors that are not clear. This is an area that might be worth exploring further.

Other ML techniques may be applied to improve the accuracy.

--- Original source retains full ownership of the source dataset ---
Data from: Red wine DataSet
kaggle.com
Updated Aug 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suraj_kumar_Gupta (2023). Red wine DataSet [Dataset]. https://www.kaggle.com/datasets/soorajgupta7/red-wine-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Suraj_kumar_Gupta
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
Datasets Description:

The datasets under discussion pertain to the red and white variants of Portuguese "Vinho Verde" wine. Detailed information is available in the reference by Cortez et al. (2009). These datasets encompass physicochemical variables as inputs and sensory variables as outputs. Notably, specifics regarding grape types, wine brand, and selling prices are absent due to privacy and logistical concerns.

Classification and Regression Tasks: One can interpret these datasets as being suitable for both classification and regression analyses. The classes are ordered, albeit imbalanced. For instance, the dataset contains a more significant number of normal wines compared to excellent or poor ones.

Dataset Contents: For a comprehensive understanding, readers are encouraged to review the work by Cortez et al. (2009). The input variables, derived from physicochemical tests, include: 1. Fixed acidity 2. Volatile acidity 3. Citric acid 4. Residual sugar 5. Chlorides 6. Free sulfur dioxide 7. Total sulfur dioxide 8. Density 9. pH 10. Sulphates 11. Alcohol

The output variable, based on sensory data, is denoted by: 12. Quality (score ranging from 0 to 10)

Usage Tips: A practical suggestion involves setting a threshold for the dependent variable, defining wines with a quality score of 7 or higher as 'good/1' and the rest as 'not good/0.' This facilitates meaningful experimentation with hyperparameter tuning using decision tree algorithms and analyzing ROC curves and AUC values.

Operational Workflow: To efficiently utilize the dataset, the following steps are recommended: 1. Utilize a File Reader (for csv) to a linear correlation node and an interactive histogram for basic Exploratory Data Analysis (EDA). 2. Employ a File Reader to a Rule Engine Node for transforming the 10-point scale to a dichotomous variable indicating 'good wine' and 'rest.' 3. Implement a Rule Engine Node output to an input of Column Filter node to filter out the original 10-point feature, thus preventing data leakage. 4. Apply a Column Filter Node output to the input of Partitioning Node to execute a standard train/test split (e.g., 75%/25%, choosing 'random' or 'stratified'). 5. Feed the Partitioning Node train data split output into the input of Decision Tree Learner node. 6. Connect the Partitioning Node test data split output to the input of Decision Tree predictor Node. 7. Link the Decision Tree Learner Node output to the input of Decision Tree Node. 8. Finally, connect the Decision Tree output to the input of ROC Node for model evaluation based on the AUC value.

Tools and Acknowledgments: For an efficient analysis, consider using KNIME, a valuable graphical user interface (GUI) tool. Additionally, the dataset is available on the UCI machine learning repository, and proper acknowledgment and citation of the dataset source by Cortez et al. (2009) are essential for use.
f
Wine Quality Test
figshare.com
txt
Updated Jul 4, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepchecks Data (2022). Wine Quality Test [Dataset]. http://doi.org/10.6084/m9.figshare.20223318.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20223318.v1
Dataset updated
Jul 4, 2022
Dataset provided by
figshare
Authors
Deepchecks Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
part of the dataset supplied in https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009 https://archive.ics.uci.edu/ml/datasets/wine+quality
combined wine data
kaggle.com
Updated Nov 25, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siyuan H (2017). combined wine data [Dataset]. https://www.kaggle.com/datasets/siyuanh/combined-wine-data/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Siyuan H
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib

Title: Wine Quality Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009 Past Usage:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

In the above reference, two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure). Relevant Information:

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods. Number of Instances: red wine - 1599; white wine - 4898. Number of Attributes: 11 + output attribute

Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection. Attribute information:

For more information, read [Cortez et al., 2009].

Input variables (based on physicochemical tests): 1 - fixed acidity (tartaric acid - g / dm^3) 2 - volatile acidity (acetic acid - g / dm^3) 3 - citric acid (g / dm^3) 4 - residual sugar (g / dm^3) 5 - chlorides (sodium chloride - g / dm^3 6 - free sulfur dioxide (mg / dm^3) 7 - total sulfur dioxide (mg / dm^3) 8 - density (g / cm^3) 9 - pH 10 - sulphates (potassium sulphate - g / dm3) 11 - alcohol (% by volume) Output variable (based on sensory data): 12 - quality (score between 0 and 10) Missing Attribute Values: None Description of attributes:

1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)

2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste

3 - citric acid: found in small quantities, citric acid can add 'freshness' and flavor to wines

4 - residual sugar: the amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet

5 - chlorides: the amount of salt in the wine

6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine

7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine

8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content

9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale

10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant

11 - alcohol: the percent alcohol content of the wine

Output variable (based on sensory data): 12 - quality (score between 0 and 10)
A
‘Wine Quality Classification’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Wine Quality Classification’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-wine-quality-classification-e4ab/cddb1083/
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Wine Quality Classification’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/nareshbhat/wine-quality-binary-classification on 28 January 2022.

--- Dataset description provided by original source is as follows ---

This Data set contains the information related red wine , Various factors affecting the quality. This data set was prepossessed and downloaded from the UCI Machine Learning Repository. This data set was simple, cleaned, practice data set for classification modelling. Source of this Dataset: https://archive.ics.uci.edu/ml/datasets/wine+quality

Attribute Information: Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality ('good' and 'bad' based on score >5 and <5)

--- Original source retains full ownership of the source dataset ---
Wine_Test Prediction | 1600 data | yashaswi
kaggle.com
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayushman Yashaswi (2025). Wine_Test Prediction | 1600 data | yashaswi [Dataset]. https://www.kaggle.com/datasets/ayushmanyashaswi/wine-test-prediction-1600-data-yashaswi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 19, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ayushman Yashaswi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Sure! Here's the updated Kaggle dataset description with your data visualization work included:

📊 Wine Quality - Red Wine Dataset

This dataset contains physicochemical attributes of red variants of Portuguese "Vinho Verde" wine, along with their quality score (rated between 0 to 10). The goal is to predict wine quality using various classification models based on the chemical properties of the wine.

🧪 Features Overview (12 columns):

fixed acidity: most acids involved with wine are fixed/nonvolatile

volatile acidity: amount of acetic acid (can affect taste)

citric acid: adds freshness and flavor

residual sugar: sugar left after fermentation

chlorides: salt content

free sulfur dioxide: protects wine from microbes

total sulfur dioxide: total SO₂ content

density: wine density

pH: acidity level

sulphates: preservative and antimicrobial

alcohol: alcohol percentage

quality (target): wine quality score (0–10)

🤖 Model Performance Summary:

Multiple machine learning models were trained to predict wine quality. The following accuracy scores were observed:

Model Training Accuracy Testing Accuracy
Logistic Regression 87.91% 87.0%
Random Forest 100% 94.0%
Decision Tree 100% 88.5%
Support Vector Machine (SVM) 86.41% 86.5%

📈 Data Visualization:

A comparison plot of model performance was created to visually represent the accuracy of each algorithm. This helps in understanding which models generalized well and which ones may have overfit to the training data.

📁 File Info:

Filename: winequality-red.csv

Size: ~100 KB

Rows: 1,599

Columns: 12

📌 Ideal For:

Classification model evaluation

Feature correlation analysis

EDA and visualization

ML model tuning and comparison
A
‘White Wine Quality’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘White Wine Quality’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-white-wine-quality-9dda/1b0598e7/?iid=002-239&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘White Wine Quality’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/piyushagni5/white-wine-quality on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, refer to [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

Content

For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)

Acknowledgements

This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality, to get both the dataset i.e. red and white vinho verde wine samples, from the north of Portugal, please visit the above link.

Please include this citation if you plan to use this database:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

Inspiration

We kagglers can apply several machine-learning algorithms to determine which physiochemical properties make a wine 'good'!

Relevant papers

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

--- Original source retains full ownership of the source dataset ---
Wine Quality - Dataset - U-M Biostat Datastore
ckan-demo.bio.sph.umich.edu
Updated Jan 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan-demo.bio.sph.umich.edu (2025). Wine Quality - Dataset - U-M Biostat Datastore [Dataset]. https://ckan-demo.bio.sph.umich.edu/gl_ES/dataset/wine-quality
Explore at:
Dataset updated
Jan 6, 2025
Dataset provided by
CKANhttps://ckan.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cortez et al., 2009], http://www3.dsi.uminho.pt/pcortez/wine/).
A
‘Wine Quality’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Wine Quality’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-wine-quality-0ce9/d646f556/?iid=004-175&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Wine Quality’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rajyellow46/wine-quality on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Data Set Information:

The dataset was downloaded from the UCI Machine Learning Repository.

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. The reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

Two datasets were combined and few values were randomly removed.

Attribute Information:

For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)

Acknowledgements:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

--- Original source retains full ownership of the source dataset ---
Wine Quality Model
figshare.com
bin
Updated Jul 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepchecks Data (2022). Wine Quality Model [Dataset]. http://doi.org/10.6084/m9.figshare.20223369.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20223369.v1
Dataset updated
Jul 4, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Deepchecks Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A random forest model trained on part of the dataset supplied in https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009 https://archive.ics.uci.edu/ml/datasets/wine+quality
Wine Quality - red or white?
kaggle.com
Updated Feb 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZiheTonyXu (2018). Wine Quality - red or white? [Dataset]. https://www.kaggle.com/xuzihe2010/wine-quality-red/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ZiheTonyXu
Description
Feature introduction:

Fixed acidity: acids are major wine properties and contribute greatly to the wine’s taste. Usually, the total acidity is divided into two groups: the volatile acids and the nonvolatile or fixed acids. Among the fixed acids that you can find in wines are the following: tartaric, malic, citric, and succinic. This variable is expressed in g(tartaricacidtartaricacid)/dm3dm3 in the data sets.

Volatile acidity: the volatile acidity is basically the process of wine turning into vinegar. In the U.S, the legal limits of Volatile Acidity are 1.2 g/L for red table wine and 1.1 g/L for white table wine. In these data sets, the volatile acidity is expressed in g(aceticacidaceticacid)/dm3dm3.

Citric acid is one of the fixed acids that you’ll find in wines. It’s expressed in g/dm3dm3 in the two data sets. Residual sugar typically refers to the sugar remaining after fermentation stops, or is stopped. It’s expressed in g/dm3dm3 in the red and white data.

Chlorides can be a major contributor to saltiness in wine. Here, you’ll see that it’s expressed in g(sodiumchloridesodiumchloride)/dm3dm3.

Free sulfur dioxide: the part of the sulphur dioxide that is added to a wine and that is lost into it is said to be bound, while the active part is said to be free. Winemaker will always try to get the highest proportion of free sulphur to bind. This variables is expressed in mg/dm3dm3 in the data.

Total sulfur dioxide is the sum of the bound and the free sulfur dioxide (SO2). Here, it’s expressed in mg/dm3dm3. There are legal limits for sulfur levels in wines: in the EU, red wines can only have 160mg/L, while white and rose wines can have about 210mg/L. Sweet wines are allowed to have 400mg/L. For the US, the legal limits are set at 350mg/L and for Australia, this is 250mg/L.

Density is generally used as a measure of the conversion of sugar to alcohol. Here, it’s expressed in g/cm3cm3. pH or the potential of hydrogen is a numeric scale to specify the acidity or basicity the wine. As you might know, solutions with a pH less than 7 are acidic, while solutions with a pH greater than 7 are basic. With a pH of 7, pure water is neutral. Most wines have a pH between 2.9 and 3.9 and are therefore acidic.

Sulphates are to wine as gluten is to food. You might already know sulphites from the headaches that they can cause. They are a regular part of the winemaking around the world and are considered necessary. In this case, they are expressed in g(potassiumsulphatepotassiumsulphate)/dm3dm3.

Alcohol: wine is an alcoholic beverage and as you know, the percentage of alcohol can vary from wine to wine. It shouldn’t surprised that this variable is inclued in the data sets, where it’s expressed in % vol.

Quality: wine experts graded the wine quality between 0 (very bad) and 10 (very excellent). The eventual number is the median of at least three evaluations made by those same wine experts.
A
‘Wine Quality’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Wine Quality’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-wine-quality-cdf0/latest
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Wine Quality’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/danielpanizzo/wine-quality on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib

Title: Wine Quality

Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009

Past Usage:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

In the above reference, two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).

Relevant Information:

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

Number of Instances: red wine - 1599; white wine - 4898.

Number of Attributes: 11 + output attribute

Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection.

Attribute information:

For more information, read [Cortez et al., 2009].

Input variables (based on physicochemical tests): 1 - fixed acidity (tartaric acid - g / dm^3) 2 - volatile acidity (acetic acid - g / dm^3) 3 - citric acid (g / dm^3) 4 - residual sugar (g / dm^3) 5 - chlorides (sodium chloride - g / dm^3 6 - free sulfur dioxide (mg / dm^3) 7 - total sulfur dioxide (mg / dm^3) 8 - density (g / cm^3) 9 - pH 10 - sulphates (potassium sulphate - g / dm3) 11 - alcohol (% by volume) Output variable (based on sensory data): 12 - quality (score between 0 and 10)

Missing Attribute Values: None

Description of attributes:

1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)

2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste

3 - citric acid: found in small quantities, citric acid can add 'freshness' and flavor to wines

4 - residual sugar: the amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet

5 - chlorides: the amount of salt in the wine

6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine

7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine

8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content

9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale

10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant

11 - alcohol: the percent alcohol content of the wine

Output variable (based on sensory data): 12 - quality (score between 0 and 10)

--- Original source retains full ownership of the source dataset ---
Data from: Red Wine Quality
kaggle.com
Updated Apr 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naveed Noor (2021). Red Wine Quality [Dataset]. https://www.kaggle.com/datasets/naveedpy1/red-wine-quality/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 1, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Naveed Noor
Description
Dataset

This dataset was created by Naveed Noor

Contents
Data from: Red Wine Quality
kaggle.com
Updated Jan 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Varghese (2024). Red Wine Quality [Dataset]. https://www.kaggle.com/datasets/benvarghese/red-wine-quality/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Benjamin Varghese
Description
Dataset

This dataset was created by Benjamin Varghese

Contents
C
China CN: Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml:...
ceicdata.com
Updated Dec 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2019). China CN: Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: High-quality [Dataset]. https://www.ceicdata.com/en/china/price-monitoring-center-ndrc-36-city-monthly-avg-retail-price-consumer-goods/cn-retail-price-36-city-avg-dry-red-wine-12-degree-750ml-highquality
Explore at:
Dataset updated
Dec 15, 2020
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 1, 2024 - Jan 1, 2025
Area covered
China
Variables measured
Domestic Trade Price
Description
China Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: High-quality data was reported at 287.690 RMB/Bottle in Mar 2025. This records an increase from the previous number of 286.150 RMB/Bottle for Feb 2025. China Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: High-quality data is updated monthly, averaging 360.000 RMB/Bottle from Jan 2012 (Median) to Mar 2025, with 159 observations. The data reached an all-time high of 630.000 RMB/Bottle in Apr 2012 and a record low of 280.970 RMB/Bottle in Nov 2023. China Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: High-quality data remains active status in CEIC and is reported by Price Monitoring Center, NDRC. The data is categorized under China Premium Database’s Price – Table CN.PA: Price Monitoring Center, NDRC: 36 City Monthly Avg: Retail Price: Consumer Goods.
Red Wine Quality Prediction pjt
kaggle.com
zip
Updated Jun 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HamnaKhalid (2021). Red Wine Quality Prediction pjt [Dataset]. https://www.kaggle.com/hkhamnakhalid/red-wine-quality-prediction-pjt
Explore at:
zip(26176 bytes)Available download formats
Dataset updated
Jun 5, 2021
Authors
HamnaKhalid
Description
Dataset

This dataset was created by HamnaKhalid

Contents
C
China CN: Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml:...
ceicdata.com
Updated Dec 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2020). China CN: Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: Low-quality [Dataset]. https://www.ceicdata.com/en/china/price-monitoring-center-ndrc-36-city-monthly-avg-retail-price-consumer-goods/cn-retail-price-36-city-avg-dry-red-wine-12-degree-750ml-lowquality
Explore at:
Dataset updated
Dec 15, 2020
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 1, 2024 - Jan 1, 2025
Area covered
China
Variables measured
Domestic Trade Price
Description
China Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: Low-quality data was reported at 77.540 RMB/Bottle in Mar 2025. This records a decrease from the previous number of 79.180 RMB/Bottle for Feb 2025. China Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: Low-quality data is updated monthly, averaging 71.760 RMB/Bottle from Jan 2012 (Median) to Mar 2025, with 159 observations. The data reached an all-time high of 90.100 RMB/Bottle in Apr 2020 and a record low of 59.610 RMB/Bottle in Aug 2014. China Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: Low-quality data remains active status in CEIC and is reported by Price Monitoring Center, NDRC. The data is categorized under China Premium Database’s Price – Table CN.PA: Price Monitoring Center, NDRC: 36 City Monthly Avg: Retail Price: Consumer Goods.
c
WineQualityDataset
cubig.ai
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). WineQualityDataset [Dataset]. https://cubig.ai/store/products/490/winequalitydataset
Explore at:
Dataset updated
Jun 22, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Wine_Quality_Data Dataset is a structured dataset that includes various chemical properties of wine such as acidity, sugar content, pH level, and alcohol concentration, along with a quality score (ranging from 3 to 9) and color information (red or white).

2) Data Utilization (1) Characteristics of the Wine_Quality_Data Dataset: • This dataset is designed for developing models that assess and classify wine quality, making it suitable for analyzing chemical composition and solving classification problems related to product quality. • Each sample contains chemical measurements of the wine (e.g., acidity, sugar, pH, alcohol), and the quality column provides a multi-class label representing the wine's quality score on a scale from 3 to 9.

(2) Applications of the Wine_Quality_Data Dataset: • Wine quality prediction model training: The dataset can be used to train classification or regression models that predict wine quality scores based on various chemical attributes.

Model	Training Accuracy	Testing Accuracy
Logistic Regression	87.91%	87.0%
Random Forest	100%	94.0%
Decision Tree	100%	88.5%
Support Vector Machine (SVM)	86.41%	86.5%

Facebook

Twitter

Click to copy link

Link copied

Cite

fedesoriano (2022). Spanish Wine Quality Dataset [Dataset]. https://www.kaggle.com/datasets/fedesoriano/spanish-wine-quality-dataset

Spanish Wine Quality Dataset

Model wine quality based on reviews and description

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 26, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

fedesoriano

Description

Similar Datasets

CERN Proton Collision Dataset: LINK
Airfoil Self-Noise Dataset: LINK
CERN Electron Collision Data: LINK
Wind Speed Prediction Dataset: LINK
Stellar Classification Dataset - SDSS17: LINK

Context

This dataset is related to red variants of spanish wines. The dataset describes several popularity and description metrics their effect on it's quality. The datasets can be used for classification or regression tasks. The classes are ordered and not balanced (i.e. the quality goes from almost 5 to 4 points). The task is to predict either the quality of wine or the prices using the given data.

Content

The dataset contains 7500 different types of red wines from Spain with 11 features that describe their price, rating, and even some flavor description. The was collected by me using web scraping from different sources (from wine specialized pages to supermarkets). Please acknowledge the hard work to obtain and create this dataset, you can upvote it if you find it useful to use on your projects :)

If the dataset becomes popular I will probably try to create a bigger version with wines from other countries and a wider spectrum of ratings.

Attribute Information

winery: Winery name
wine: Name of the wine
year: Year in which the grapes were harvested
rating: Average rating given to the wine by the users [from 1-5]
num_reviews: Number of users that reviewed the wine
country: Country of origin [Spain]
region: Region of the wine
price: Price in euros [€]
type: Wine variety
body: Body score, defined as the richness and weight of the wine in your mouth [from 1-5]
acidity: Acidity score, defined as wine's “pucker” or tartness; it's what makes a wine refreshing and your tongue salivate and want another sip [from 1-5]

Citation Request

If you want to cite this data:

fedesoriano. (April 2022). Spanish Wine Quality Dataset. Retrieved [Date Retrieved] from https://www.kaggle.com/datasets/fedesoriano/spanish-wine-quality-dataset

Clear search

Close search

Google apps

Main menu

Spanish Wine Quality Dataset

Similar Datasets

Context

Content

Attribute Information

Citation Request

wine_quality

‘Red and White Wine Quality Analysis’ analyzed by Analyst-2

Wine Quality Data Set

Data Set Information:

Attribute Information:

Context

Content

Acknowledgements

Github Link

Next Steps

Data from: Red wine DataSet

Wine Quality Test

combined wine data

‘Wine Quality Classification’ analyzed by Analyst-2

Wine_Test Prediction | 1600 data | yashaswi

📊 Wine Quality - Red Wine Dataset

🧪 Features Overview (12 columns):

🤖 Model Performance Summary:

📈 Data Visualization:

📁 File Info:

📌 Ideal For:

‘White Wine Quality’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Relevant papers

Wine Quality - Dataset - U-M Biostat Datastore

‘Wine Quality’ analyzed by Analyst-2

Wine Quality Model

Wine Quality - red or white?

‘Wine Quality’ analyzed by Analyst-2

Data from: Red Wine Quality

Dataset

Contents

Data from: Red Wine Quality

Dataset

Contents

China CN: Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml:...

Red Wine Quality Prediction pjt

Dataset

Contents

China CN: Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml:...

WineQualityDataset

Spanish Wine Quality Dataset

Model wine quality based on reviews and description

Similar Datasets

Context

Content

Attribute Information

Citation Request