Two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
Number of Instances: red wine - 1599; white wine - 4898
Input variables (based on physicochemical tests):
Output variable (based on sensory data):
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('wine_quality', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Italy and France are historically among the countries that produce the most prestigious wines worldwide. In Europe, these two countries together produce more than half of the wines classified under the Protected Designation of Origin (PDO) label, the strictest quality mark of food and wines in the European Union. Due to their long tradition in wine protection, Italy and France include highly detailed regulatory information in their wine PDO regulatory documents that are usually not available for other countries, such as specific information about the main cultivars that must be used to make each wine product or the related required planting density in the vineyards. However, this information is scattered throughout the documents of each wine production area and has never been extracted and homogenised in a unique dataset. Here, we present the first dataset that characterizes the PDO wines produced in Italy and France at very high detail based on the documents from the official EU geographical indication register. It includes, for each country, a standardized list of the PDO wine names, linked with their specific regulatory requirements, including the wine colour, type, cultivars used and maximum allowed yields. The unprecedent level of detail of this dataset allows for the first time the analysis of more than 5000 traditional wines and their legal and agronomic specifications. This gives insights into the interplay between the European Union quality regulation policy, the wine sector and agronomic practices, enabling researchers and practitioners to analyze wine production in the context of specific regulations or economic scenarios.
https://choosealicense.com/licenses/ecl-2.0/https://choosealicense.com/licenses/ecl-2.0/
Wine Quality 6k4
Contains the original (raw) and cleaned (processed) versions of the Wine Quality datasets (red and white). The raw files are the original semicolon-delimited CSVs and the processed files are cleaned, comma-delimited CSVs suitable for standard data tools and for uploading as a single Hugging Face dataset repository.
Columns (both red and white): fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH… See the full description on the dataset page: https://huggingface.co/datasets/mnemoraorg/wine-quality-6k4.
The Wine Quality data combines two benchmark data sets from UCI related to red and white wines.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
codesignal/wine-quality dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
part of the dataset supplied in https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009 https://archive.ics.uci.edu/ml/datasets/wine+quality
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Wine Dataset is derived from a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The dataset includes 13 attributes such as alcohol, malic acid, ash, and color intensity, providing a comprehensive overview for understanding wine characteristics and aiding in classification tasks.
2) Data Utilization (1) Wine data has characteristics that: • It includes detailed measurements of wine attributes, allowing for analysis of chemical composition, comparison between different wine types, and identification of patterns in wine quality and flavor profiles. (2) Wine data can be used to: • Wine Industry: Assists winemakers and analysts in understanding the chemical properties that influence wine quality, helping to improve production processes and quality control. • Research: Supports academic studies and the development of classification models for wine quality prediction and analysis.
This dataset was created by Vinod_0990
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cortez et al., 2009], http://www3.dsi.uminho.pt/pcortez/wine/).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Wine Quality’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/danielpanizzo/wine-quality on 30 September 2021.
--- Dataset description provided by original source is as follows ---
Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib
Title: Wine Quality
Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009
Past Usage:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
In the above reference, two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).
Relevant Information:
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
Number of Instances: red wine - 1599; white wine - 4898.
Number of Attributes: 11 + output attribute
Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection.
Attribute information:
For more information, read [Cortez et al., 2009].
Input variables (based on physicochemical tests): 1 - fixed acidity (tartaric acid - g / dm^3) 2 - volatile acidity (acetic acid - g / dm^3) 3 - citric acid (g / dm^3) 4 - residual sugar (g / dm^3) 5 - chlorides (sodium chloride - g / dm^3 6 - free sulfur dioxide (mg / dm^3) 7 - total sulfur dioxide (mg / dm^3) 8 - density (g / cm^3) 9 - pH 10 - sulphates (potassium sulphate - g / dm3) 11 - alcohol (% by volume) Output variable (based on sensory data): 12 - quality (score between 0 and 10)
Missing Attribute Values: None
Description of attributes:
1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)
2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste
3 - citric acid: found in small quantities, citric acid can add 'freshness' and flavor to wines
4 - residual sugar: the amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet
5 - chlorides: the amount of salt in the wine
6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine
7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine
8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content
9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale
10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant
11 - alcohol: the percent alcohol content of the wine
Output variable (based on sensory data): 12 - quality (score between 0 and 10)
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is an enhanced version of the Red Wine Quality dataset.
Modifications and additions include:
- ✅ A binary column is_high_quality
(1 if quality ≥ 6, else 0)
- ✅ A calculated column total_acidity
(sum of fixed, volatile, and citric acid)
- ✅ A new column user_comment
with static text
- ✅ One synthetic custom data row manually added
Perfect for experimenting with binary classification, feature engineering, and data enrichment.
Data Set Information:
The dataset was downloaded from the UCI Machine Learning Repository.
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. The reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
Two datasets were combined and few values were randomly removed.
Attribute Information:
For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)
Acknowledgements:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by rathi001
Released under CC0: Public Domain
Around **** million hectoliters of wine were tested in Rhenisch Hesse, making it the winegrowing region with the highest quality control output in 2023. These tests are official procedures conducted to determine whether the wine can be sold as a "quality" wine.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).
This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality , I just shared it to kaggle for convenience. (If I am mistaken and the public license type disallowed me from doing so, I will take this down if requested.)
For more information, read [Cortez et al., 2009].
Input variables (based on physicochemical tests):
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 - alcohol
Output variable (based on sensory data):
12 - quality (score between 0 and 10)
What might be an interesting thing to do, is aside from using regression modelling, is to set an arbitrary cutoff for your dependent variable (wine quality) at e.g. 7 or higher getting classified as 'good/1' and the remainder as 'not good/0'. This allows you to practice with hyper parameter tuning on e.g. decision tree algorithms looking at the ROC curve and the AUC value. Without doing any kind of feature engineering or overfitting you should be able to get an AUC of .88 (without even using random forest algorithm)
KNIME is a great tool (GUI) that can be used for this.
1 - File Reader (for csv) to linear correlation node and to interactive histogram for basic EDA.
2- File Reader to 'Rule Engine Node' to turn the 10 point scale to dichtome variable (good wine and rest), the code to put in the rule engine is something like this:
- $quality$ > 6.5 => "good"
- TRUE => "bad"
3- Rule Engine Node output to input of Column Filter node to filter out your original 10point feature (this prevent leaking)
4- Column Filter Node output to input of Partitioning Node (your standard train/tes split, e.g. 75%/25%, choose 'random' or 'stratified')
5- Partitioning Node train data split output to input of Train data split to input Decision Tree Learner node and
6- Partitioning Node test data split output to input Decision Tree predictor Node
7- Decision Tree learner Node output to input Decision Tree Node input
8- Decision Tree output to input ROC Node.. (here you can evaluate your model base on AUC value)
Use machine learning to determine which physiochemical properties make a wine 'good'!
This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality , I just shared it to kaggle for convenience. (I am mistaken and the public license type disallowed me from doing so, I will take this down at first request. I am not the owner of this dataset.
Please include this citation if you plan to use this database: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
This statistic shows the results of a survey on Brazilian wine quality perceptions among consumers in Brazil as of July 2018. At that point in time, a total of ** percent of Brazilian respondents perceived national wine as having either high or very high quality, while only **** percent considered it low or very low quality.
Around ***** million hectoliters of dry wine were tested in the Palatinate region, making it the wine growing area with the largest quality control output in 2023. These tests are official procedures conducted to determine whether the wine can be sold as a "quality" wine.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Wine production is a complex process from the vineyard to the winery. On this journey, microbes play a decisive role. From the environment where the vines grow, encompassing soil, topography, weather and climate through to management practices in vineyards, the microbes present can potentially change the composition of wine. Introduction of grapes into the winery and the start of winemaking processes modify microbial communities further. Recent advances in next-generation sequencing (NGS) technology have progressed our understanding of microbial communities associated with grapes and fermentations. We now have a finer appreciation of microbial diversity across wine producing regions to begin to understand how diversity can contribute to wine quality and style characteristics. In this review, we highlight literature surrounding wine-related microorganisms and how these affect factors interact with and shape microbial communities and contribute to wine quality. By discussing the geography, climate and soil of environments and viticulture and winemaking practices, we claim microbial biogeography as a new perspective to impact wine quality and regionality. Depending on geospatial scales, habitats, and taxa, the microbial community respond to local conditions. We discuss the effect of a changing climate on local conditions and how this may alter microbial diversity and thus wine style. With increasing understanding of microbial diversity and their effects on wine fermentation, wine production can be optimised with enhancing the expression of regional characteristics by understanding and managing the microbes present.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
wine csv data with columns
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
European Quality Wine Production by Country, 2023 Discover more data with ReportLinker!
Two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
Number of Instances: red wine - 1599; white wine - 4898
Input variables (based on physicochemical tests):
Output variable (based on sensory data):
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('wine_quality', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.