100+ datasets found
  1. Spanish Wine Quality Dataset

    • kaggle.com
    Updated Apr 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fedesoriano (2022). Spanish Wine Quality Dataset [Dataset]. https://www.kaggle.com/datasets/fedesoriano/spanish-wine-quality-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 26, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    fedesoriano
    Description

    Similar Datasets

    • CERN Proton Collision Dataset: LINK
    • Airfoil Self-Noise Dataset: LINK
    • CERN Electron Collision Data: LINK
    • Wind Speed Prediction Dataset: LINK
    • Stellar Classification Dataset - SDSS17: LINK

    Context

    This dataset is related to red variants of spanish wines. The dataset describes several popularity and description metrics their effect on it's quality. The datasets can be used for classification or regression tasks. The classes are ordered and not balanced (i.e. the quality goes from almost 5 to 4 points). The task is to predict either the quality of wine or the prices using the given data.

    Content

    The dataset contains 7500 different types of red wines from Spain with 11 features that describe their price, rating, and even some flavor description. The was collected by me using web scraping from different sources (from wine specialized pages to supermarkets). Please acknowledge the hard work to obtain and create this dataset, you can upvote it if you find it useful to use on your projects :)

    If the dataset becomes popular I will probably try to create a bigger version with wines from other countries and a wider spectrum of ratings.

    Attribute Information

    1. winery: Winery name
    2. wine: Name of the wine
    3. year: Year in which the grapes were harvested
    4. rating: Average rating given to the wine by the users [from 1-5]
    5. num_reviews: Number of users that reviewed the wine
    6. country: Country of origin [Spain]
    7. region: Region of the wine
    8. price: Price in euros [€]
    9. type: Wine variety
    10. body: Body score, defined as the richness and weight of the wine in your mouth [from 1-5]
    11. acidity: Acidity score, defined as wine's “pucker” or tartness; it's what makes a wine refreshing and your tongue salivate and want another sip [from 1-5]

    Citation Request

    If you want to cite this data:

    fedesoriano. (April 2022). Spanish Wine Quality Dataset. Retrieved [Date Retrieved] from https://www.kaggle.com/datasets/fedesoriano/spanish-wine-quality-dataset

  2. T

    wine_quality

    • tensorflow.org
    • beta.dataverse.org
    • +1more
    Updated Nov 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). wine_quality [Dataset]. https://www.tensorflow.org/datasets/catalog/wine_quality
    Explore at:
    Dataset updated
    Nov 23, 2022
    Description

    Two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).

    The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

    Number of Instances: red wine - 1599; white wine - 4898

    Input variables (based on physicochemical tests):

    1. fixed acidity
    2. volatile acidity
    3. citric acid
    4. residual sugar
    5. chlorides
    6. free sulfur dioxide
    7. total sulfur dioxide
    8. density
    9. pH
    10. sulphates
    11. alcohol

    Output variable (based on sensory data):

    1. quality (score between 0 and 10)

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('wine_quality', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  3. A

    ‘Red and White Wine Quality Analysis’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Red and White Wine Quality Analysis’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-red-and-white-wine-quality-analysis-0938/d129fe93/?iid=005-800&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Red and White Wine Quality Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/saigeethac/red-and-white-wine-quality-datasets on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Wine Quality Data Set

    This data set is available in UCI at https://archive.ics.uci.edu/ml/datasets/Wine+Quality.

    Abstract: Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests.

    Data Set Information:

    The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

    These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

    Attribute Information:

    Input variables (based on physicochemical tests):

    1. fixed acidity
    2. volatile acidity
    3. citric acid
    4. residual sugar
    5. chlorides
    6. free sulfur dioxide
    7. total sulfur dioxide
    8. density
    9. pH
    10. sulphates
    11. alcohol

    Output variable (based on sensory data):

    1. quality (score between 0 and 10)

    These columns have been described in the Kaggle Data Explorer.

    Context

    The authors state "we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods." We have briefly explored this aspect and see that Red wine quality prediction on the test and training datasets is almost the same (~88%) with just three features. Likewise White wine quality prediction appears to depend on just one feature. This may be due to the privacy and logistics issues mentioned by the dataset authors.

    Content

    Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. Both these datasets are analyzed and linear regression models are developed in Python 3. The github link provided for the source code also includes a Flask web application for deployment on the local machine or on Heroku.

    Acknowledgements

    Datasets: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

    Banner Image: Photo by Roberta Sorge on Unsplash

    Github Link

    Complete code has been uploaded onto github at https://github.com/saigeethachandrashekar/wine_quality.

    Please clone the repo - this contains both the datasets, the code required for building and saving the model on to your local system. Code for a Flask app is provided for deploying the models on your local machine. The app can also be deployed on Heroku - the requirements.txt and Procfile are also provided for this.

    Next Steps

    1. White wine quality prediction appears to depend on just one feature. This may be due to the privacy and logistics issues mentioned by the dataset authors (e.g. there is no data about grape types, wine brand, wine selling price, etc.) or it may be due to other factors that are not clear. This is an area that might be worth exploring further.

    2. Other ML techniques may be applied to improve the accuracy.

    --- Original source retains full ownership of the source dataset ---

  4. Data from: Red wine DataSet

    • kaggle.com
    Updated Aug 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suraj_kumar_Gupta (2023). Red wine DataSet [Dataset]. https://www.kaggle.com/datasets/soorajgupta7/red-wine-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Suraj_kumar_Gupta
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Datasets Description:

    The datasets under discussion pertain to the red and white variants of Portuguese "Vinho Verde" wine. Detailed information is available in the reference by Cortez et al. (2009). These datasets encompass physicochemical variables as inputs and sensory variables as outputs. Notably, specifics regarding grape types, wine brand, and selling prices are absent due to privacy and logistical concerns.

    Classification and Regression Tasks: One can interpret these datasets as being suitable for both classification and regression analyses. The classes are ordered, albeit imbalanced. For instance, the dataset contains a more significant number of normal wines compared to excellent or poor ones.

    Dataset Contents: For a comprehensive understanding, readers are encouraged to review the work by Cortez et al. (2009). The input variables, derived from physicochemical tests, include: 1. Fixed acidity 2. Volatile acidity 3. Citric acid 4. Residual sugar 5. Chlorides 6. Free sulfur dioxide 7. Total sulfur dioxide 8. Density 9. pH 10. Sulphates 11. Alcohol

    The output variable, based on sensory data, is denoted by: 12. Quality (score ranging from 0 to 10)

    Usage Tips: A practical suggestion involves setting a threshold for the dependent variable, defining wines with a quality score of 7 or higher as 'good/1' and the rest as 'not good/0.' This facilitates meaningful experimentation with hyperparameter tuning using decision tree algorithms and analyzing ROC curves and AUC values.

    Operational Workflow: To efficiently utilize the dataset, the following steps are recommended: 1. Utilize a File Reader (for csv) to a linear correlation node and an interactive histogram for basic Exploratory Data Analysis (EDA). 2. Employ a File Reader to a Rule Engine Node for transforming the 10-point scale to a dichotomous variable indicating 'good wine' and 'rest.' 3. Implement a Rule Engine Node output to an input of Column Filter node to filter out the original 10-point feature, thus preventing data leakage. 4. Apply a Column Filter Node output to the input of Partitioning Node to execute a standard train/test split (e.g., 75%/25%, choosing 'random' or 'stratified'). 5. Feed the Partitioning Node train data split output into the input of Decision Tree Learner node. 6. Connect the Partitioning Node test data split output to the input of Decision Tree predictor Node. 7. Link the Decision Tree Learner Node output to the input of Decision Tree Node. 8. Finally, connect the Decision Tree output to the input of ROC Node for model evaluation based on the AUC value.

    Tools and Acknowledgments: For an efficient analysis, consider using KNIME, a valuable graphical user interface (GUI) tool. Additionally, the dataset is available on the UCI machine learning repository, and proper acknowledgment and citation of the dataset source by Cortez et al. (2009) are essential for use.

  5. f

    Wine Quality Test

    • figshare.com
    txt
    Updated Jul 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepchecks Data (2022). Wine Quality Test [Dataset]. http://doi.org/10.6084/m9.figshare.20223318.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 4, 2022
    Dataset provided by
    figshare
    Authors
    Deepchecks Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
  6. combined wine data

    • kaggle.com
    Updated Nov 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siyuan H (2017). combined wine data [Dataset]. https://www.kaggle.com/datasets/siyuanh/combined-wine-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Siyuan H
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database:

    P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

    Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib

    Title: Wine Quality Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009 Past Usage:

    P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

    In the above reference, two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure). Relevant Information:

    The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

    These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods. Number of Instances: red wine - 1599; white wine - 4898. Number of Attributes: 11 + output attribute

    Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection. Attribute information:

    For more information, read [Cortez et al., 2009].

    Input variables (based on physicochemical tests): 1 - fixed acidity (tartaric acid - g / dm^3) 2 - volatile acidity (acetic acid - g / dm^3) 3 - citric acid (g / dm^3) 4 - residual sugar (g / dm^3) 5 - chlorides (sodium chloride - g / dm^3 6 - free sulfur dioxide (mg / dm^3) 7 - total sulfur dioxide (mg / dm^3) 8 - density (g / cm^3) 9 - pH 10 - sulphates (potassium sulphate - g / dm3) 11 - alcohol (% by volume) Output variable (based on sensory data): 12 - quality (score between 0 and 10) Missing Attribute Values: None Description of attributes:

    1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)

    2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste

    3 - citric acid: found in small quantities, citric acid can add 'freshness' and flavor to wines

    4 - residual sugar: the amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet

    5 - chlorides: the amount of salt in the wine

    6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine

    7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine

    8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content

    9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale

    10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant

    11 - alcohol: the percent alcohol content of the wine

    Output variable (based on sensory data): 12 - quality (score between 0 and 10)

  7. A

    ‘Wine Quality Classification’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Wine Quality Classification’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-wine-quality-classification-e4ab/cddb1083/
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Wine Quality Classification’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/nareshbhat/wine-quality-binary-classification on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    This Data set contains the information related red wine , Various factors affecting the quality. This data set was prepossessed and downloaded from the UCI Machine Learning Repository. This data set was simple, cleaned, practice data set for classification modelling. Source of this Dataset: https://archive.ics.uci.edu/ml/datasets/wine+quality

    Attribute Information: Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality ('good' and 'bad' based on score >5 and <5)

    --- Original source retains full ownership of the source dataset ---

  8. Wine_Test Prediction | 1600 data | yashaswi

    • kaggle.com
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayushman Yashaswi (2025). Wine_Test Prediction | 1600 data | yashaswi [Dataset]. https://www.kaggle.com/datasets/ayushmanyashaswi/wine-test-prediction-1600-data-yashaswi
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ayushman Yashaswi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Sure! Here's the updated Kaggle dataset description with your data visualization work included:

    📊 Wine Quality - Red Wine Dataset

    This dataset contains physicochemical attributes of red variants of Portuguese "Vinho Verde" wine, along with their quality score (rated between 0 to 10). The goal is to predict wine quality using various classification models based on the chemical properties of the wine.

    🧪 Features Overview (12 columns):

    • fixed acidity: most acids involved with wine are fixed/nonvolatile
    • volatile acidity: amount of acetic acid (can affect taste)
    • citric acid: adds freshness and flavor
    • residual sugar: sugar left after fermentation
    • chlorides: salt content
    • free sulfur dioxide: protects wine from microbes
    • total sulfur dioxide: total SO₂ content
    • density: wine density
    • pH: acidity level
    • sulphates: preservative and antimicrobial
    • alcohol: alcohol percentage
    • quality (target): wine quality score (0–10)

    🤖 Model Performance Summary:

    Multiple machine learning models were trained to predict wine quality. The following accuracy scores were observed:

    ModelTraining AccuracyTesting Accuracy
    Logistic Regression87.91%87.0%
    Random Forest100%94.0%
    Decision Tree100%88.5%
    Support Vector Machine (SVM)86.41%86.5%

    📈 Data Visualization:

    A comparison plot of model performance was created to visually represent the accuracy of each algorithm. This helps in understanding which models generalized well and which ones may have overfit to the training data.

    📁 File Info:

    • Filename: winequality-red.csv
    • Size: ~100 KB
    • Rows: 1,599
    • Columns: 12

    📌 Ideal For:

    • Classification model evaluation
    • Feature correlation analysis
    • EDA and visualization
    • ML model tuning and comparison
  9. A

    ‘White Wine Quality’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘White Wine Quality’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-white-wine-quality-9dda/1b0598e7/?iid=002-239&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘White Wine Quality’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/piyushagni5/white-wine-quality on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, refer to [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

    These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

    Content

    For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)

    Acknowledgements

    This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality, to get both the dataset i.e. red and white vinho verde wine samples, from the north of Portugal, please visit the above link.

    Please include this citation if you plan to use this database:

    P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

    Inspiration

    We kagglers can apply several machine-learning algorithms to determine which physiochemical properties make a wine 'good'!

    Relevant papers

    P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

    --- Original source retains full ownership of the source dataset ---

  10. Wine Quality - Dataset - U-M Biostat Datastore

    • ckan-demo.bio.sph.umich.edu
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan-demo.bio.sph.umich.edu (2025). Wine Quality - Dataset - U-M Biostat Datastore [Dataset]. https://ckan-demo.bio.sph.umich.edu/gl_ES/dataset/wine-quality
    Explore at:
    Dataset updated
    Jan 6, 2025
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cortez et al., 2009], http://www3.dsi.uminho.pt/pcortez/wine/).

  11. A

    ‘Wine Quality’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Wine Quality’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-wine-quality-0ce9/d646f556/?iid=004-175&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Wine Quality’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rajyellow46/wine-quality on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Data Set Information:

    The dataset was downloaded from the UCI Machine Learning Repository.

    The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. The reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

    These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

    Two datasets were combined and few values were randomly removed.

    Attribute Information:

    For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)

    Acknowledgements:

    P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

    --- Original source retains full ownership of the source dataset ---

  12. Wine Quality Model

    • figshare.com
    bin
    Updated Jul 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepchecks Data (2022). Wine Quality Model [Dataset]. http://doi.org/10.6084/m9.figshare.20223369.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 4, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Deepchecks Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
  13. Wine Quality - red or white?

    • kaggle.com
    Updated Feb 3, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZiheTonyXu (2018). Wine Quality - red or white? [Dataset]. https://www.kaggle.com/xuzihe2010/wine-quality-red/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 3, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ZiheTonyXu
    Description

    Feature introduction:

    Fixed acidity: acids are major wine properties and contribute greatly to the wine’s taste. Usually, the total acidity is divided into two groups: the volatile acids and the nonvolatile or fixed acids. Among the fixed acids that you can find in wines are the following: tartaric, malic, citric, and succinic. This variable is expressed in g(tartaricacidtartaricacid)/dm3dm3 in the data sets.

    Volatile acidity: the volatile acidity is basically the process of wine turning into vinegar. In the U.S, the legal limits of Volatile Acidity are 1.2 g/L for red table wine and 1.1 g/L for white table wine. In these data sets, the volatile acidity is expressed in g(aceticacidaceticacid)/dm3dm3.

    Citric acid is one of the fixed acids that you’ll find in wines. It’s expressed in g/dm3dm3 in the two data sets. Residual sugar typically refers to the sugar remaining after fermentation stops, or is stopped. It’s expressed in g/dm3dm3 in the red and white data.

    Chlorides can be a major contributor to saltiness in wine. Here, you’ll see that it’s expressed in g(sodiumchloridesodiumchloride)/dm3dm3.

    Free sulfur dioxide: the part of the sulphur dioxide that is added to a wine and that is lost into it is said to be bound, while the active part is said to be free. Winemaker will always try to get the highest proportion of free sulphur to bind. This variables is expressed in mg/dm3dm3 in the data.

    Total sulfur dioxide is the sum of the bound and the free sulfur dioxide (SO2). Here, it’s expressed in mg/dm3dm3. There are legal limits for sulfur levels in wines: in the EU, red wines can only have 160mg/L, while white and rose wines can have about 210mg/L. Sweet wines are allowed to have 400mg/L. For the US, the legal limits are set at 350mg/L and for Australia, this is 250mg/L.

    Density is generally used as a measure of the conversion of sugar to alcohol. Here, it’s expressed in g/cm3cm3. pH or the potential of hydrogen is a numeric scale to specify the acidity or basicity the wine. As you might know, solutions with a pH less than 7 are acidic, while solutions with a pH greater than 7 are basic. With a pH of 7, pure water is neutral. Most wines have a pH between 2.9 and 3.9 and are therefore acidic.

    Sulphates are to wine as gluten is to food. You might already know sulphites from the headaches that they can cause. They are a regular part of the winemaking around the world and are considered necessary. In this case, they are expressed in g(potassiumsulphatepotassiumsulphate)/dm3dm3.

    Alcohol: wine is an alcoholic beverage and as you know, the percentage of alcohol can vary from wine to wine. It shouldn’t surprised that this variable is inclued in the data sets, where it’s expressed in % vol.

    Quality: wine experts graded the wine quality between 0 (very bad) and 10 (very excellent). The eventual number is the median of at least three evaluations made by those same wine experts.

  14. A

    ‘Wine Quality’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Wine Quality’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-wine-quality-cdf0/latest
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Wine Quality’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/danielpanizzo/wine-quality on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database:

    P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

    Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib

    1. Title: Wine Quality

    2. Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009

    3. Past Usage:

      P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

      In the above reference, two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).

    4. Relevant Information:

      The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

      These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

    5. Number of Instances: red wine - 1599; white wine - 4898.

    6. Number of Attributes: 11 + output attribute

      Note: several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection.

    7. Attribute information:

      For more information, read [Cortez et al., 2009].

      Input variables (based on physicochemical tests): 1 - fixed acidity (tartaric acid - g / dm^3) 2 - volatile acidity (acetic acid - g / dm^3) 3 - citric acid (g / dm^3) 4 - residual sugar (g / dm^3) 5 - chlorides (sodium chloride - g / dm^3 6 - free sulfur dioxide (mg / dm^3) 7 - total sulfur dioxide (mg / dm^3) 8 - density (g / cm^3) 9 - pH 10 - sulphates (potassium sulphate - g / dm3) 11 - alcohol (% by volume) Output variable (based on sensory data): 12 - quality (score between 0 and 10)

    8. Missing Attribute Values: None

    9. Description of attributes:

      1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)

      2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste

      3 - citric acid: found in small quantities, citric acid can add 'freshness' and flavor to wines

      4 - residual sugar: the amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet

      5 - chlorides: the amount of salt in the wine

      6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine

      7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine

      8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content

      9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale

      10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant

      11 - alcohol: the percent alcohol content of the wine

      Output variable (based on sensory data): 12 - quality (score between 0 and 10)

    --- Original source retains full ownership of the source dataset ---

  15. Data from: Red Wine Quality

    • kaggle.com
    Updated Apr 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naveed Noor (2021). Red Wine Quality [Dataset]. https://www.kaggle.com/datasets/naveedpy1/red-wine-quality/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Naveed Noor
    Description

    Dataset

    This dataset was created by Naveed Noor

    Contents

  16. Data from: Red Wine Quality

    • kaggle.com
    Updated Jan 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Varghese (2024). Red Wine Quality [Dataset]. https://www.kaggle.com/datasets/benvarghese/red-wine-quality/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 2, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Benjamin Varghese
    Description

    Dataset

    This dataset was created by Benjamin Varghese

    Contents

  17. C

    China CN: Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml:...

    • ceicdata.com
    Updated Dec 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2019). China CN: Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: High-quality [Dataset]. https://www.ceicdata.com/en/china/price-monitoring-center-ndrc-36-city-monthly-avg-retail-price-consumer-goods/cn-retail-price-36-city-avg-dry-red-wine-12-degree-750ml-highquality
    Explore at:
    Dataset updated
    Dec 15, 2020
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Feb 1, 2024 - Jan 1, 2025
    Area covered
    China
    Variables measured
    Domestic Trade Price
    Description

    China Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: High-quality data was reported at 287.690 RMB/Bottle in Mar 2025. This records an increase from the previous number of 286.150 RMB/Bottle for Feb 2025. China Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: High-quality data is updated monthly, averaging 360.000 RMB/Bottle from Jan 2012 (Median) to Mar 2025, with 159 observations. The data reached an all-time high of 630.000 RMB/Bottle in Apr 2012 and a record low of 280.970 RMB/Bottle in Nov 2023. China Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: High-quality data remains active status in CEIC and is reported by Price Monitoring Center, NDRC. The data is categorized under China Premium Database’s Price – Table CN.PA: Price Monitoring Center, NDRC: 36 City Monthly Avg: Retail Price: Consumer Goods.

  18. Red Wine Quality Prediction pjt

    • kaggle.com
    zip
    Updated Jun 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HamnaKhalid (2021). Red Wine Quality Prediction pjt [Dataset]. https://www.kaggle.com/hkhamnakhalid/red-wine-quality-prediction-pjt
    Explore at:
    zip(26176 bytes)Available download formats
    Dataset updated
    Jun 5, 2021
    Authors
    HamnaKhalid
    Description

    Dataset

    This dataset was created by HamnaKhalid

    Contents

  19. C

    China CN: Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml:...

    • ceicdata.com
    Updated Dec 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2020). China CN: Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: Low-quality [Dataset]. https://www.ceicdata.com/en/china/price-monitoring-center-ndrc-36-city-monthly-avg-retail-price-consumer-goods/cn-retail-price-36-city-avg-dry-red-wine-12-degree-750ml-lowquality
    Explore at:
    Dataset updated
    Dec 15, 2020
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Feb 1, 2024 - Jan 1, 2025
    Area covered
    China
    Variables measured
    Domestic Trade Price
    Description

    China Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: Low-quality data was reported at 77.540 RMB/Bottle in Mar 2025. This records a decrease from the previous number of 79.180 RMB/Bottle for Feb 2025. China Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: Low-quality data is updated monthly, averaging 71.760 RMB/Bottle from Jan 2012 (Median) to Mar 2025, with 159 observations. The data reached an all-time high of 90.100 RMB/Bottle in Apr 2020 and a record low of 59.610 RMB/Bottle in Aug 2014. China Retail Price: 36 City Avg: Dry Red Wine: 12 Degree: 750ml: Low-quality data remains active status in CEIC and is reported by Price Monitoring Center, NDRC. The data is categorized under China Premium Database’s Price – Table CN.PA: Price Monitoring Center, NDRC: 36 City Monthly Avg: Retail Price: Consumer Goods.

  20. c

    WineQualityDataset

    • cubig.ai
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). WineQualityDataset [Dataset]. https://cubig.ai/store/products/490/winequalitydataset
    Explore at:
    Dataset updated
    Jun 22, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Wine_Quality_Data Dataset is a structured dataset that includes various chemical properties of wine such as acidity, sugar content, pH level, and alcohol concentration, along with a quality score (ranging from 3 to 9) and color information (red or white).

    2) Data Utilization (1) Characteristics of the Wine_Quality_Data Dataset: • This dataset is designed for developing models that assess and classify wine quality, making it suitable for analyzing chemical composition and solving classification problems related to product quality. • Each sample contains chemical measurements of the wine (e.g., acidity, sugar, pH, alcohol), and the quality column provides a multi-class label representing the wine's quality score on a scale from 3 to 9.

    (2) Applications of the Wine_Quality_Data Dataset: • Wine quality prediction model training: The dataset can be used to train classification or regression models that predict wine quality scores based on various chemical attributes.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
fedesoriano (2022). Spanish Wine Quality Dataset [Dataset]. https://www.kaggle.com/datasets/fedesoriano/spanish-wine-quality-dataset
Organization logo

Spanish Wine Quality Dataset

Model wine quality based on reviews and description

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 26, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
fedesoriano
Description

Similar Datasets

  • CERN Proton Collision Dataset: LINK
  • Airfoil Self-Noise Dataset: LINK
  • CERN Electron Collision Data: LINK
  • Wind Speed Prediction Dataset: LINK
  • Stellar Classification Dataset - SDSS17: LINK

Context

This dataset is related to red variants of spanish wines. The dataset describes several popularity and description metrics their effect on it's quality. The datasets can be used for classification or regression tasks. The classes are ordered and not balanced (i.e. the quality goes from almost 5 to 4 points). The task is to predict either the quality of wine or the prices using the given data.

Content

The dataset contains 7500 different types of red wines from Spain with 11 features that describe their price, rating, and even some flavor description. The was collected by me using web scraping from different sources (from wine specialized pages to supermarkets). Please acknowledge the hard work to obtain and create this dataset, you can upvote it if you find it useful to use on your projects :)

If the dataset becomes popular I will probably try to create a bigger version with wines from other countries and a wider spectrum of ratings.

Attribute Information

  1. winery: Winery name
  2. wine: Name of the wine
  3. year: Year in which the grapes were harvested
  4. rating: Average rating given to the wine by the users [from 1-5]
  5. num_reviews: Number of users that reviewed the wine
  6. country: Country of origin [Spain]
  7. region: Region of the wine
  8. price: Price in euros [€]
  9. type: Wine variety
  10. body: Body score, defined as the richness and weight of the wine in your mouth [from 1-5]
  11. acidity: Acidity score, defined as wine's “pucker” or tartness; it's what makes a wine refreshing and your tongue salivate and want another sip [from 1-5]

Citation Request

If you want to cite this data:

fedesoriano. (April 2022). Spanish Wine Quality Dataset. Retrieved [Date Retrieved] from https://www.kaggle.com/datasets/fedesoriano/spanish-wine-quality-dataset

Search
Clear search
Close search
Google apps
Main menu