8 datasets found
  1. wine reviews_small.csv

    • kaggle.com
    Updated Aug 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sailesh S (2021). wine reviews_small.csv [Dataset]. https://www.kaggle.com/datasets/sailesh07/wine-reviews-smallcsv/suggestions?status=pending&yourSuggestions=true
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 12, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sailesh S
    Description

    Dataset

    This dataset was created by Sailesh S

    Contents

  2. Data from: Red wine DataSet

    • kaggle.com
    Updated Aug 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suraj_kumar_Gupta (2023). Red wine DataSet [Dataset]. https://www.kaggle.com/datasets/soorajgupta7/red-wine-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Suraj_kumar_Gupta
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Datasets Description:

    The datasets under discussion pertain to the red and white variants of Portuguese "Vinho Verde" wine. Detailed information is available in the reference by Cortez et al. (2009). These datasets encompass physicochemical variables as inputs and sensory variables as outputs. Notably, specifics regarding grape types, wine brand, and selling prices are absent due to privacy and logistical concerns.

    Classification and Regression Tasks: One can interpret these datasets as being suitable for both classification and regression analyses. The classes are ordered, albeit imbalanced. For instance, the dataset contains a more significant number of normal wines compared to excellent or poor ones.

    Dataset Contents: For a comprehensive understanding, readers are encouraged to review the work by Cortez et al. (2009). The input variables, derived from physicochemical tests, include: 1. Fixed acidity 2. Volatile acidity 3. Citric acid 4. Residual sugar 5. Chlorides 6. Free sulfur dioxide 7. Total sulfur dioxide 8. Density 9. pH 10. Sulphates 11. Alcohol

    The output variable, based on sensory data, is denoted by: 12. Quality (score ranging from 0 to 10)

    Usage Tips: A practical suggestion involves setting a threshold for the dependent variable, defining wines with a quality score of 7 or higher as 'good/1' and the rest as 'not good/0.' This facilitates meaningful experimentation with hyperparameter tuning using decision tree algorithms and analyzing ROC curves and AUC values.

    Operational Workflow: To efficiently utilize the dataset, the following steps are recommended: 1. Utilize a File Reader (for csv) to a linear correlation node and an interactive histogram for basic Exploratory Data Analysis (EDA). 2. Employ a File Reader to a Rule Engine Node for transforming the 10-point scale to a dichotomous variable indicating 'good wine' and 'rest.' 3. Implement a Rule Engine Node output to an input of Column Filter node to filter out the original 10-point feature, thus preventing data leakage. 4. Apply a Column Filter Node output to the input of Partitioning Node to execute a standard train/test split (e.g., 75%/25%, choosing 'random' or 'stratified'). 5. Feed the Partitioning Node train data split output into the input of Decision Tree Learner node. 6. Connect the Partitioning Node test data split output to the input of Decision Tree predictor Node. 7. Link the Decision Tree Learner Node output to the input of Decision Tree Node. 8. Finally, connect the Decision Tree output to the input of ROC Node for model evaluation based on the AUC value.

    Tools and Acknowledgments: For an efficient analysis, consider using KNIME, a valuable graphical user interface (GUI) tool. Additionally, the dataset is available on the UCI machine learning repository, and proper acknowledgment and citation of the dataset source by Cortez et al. (2009) are essential for use.

  3. T

    wine_quality

    • tensorflow.org
    • beta.dataverse.org
    Updated Nov 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). wine_quality [Dataset]. https://www.tensorflow.org/datasets/catalog/wine_quality
    Explore at:
    Dataset updated
    Nov 23, 2022
    Description

    Two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).

    The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

    Number of Instances: red wine - 1599; white wine - 4898

    Input variables (based on physicochemical tests):

    1. fixed acidity
    2. volatile acidity
    3. citric acid
    4. residual sugar
    5. chlorides
    6. free sulfur dioxide
    7. total sulfur dioxide
    8. density
    9. pH
    10. sulphates
    11. alcohol

    Output variable (based on sensory data):

    1. quality (score between 0 and 10)

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('wine_quality', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  4. White Wine Quality Dataset

    • kaggle.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumit R Washimkar (2025). White Wine Quality Dataset [Dataset]. https://www.kaggle.com/datasets/sumit17125/wine-quality-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sumit R Washimkar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    White Wine Quality Dataset

    Introduction

    This dataset contains physicochemical properties of white wine samples. The goal is to analyze how these features influence the quality of wine. It can be used for exploratory data analysis, statistical modeling, and machine learning tasks such as regression and classification.

    Dataset Information

    The dataset consists of multiple white wine samples with their respective chemical compositions. Each row represents a different wine sample, and the columns correspond to specific properties that impact its taste and quality.

    Columns Description

    • Fixed Acidity: Concentration of non-volatile acids (e.g., tartaric acid) in g/dm³.
    • Volatile Acidity: Amount of acetic acid in g/dm³, which can affect the wine’s aroma and taste. High levels can lead to an unpleasant vinegar-like taste.
    • Citric Acid: Presence of citric acid in g/dm³, which adds freshness and flavor to the wine.
    • Residual Sugar: The amount of sugar remaining after fermentation, measured in g/dm³. Affects the wine's sweetness.
    • Chlorides: Amount of salt (sodium chloride) in the wine, measured in g/dm³. Higher values can negatively affect taste.
    • Free Sulfur: The level of free sulfur dioxide (SO₂), which acts as an antioxidant and antimicrobial agent, helping preserve the wine’s freshness.

    Possible Use Cases

    • Exploratory Data Analysis (EDA): Understanding the distribution and correlation between wine features.
    • Wine Quality Prediction: Using machine learning models to predict wine quality based on physicochemical attributes.
    • Feature Importance Analysis: Identifying which features have the most impact on wine quality.

    Acknowledgments

    This dataset is inspired by wine composition studies and can be used for educational and research purposes.

  5. Wine Dataset Classification Results.csv

    • kaggle.com
    Updated Feb 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leila Carey (2024). Wine Dataset Classification Results.csv [Dataset]. https://www.kaggle.com/datasets/leilacarey/wine1-csv/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Leila Carey
    Description

    Dataset

    This dataset was created by Leila Carey

    Contents

  6. White Wine Data

    • kaggle.com
    Updated Feb 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sudheesh R (2023). White Wine Data [Dataset]. https://www.kaggle.com/datasets/sudheeshr/white-wine-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sudheesh R
    Description

    The data consist of variants of the Portuguese Vinho-Verde wine and has 1599 observations of Red wine and 4898 observations of White wine. For each, we have the wine quality (scored between 0 and 10) and eleven chemical attributes (quantitative), which are as follows: Fixed acidity, Volatile acidity, Citric acid, Residual sugar, Chlorides, Free sulfur dioxide, Total sulfur dioxide, Density, PH, Sulphates, and Alcohol

    Fixed acidity - Most acids involved wine or fixed or nonvolatile Volatile acidity - The number of acetic acids in wine which at too high of levels can lead to an unpleasant, vinegar taste Citric acid - Can be found in small quantities, add freshness and the flavor to the wine Residual sugar - The amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1g/L and wines with greater than 45g/L considered as sweet. Chlorides - The amount of salt in the wine Free sulfur dioxide - The free form of sulfur dioxide that is not bound to other molecules, and is used to calculate molecular sulfur dioxide Total sulfur dioxide - The amount of free and bound forms of sulfur dioxide Density - The density of water is close to that of water depending on the percent of alcohol and the sugar PH - Describe how acidic or basic a wine in on a scale from 0 to 14 Sulfates - A wine additive which can contribute to sulfur dioxide gas levels, which act as an antimicrobial and antioxidant Alcohol - The percent alcohol content of the wine

  7. Iris Species

    • kaggle.com
    zip
    Updated Sep 27, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning (2016). Iris Species [Dataset]. https://www.kaggle.com/datasets/uciml/iris
    Explore at:
    zip(3687 bytes)Available download formats
    Dataset updated
    Sep 27, 2016
    Dataset authored and provided by
    UCI Machine Learning
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.

    It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

    The columns in this dataset are:

    • Id
    • SepalLengthCm
    • SepalWidthCm
    • PetalLengthCm
    • PetalWidthCm
    • Species

    Sepal Width vs. Sepal Length

  8. Customer Segmentation : Clustering

    • kaggle.com
    Updated Jan 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishakh Patel (2024). Customer Segmentation : Clustering [Dataset]. https://www.kaggle.com/datasets/vishakhdapat/customer-segmentation-clustering
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 13, 2024
    Dataset provided by
    Kaggle
    Authors
    Vishakh Patel
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Customer Personality Analysis involves a thorough examination of a company's optimal customer profiles. This analysis facilitates a deeper understanding of customers, enabling businesses to tailor products to meet the distinct needs, behaviors, and concerns of various customer types.

    By conducting a Customer Personality Analysis, businesses can refine their products based on the preferences of specific customer segments. Rather than allocating resources to market a new product to the entire customer database, companies can identify the segments most likely to be interested in the product. Subsequently, targeted marketing efforts can be directed toward those particular segments, optimizing resource utilization and increasing the likelihood of successful product adoption.

    Details of Features are as below:

    • Id: Unique identifier for each individual in the dataset.
    • Year_Birth: The birth year of the individual.
    • Education: The highest level of education attained by the individual.
    • Marital_Status: The marital status of the individual.
    • Income: The annual income of the individual.
    • Kidhome: The number of young children in the household.
    • Teenhome: The number of teenagers in the household.
    • Dt_Customer: The date when the customer was first enrolled or became a part of the company's database.
    • Recency: The number of days since the last purchase or interaction.
    • MntWines: The amount spent on wines.
    • MntFruits: The amount spent on fruits.
    • MntMeatProducts: The amount spent on meat products.
    • MntFishProducts: The amount spent on fish products.
    • MntSweetProducts: The amount spent on sweet products.
    • MntGoldProds: The amount spent on gold products.
    • NumDealsPurchases: The number of purchases made with a discount or as part of a deal.
    • NumWebPurchases: The number of purchases made through the company's website.
    • NumCatalogPurchases: The number of purchases made through catalogs.
    • NumStorePurchases: The number of purchases made in physical stores.
    • NumWebVisitsMonth: The number of visits to the company's website in a month.
    • AcceptedCmp3: Binary indicator (1 or 0) whether the individual accepted the third marketing campaign.
    • AcceptedCmp4: Binary indicator (1 or 0) whether the individual accepted the fourth marketing campaign.
    • AcceptedCmp5: Binary indicator (1 or 0) whether the individual accepted the fifth marketing campaign.
    • AcceptedCmp1: Binary indicator (1 or 0) whether the individual accepted the first marketing campaign.
    • AcceptedCmp2: Binary indicator (1 or 0) whether the individual accepted the second marketing campaign.
    • Complain: Binary indicator (1 or 0) whether the individual has made a complaint.
    • Z_CostContact: A constant cost associated with contacting a customer.
    • Z_Revenue: A constant revenue associated with a successful campaign response.
    • Response: Binary indicator (1 or 0) whether the individual responded to the marketing campaign.
  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sailesh S (2021). wine reviews_small.csv [Dataset]. https://www.kaggle.com/datasets/sailesh07/wine-reviews-smallcsv/suggestions?status=pending&yourSuggestions=true
Organization logo

wine reviews_small.csv

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sailesh S
Description

Dataset

This dataset was created by Sailesh S

Contents

Search
Clear search
Close search
Google apps
Main menu