This dataset was created by Sailesh S
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Datasets Description:
The datasets under discussion pertain to the red and white variants of Portuguese "Vinho Verde" wine. Detailed information is available in the reference by Cortez et al. (2009). These datasets encompass physicochemical variables as inputs and sensory variables as outputs. Notably, specifics regarding grape types, wine brand, and selling prices are absent due to privacy and logistical concerns.
Classification and Regression Tasks: One can interpret these datasets as being suitable for both classification and regression analyses. The classes are ordered, albeit imbalanced. For instance, the dataset contains a more significant number of normal wines compared to excellent or poor ones.
Dataset Contents: For a comprehensive understanding, readers are encouraged to review the work by Cortez et al. (2009). The input variables, derived from physicochemical tests, include: 1. Fixed acidity 2. Volatile acidity 3. Citric acid 4. Residual sugar 5. Chlorides 6. Free sulfur dioxide 7. Total sulfur dioxide 8. Density 9. pH 10. Sulphates 11. Alcohol
The output variable, based on sensory data, is denoted by: 12. Quality (score ranging from 0 to 10)
Usage Tips: A practical suggestion involves setting a threshold for the dependent variable, defining wines with a quality score of 7 or higher as 'good/1' and the rest as 'not good/0.' This facilitates meaningful experimentation with hyperparameter tuning using decision tree algorithms and analyzing ROC curves and AUC values.
Operational Workflow: To efficiently utilize the dataset, the following steps are recommended: 1. Utilize a File Reader (for csv) to a linear correlation node and an interactive histogram for basic Exploratory Data Analysis (EDA). 2. Employ a File Reader to a Rule Engine Node for transforming the 10-point scale to a dichotomous variable indicating 'good wine' and 'rest.' 3. Implement a Rule Engine Node output to an input of Column Filter node to filter out the original 10-point feature, thus preventing data leakage. 4. Apply a Column Filter Node output to the input of Partitioning Node to execute a standard train/test split (e.g., 75%/25%, choosing 'random' or 'stratified'). 5. Feed the Partitioning Node train data split output into the input of Decision Tree Learner node. 6. Connect the Partitioning Node test data split output to the input of Decision Tree predictor Node. 7. Link the Decision Tree Learner Node output to the input of Decision Tree Node. 8. Finally, connect the Decision Tree output to the input of ROC Node for model evaluation based on the AUC value.
Tools and Acknowledgments: For an efficient analysis, consider using KNIME, a valuable graphical user interface (GUI) tool. Additionally, the dataset is available on the UCI machine learning repository, and proper acknowledgment and citation of the dataset source by Cortez et al. (2009) are essential for use.
Two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
Number of Instances: red wine - 1599; white wine - 4898
Input variables (based on physicochemical tests):
Output variable (based on sensory data):
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('wine_quality', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains physicochemical properties of white wine samples. The goal is to analyze how these features influence the quality of wine. It can be used for exploratory data analysis, statistical modeling, and machine learning tasks such as regression and classification.
The dataset consists of multiple white wine samples with their respective chemical compositions. Each row represents a different wine sample, and the columns correspond to specific properties that impact its taste and quality.
This dataset is inspired by wine composition studies and can be used for educational and research purposes.
This dataset was created by Leila Carey
The data consist of variants of the Portuguese Vinho-Verde wine and has 1599 observations of Red wine and 4898 observations of White wine. For each, we have the wine quality (scored between 0 and 10) and eleven chemical attributes (quantitative), which are as follows: Fixed acidity, Volatile acidity, Citric acid, Residual sugar, Chlorides, Free sulfur dioxide, Total sulfur dioxide, Density, PH, Sulphates, and Alcohol
Fixed acidity - Most acids involved wine or fixed or nonvolatile Volatile acidity - The number of acetic acids in wine which at too high of levels can lead to an unpleasant, vinegar taste Citric acid - Can be found in small quantities, add freshness and the flavor to the wine Residual sugar - The amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1g/L and wines with greater than 45g/L considered as sweet. Chlorides - The amount of salt in the wine Free sulfur dioxide - The free form of sulfur dioxide that is not bound to other molecules, and is used to calculate molecular sulfur dioxide Total sulfur dioxide - The amount of free and bound forms of sulfur dioxide Density - The density of water is close to that of water depending on the percent of alcohol and the sugar PH - Describe how acidic or basic a wine in on a scale from 0 to 14 Sulfates - A wine additive which can contribute to sulfur dioxide gas levels, which act as an antimicrobial and antioxidant Alcohol - The percent alcohol content of the wine
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.
It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.
The columns in this dataset are:
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Customer Personality Analysis involves a thorough examination of a company's optimal customer profiles. This analysis facilitates a deeper understanding of customers, enabling businesses to tailor products to meet the distinct needs, behaviors, and concerns of various customer types.
By conducting a Customer Personality Analysis, businesses can refine their products based on the preferences of specific customer segments. Rather than allocating resources to market a new product to the entire customer database, companies can identify the segments most likely to be interested in the product. Subsequently, targeted marketing efforts can be directed toward those particular segments, optimizing resource utilization and increasing the likelihood of successful product adoption.
Details of Features are as below:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This dataset was created by Sailesh S