Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Sure! Here's the updated Kaggle dataset description with your data visualization work included:
This dataset contains physicochemical attributes of red variants of Portuguese "Vinho Verde" wine, along with their quality score (rated between 0 to 10). The goal is to predict wine quality using various classification models based on the chemical properties of the wine.
Multiple machine learning models were trained to predict wine quality. The following accuracy scores were observed:
Model | Training Accuracy | Testing Accuracy |
---|---|---|
Logistic Regression | 87.91% | 87.0% |
Random Forest | 100% | 94.0% |
Decision Tree | 100% | 88.5% |
Support Vector Machine (SVM) | 86.41% | 86.5% |
A comparison plot of model performance was created to visually represent the accuracy of each algorithm. This helps in understanding which models generalized well and which ones may have overfit to the training data.
winequality-red.csv
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, refer to [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)
This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality, to get both the dataset i.e. red and white vinho verde wine samples, from the north of Portugal, please visit the above link.
Please include this citation if you plan to use this database:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
We kagglers can apply several machine-learning algorithms to determine which physiochemical properties make a wine 'good'!
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Tabular Datasets
The datasets are used in this project: Feature Factory
Index Dataset Name File Name Data Type
Format Source
1 Wine Quality (Red Wine) winequality-red.csv Tabular 1,599 CSV Link
2 NYC Yellow Taxi Trip (Jan 2019) yellow_tripdata_2019.parquet Taxi Trip Data ~7M Parquet Link
3 NYC Green Taxi Trip (Jan 2019)green_tripdata_2019.parquet Taxi Trip Data ~1M Parquet Link
4 California Housing Prices california_housing.csv Real Estate Prices⦠See the full description on the dataset page: https://huggingface.co/datasets/habedi/feature-factory-datasets.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
About Wine Wine is an alcoholic drink typically made from fermented grapes. Yeast consumes the sugar in the grapes and converts it to ethanol, carbon dioxide, and heat.
White wine is primarily made with white grapes, and the skins are separated from the juice before the fermentation process. Red wine is made with darker red or black grapes, and the skins remain on the grapes during the fermentation process.
Objective βWine is bottled poetry.β The wine connoisseurs in a wine factory in Portugal are debating on the quality of red and white wines. They thought to take the help of Data Science industry for this work. They hired you as a data scientist as you were the best data scientist in the world. Can you help them out?
Data Description Input variables (based on physicochemical tests): fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol Output variable (based on sensory data):quality (score between 0 and 10)
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Sure! Here's the updated Kaggle dataset description with your data visualization work included:
This dataset contains physicochemical attributes of red variants of Portuguese "Vinho Verde" wine, along with their quality score (rated between 0 to 10). The goal is to predict wine quality using various classification models based on the chemical properties of the wine.
Multiple machine learning models were trained to predict wine quality. The following accuracy scores were observed:
Model | Training Accuracy | Testing Accuracy |
---|---|---|
Logistic Regression | 87.91% | 87.0% |
Random Forest | 100% | 94.0% |
Decision Tree | 100% | 88.5% |
Support Vector Machine (SVM) | 86.41% | 86.5% |
A comparison plot of model performance was created to visually represent the accuracy of each algorithm. This helps in understanding which models generalized well and which ones may have overfit to the training data.
winequality-red.csv