https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Wine Dataset is derived from a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The dataset includes 13 attributes such as alcohol, malic acid, ash, and color intensity, providing a comprehensive overview for understanding wine characteristics and aiding in classification tasks.
2) Data Utilization (1) Wine data has characteristics that: • It includes detailed measurements of wine attributes, allowing for analysis of chemical composition, comparison between different wine types, and identification of patterns in wine quality and flavor profiles. (2) Wine data can be used to: • Wine Industry: Assists winemakers and analysts in understanding the chemical properties that influence wine quality, helping to improve production processes and quality control. • Research: Supports academic studies and the development of classification models for wine quality prediction and analysis.
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Datasets Description:
The datasets under discussion pertain to the red and white variants of Portuguese "Vinho Verde" wine. Detailed information is available in the reference by Cortez et al. (2009). These datasets encompass physicochemical variables as inputs and sensory variables as outputs. Notably, specifics regarding grape types, wine brand, and selling prices are absent due to privacy and logistical concerns.
Classification and Regression Tasks: One can interpret these datasets as being suitable for both classification and regression analyses. The classes are ordered, albeit imbalanced. For instance, the dataset contains a more significant number of normal wines compared to excellent or poor ones.
Dataset Contents: For a comprehensive understanding, readers are encouraged to review the work by Cortez et al. (2009). The input variables, derived from physicochemical tests, include: 1. Fixed acidity 2. Volatile acidity 3. Citric acid 4. Residual sugar 5. Chlorides 6. Free sulfur dioxide 7. Total sulfur dioxide 8. Density 9. pH 10. Sulphates 11. Alcohol
The output variable, based on sensory data, is denoted by: 12. Quality (score ranging from 0 to 10)
Usage Tips: A practical suggestion involves setting a threshold for the dependent variable, defining wines with a quality score of 7 or higher as 'good/1' and the rest as 'not good/0.' This facilitates meaningful experimentation with hyperparameter tuning using decision tree algorithms and analyzing ROC curves and AUC values.
Operational Workflow: To efficiently utilize the dataset, the following steps are recommended: 1. Utilize a File Reader (for csv) to a linear correlation node and an interactive histogram for basic Exploratory Data Analysis (EDA). 2. Employ a File Reader to a Rule Engine Node for transforming the 10-point scale to a dichotomous variable indicating 'good wine' and 'rest.' 3. Implement a Rule Engine Node output to an input of Column Filter node to filter out the original 10-point feature, thus preventing data leakage. 4. Apply a Column Filter Node output to the input of Partitioning Node to execute a standard train/test split (e.g., 75%/25%, choosing 'random' or 'stratified'). 5. Feed the Partitioning Node train data split output into the input of Decision Tree Learner node. 6. Connect the Partitioning Node test data split output to the input of Decision Tree predictor Node. 7. Link the Decision Tree Learner Node output to the input of Decision Tree Node. 8. Finally, connect the Decision Tree output to the input of ROC Node for model evaluation based on the AUC value.
Tools and Acknowledgments: For an efficient analysis, consider using KNIME, a valuable graphical user interface (GUI) tool. Additionally, the dataset is available on the UCI machine learning repository, and proper acknowledgment and citation of the dataset source by Cortez et al. (2009) are essential for use.
This dataset is related to red variants of spanish wines. The dataset describes several popularity and description metrics their effect on it's quality. The datasets can be used for classification or regression tasks. The classes are ordered and not balanced (i.e. the quality goes from almost 5 to 4 points). The task is to predict either the quality of wine or the prices using the given data.
The dataset contains 7500 different types of red wines from Spain with 11 features that describe their price, rating, and even some flavor description. The was collected by me using web scraping from different sources (from wine specialized pages to supermarkets). Please acknowledge the hard work to obtain and create this dataset, you can upvote it if you find it useful to use on your projects :)
If the dataset becomes popular I will probably try to create a bigger version with wines from other countries and a wider spectrum of ratings.
If you want to cite this data:
fedesoriano. (April 2022). Spanish Wine Quality Dataset. Retrieved [Date Retrieved] from https://www.kaggle.com/datasets/fedesoriano/spanish-wine-quality-dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
part of the dataset supplied in https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009 https://archive.ics.uci.edu/ml/datasets/wine+quality
Two datasets were created, using red and white wine samples. The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model these datasets under a regression approach. The support vector machine model achieved the best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity analysis procedure).
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
Number of Instances: red wine - 1599; white wine - 4898
Input variables (based on physicochemical tests):
Output variable (based on sensory data):
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('wine_quality', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Data Set Information:
The dataset was downloaded from the UCI Machine Learning Repository.
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. The reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
Two datasets were combined and few values were randomly removed.
Attribute Information:
For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)
Acknowledgements:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Sure! Here's the updated Kaggle dataset description with your data visualization work included:
This dataset contains physicochemical attributes of red variants of Portuguese "Vinho Verde" wine, along with their quality score (rated between 0 to 10). The goal is to predict wine quality using various classification models based on the chemical properties of the wine.
Multiple machine learning models were trained to predict wine quality. The following accuracy scores were observed:
Model | Training Accuracy | Testing Accuracy |
---|---|---|
Logistic Regression | 87.91% | 87.0% |
Random Forest | 100% | 94.0% |
Decision Tree | 100% | 88.5% |
Support Vector Machine (SVM) | 86.41% | 86.5% |
A comparison plot of model performance was created to visually represent the accuracy of each algorithm. This helps in understanding which models generalized well and which ones may have overfit to the training data.
winequality-red.csv
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Wine Dataset for Clustering’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harrywang/wine-dataset-for-clustering on 30 September 2021.
--- Dataset description provided by original source is as follows ---
This dataset is adapted from the Wine Data Set from https://archive.ics.uci.edu/ml/datasets/wine by removing the information about the types of wine for unsupervised learning.
The following descriptions are adapted from the UCI webpage:
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
The attributes are:
--- Original source retains full ownership of the source dataset ---
This data is taken from UCI's Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Wine For a slightly edited csv version (added column names): https://www.kaggle.com/aarontanjaya/uci-wine-dataset-edit
the data is donated by:
Original Owners:
Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.
Donor:
Stefan Aeberhard, email: stefan '@' coral.cs.jcu.edu.au
Data Set Information:
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
I think that the initial data set had around 30 variables, but for some reason I only have the 13 dimensional version. I had a list of what the 30 or so variables were, but a.) I lost it, and b.), I would not know which 13 variables are included in the set.
The attributes are (dontated by Riccardo Leardi, riclea '@' anchem.unige.it ) 1) Alcohol 2) Malic acid 3) Ash 4) Alcalinity of ash 5) Magnesium 6) Total phenols 7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins 10)Color intensity 11)Hue 12)OD280/OD315 of diluted wines 13)Proline
In a classification context, this is a well posed problem with "well behaved" class structures. A good data set for first testing of a new classifier, but not very challenging.
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Dataset Card for wine-labels
** The original COCO dataset is stored at dataset.tar.gz**
Dataset Summary
wine-labels
Supported Tasks and Leaderboards
object-detection: The dataset can be used to train a model for Object Detection.
Languages
English
Dataset Structure
Data Instances
A data point comprises an image and its object annotations. { 'image_id': 15, 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB… See the full description on the dataset page: https://huggingface.co/datasets/Francesco/wine-labels.
Wine originating from the state of Oregon had the highest average price for a 750ml bottle in the United States in 2023 at 17.37 U.S. dollars. In comparison, wine from California averaged 8.48 dollars per bottle.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Wines: Approved for Circulation: AOC & VDQS: Dept: Sevres (Deux) data was reported at 465.000 hl in Apr 2018. This records a decrease from the previous number of 511.000 hl for Mar 2018. Wines: Approved for Circulation: AOC & VDQS: Dept: Sevres (Deux) data is updated monthly, averaging 18,398.000 hl from Aug 2002 (Median) to Apr 2018, with 188 observations. The data reached an all-time high of 120,078.000 hl in Feb 2013 and a record low of 154.000 hl in Jul 2013. Wines: Approved for Circulation: AOC & VDQS: Dept: Sevres (Deux) data remains active status in CEIC and is reported by General Directorate of Customs and Excise. The data is categorized under Global Database’s France – Table FR.B013: Wine Statistics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A compendium of data on wine and grape production in winegrape bearing regions in Australia. There are four sheets that record data by region: 1) wine variables; 2) yield for 2006 and 2008; 3) time series data from 1999-2008; and 4) data on water usage by state. The data include, for example, statistics on grape and wine employment and value of grape and wine output. Dataset to be attributed to The University of Adelaide.
https://www.nist.gov/open/copyright-fair-use-and-licensing-statements-srd-data-software-and-technical-series-publications#SRDhttps://www.nist.gov/open/copyright-fair-use-and-licensing-statements-srd-data-software-and-technical-series-publications#SRD
This page, "Wine lactone", is part of the NIST Chemistry WebBook. This site and its contents are part of the NIST Standard Reference Data Program.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains data that were collected on 2 sets of 8 French red wines from two grape varieties, Pinot Noir (PN) and Cabernet Franc (CF). It provides, for the 16 wines, (i) sensory descriptive data obtained with a trained panel, (ii) volatile organic compounds (VOC) quantification data obtained by Gas Chromatography–Mass Spectrometry (GC-MS) and (iii) odorant composition obtained by Gas Chromatography–Mass Spectrometry–Olfactometry (GC-MS-O).
The dataset is a Microsoft Excel Worksheet containing 8 sheets.
Gives information about the sheets contained in this .xlsx file
Each row represents a wine
Each column corresponds to an experimental factors of the wines (Grape variety, Vintage and Protected Designation of Origin)
Lists the 33 sensory descriptors used for the sensory descriptive analysis of the wines
Each row represents a wine
Each column corresponds to a condition (2640 columns)
Senso_(ortho or retro)_(Panelist1 to Panelist 16)_(1 to 33 Sensory descriptors)_(1 to 3 repetitions for ortho and 1 to 2 repetitions for retro)
For the ortho (orthonasal) measurements, there is 16 panelists, 33 sensory descriptors and 3 repetitions = 1584 columns
For the retro (retronasal) measurements, there is 16 panelists, 33 sensory descriptors and 2 repetitions = 1056 columns
Each cell contains a sensory measurement for the corresponding condition in the corresponding wine
Lists the 45 VOC quantified in the wines with their corresponding CAS number
VOC: Volatil Organic Compounds
Each row represents a wine
Each column corresponds to a VOC (45 columns)
Each cell contains the quantification of the corresponding VOC in the corresponding wine
Lists the 49 odor-active compounds identified with their corresponding CAS number and the 34 compounds identified by their apex indice
Each row represents a wine
Each column corresponds to an odor-active compound identified by its CAS number or by its Apex indice if the compound was not identify (81 odor-active compounds) + the number of judges who smelled the compound and its description (by 8 judges) = 9 columns per odor-active compound for a total of 729 columns
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is related to the research paper "Wine quality rapid detection using a compact electronic nose system: application focused on spoilage thresholds by acetic acid" published in LWT journal (available online from April 1, 2019, https://doi.org/10.1016/j.lwt.2019.03.074), and the data paper "Electronic nose dataset for detection of wine spoilage thresholds" submitted to Data in Brief journal. For more details read the mentioned articles and cite our work whether found useful.
The recorded time series was acquired at the sampling frequency of 18.5Hz during 180 seconds, resulting in 3330 data points per sensor.
Each file in the dataset has eight columns: relative humidity (%), temperature (°C), and the resistance readings in kΩ of the six gas sensors: MQ-3, MQ-4, MQ-6, MQ-3, MQ-4, MQ-6.
We organized the database in three folders for the wines: AQ_Wines, HQ_Wines, LQ_Wines; and one folder for the ethanol: Ethanol. Each folder contains text files that correspond to different measurements.
The filename identify the wine measurement as follows: the first 2 characters of the filename are an identifier of the spoilage wine threshold (AQ: average-quality, HQ: high-quality, LQ: low-quality); characters 4-9 indicate the wine brand; characters 11-13 indicate the bottle, and the last 3 characters indicate the repetition (another sample of the same bottle). For example, file LQ_Wine01-B01_R01 contains the time series recorded when low-quality wine of the brand 01, bottle 01, sample 01 was measured.
The filenames into the Ethanol folder identify the measurements at different concentrations: the first 2 characters of the filename are an identifier of Ethanol (Ea); characters 4-5 indicate the concentration in v/v (C1: 1%, C2: 2.5%, C3: 5%, C4: 10%, C5: 15%, C6: 20%); and the last 3 characters indicate the repetition. For example, file Ea-C1_R01 contains time series acquired when Ethanol at 1% v/v of concentration, sample 01 was measured.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database is built from open data as described in the paper entitled ‘French wine: Combination of multiple open data sources to mapping the expected harvest value’ (2024).
CODE_CULTU |
Crop code of the graphic land registry database |
CodeCdC |
Crop code in Multi Perils Crop Insurance specification |
Harvest Value B |
Harvest value (€/ha organic wine) |
Harvest Value C |
Harvest value (€/ha no-organic wine) |
IDA |
ID of geographical areas of INAO |
Insee_Com |
County code (INSEE) |
Label_CdC |
Crop label in Multi Perils Crop Insurance specification |
Label_Dpt |
Department |
Label_Insee_com |
County |
Label_RA |
Agricultural Region (AGRESTE) |
Label_appellation |
Appellation (INAO) |
Label_code3 |
Crop (FADN) |
Label_cvi |
Wine name (vineyard register of customs services) |
Label_idGeo |
Geographical ID of Quality Sign (INAO) |
PxBaremAOP |
Price listed in Multi Perils Crop Insurance specification (€/hl no-organic) |
PxBaremAOPBio |
Price listed in Multi Perils Crop Insurance specification (€/hl organic) |
RdtMOAOP |
Harvest wine yield (hl/ha) |
SurfaceModel |
Surface of wine as fitted by model |
code3 |
Crop code (FADN) |
code_dept |
Department code |
code_regag |
Code of Agricultural Region (AGRESTE) |
cvi |
Wine code (vineyard register of customs services) |
id_appellation |
Appellation code (INAO) |
id_denomination_geo |
Geographical ID of Quality Sign (INAO) |
Find here the relative research paper :
https://univ-lemans.hal.science/hal-04627672
Please find below the list of the sites where used data could be found (lasted view the June 26, 2024).
https://agreste.agriculture.gouv.fr/agreste-web/methodon/Z.1/!searchurl/listeTypeMethodon/
https://www.casd.eu/source/reseau-dinformation-comptable-agricole/?tab=16
https://www.douane.gouv.fr/la-douane/opendata?f%5B0%5D=categorie_opendata_facet%3A467
https://www.data.gouv.fr/fr/datasets/?q=inao
https://maisons-champagne.com/fr/appellation/aire-geographique/
The DATASET compiles 1.945 files corresponding to individual images of glasses containing red wine. Each file name is unique and contains information of the parameters under which the photograph was taken (see DATA-SPECIFIC INFOR-MATION for details). For example: The file Rea_Rio_C_Bor_175_nd_nd_fr10_nd_nd_ar2 corresponds to an almost real image (Rea), taken in La Rioja (Rio), of a “crianza wine”(C), in a Bourgogne wine glass (Bor), with a volume of 175 mL (175), taken at none defined time (nd) and undefined lighting (nd), with a real back-ground (fr10) and without reference (nd) nor distance (nd) considerations, and upper angle (ar2). The DATASET compiles 1.945 files corresponding to individual images of glasses containing red wine. The photographs of glasses containing wine were acquired by researchers with different smartphones equipped with high-resolution cameras (12 or 48 MP). Photographs were previously designed considering usual photographic parameters. Each file name is unique and contains information of the parameters under which the photograph was taken. The photographs of glasses containing wine were acquired by researchers with different smartphones equipped with high-resolution cameras (12 or 48 MP). Photographs were previously designed considering usual photographic parameters (see DATA-SPECIFIC INFORMATION for details).-- The data have not been processed. This study was supported by MCIN (Ministerio de Ciencia e Innovación)/AEI (Agencia Estatal de Investigación)/10 .13039 /501100011033through the projects PID2019-108851RB-C21 & PID2019-108851RB-C22. The authors would like to thank CSIC Interdisciplinary Thematic Platform (PTI+) Digital Science and Innovation. Peer reviewed
Liquor Authority quarterly list of all active licensees in NYS filtered by Winery and Brewery specific License Types.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Wine Dataset is derived from a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The dataset includes 13 attributes such as alcohol, malic acid, ash, and color intensity, providing a comprehensive overview for understanding wine characteristics and aiding in classification tasks.
2) Data Utilization (1) Wine data has characteristics that: • It includes detailed measurements of wine attributes, allowing for analysis of chemical composition, comparison between different wine types, and identification of patterns in wine quality and flavor profiles. (2) Wine data can be used to: • Wine Industry: Assists winemakers and analysts in understanding the chemical properties that influence wine quality, helping to improve production processes and quality control. • Research: Supports academic studies and the development of classification models for wine quality prediction and analysis.