26 datasets found

Cardiovascular-Disease-Dataset
kaggle.com
zip
Updated Dec 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AKSHAT (2023). Cardiovascular-Disease-Dataset [Dataset]. https://www.kaggle.com/datasets/akshatshaw7/cardiovascular-disease-dataset
Explore at:
zip(1021713 bytes)Available download formats
Dataset updated
Dec 14, 2023
Authors
AKSHAT
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
PLEASE UPVOTE Try some basic EDA on this dataset and try simpler model on this dataset and post your work.

Features:

Age | Objective Feature | age | int (days)

Height | Objective Feature | height | int (cm) |

Weight | Objective Feature | weight | float (kg) |

Gender | Objective Feature | gender | categorical code |

Systolic blood pressure | Examination Feature | ap_hi | int |

Diastolic blood pressure | Examination Feature | ap_lo | int |

Cholesterol | Examination Feature | cholesterol | 1: normal, 2: above normal, 3: well above normal |

Glucose | Examination Feature | gluc | 1: normal, 2: above normal, 3: well above normal |

Smoking | Subjective Feature | smoke | binary |

Alcohol intake | Subjective Feature | alco | binary |

Physical activity | Subjective Feature | active | binary |

Presence or absence of cardiovascular disease | Target Variable | cardio | binary |
Comparison results with the state of the art.
plos.figshare.com
xls
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fahad Khan; Xiaojun Yu; Zhaohui Yuan; Atiq ur Rehman (2023). Comparison results with the state of the art. [Dataset]. http://doi.org/10.1371/journal.pone.0284791.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0284791.t006
Dataset updated
Jun 21, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Fahad Khan; Xiaojun Yu; Zhaohui Yuan; Atiq ur Rehman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An electrocardiograph (ECG) is widely used in diagnosis and prediction of cardiovascular diseases (CVDs). The traditional ECG classification methods have complex signal processing phases that leads to expensive designs. This paper provides a deep learning (DL) based system that employs the convolutional neural networks (CNNs) for classification of ECG signals present in PhysioNet MIT-BIH Arrhythmia database. The proposed system implements 1-D convolutional deep residual neural network (ResNet) model that performs feature extraction by directly using the input heartbeats. We have used synthetic minority oversampling technique (SMOTE) that process class-imbalance problem in the training dataset and effectively classifies the five heartbeat types in the test dataset. The classifier’s performance is evaluated with ten-fold cross validation (CV) using accuracy, precision, sensitivity, F1-score, and kappa. We have obtained an average accuracy of 98.63%, precision of 92.86%, sensitivity of 92.41%, and specificity of 99.06%. The average F1-score and Kappa obtained were 92.63% and 95.5% respectively. The study shows that proposed ResNet performs well with deep layers compared to other 1-D CNNs.
Data_Sheet_1_Rough-set based learning: Assessing patterns and predictability...
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sheela Ramanna; Negin Ashrafi; Evan Loster; Karen Debroni; Shelley Turner (2023). Data_Sheet_1_Rough-set based learning: Assessing patterns and predictability of anxiety, depression, and sleep scores associated with the use of cannabinoid-based medicine during COVID-19.pdf [Dataset]. http://doi.org/10.3389/frai.2023.981953.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2023.981953.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Sheela Ramanna; Negin Ashrafi; Evan Loster; Karen Debroni; Shelley Turner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recently, research is emerging highlighting the potential of cannabinoids' beneficial effects related to anxiety, mood, and sleep disorders as well as pointing to an increased use of cannabinoid-based medicines since COVID-19 was declared a pandemic. The objective of this research is 3 fold: i) to evaluate the relationship of the clinical delivery of cannabinoid-based medicine for anxiety, depression and sleep scores by utilizing machine learning specifically rough set methods; ii) to discover patterns based on patient features such as specific cannabinoid recommendations, diagnosis information, decreasing/increasing levels of clinical assessment tools (CAT) scores over a period of time; and iii) to predict whether new patients could potentially experience either an increase or decrease in CAT scores. The dataset for this study was derived from patient visits to Ekosi Health Centres, Canada over a 2 year period including the COVID timeline. Extensive pre-processing and feature engineering was performed. A class feature indicative of their progress or lack thereof due to the treatment received was introduced. Six Rough/Fuzzy-Rough classifiers as well as Random Forest and RIPPER classifiers were trained on the patient dataset using a 10-fold stratified CV method. The highest overall accuracy, sensitivity and specificity measures of over 99% was obtained using the rule-based rough-set learning model. In this study, we have identified rough-set based machine learning model with high accuracy that could be utilized for future studies regarding cannabinoids and precision medicine.
Z
MSCardio Seismocardiography (SCG) Dataset
data-staging.niaid.nih.gov
Updated Mar 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taebi, Amirtahà; Rahman, Mohammad Muntasir (2025). MSCardio Seismocardiography (SCG) Dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14975877
Explore at:
Dataset updated
Mar 5, 2025
Dataset provided by
Mississippi State University
Authors
Taebi, Amirtahà; Rahman, Mohammad Muntasir
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Overview

The MSCardio Seismocardiography Dataset is an open-access dataset collected as part of the Mississippi State Remote Cardiovascular Monitoring (MSCardio) study. This dataset includes seismocardiogram (SCG) signals recorded from participants using smartphone sensors, enabling scalable, real-world cardiovascular monitoring without requiring specialized equipment. The dataset aims to support research in SCG signal processing, machine learning applications in health monitoring, and cardiovascular assessment.

See the GitHub repository of this dataset for the latest updates: https://github.com/TaebiLab/MSCardio

Background

Cardiovascular diseases remain the leading cause of morbidity and mortality worldwide. SCG is a non-invasive technique that captures chest vibrations induced by cardiac activity and respiration, providing valuable insights into cardiac function. However, the scarcity of open-access SCG datasets has been a significant limitation for research in this field. The MSCardio dataset addresses this gap by providing real-world SCG signals collected via smartphone sensors from a diverse population.

Data Description

Study Population

Total participants enrolled: 123

Participants who uploaded data: 108 (46 males, 61 females, 1 unspecified)

Age range: 18 to 62 years

Total recordings uploaded: 515

Unique recordings after duplicate removal: 502

Platforms used: iOS and Android smartphones

Signal Data

Axial vibrations in three directions (SCG) recorded using smartphone sensors

Sampling frequency varies depending on the device capabilities

Data synchronization is ensured for temporal accuracy

Missing SCG data identified in certain recordings, addressed through preprocessing

Metadata

Each recording includes:

Device model (e.g., iPhone Pro Max)

Recording time (UTC) and time zone

Platform (iOS or Android)

General demographic details (gender, race, age, height, weight)

File Structure

The dataset is organized as follows:

MSCardio_SCG_Dataset/│── info/│ └── all_subject_data.csv # Consolidated metadata for all subjects│── MSCardio/│ ├── Subject_XXXX/ # Subject-specific folder│ │ ├── general_metadata.json # Demographic and device information│ │ ├── Recording_XXX/ # Individual recordings│ │ │ ├── scg.csv # SCG signal data│ │ │ ├── recording_metadata.json # Timestamp and device details

Data Collection Protocol

Participants placed their smartphone on their chest while lying in a supine position.

The app recorded SCG signals for approximately two minutes.

Self-reported demographic data were collected.

Data were uploaded to the study's cloud storage.

Usage and Applications

This dataset is intended for research in:

SCG signal processing and feature extraction

Machine learning applications in cardiovascular monitoring

Investigating inter- and intra-subject variability in SCG signals

Remote cardiovascular health assessment

The Data_visualization.py script is provided for data visualization

Citation

If you use this dataset in your research, please cite:

@article{rahman2025MSCardio, author = {Taebi, Amirtah{`a} and Rahman, Mohammad Muntasir}, title = {MSCardio: Initial insights from remote monitoring of cardiovascular-induced chest vibrations via smartphones}, journal = {Data in Brief}, year = {2025}, publisher = {Elsevier}}

Contact

For any questions regarding the dataset, please contact:

Amirtahà Taebi and Mohammad Muntasir Rahman

E-mail: ataebi@abe.msstate.edu, mmr510@msstate.edu

Biomedical Engineering Program, Mississippi State University

This dataset is provided under an open-access license. Please ensure ethical and responsible use when utilizing this dataset for research.
Heart Disease Risk Prediction Dataset
kaggle.com
zip
Updated Feb 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahatir Ahmed Tusher (2025). Heart Disease Risk Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/mahatiratusher/heart-disease-risk-prediction-dataset
Explore at:
zip(1448235 bytes)Available download formats
Dataset updated
Feb 7, 2025
Authors
Mahatir Ahmed Tusher
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Heart Disease Risk Prediction Dataset

Overview

This synthetic dataset is designed to predict the risk of heart disease based on a combination of symptoms, lifestyle factors, and medical history. Each row in the dataset represents a patient, with binary (Yes/No) indicators for symptoms and risk factors, along with a computed risk label indicating whether the patient is at high or low risk of developing heart disease.

The dataset contains 70,000 samples, making it suitable for training machine learning models for classification tasks. The goal is to provide researchers, data scientists, and healthcare professionals with a clean and structured dataset to explore predictive modeling for cardiovascular health.

This dataset is a side project of EarlyMed, developed by students of Vellore Institute of Technology (VIT-AP). EarlyMed aims to leverage data science and machine learning for early detection and prevention of chronic diseases.

Dataset Features

Input Features

Symptoms (Binary - Yes/No)

Chest Pain (chest_pain): Presence of chest pain, a common symptom of heart disease.

Shortness of Breath (shortness_of_breath): Difficulty breathing, often associated with heart conditions.

Unexplained Fatigue (fatigue): Persistent tiredness without an obvious cause.

Palpitations (palpitations): Irregular or rapid heartbeat.

Dizziness/Fainting (dizziness): Episodes of lightheadedness or fainting.

Swelling in Legs/Ankles (swelling): Swelling due to fluid retention, often linked to heart failure.

Pain in Arm/Jaw/Neck/Back (radiating_pain): Radiating pain, a hallmark of angina or heart attacks.

Cold Sweats & Nausea (cold_sweats): Symptoms commonly associated with acute cardiac events.

Risk Factors (Binary - Yes/No or Continuous)

Age (age): Patient's age in years (continuous variable).

High Blood Pressure (hypertension): History of hypertension (Yes/No).

High Cholesterol (cholesterol_high): Elevated cholesterol levels (Yes/No).

Diabetes (diabetes): Diagnosis of diabetes (Yes/No).

Smoking History (smoker): Whether the patient is a smoker (Yes/No).

Obesity (obesity): Obesity status (Yes/No).

Family History of Heart Disease (family_history): Family history of cardiovascular conditions (Yes/No).

Output Label

Heart Disease Risk (risk_label): Binary label indicating the risk of heart disease:

0: Low risk

1: High risk

Data Generation Process

This dataset was synthetically generated using Python libraries such as numpy and pandas. The generation process ensured a balanced distribution of high-risk and low-risk cases while maintaining realistic correlations between features. For example: - Patients with multiple risk factors (e.g., smoking, hypertension, and diabetes) were more likely to be labeled as high risk. - Symptom patterns were modeled after clinical guidelines and research studies on heart disease.

Sources of Inspiration

The design of this dataset was inspired by the following resources:

Books

"Harrison's Principles of Internal Medicine" by J. Larry Jameson et al.: A comprehensive resource on cardiovascular diseases and their symptoms.

"Mayo Clinic Cardiology" by Joseph G. Murphy et al.: Provides insights into heart disease risk factors and diagnostic criteria.

Research Papers

Framingham Heart Study: A landmark study identifying key risk factors for cardiovascular disease.

American Heart Association (AHA) Guidelines: Recommendations for diagnosing and managing heart disease.

Existing Datasets

UCI Heart Disease Dataset: A widely used dataset for heart disease prediction.

Kaggle’s Heart Disease datasets: Various datasets contributed by the community.

Clinical Guidelines

Centers for Disease Control and Prevention (CDC): Information on heart disease symptoms and risk factors.

World Health Organization (WHO): Global statistics and risk factor analysis for cardiovascular diseases.

Applications

This dataset can be used for a variety of purposes:

Machine Learning Research:

Train classification models (e.g., Logistic Regression, Random Forest, XGBoost) to predict heart disease risk.

Experiment with feature engineering, model tuning, and evaluation metrics like Accuracy, Precision, Recall, and ROC-AUC.

Healthcare Analytics:

Identify key risk factors contributing to heart disease.

Develop decision support systems for early detection of cardiovascular risks.

Educational Purposes:

Teach students and practitioners about predictive modeling in healthcare.

Demonstrate the importance of feature selection...
f
Evaluation results of cross validation on DS2-CV for all the ML models.
plos.figshare.com
xls
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jen-Chieh Yu; Kuan Ni; Ching-Tai Chen (2024). Evaluation results of cross validation on DS2-CV for all the ML models. [Dataset]. http://doi.org/10.1371/journal.pone.0307176.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0307176.t002
Dataset updated
Jul 18, 2024
Dataset provided by
PLOS ONE
Authors
Jen-Chieh Yu; Kuan Ni; Ching-Tai Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Bald face indicates the highest value among all the methods.
Raisin Dataset
kaggle.com
zip
Updated Apr 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murat KOKLU (2022). Raisin Dataset [Dataset]. https://www.kaggle.com/datasets/muratkokludataset/raisin-dataset/versions/1
Explore at:
zip(115045 bytes)Available download formats
Dataset updated
Apr 3, 2022
Authors
Murat KOKLU
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
CV:https://www.muratkoklu.com/en/publications/ DATASET: https://www.muratkoklu.com/datasets/

Citation Request : CINAR I., KOKLU M. and TASDEMIR S., (2020). Classification of Raisin Grains Using Machine Vision and Artificial Intelligence Methods, Gazi Journal of Engineering Sciences, vol. 6, no. 3, pp. 200-209, December, 2020, DOI: https://doi.org/10.30855/gmbd.2020.03.03

Article Download (PDF): https://dergipark.org.tr/tr/download/article-file/1227592

ABSTRACT: In this study, machine vision system was developed in order to distinguish between two different variety of raisins (Kecimen and Besni) grown in Turkey. Firstly, a total of 900 pieces raisin grains were obtained, from an equal number of both varieties. These images were subjected to various preprocessing steps and 7 morphological feature extraction operations were performed using image processing techniques. In addition, minimum, mean, maximum and standard deviation statistical information was calculated for each feature. The distributions of both raisin varieties on the features were examined and these distributions were shown on the graphs. Later, models were created using LR, MLP, and SVM machine learning techniques and performance measurements were performed. The classification achieved 85.22% with LR, 86.33% with MLP and 86.44% with the highest classification accuracy obtained in the study with SVM. Considering the number of data available, it is possible to say that the study was successful.
Pistachio Species Classification
kaggle.com
zip
Updated May 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaurav Dutta (2023). Pistachio Species Classification [Dataset]. https://www.kaggle.com/datasets/gauravduttakiit/pistachio-species-classification
Explore at:
zip(46130452 bytes)Available download formats
Dataset updated
May 20, 2023
Authors
Gaurav Dutta
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
DATASET: https://www.muratkoklu.com/datasets/ CV:https://www.muratkoklu.com/en/publications/

Pistachio Image Dataset Citation Request :

OZKAN IA., KOKLU M. and SARACOGLU R. (2021). Classification of Pistachio Species Using Improved K-NN Classifier. Progress in Nutrition, Vol. 23, N. 2, pp. DOI:10.23751/pn.v23i2.9686. (Open Access) https://www.mattioli1885journals.com/index.php/progressinnutrition/article/view/9686/9178

SINGH D, TASPINAR YS, KURSUN R, CINAR I, KOKLU M, OZKAN IA, LEE H-N., (2022). Classification and Analysis of Pistachio Species with Pre-Trained Deep Learning Models, Electronics, 11 (7), 981. https://doi.org/10.3390/electronics11070981. (Open Access)

Article Download (PDF): 1: https://www.mattioli1885journals.com/index.php/progressinnutrition/article/view/9686/9178 2: https://doi.org/10.3390/electronics11070981

DATASET: https://www.muratkoklu.com/datasets/

ABSTRACT: To keep the economic value of pistachio nuts which have an important place in the agricultural economy, the efficiency of post-harvest industrial processes is very important. To provide this efficiency, new methods and technologies are needed for the separation and classification of pistachios. Different pistachio species address different markets, which increases the need for the classification of pistachio species. This study, it is aimed to develop a classification model different from traditional separation methods, based on image processing and artificial intelligence which are capable to provide the required classification. A computer vision system has been developed to distinguish two different species of pistachios with different characteristics that address different market types. 2148 sample images for these two kinds of pistachios were taken with a high-resolution camera. The image processing techniques, segmentation, and feature extraction were applied to the obtained images of the pistachio samples. A pistachio dataset that has sixteen attributes was created. An advanced classifier based on the k-NN method, which is a simple and successful classifier, and principal component analysis was designed on the obtained dataset. In this study; a multi-level system including feature extraction, dimension reduction, and dimension weighting stages has been proposed. Experimental results showed that the proposed approach achieved a classification success of 94.18%. The presented high-performance classification model provides an important need for the separation of pistachio species and increases the economic value of species. In addition, the developed model is important in terms of its application to similar studies.
f
Demographic features of the study population.
figshare.com
xls
Updated Jan 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nahida Akter; Jack Gordon; Sherry Li; Mikki Poon; Stuart Perry; John Fletcher; Thomas Chan; Andrew White; Maitreyee Roy (2025). Demographic features of the study population. [Dataset]. http://doi.org/10.1371/journal.pone.0316919.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316919.t001
Dataset updated
Jan 17, 2025
Dataset provided by
PLOS ONE
Authors
Nahida Akter; Jack Gordon; Sherry Li; Mikki Poon; Stuart Perry; John Fletcher; Thomas Chan; Andrew White; Maitreyee Roy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PurposeIn this study, we investigated the performance of deep learning (DL) models to differentiate between normal and glaucomatous visual fields (VFs) and classify glaucoma from early to the advanced stage to observe if the DL model can stage glaucoma as Mills criteria using only the pattern deviation (PD) plots. The DL model results were compared with a machine learning (ML) classifier trained on conventional VF parameters.MethodsA total of 265 PD plots and 265 numerical datasets of Humphrey 24–2 VF images were collected from 119 normal and 146 glaucomatous eyes to train the DL models to classify the images into four groups: normal, early glaucoma, moderate glaucoma, and advanced glaucoma. The two popular pre-trained DL models: ResNet18 and VGG16, were used to train the PD images using five-fold cross-validation (CV) and observed the performance using balanced, pre-augmented data (n = 476 images), imbalanced original data (n = 265) and feature extraction. The trained images were further investigated using the Grad-CAM visualization technique. Moreover, four ML models were trained from the global indices: mean deviation (MD), pattern standard deviation (PSD) and visual field index (VFI), using five-fold CV to compare the classification performance with the DL model’s result.ResultsThe DL model, ResNet18 trained from balanced, pre-augmented PD images, achieved high accuracy in classifying the groups with an overall F1-score: 96.8%, precision: 97.0%, recall: 96.9%, and specificity: 99.0%. The highest F1 score was 87.8% for ResNet18 with the original dataset and 88.7% for VGG16 with feature extraction. The DL models successfully localized the affected VF loss in PD plots. Among the ML models, the random forest (RF) classifier performed best with an F1 score of 96%.ConclusionThe DL model trained from PD plots was promising in differentiating normal and glaucomatous groups and performed similarly to conventional global indices. Hence, the evidence-based DL model trained from PD images demonstrated that the DL model could stage glaucoma using only PD plots like Mills criteria. This automated DL model will assist clinicians in precision glaucoma detection and progression management during extensive glaucoma screening.
Results of 5-fold CV generated by AC model and FFT model on circR2Disease...
plos.figshare.com
xls
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lei Wang; Zhu-Hong You; Yang-Ming Li; Kai Zheng; Yu-An Huang (2023). Results of 5-fold CV generated by AC model and FFT model on circR2Disease dataset. [Dataset]. http://doi.org/10.1371/journal.pcbi.1007568.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1007568.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Lei Wang; Zhu-Hong You; Yang-Ming Li; Kai Zheng; Yu-An Huang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Results of 5-fold CV generated by AC model and FFT model on circR2Disease dataset.
Deep Learning-EV Battery Pack Diagnostics (SDG 7)
kaggle.com
zip
Updated Oct 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr.Tawfikr Rahman (2025). Deep Learning-EV Battery Pack Diagnostics (SDG 7) [Dataset]. https://www.kaggle.com/datasets/drtawfikrrahman/deep-learning-ev-battery-pack-diagnostics-sdg-7
Explore at:
zip(41921744 bytes)Available download formats
Dataset updated
Oct 25, 2025
Authors
Dr.Tawfikr Rahman
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
** Deep Learning for EV Battery Pack Diagnostics (SDG~7)** This study employs two complementary datasets to develop and validate the proposed deep learning-driven Cell-to-Vehicle (C2V) diagnostic framework: the NASA Cell Degradation Dataset and a real-world EV Fleet Dataset. The combination of laboratory-grade and field-level data enables the proposed CNN–LSTM–ViT model to learn robust degradation patterns at the cell level and transfer this knowledge effectively to vehicle-level operation, thereby achieving scalable, sustainable diagnostics aligned with Sustainable Development Goal~7 (SDG~7) for affordable and clean energy.

NASA Cell-Level Dataset The NASA battery aging dataset, released by the Prognostics Center of Excellence (PCoE)~\cite{nasa_dataset}, contains long-term degradation measurements of lithium-ion cells (LiCoO$_2$ chemistry) subjected to controlled cycling at constant current (CC) and constant voltage (CV) conditions. The dataset includes voltage, current, capacity, and temperature readings for 12 individual cells sampled at 1~Hz over approximately 300 complete charge–discharge cycles. Each cell was cycled under distinct loading conditions ranging from 1 °C to 2 °C to induce diverse degradation trajectories.

From this dataset, differential voltage ($dV/dQ$) and incremental capacity ($dQ/dV$) curves were derived to reveal phase transitions and variations in internal resistance. The extracted DVA and ICA features were normalized to the range [0, 1], filtered using a median window ($n=15$) to suppress noise, and segmented into temporal sequences of 256 samples per cycle. This preprocessing ensures consistent feature dimensionality for convolutional and transformer-based feature extraction. The cell-level dataset forms the basis for pretraining the deep learning model, capturing intrinsic electrochemical degradation signatures under well-controlled laboratory conditions.

EV Fleet Pack-Level Dataset The second dataset was collected from a commercial EV fleet operating in real-world conditions across mixed driving cycles. It comprises battery telemetry from both Volkswagen (VW) and Tesla vehicles, each using distinct chemistries: NMC (Nickel Manganese Cobalt) and LFP (Lithium Iron Phosphate). Measurements were recorded by the onboard Battery Management System (BMS) at a sampling rate of 0.2–1 Hz and include pack voltage, current, temperature, and cumulative ampere-hour throughput for approximately 120 cycles per vehicle.

Unlike the NASA dataset, the EV Fleet data are inherently noisy and influenced by environmental and operational variability (e.g., ambient temperature 5–35°C, dynamic current rates between 0.2C–1.2C). Differential voltage and incremental capacity profiles were extracted using post-filtered data (Savitzky–Golay smoothing, order 3, window 25) to maintain diagnostic feature fidelity. This dataset was used to fine-tune the pretrained CNN–LSTM–ViT model during Cell-to-Vehicle transfer, enabling domain adaptation from the laboratory to real-world operation. Approximately 4,200 feature sequences were obtained, divided into 80\% for training and 20\% for validation.

Data Integration and Relevance to SDG~7 The integration of NASA and EV Fleet datasets enables the model to learn both controlled electrochemical dynamics and stochastic real-world degradation effects. This dual-domain learning strategy enhances the model’s generalization while reducing dependence on extensive in-vehicle data acquisition—thereby minimizing experimental energy use and data-collection costs.

From a sustainability standpoint, this approach directly supports SDG~7 by promoting efficient energy use, extending battery life, and reducing waste in EV ecosystems. By reusing knowledge from cell-level experiments, the C2V transfer learning method eliminates redundant testing cycles and lowers laboratory energy consumption by approximately 30\%. Moreover, improved battery health monitoring extends usable lifespan by 35\%, enhances charging efficiency by 6\%, and reduces lifecycle CO$_2$ emissions by 10\%. These outcomes demonstrate that deep learning, when combined with physically grounded diagnostics, can serve as a powerful enabler of clean, affordable energy systems in the transportation sector.

Summary In summary, the dataset framework combines high-resolution laboratory cell data from NASA and field-level telemetry from EV fleets to provide a realistic, sustainable foundation for deep learning-based battery diagnostics. The synergy between these datasets allows the proposed CNN–LSTM–ViT model to bridge the gap between controlled electrochemical understanding and scalable, real-world EV applications. Through efficient data utilization and reduced retraining requirements, the dataset design exemplifies data-driven sustainability and directly contributes to the implementation of SDG~7 targets in next-generation electric mobility.
VGG16 ImageNet Weights: Boost Your CV Models
kaggle.com
zip
Updated Aug 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evil Spirit05 (2024). VGG16 ImageNet Weights: Boost Your CV Models [Dataset]. https://www.kaggle.com/datasets/evilspirit05/vgg16-title/code
Explore at:
zip(54730430 bytes)Available download formats
Dataset updated
Aug 31, 2024
Authors
Evil Spirit05
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The file vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 contains pre-trained weights for the VGG16 convolutional neural network architecture, specifically designed for TensorFlow and Keras frameworks. This file is a crucial resource for researchers and practitioners in the field of deep learning, particularly those working on computer vision tasks.

What is VGG16?

VGG16 is a convolutional neural network architecture proposed by Karen Simonyan and Andrew Zisserman from the University of Oxford in their 2014 paper "Very Deep Convolutional Networks for Large-Scale Image Recognition". This network achieved top results in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, demonstrating exceptional performance in image classification tasks.

Contents of the Weights File

The vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 file contains:

Pre-trained weights for all convolutional layers of the VGG16 network.

Weights for the max-pooling layers.

The file does NOT include weights for the top (fully connected) layers, as indicated by "notop" in the filename.

Key Features

TensorFlow Compatibility: The weights are specifically formatted for use with TensorFlow and Keras, as indicated by "tf_dim_ordering" in the filename. Transfer Learning Ready: By excluding the top layers, this file is ideal for transfer learning applications where you want to use VGG16 as a feature extractor or fine-tune it for your specific task.

Keras Integration: The .h5 format allows for easy loading into Keras models using the load_weights() function.

Pretrained on ImageNet: These weights are the result of training on the vast ImageNet dataset, capturing a rich set of features useful for a wide range of computer vision tasks.

Use Cases

Feature Extraction: Use the pre-trained layers as a fixed feature extractor for your own image datasets.

Transfer Learning: Fine-tune the model on your specific dataset, potentially achieving high performance with less training data.

Baseline Model: Utilize as a strong baseline for computer vision tasks such as image classification, object detection, or semantic segmentation.

Comparative Studies: Use in research to compare against newer architectures or as part of ensemble models.

How to Use

Here's a basic example of how to use these weights in a Keras model:

from tensorflow.keras.applications import VGG16 from tensorflow.keras.models import Model # Load the VGG16 model without top layers base_model = VGG16(weights='path/to/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5', include_top=False, input_shape=(224, 224, 3)) # Add your own top layers x = base_model.output x = GlobalAveragePooling2D()(x) x = Dense(1024, activation='relu')(x) predictions = Dense(num_classes, activation='softmax')(x) # Create your new model model = Model(inputs=base_model.input, outputs=predictions)

Benefits for Your Projects

Reduced Training Time: Start with pre-learned features, significantly reducing the time needed to train your models.

Improved Generalization: Leverage features learned from a diverse and large-scale dataset (ImageNet), potentially improving your model's ability to generalize.

Resource Efficiency: Achieve high performance even with limited computational resources or smaller datasets.

Flexibility: Easily adapt the VGG16 architecture to various image-related tasks beyond simple classification.

File Details

Size: Approximately 58.89 MB

Format: HDF5 (.h5)

Compatibility: TensorFlow 2.x, Keras

Source: Usually downloaded from official Keras repositories

Ethical Considerations

When using these weights, be aware of potential biases inherent in the ImageNet dataset. Consider the ethical implications and potential biases in your specific application. By incorporating this weights file into your projects, you're building upon years of research and development in deep learning for computer vision. It's an excellent starting point for many image-related tasks and can significantly boost the performance of your models.
Cryptocurrency extra data - Cardano
kaggle.com
zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yam Peleg (2022). Cryptocurrency extra data - Cardano [Dataset]. https://www.kaggle.com/datasets/yamqwe/cryptocurrency-extra-data-cardano/code
Explore at:
zip(1254179058 bytes)Available download formats
Dataset updated
Jan 20, 2022
Authors
Yam Peleg
Description
Context:

This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

Introduction

This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

The Data

For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

1. **timestamp** - A timestamp for the minute covered by the row. 2. **Asset_ID** - An ID code for the cryptoasset. 3. **Count** - The number of trades that took place this minute. 4. **Open** - The USD price at the beginning of the minute. 5. **High** - The highest USD price during the minute. 6. **Low** - The lowest USD price during the minute. 7. **Close** - The USD price at the end of the minute. 8. **Volume** - The number of cryptoasset u units traded during the minute. 9. **VWAP** - The volume-weighted average price for the minute. 10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated. 11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition) 12. **Asset_Name** - Human readable Asset name.

Indexing

The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

Usage Example

The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

Baseline Example Notebooks:

Neural Network Starter

LightGBM Starter

Catboost Starter

XGBoost Starter

TabNet Starter

Reinforcement Learning (PPO) Starter

These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

Loose-ends:

This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]

Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]

Filtering: No filtration of 0 volume data is taken place.

Example Visualisations

Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

License

This data is being collected automatically from the crypto exchange Binance.

KobolRSF_VCO

kaggle.com

zip

Updated Jul 1, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Sofia Vallejo (2024). KobolRSF_VCO [Dataset]. https://www.kaggle.com/datasets/bringmethetxcos/kobolrsf-vco

Explore at:

zip(460882131 bytes)Available download formats

Dataset updated

Jul 1, 2024

Authors

Sofia Vallejo

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

RSF Kobol Synthesizer VCO Dataset

The RSF Kobol Synthesizer VCO dataset focuses on the digital emulation of the analog Voltage-Controlled Oscillator (VCO) to capture and replicate the distinctive qualities of analog sound in a digital format. Analog signals, known for their continuous nature and inherent imperfections, contribute to the warmth and depth of sound, characteristics that are highly prized and challenging to mimic digitally. The dataset aims to model the oscillators of the VCO module accurately.

This work was developed as part of a Master's thesis at Universitat Pompeu Fabra in Barcelona, in academic collaboration with the Music Technology Group (MTG).

Files in the Dataset

audio_samples: A directory containing 437 audio samples, each lasting 1.5 seconds, recorded at various control voltage settings on both waveform and frequency.
metadata.csv: Contains metadata for each audio sample.
- sample_id: Unique identifier for each audio sample.
- cv_frequency: Control voltage for frequency adjustment.
- cv_waveform: Control voltage for waveform adjustment.
- waveform_id: Identifier for the type of waveform generated.
- frequency: Estimated frequency of the note.
- angular_frequency: Angular frequency of the waveform.
- waveform_data: Raw audio data converted to numerical format.

The dataset is organised into the following main sections:

Control Voltage and Waveform Settings

The control voltage (CV) is a crucial parameter in voltage-controlled oscillators (VCOs). For this project, steps of 0.33 volts were used for frequency adjustment and 0.5 volts for waveform selection.

Waveform Data Extraction

Waveform data was extracted to include details such as the number of audio channels, sample width, sample rate, and the total number of frames from an audio file.

Waveform Types

Waveforms were categorized using IDs based on their type. The table below lists the waveform types and their corresponding IDs:

ID	Waveform Type
1	Triangular +
2	Triangular + Sawtooth
3	Triangular + Sawtooth +
4	Triangular + Sawtooth ++
5	Sawtooth
6	Sawtooth +
7	Sawtooth ++
8	Sawtooth + Square
9	Sawtooth + Square +
10	Sawtooth + Square ++
11	Square
12	Square +
13	Square ++
14	Square + Pulse
15	Square + Pulse +
16	Square + Pulse ++
17	Pulse
18	Pulse +

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F21440355%2Fb84c18efc08e69cdc07567629b1b93cc%2Fwaveforms.png?generation=1719485538599603&alt=media" alt="Waveform Types">

Feature Extraction

Features were extracted and saved into a CSV file for use in model training and evaluation. These features include:

cv_frequency: Control voltage value for frequency adjustment.
cv_waveform: Control voltage value for waveform selection.
waveform_id: Identifier for the waveform type.
frequency: Estimated frequency of the note using Fast Fourier Transform (FFT).
angular_frequency: Angular frequency calculated from the control voltage.
waveform_data: Raw audio data converted to numerical format.

Methodology for Feature Extraction

Control Voltage of Frequency and Waveform Knobs:
- The control voltage (CV) is an essential parameter in voltage-controlled oscillators (VCOs).
- For this project, frequency adjustments were made in steps of 0.33 volts and waveform selections in steps of 0.5 volts.
Waveform Data:
- Waveform data was extracted using the wave library.
- This data includes details such as the number of audio channels, sample width, sample rate, and the total number of frames from an audio file.
Pitch, Loudness, and Angular Frequency:
- Pitch: Extracted using an algorithm that converts time-domain audio data to its corresponding frequency-domain representation, identifying the fundamental frequency.
- Loudness: Computed by analyzing the amplitude of the audio signal, often using a perceptual loudness model that reflects human hearing sensitivity.
- Angular Frequency: Derived from the fundamental frequency, it is calculated as ( \omega = 2\pi f ), where ( f ) is the fundamental frequency.

Methodology for Dataset Recording

The dataset was recorded using a Focusrite Scarlett 18i20 audio interface, connected to the Doepfer MCV24 MIDI to CV converter. The setup included:

Doepfer MCV24 MIDI to CV Converter: Used to convert MIDI data into control voltage signals, allowing precise control over the RSF Kobol synthesizer's frequency and waveform parameters....

Cryptocurrency extra data - Maker
kaggle.com
zip
Updated Jan 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yam Peleg (2022). Cryptocurrency extra data - Maker [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-maker
Explore at:
zip(1150531041 bytes)Available download formats
Dataset updated
Jan 20, 2022
Authors
Yam Peleg
Description
Context:

This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

Introduction

This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

The Data

For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

1. **timestamp** - A timestamp for the minute covered by the row. 2. **Asset_ID** - An ID code for the cryptoasset. 3. **Count** - The number of trades that took place this minute. 4. **Open** - The USD price at the beginning of the minute. 5. **High** - The highest USD price during the minute. 6. **Low** - The lowest USD price during the minute. 7. **Close** - The USD price at the end of the minute. 8. **Volume** - The number of cryptoasset u units traded during the minute. 9. **VWAP** - The volume-weighted average price for the minute. 10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated. 11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition) 12. **Asset_Name** - Human readable Asset name.

Indexing

The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

Usage Example

The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

Baseline Example Notebooks:

Neural Network Starter

LightGBM Starter

Catboost Starter

XGBoost Starter

TabNet Starter

Reinforcement Learning (PPO) Starter

These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

Loose-ends:

This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]

Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]

Filtering: No filtration of 0 volume data is taken place.

Example Visualisations

Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

License

This data is being collected automatically from the crypto exchange Binance.
Cryptocurrency extra data - TRON
kaggle.com
zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yam Peleg (2022). Cryptocurrency extra data - TRON [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-tron
Explore at:
zip(1253566627 bytes)Available download formats
Dataset updated
Jan 20, 2022
Authors
Yam Peleg
Description
Context:

This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

Introduction

This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

The Data

For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

1. **timestamp** - A timestamp for the minute covered by the row. 2. **Asset_ID** - An ID code for the cryptoasset. 3. **Count** - The number of trades that took place this minute. 4. **Open** - The USD price at the beginning of the minute. 5. **High** - The highest USD price during the minute. 6. **Low** - The lowest USD price during the minute. 7. **Close** - The USD price at the end of the minute. 8. **Volume** - The number of cryptoasset u units traded during the minute. 9. **VWAP** - The volume-weighted average price for the minute. 10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated. 11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition) 12. **Asset_Name** - Human readable Asset name.

Indexing

The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

Usage Example

The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

Baseline Example Notebooks:

Neural Network Starter

LightGBM Starter

Catboost Starter

XGBoost Starter

TabNet Starter

Reinforcement Learning (PPO) Starter

These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

Loose-ends:

This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]

Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]

Filtering: No filtration of 0 volume data is taken place.

Example Visualisations

Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

License

This data is being collected automatically from the crypto exchange Binance.
Cryptocurrency extra data - IOTA
kaggle.com
zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yam Peleg (2022). Cryptocurrency extra data - IOTA [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-iota
Explore at:
zip(1196411839 bytes)Available download formats
Dataset updated
Jan 20, 2022
Authors
Yam Peleg
Description
Context:

This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

Introduction

This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

The Data

For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

1. **timestamp** - A timestamp for the minute covered by the row. 2. **Asset_ID** - An ID code for the cryptoasset. 3. **Count** - The number of trades that took place this minute. 4. **Open** - The USD price at the beginning of the minute. 5. **High** - The highest USD price during the minute. 6. **Low** - The lowest USD price during the minute. 7. **Close** - The USD price at the end of the minute. 8. **Volume** - The number of cryptoasset u units traded during the minute. 9. **VWAP** - The volume-weighted average price for the minute. 10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated. 11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition) 12. **Asset_Name** - Human readable Asset name.

Indexing

The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

Usage Example

The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

Baseline Example Notebooks:

Neural Network Starter

LightGBM Starter

Catboost Starter

XGBoost Starter

TabNet Starter

Reinforcement Learning (PPO) Starter

These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

Loose-ends:

This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]

Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]

Filtering: No filtration of 0 volume data is taken place.

Example Visualisations

Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

License

This data is being collected automatically from the crypto exchange Binance.
Cryptocurrency extra data - Monero
kaggle.com
zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yam Peleg (2022). Cryptocurrency extra data - Monero [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-monero
Explore at:
zip(1204684577 bytes)Available download formats
Dataset updated
Jan 20, 2022
Authors
Yam Peleg
Description
Context:

This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

Introduction

This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

The Data

For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

1. **timestamp** - A timestamp for the minute covered by the row. 2. **Asset_ID** - An ID code for the cryptoasset. 3. **Count** - The number of trades that took place this minute. 4. **Open** - The USD price at the beginning of the minute. 5. **High** - The highest USD price during the minute. 6. **Low** - The lowest USD price during the minute. 7. **Close** - The USD price at the end of the minute. 8. **Volume** - The number of cryptoasset u units traded during the minute. 9. **VWAP** - The volume-weighted average price for the minute. 10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated. 11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition) 12. **Asset_Name** - Human readable Asset name.

Indexing

The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

Usage Example

The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

Baseline Example Notebooks:

Neural Network Starter

LightGBM Starter

Catboost Starter

XGBoost Starter

TabNet Starter

Reinforcement Learning (PPO) Starter

These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

Loose-ends:

This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]

Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]

Filtering: No filtration of 0 volume data is taken place.

Example Visualisations

Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

License

This data is being collected automatically from the crypto exchange Binance.
Cryptocurrency extra data - Binance Coin
kaggle.com
zip
Updated Jan 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yam Peleg (2022). Cryptocurrency extra data - Binance Coin [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-binance-coin
Explore at:
zip(1246039618 bytes)Available download formats
Dataset updated
Jan 19, 2022
Authors
Yam Peleg
Description
Context:

This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

Introduction

This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

The Data

For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

1. **timestamp** - A timestamp for the minute covered by the row. 2. **Asset_ID** - An ID code for the cryptoasset. 3. **Count** - The number of trades that took place this minute. 4. **Open** - The USD price at the beginning of the minute. 5. **High** - The highest USD price during the minute. 6. **Low** - The lowest USD price during the minute. 7. **Close** - The USD price at the end of the minute. 8. **Volume** - The number of cryptoasset u units traded during the minute. 9. **VWAP** - The volume-weighted average price for the minute. 10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated. 11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition) 12. **Asset_Name** - Human readable Asset name.

Indexing

The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

Usage Example

The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

Baseline Example Notebooks:

Neural Network Starter

LightGBM Starter

Catboost Starter

XGBoost Starter

TabNet Starter

Reinforcement Learning (PPO) Starter

These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

Loose-ends:

This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]

Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]

Filtering: No filtration of 0 volume data is taken place.

Example Visualisations

Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

License

This data is being collected automatically from the crypto exchange Binance.
Cryptocurrency extra data - Ethereum Classic
kaggle.com
zip
Updated Jan 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yam Peleg (2022). Cryptocurrency extra data - Ethereum Classic [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-ethereum-classic
Explore at:
zip(1259913408 bytes)Available download formats
Dataset updated
Jan 19, 2022
Authors
Yam Peleg
Description
Context:

This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

Introduction

This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

The Data

For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

1. **timestamp** - A timestamp for the minute covered by the row. 2. **Asset_ID** - An ID code for the cryptoasset. 3. **Count** - The number of trades that took place this minute. 4. **Open** - The USD price at the beginning of the minute. 5. **High** - The highest USD price during the minute. 6. **Low** - The lowest USD price during the minute. 7. **Close** - The USD price at the end of the minute. 8. **Volume** - The number of cryptoasset u units traded during the minute. 9. **VWAP** - The volume-weighted average price for the minute. 10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated. 11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition) 12. **Asset_Name** - Human readable Asset name.

Indexing

The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

Usage Example

The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

Baseline Example Notebooks:

Neural Network Starter

LightGBM Starter

Catboost Starter

XGBoost Starter

TabNet Starter

Reinforcement Learning (PPO) Starter

These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

Loose-ends:

This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]

Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]

Filtering: No filtration of 0 volume data is taken place.

Example Visualisations

Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

License

This data is being collected automatically from the crypto exchange Binance.

Facebook

Twitter

Click to copy link

Link copied

Cite

AKSHAT (2023). Cardiovascular-Disease-Dataset [Dataset]. https://www.kaggle.com/datasets/akshatshaw7/cardiovascular-disease-dataset

Cardiovascular-Disease-Dataset

Health dataset, Simple dataset for EDA and classification models

Explore at:

zip(1021713 bytes)Available download formats

Dataset updated

Dec 14, 2023

Authors

AKSHAT

License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

PLEASE UPVOTE Try some basic EDA on this dataset and try simpler model on this dataset and post your work.

Features:

Age | Objective Feature | age | int (days)

Height | Objective Feature | height | int (cm) |

Weight | Objective Feature | weight | float (kg) |

Gender | Objective Feature | gender | categorical code |

Systolic blood pressure | Examination Feature | ap_hi | int |

Diastolic blood pressure | Examination Feature | ap_lo | int |

Cholesterol | Examination Feature | cholesterol | 1: normal, 2: above normal, 3: well above normal |

Glucose | Examination Feature | gluc | 1: normal, 2: above normal, 3: well above normal |

Smoking | Subjective Feature | smoke | binary |

Alcohol intake | Subjective Feature | alco | binary |

Physical activity | Subjective Feature | active | binary |

Presence or absence of cardiovascular disease | Target Variable | cardio | binary |

Clear search

Close search

Google apps

Main menu

Cardiovascular-Disease-Dataset

Comparison results with the state of the art.

Data_Sheet_1_Rough-set based learning: Assessing patterns and predictability...

MSCardio Seismocardiography (SCG) Dataset

Heart Disease Risk Prediction Dataset

Heart Disease Risk Prediction Dataset

Overview

Dataset Features

Input Features

Symptoms (Binary - Yes/No)

Risk Factors (Binary - Yes/No or Continuous)

Output Label

Data Generation Process

Sources of Inspiration

Books

Research Papers

Existing Datasets

Clinical Guidelines

Applications

Evaluation results of cross validation on DS2-CV for all the ML models.

Raisin Dataset

Pistachio Species Classification

Demographic features of the study population.

Results of 5-fold CV generated by AC model and FFT model on circR2Disease...

Deep Learning-EV Battery Pack Diagnostics (SDG 7)

VGG16 ImageNet Weights: Boost Your CV Models

What is VGG16?

Contents of the Weights File

The vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 file contains:

Key Features

Use Cases

How to Use

Here's a basic example of how to use these weights in a Keras model:

Benefits for Your Projects

File Details

Ethical Considerations

Cryptocurrency extra data - Cardano

Context:

Introduction

The Data

Indexing

Usage Example

Baseline Example Notebooks:

Loose-ends:

Example Visualisations

License

KobolRSF_VCO

RSF Kobol Synthesizer VCO Dataset

Files in the Dataset

Control Voltage and Waveform Settings

Waveform Data Extraction

Waveform Types

Feature Extraction

Methodology for Feature Extraction

Methodology for Dataset Recording

Cryptocurrency extra data - Maker

Context:

Introduction

The Data

Indexing

Usage Example

Baseline Example Notebooks:

Loose-ends:

Example Visualisations

License

Cryptocurrency extra data - TRON

Context:

Introduction

The Data

Indexing

Usage Example

Baseline Example Notebooks:

Loose-ends:

Example Visualisations

License

Cryptocurrency extra data - IOTA

Context:

Introduction

The Data

Indexing