26 datasets found
  1. Cardiovascular-Disease-Dataset

    • kaggle.com
    zip
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AKSHAT (2023). Cardiovascular-Disease-Dataset [Dataset]. https://www.kaggle.com/datasets/akshatshaw7/cardiovascular-disease-dataset
    Explore at:
    zip(1021713 bytes)Available download formats
    Dataset updated
    Dec 14, 2023
    Authors
    AKSHAT
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    PLEASE UPVOTE Try some basic EDA on this dataset and try simpler model on this dataset and post your work.

    Features:

    Age | Objective Feature | age | int (days)

    Height | Objective Feature | height | int (cm) |

    Weight | Objective Feature | weight | float (kg) |

    Gender | Objective Feature | gender | categorical code |

    Systolic blood pressure | Examination Feature | ap_hi | int |

    Diastolic blood pressure | Examination Feature | ap_lo | int |

    Cholesterol | Examination Feature | cholesterol | 1: normal, 2: above normal, 3: well above normal |

    Glucose | Examination Feature | gluc | 1: normal, 2: above normal, 3: well above normal |

    Smoking | Subjective Feature | smoke | binary |

    Alcohol intake | Subjective Feature | alco | binary |

    Physical activity | Subjective Feature | active | binary |

    Presence or absence of cardiovascular disease | Target Variable | cardio | binary |

  2. Comparison results with the state of the art.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fahad Khan; Xiaojun Yu; Zhaohui Yuan; Atiq ur Rehman (2023). Comparison results with the state of the art. [Dataset]. http://doi.org/10.1371/journal.pone.0284791.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Fahad Khan; Xiaojun Yu; Zhaohui Yuan; Atiq ur Rehman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An electrocardiograph (ECG) is widely used in diagnosis and prediction of cardiovascular diseases (CVDs). The traditional ECG classification methods have complex signal processing phases that leads to expensive designs. This paper provides a deep learning (DL) based system that employs the convolutional neural networks (CNNs) for classification of ECG signals present in PhysioNet MIT-BIH Arrhythmia database. The proposed system implements 1-D convolutional deep residual neural network (ResNet) model that performs feature extraction by directly using the input heartbeats. We have used synthetic minority oversampling technique (SMOTE) that process class-imbalance problem in the training dataset and effectively classifies the five heartbeat types in the test dataset. The classifier’s performance is evaluated with ten-fold cross validation (CV) using accuracy, precision, sensitivity, F1-score, and kappa. We have obtained an average accuracy of 98.63%, precision of 92.86%, sensitivity of 92.41%, and specificity of 99.06%. The average F1-score and Kappa obtained were 92.63% and 95.5% respectively. The study shows that proposed ResNet performs well with deep layers compared to other 1-D CNNs.

  3. Data_Sheet_1_Rough-set based learning: Assessing patterns and predictability...

    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sheela Ramanna; Negin Ashrafi; Evan Loster; Karen Debroni; Shelley Turner (2023). Data_Sheet_1_Rough-set based learning: Assessing patterns and predictability of anxiety, depression, and sleep scores associated with the use of cannabinoid-based medicine during COVID-19.pdf [Dataset]. http://doi.org/10.3389/frai.2023.981953.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Sheela Ramanna; Negin Ashrafi; Evan Loster; Karen Debroni; Shelley Turner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently, research is emerging highlighting the potential of cannabinoids' beneficial effects related to anxiety, mood, and sleep disorders as well as pointing to an increased use of cannabinoid-based medicines since COVID-19 was declared a pandemic. The objective of this research is 3 fold: i) to evaluate the relationship of the clinical delivery of cannabinoid-based medicine for anxiety, depression and sleep scores by utilizing machine learning specifically rough set methods; ii) to discover patterns based on patient features such as specific cannabinoid recommendations, diagnosis information, decreasing/increasing levels of clinical assessment tools (CAT) scores over a period of time; and iii) to predict whether new patients could potentially experience either an increase or decrease in CAT scores. The dataset for this study was derived from patient visits to Ekosi Health Centres, Canada over a 2 year period including the COVID timeline. Extensive pre-processing and feature engineering was performed. A class feature indicative of their progress or lack thereof due to the treatment received was introduced. Six Rough/Fuzzy-Rough classifiers as well as Random Forest and RIPPER classifiers were trained on the patient dataset using a 10-fold stratified CV method. The highest overall accuracy, sensitivity and specificity measures of over 99% was obtained using the rule-based rough-set learning model. In this study, we have identified rough-set based machine learning model with high accuracy that could be utilized for future studies regarding cannabinoids and precision medicine.

  4. Z

    MSCardio Seismocardiography (SCG) Dataset

    • data-staging.niaid.nih.gov
    Updated Mar 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taebi, Amirtahà; Rahman, Mohammad Muntasir (2025). MSCardio Seismocardiography (SCG) Dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14975877
    Explore at:
    Dataset updated
    Mar 5, 2025
    Dataset provided by
    Mississippi State University
    Authors
    Taebi, Amirtahà; Rahman, Mohammad Muntasir
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview

    The MSCardio Seismocardiography Dataset is an open-access dataset collected as part of the Mississippi State Remote Cardiovascular Monitoring (MSCardio) study. This dataset includes seismocardiogram (SCG) signals recorded from participants using smartphone sensors, enabling scalable, real-world cardiovascular monitoring without requiring specialized equipment. The dataset aims to support research in SCG signal processing, machine learning applications in health monitoring, and cardiovascular assessment.

    See the GitHub repository of this dataset for the latest updates: https://github.com/TaebiLab/MSCardio

    Background

    Cardiovascular diseases remain the leading cause of morbidity and mortality worldwide. SCG is a non-invasive technique that captures chest vibrations induced by cardiac activity and respiration, providing valuable insights into cardiac function. However, the scarcity of open-access SCG datasets has been a significant limitation for research in this field. The MSCardio dataset addresses this gap by providing real-world SCG signals collected via smartphone sensors from a diverse population.

    Data Description

    Study Population

    Total participants enrolled: 123

    Participants who uploaded data: 108 (46 males, 61 females, 1 unspecified)

    Age range: 18 to 62 years

    Total recordings uploaded: 515

    Unique recordings after duplicate removal: 502

    Platforms used: iOS and Android smartphones

    Signal Data

    Axial vibrations in three directions (SCG) recorded using smartphone sensors

    Sampling frequency varies depending on the device capabilities

    Data synchronization is ensured for temporal accuracy

    Missing SCG data identified in certain recordings, addressed through preprocessing

    Metadata

    Each recording includes:

    Device model (e.g., iPhone Pro Max)

    Recording time (UTC) and time zone

    Platform (iOS or Android)

    General demographic details (gender, race, age, height, weight)

    File Structure

    The dataset is organized as follows:

    MSCardio_SCG_Dataset/│── info/│ └── all_subject_data.csv # Consolidated metadata for all subjects│── MSCardio/│ ├── Subject_XXXX/ # Subject-specific folder│ │ ├── general_metadata.json # Demographic and device information│ │ ├── Recording_XXX/ # Individual recordings│ │ │ ├── scg.csv # SCG signal data│ │ │ ├── recording_metadata.json # Timestamp and device details

    Data Collection Protocol

    Participants placed their smartphone on their chest while lying in a supine position.

    The app recorded SCG signals for approximately two minutes.

    Self-reported demographic data were collected.

    Data were uploaded to the study's cloud storage.

    Usage and Applications

    This dataset is intended for research in:

    SCG signal processing and feature extraction

    Machine learning applications in cardiovascular monitoring

    Investigating inter- and intra-subject variability in SCG signals

    Remote cardiovascular health assessment

    The Data_visualization.py script is provided for data visualization

    Citation

    If you use this dataset in your research, please cite:

    @article{rahman2025MSCardio, author = {Taebi, Amirtah{`a} and Rahman, Mohammad Muntasir}, title = {MSCardio: Initial insights from remote monitoring of cardiovascular-induced chest vibrations via smartphones}, journal = {Data in Brief}, year = {2025}, publisher = {Elsevier}}

    Contact

    For any questions regarding the dataset, please contact:

    Amirtahà Taebi and Mohammad Muntasir Rahman

    E-mail: ataebi@abe.msstate.edu, mmr510@msstate.edu

    Biomedical Engineering Program, Mississippi State University

    This dataset is provided under an open-access license. Please ensure ethical and responsible use when utilizing this dataset for research.

  5. Heart Disease Risk Prediction Dataset

    • kaggle.com
    zip
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahatir Ahmed Tusher (2025). Heart Disease Risk Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/mahatiratusher/heart-disease-risk-prediction-dataset
    Explore at:
    zip(1448235 bytes)Available download formats
    Dataset updated
    Feb 7, 2025
    Authors
    Mahatir Ahmed Tusher
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Heart Disease Risk Prediction Dataset

    Overview

    This synthetic dataset is designed to predict the risk of heart disease based on a combination of symptoms, lifestyle factors, and medical history. Each row in the dataset represents a patient, with binary (Yes/No) indicators for symptoms and risk factors, along with a computed risk label indicating whether the patient is at high or low risk of developing heart disease.

    The dataset contains 70,000 samples, making it suitable for training machine learning models for classification tasks. The goal is to provide researchers, data scientists, and healthcare professionals with a clean and structured dataset to explore predictive modeling for cardiovascular health.

    This dataset is a side project of EarlyMed, developed by students of Vellore Institute of Technology (VIT-AP). EarlyMed aims to leverage data science and machine learning for early detection and prevention of chronic diseases.

    Dataset Features

    Input Features

    Symptoms (Binary - Yes/No)

    1. Chest Pain (chest_pain): Presence of chest pain, a common symptom of heart disease.
    2. Shortness of Breath (shortness_of_breath): Difficulty breathing, often associated with heart conditions.
    3. Unexplained Fatigue (fatigue): Persistent tiredness without an obvious cause.
    4. Palpitations (palpitations): Irregular or rapid heartbeat.
    5. Dizziness/Fainting (dizziness): Episodes of lightheadedness or fainting.
    6. Swelling in Legs/Ankles (swelling): Swelling due to fluid retention, often linked to heart failure.
    7. Pain in Arm/Jaw/Neck/Back (radiating_pain): Radiating pain, a hallmark of angina or heart attacks.
    8. Cold Sweats & Nausea (cold_sweats): Symptoms commonly associated with acute cardiac events.

    Risk Factors (Binary - Yes/No or Continuous)

    1. Age (age): Patient's age in years (continuous variable).
    2. High Blood Pressure (hypertension): History of hypertension (Yes/No).
    3. High Cholesterol (cholesterol_high): Elevated cholesterol levels (Yes/No).
    4. Diabetes (diabetes): Diagnosis of diabetes (Yes/No).
    5. Smoking History (smoker): Whether the patient is a smoker (Yes/No).
    6. Obesity (obesity): Obesity status (Yes/No).
    7. Family History of Heart Disease (family_history): Family history of cardiovascular conditions (Yes/No).

    Output Label

    • Heart Disease Risk (risk_label): Binary label indicating the risk of heart disease:
      • 0: Low risk
      • 1: High risk

    Data Generation Process

    This dataset was synthetically generated using Python libraries such as numpy and pandas. The generation process ensured a balanced distribution of high-risk and low-risk cases while maintaining realistic correlations between features. For example: - Patients with multiple risk factors (e.g., smoking, hypertension, and diabetes) were more likely to be labeled as high risk. - Symptom patterns were modeled after clinical guidelines and research studies on heart disease.

    Sources of Inspiration

    The design of this dataset was inspired by the following resources:

    Books

    • "Harrison's Principles of Internal Medicine" by J. Larry Jameson et al.: A comprehensive resource on cardiovascular diseases and their symptoms.
    • "Mayo Clinic Cardiology" by Joseph G. Murphy et al.: Provides insights into heart disease risk factors and diagnostic criteria.

    Research Papers

    • Framingham Heart Study: A landmark study identifying key risk factors for cardiovascular disease.
    • American Heart Association (AHA) Guidelines: Recommendations for diagnosing and managing heart disease.

    Existing Datasets

    • UCI Heart Disease Dataset: A widely used dataset for heart disease prediction.
    • Kaggle’s Heart Disease datasets: Various datasets contributed by the community.

    Clinical Guidelines

    • Centers for Disease Control and Prevention (CDC): Information on heart disease symptoms and risk factors.
    • World Health Organization (WHO): Global statistics and risk factor analysis for cardiovascular diseases.

    Applications

    This dataset can be used for a variety of purposes:

    1. Machine Learning Research:

      • Train classification models (e.g., Logistic Regression, Random Forest, XGBoost) to predict heart disease risk.
      • Experiment with feature engineering, model tuning, and evaluation metrics like Accuracy, Precision, Recall, and ROC-AUC.
    2. Healthcare Analytics:

      • Identify key risk factors contributing to heart disease.
      • Develop decision support systems for early detection of cardiovascular risks.
    3. Educational Purposes:

      • Teach students and practitioners about predictive modeling in healthcare.
      • Demonstrate the importance of feature selection...
  6. f

    Evaluation results of cross validation on DS2-CV for all the ML models.

    • plos.figshare.com
    xls
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jen-Chieh Yu; Kuan Ni; Ching-Tai Chen (2024). Evaluation results of cross validation on DS2-CV for all the ML models. [Dataset]. http://doi.org/10.1371/journal.pone.0307176.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Jen-Chieh Yu; Kuan Ni; Ching-Tai Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bald face indicates the highest value among all the methods.

  7. Raisin Dataset

    • kaggle.com
    zip
    Updated Apr 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murat KOKLU (2022). Raisin Dataset [Dataset]. https://www.kaggle.com/datasets/muratkokludataset/raisin-dataset/versions/1
    Explore at:
    zip(115045 bytes)Available download formats
    Dataset updated
    Apr 3, 2022
    Authors
    Murat KOKLU
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    CV:https://www.muratkoklu.com/en/publications/ DATASET: https://www.muratkoklu.com/datasets/

    Citation Request : CINAR I., KOKLU M. and TASDEMIR S., (2020). Classification of Raisin Grains Using Machine Vision and Artificial Intelligence Methods, Gazi Journal of Engineering Sciences, vol. 6, no. 3, pp. 200-209, December, 2020, DOI: https://doi.org/10.30855/gmbd.2020.03.03

    Article Download (PDF): https://dergipark.org.tr/tr/download/article-file/1227592

    ABSTRACT: In this study, machine vision system was developed in order to distinguish between two different variety of raisins (Kecimen and Besni) grown in Turkey. Firstly, a total of 900 pieces raisin grains were obtained, from an equal number of both varieties. These images were subjected to various preprocessing steps and 7 morphological feature extraction operations were performed using image processing techniques. In addition, minimum, mean, maximum and standard deviation statistical information was calculated for each feature. The distributions of both raisin varieties on the features were examined and these distributions were shown on the graphs. Later, models were created using LR, MLP, and SVM machine learning techniques and performance measurements were performed. The classification achieved 85.22% with LR, 86.33% with MLP and 86.44% with the highest classification accuracy obtained in the study with SVM. Considering the number of data available, it is possible to say that the study was successful.

  8. Pistachio Species Classification

    • kaggle.com
    zip
    Updated May 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav Dutta (2023). Pistachio Species Classification [Dataset]. https://www.kaggle.com/datasets/gauravduttakiit/pistachio-species-classification
    Explore at:
    zip(46130452 bytes)Available download formats
    Dataset updated
    May 20, 2023
    Authors
    Gaurav Dutta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    DATASET: https://www.muratkoklu.com/datasets/ CV:https://www.muratkoklu.com/en/publications/

    Pistachio Image Dataset Citation Request :

    OZKAN IA., KOKLU M. and SARACOGLU R. (2021). Classification of Pistachio Species Using Improved K-NN Classifier. Progress in Nutrition, Vol. 23, N. 2, pp. DOI:10.23751/pn.v23i2.9686. (Open Access) https://www.mattioli1885journals.com/index.php/progressinnutrition/article/view/9686/9178

    SINGH D, TASPINAR YS, KURSUN R, CINAR I, KOKLU M, OZKAN IA, LEE H-N., (2022). Classification and Analysis of Pistachio Species with Pre-Trained Deep Learning Models, Electronics, 11 (7), 981. https://doi.org/10.3390/electronics11070981. (Open Access)

    Article Download (PDF): 1: https://www.mattioli1885journals.com/index.php/progressinnutrition/article/view/9686/9178 2: https://doi.org/10.3390/electronics11070981

    DATASET: https://www.muratkoklu.com/datasets/

    ABSTRACT: To keep the economic value of pistachio nuts which have an important place in the agricultural economy, the efficiency of post-harvest industrial processes is very important. To provide this efficiency, new methods and technologies are needed for the separation and classification of pistachios. Different pistachio species address different markets, which increases the need for the classification of pistachio species. This study, it is aimed to develop a classification model different from traditional separation methods, based on image processing and artificial intelligence which are capable to provide the required classification. A computer vision system has been developed to distinguish two different species of pistachios with different characteristics that address different market types. 2148 sample images for these two kinds of pistachios were taken with a high-resolution camera. The image processing techniques, segmentation, and feature extraction were applied to the obtained images of the pistachio samples. A pistachio dataset that has sixteen attributes was created. An advanced classifier based on the k-NN method, which is a simple and successful classifier, and principal component analysis was designed on the obtained dataset. In this study; a multi-level system including feature extraction, dimension reduction, and dimension weighting stages has been proposed. Experimental results showed that the proposed approach achieved a classification success of 94.18%. The presented high-performance classification model provides an important need for the separation of pistachio species and increases the economic value of species. In addition, the developed model is important in terms of its application to similar studies.

  9. f

    Demographic features of the study population.

    • figshare.com
    xls
    Updated Jan 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nahida Akter; Jack Gordon; Sherry Li; Mikki Poon; Stuart Perry; John Fletcher; Thomas Chan; Andrew White; Maitreyee Roy (2025). Demographic features of the study population. [Dataset]. http://doi.org/10.1371/journal.pone.0316919.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 17, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Nahida Akter; Jack Gordon; Sherry Li; Mikki Poon; Stuart Perry; John Fletcher; Thomas Chan; Andrew White; Maitreyee Roy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PurposeIn this study, we investigated the performance of deep learning (DL) models to differentiate between normal and glaucomatous visual fields (VFs) and classify glaucoma from early to the advanced stage to observe if the DL model can stage glaucoma as Mills criteria using only the pattern deviation (PD) plots. The DL model results were compared with a machine learning (ML) classifier trained on conventional VF parameters.MethodsA total of 265 PD plots and 265 numerical datasets of Humphrey 24–2 VF images were collected from 119 normal and 146 glaucomatous eyes to train the DL models to classify the images into four groups: normal, early glaucoma, moderate glaucoma, and advanced glaucoma. The two popular pre-trained DL models: ResNet18 and VGG16, were used to train the PD images using five-fold cross-validation (CV) and observed the performance using balanced, pre-augmented data (n = 476 images), imbalanced original data (n = 265) and feature extraction. The trained images were further investigated using the Grad-CAM visualization technique. Moreover, four ML models were trained from the global indices: mean deviation (MD), pattern standard deviation (PSD) and visual field index (VFI), using five-fold CV to compare the classification performance with the DL model’s result.ResultsThe DL model, ResNet18 trained from balanced, pre-augmented PD images, achieved high accuracy in classifying the groups with an overall F1-score: 96.8%, precision: 97.0%, recall: 96.9%, and specificity: 99.0%. The highest F1 score was 87.8% for ResNet18 with the original dataset and 88.7% for VGG16 with feature extraction. The DL models successfully localized the affected VF loss in PD plots. Among the ML models, the random forest (RF) classifier performed best with an F1 score of 96%.ConclusionThe DL model trained from PD plots was promising in differentiating normal and glaucomatous groups and performed similarly to conventional global indices. Hence, the evidence-based DL model trained from PD images demonstrated that the DL model could stage glaucoma using only PD plots like Mills criteria. This automated DL model will assist clinicians in precision glaucoma detection and progression management during extensive glaucoma screening.

  10. Results of 5-fold CV generated by AC model and FFT model on circR2Disease...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lei Wang; Zhu-Hong You; Yang-Ming Li; Kai Zheng; Yu-An Huang (2023). Results of 5-fold CV generated by AC model and FFT model on circR2Disease dataset. [Dataset]. http://doi.org/10.1371/journal.pcbi.1007568.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lei Wang; Zhu-Hong You; Yang-Ming Li; Kai Zheng; Yu-An Huang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Results of 5-fold CV generated by AC model and FFT model on circR2Disease dataset.

  11. Deep Learning-EV Battery Pack Diagnostics (SDG 7)

    • kaggle.com
    zip
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr.Tawfikr Rahman (2025). Deep Learning-EV Battery Pack Diagnostics (SDG 7) [Dataset]. https://www.kaggle.com/datasets/drtawfikrrahman/deep-learning-ev-battery-pack-diagnostics-sdg-7
    Explore at:
    zip(41921744 bytes)Available download formats
    Dataset updated
    Oct 25, 2025
    Authors
    Dr.Tawfikr Rahman
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    ** Deep Learning for EV Battery Pack Diagnostics (SDG~7)** This study employs two complementary datasets to develop and validate the proposed deep learning-driven Cell-to-Vehicle (C2V) diagnostic framework: the NASA Cell Degradation Dataset and a real-world EV Fleet Dataset. The combination of laboratory-grade and field-level data enables the proposed CNN–LSTM–ViT model to learn robust degradation patterns at the cell level and transfer this knowledge effectively to vehicle-level operation, thereby achieving scalable, sustainable diagnostics aligned with Sustainable Development Goal~7 (SDG~7) for affordable and clean energy.

    NASA Cell-Level Dataset The NASA battery aging dataset, released by the Prognostics Center of Excellence (PCoE)~\cite{nasa_dataset}, contains long-term degradation measurements of lithium-ion cells (LiCoO$_2$ chemistry) subjected to controlled cycling at constant current (CC) and constant voltage (CV) conditions. The dataset includes voltage, current, capacity, and temperature readings for 12 individual cells sampled at 1~Hz over approximately 300 complete charge–discharge cycles. Each cell was cycled under distinct loading conditions ranging from 1 °C to 2 °C to induce diverse degradation trajectories.

    From this dataset, differential voltage ($dV/dQ$) and incremental capacity ($dQ/dV$) curves were derived to reveal phase transitions and variations in internal resistance. The extracted DVA and ICA features were normalized to the range [0, 1], filtered using a median window ($n=15$) to suppress noise, and segmented into temporal sequences of 256 samples per cycle. This preprocessing ensures consistent feature dimensionality for convolutional and transformer-based feature extraction. The cell-level dataset forms the basis for pretraining the deep learning model, capturing intrinsic electrochemical degradation signatures under well-controlled laboratory conditions.

    EV Fleet Pack-Level Dataset The second dataset was collected from a commercial EV fleet operating in real-world conditions across mixed driving cycles. It comprises battery telemetry from both Volkswagen (VW) and Tesla vehicles, each using distinct chemistries: NMC (Nickel Manganese Cobalt) and LFP (Lithium Iron Phosphate). Measurements were recorded by the onboard Battery Management System (BMS) at a sampling rate of 0.2–1 Hz and include pack voltage, current, temperature, and cumulative ampere-hour throughput for approximately 120 cycles per vehicle.

    Unlike the NASA dataset, the EV Fleet data are inherently noisy and influenced by environmental and operational variability (e.g., ambient temperature 5–35°C, dynamic current rates between 0.2C–1.2C). Differential voltage and incremental capacity profiles were extracted using post-filtered data (Savitzky–Golay smoothing, order 3, window 25) to maintain diagnostic feature fidelity. This dataset was used to fine-tune the pretrained CNN–LSTM–ViT model during Cell-to-Vehicle transfer, enabling domain adaptation from the laboratory to real-world operation. Approximately 4,200 feature sequences were obtained, divided into 80\% for training and 20\% for validation.

    Data Integration and Relevance to SDG~7 The integration of NASA and EV Fleet datasets enables the model to learn both controlled electrochemical dynamics and stochastic real-world degradation effects. This dual-domain learning strategy enhances the model’s generalization while reducing dependence on extensive in-vehicle data acquisition—thereby minimizing experimental energy use and data-collection costs.

    From a sustainability standpoint, this approach directly supports SDG~7 by promoting efficient energy use, extending battery life, and reducing waste in EV ecosystems. By reusing knowledge from cell-level experiments, the C2V transfer learning method eliminates redundant testing cycles and lowers laboratory energy consumption by approximately 30\%. Moreover, improved battery health monitoring extends usable lifespan by 35\%, enhances charging efficiency by 6\%, and reduces lifecycle CO$_2$ emissions by 10\%. These outcomes demonstrate that deep learning, when combined with physically grounded diagnostics, can serve as a powerful enabler of clean, affordable energy systems in the transportation sector.

    Summary In summary, the dataset framework combines high-resolution laboratory cell data from NASA and field-level telemetry from EV fleets to provide a realistic, sustainable foundation for deep learning-based battery diagnostics. The synergy between these datasets allows the proposed CNN–LSTM–ViT model to bridge the gap between controlled electrochemical understanding and scalable, real-world EV applications. Through efficient data utilization and reduced retraining requirements, the dataset design exemplifies data-driven sustainability and directly contributes to the implementation of SDG~7 targets in next-generation electric mobility.

  12. VGG16 ImageNet Weights: Boost Your CV Models

    • kaggle.com
    zip
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evil Spirit05 (2024). VGG16 ImageNet Weights: Boost Your CV Models [Dataset]. https://www.kaggle.com/datasets/evilspirit05/vgg16-title/code
    Explore at:
    zip(54730430 bytes)Available download formats
    Dataset updated
    Aug 31, 2024
    Authors
    Evil Spirit05
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
    The file vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 contains pre-trained weights for the VGG16 convolutional neural network architecture, specifically designed for TensorFlow and Keras frameworks. This file is a crucial resource for researchers and practitioners in the field of deep learning, particularly those working on computer vision tasks.
    

    What is VGG16?

    VGG16 is a convolutional neural network architecture proposed by Karen Simonyan and Andrew Zisserman from the University of Oxford in their 2014 paper "Very Deep Convolutional Networks for Large-Scale Image Recognition". This network achieved top results in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, demonstrating exceptional performance in image classification tasks.
    

    Contents of the Weights File

    The vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 file contains:

    • Pre-trained weights for all convolutional layers of the VGG16 network.
    • Weights for the max-pooling layers.
    • The file does NOT include weights for the top (fully connected) layers, as indicated by "notop" in the filename.

    Key Features

    • TensorFlow Compatibility: The weights are specifically formatted for use with TensorFlow and Keras, as indicated by "tf_dim_ordering" in the filename. Transfer Learning Ready: By excluding the top layers, this file is ideal for transfer learning applications where you want to use VGG16 as a feature extractor or fine-tune it for your specific task.
    • Keras Integration: The .h5 format allows for easy loading into Keras models using the load_weights() function.
    • Pretrained on ImageNet: These weights are the result of training on the vast ImageNet dataset, capturing a rich set of features useful for a wide range of computer vision tasks.

    Use Cases

    • Feature Extraction: Use the pre-trained layers as a fixed feature extractor for your own image datasets.
    • Transfer Learning: Fine-tune the model on your specific dataset, potentially achieving high performance with less training data.
    • Baseline Model: Utilize as a strong baseline for computer vision tasks such as image classification, object detection, or semantic segmentation.
    • Comparative Studies: Use in research to compare against newer architectures or as part of ensemble models.

    How to Use

    Here's a basic example of how to use these weights in a Keras model:

    from tensorflow.keras.applications import VGG16
    from tensorflow.keras.models import Model
    
    # Load the VGG16 model without top layers
    base_model = VGG16(weights='path/to/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5', 
              include_top=False, 
              input_shape=(224, 224, 3))
    
    # Add your own top layers
    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(1024, activation='relu')(x)
    predictions = Dense(num_classes, activation='softmax')(x)
    
    # Create your new model
    model = Model(inputs=base_model.input, outputs=predictions)
    

    Benefits for Your Projects

    • Reduced Training Time: Start with pre-learned features, significantly reducing the time needed to train your models.
    • Improved Generalization: Leverage features learned from a diverse and large-scale dataset (ImageNet), potentially improving your model's ability to generalize.
    • Resource Efficiency: Achieve high performance even with limited computational resources or smaller datasets.
    • Flexibility: Easily adapt the VGG16 architecture to various image-related tasks beyond simple classification.

    File Details

    • Size: Approximately 58.89 MB
    • Format: HDF5 (.h5)
    • Compatibility: TensorFlow 2.x, Keras
    • Source: Usually downloaded from official Keras repositories

    Ethical Considerations

    When using these weights, be aware of potential biases inherent in the ImageNet dataset. Consider the ethical implications and potential biases in your specific application.
    By incorporating this weights file into your projects, you're building upon years of research and development in deep learning for computer vision. It's an excellent starting point for many image-related tasks and can significantly boost the performance of your models.
    
  13. Cryptocurrency extra data - Cardano

    • kaggle.com
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - Cardano [Dataset]. https://www.kaggle.com/datasets/yamqwe/cryptocurrency-extra-data-cardano/code
    Explore at:
    zip(1254179058 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  14. KobolRSF_VCO

    • kaggle.com
    zip
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Vallejo (2024). KobolRSF_VCO [Dataset]. https://www.kaggle.com/datasets/bringmethetxcos/kobolrsf-vco
    Explore at:
    zip(460882131 bytes)Available download formats
    Dataset updated
    Jul 1, 2024
    Authors
    Sofia Vallejo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    RSF Kobol Synthesizer VCO Dataset

    The RSF Kobol Synthesizer VCO dataset focuses on the digital emulation of the analog Voltage-Controlled Oscillator (VCO) to capture and replicate the distinctive qualities of analog sound in a digital format. Analog signals, known for their continuous nature and inherent imperfections, contribute to the warmth and depth of sound, characteristics that are highly prized and challenging to mimic digitally. The dataset aims to model the oscillators of the VCO module accurately.

    This work was developed as part of a Master's thesis at Universitat Pompeu Fabra in Barcelona, in academic collaboration with the Music Technology Group (MTG).

    Files in the Dataset

    • audio_samples: A directory containing 437 audio samples, each lasting 1.5 seconds, recorded at various control voltage settings on both waveform and frequency.
    • metadata.csv: Contains metadata for each audio sample.
      • sample_id: Unique identifier for each audio sample.
      • cv_frequency: Control voltage for frequency adjustment.
      • cv_waveform: Control voltage for waveform adjustment.
      • waveform_id: Identifier for the type of waveform generated.
      • frequency: Estimated frequency of the note.
      • angular_frequency: Angular frequency of the waveform.
      • waveform_data: Raw audio data converted to numerical format.

    The dataset is organised into the following main sections:

    Control Voltage and Waveform Settings

    The control voltage (CV) is a crucial parameter in voltage-controlled oscillators (VCOs). For this project, steps of 0.33 volts were used for frequency adjustment and 0.5 volts for waveform selection.

    Waveform Data Extraction

    Waveform data was extracted to include details such as the number of audio channels, sample width, sample rate, and the total number of frames from an audio file.

    Waveform Types

    Waveforms were categorized using IDs based on their type. The table below lists the waveform types and their corresponding IDs:

    IDWaveform Type
    1Triangular +
    2Triangular + Sawtooth
    3Triangular + Sawtooth +
    4Triangular + Sawtooth ++
    5Sawtooth
    6Sawtooth +
    7Sawtooth ++
    8Sawtooth + Square
    9Sawtooth + Square +
    10Sawtooth + Square ++
    11Square
    12Square +
    13Square ++
    14Square + Pulse
    15Square + Pulse +
    16Square + Pulse ++
    17Pulse
    18Pulse +

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F21440355%2Fb84c18efc08e69cdc07567629b1b93cc%2Fwaveforms.png?generation=1719485538599603&alt=media" alt="Waveform Types">

    Feature Extraction

    Features were extracted and saved into a CSV file for use in model training and evaluation. These features include:

    • cv_frequency: Control voltage value for frequency adjustment.
    • cv_waveform: Control voltage value for waveform selection.
    • waveform_id: Identifier for the waveform type.
    • frequency: Estimated frequency of the note using Fast Fourier Transform (FFT).
    • angular_frequency: Angular frequency calculated from the control voltage.
    • waveform_data: Raw audio data converted to numerical format.

    Methodology for Feature Extraction

    • Control Voltage of Frequency and Waveform Knobs:

      • The control voltage (CV) is an essential parameter in voltage-controlled oscillators (VCOs).
      • For this project, frequency adjustments were made in steps of 0.33 volts and waveform selections in steps of 0.5 volts.
    • Waveform Data:

      • Waveform data was extracted using the wave library.
      • This data includes details such as the number of audio channels, sample width, sample rate, and the total number of frames from an audio file.
    • Pitch, Loudness, and Angular Frequency:

      • Pitch: Extracted using an algorithm that converts time-domain audio data to its corresponding frequency-domain representation, identifying the fundamental frequency.
      • Loudness: Computed by analyzing the amplitude of the audio signal, often using a perceptual loudness model that reflects human hearing sensitivity.
      • Angular Frequency: Derived from the fundamental frequency, it is calculated as ( \omega = 2\pi f ), where ( f ) is the fundamental frequency.

    Methodology for Dataset Recording

    The dataset was recorded using a Focusrite Scarlett 18i20 audio interface, connected to the Doepfer MCV24 MIDI to CV converter. The setup included:

    • Doepfer MCV24 MIDI to CV Converter: Used to convert MIDI data into control voltage signals, allowing precise control over the RSF Kobol synthesizer's frequency and waveform parameters....
  15. Cryptocurrency extra data - Maker

    • kaggle.com
    zip
    Updated Jan 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - Maker [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-maker
    Explore at:
    zip(1150531041 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  16. Cryptocurrency extra data - TRON

    • kaggle.com
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - TRON [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-tron
    Explore at:
    zip(1253566627 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  17. Cryptocurrency extra data - IOTA

    • kaggle.com
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - IOTA [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-iota
    Explore at:
    zip(1196411839 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  18. Cryptocurrency extra data - Monero

    • kaggle.com
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - Monero [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-monero
    Explore at:
    zip(1204684577 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  19. Cryptocurrency extra data - Binance Coin

    • kaggle.com
    zip
    Updated Jan 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - Binance Coin [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-binance-coin
    Explore at:
    zip(1246039618 bytes)Available download formats
    Dataset updated
    Jan 19, 2022
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  20. Cryptocurrency extra data - Ethereum Classic

    • kaggle.com
    zip
    Updated Jan 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - Ethereum Classic [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-ethereum-classic
    Explore at:
    zip(1259913408 bytes)Available download formats
    Dataset updated
    Jan 19, 2022
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
AKSHAT (2023). Cardiovascular-Disease-Dataset [Dataset]. https://www.kaggle.com/datasets/akshatshaw7/cardiovascular-disease-dataset
Organization logo

Cardiovascular-Disease-Dataset

Health dataset, Simple dataset for EDA and classification models

Explore at:
zip(1021713 bytes)Available download formats
Dataset updated
Dec 14, 2023
Authors
AKSHAT
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

PLEASE UPVOTE Try some basic EDA on this dataset and try simpler model on this dataset and post your work.

Features:

Age | Objective Feature | age | int (days)

Height | Objective Feature | height | int (cm) |

Weight | Objective Feature | weight | float (kg) |

Gender | Objective Feature | gender | categorical code |

Systolic blood pressure | Examination Feature | ap_hi | int |

Diastolic blood pressure | Examination Feature | ap_lo | int |

Cholesterol | Examination Feature | cholesterol | 1: normal, 2: above normal, 3: well above normal |

Glucose | Examination Feature | gluc | 1: normal, 2: above normal, 3: well above normal |

Smoking | Subjective Feature | smoke | binary |

Alcohol intake | Subjective Feature | alco | binary |

Physical activity | Subjective Feature | active | binary |

Presence or absence of cardiovascular disease | Target Variable | cardio | binary |

Search
Clear search
Close search
Google apps
Main menu