100+ datasets found
  1. p

    PTB-XL, a large publicly available electrocardiography dataset

    • physionet.org
    • maplerate.net
    Updated Nov 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Wagner; Nils Strodthoff; Ralf-Dieter Bousseljot; Wojciech Samek; Tobias Schaeffter (2022). PTB-XL, a large publicly available electrocardiography dataset [Dataset]. http://doi.org/10.13026/kfzx-aw45
    Explore at:
    Dataset updated
    Nov 9, 2022
    Authors
    Patrick Wagner; Nils Strodthoff; Ralf-Dieter Bousseljot; Wojciech Samek; Tobias Schaeffter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.

    The PTB-XL ECG dataset is a large dataset of 21799 clinical 12-lead ECGs from 18869 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.

  2. m

    ECG Images dataset of Cardiac Patients

    • data.mendeley.com
    Updated Mar 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Haider Khan (2021). ECG Images dataset of Cardiac Patients [Dataset]. http://doi.org/10.17632/gwbz3fsgp8.2
    Explore at:
    Dataset updated
    Mar 19, 2021
    Authors
    Ali Haider Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ECG images dataset of Cardiac Patients created under the auspices of Ch. Pervaiz Elahi Institute of Cardiology Multan, Pakistan that aims to help the scientific community for conducting the research for Cardiovascular diseases.

  3. CODE-test: An annotated 12-lead ECG dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 7, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio H Ribeiro; Antonio H Ribeiro; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Gabriela M. Paixão; Derick M. Oliveira; Derick M. Oliveira; Paulo R. Gomes; Paulo R. Gomes; Jéssica A. Canazart; Jéssica A. Canazart; Milton P. Ferreira; Milton P. Ferreira; Carl R. Andersson; Carl R. Andersson; Peter W. Macfarlane; Peter W. Macfarlane; Wagner Meira Jr.; Wagner Meira Jr.; Thomas B. Schön; Thomas B. Schön; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro (2021). CODE-test: An annotated 12-lead ECG dataset [Dataset]. http://doi.org/10.5281/zenodo.3765780
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 7, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antonio H Ribeiro; Antonio H Ribeiro; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Gabriela M. Paixão; Derick M. Oliveira; Derick M. Oliveira; Paulo R. Gomes; Paulo R. Gomes; Jéssica A. Canazart; Jéssica A. Canazart; Milton P. Ferreira; Milton P. Ferreira; Carl R. Andersson; Carl R. Andersson; Peter W. Macfarlane; Peter W. Macfarlane; Wagner Meira Jr.; Wagner Meira Jr.; Thomas B. Schön; Thomas B. Schön; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    # Annotated 12 lead ECG dataset
    
    Contain 827 ECG tracings from different patients, annotated by several cardiologists, residents and medical students. It is used as test set on the paper: "Automatic diagnosis of the 12-lead ECG using a deep neural network". https://www.nature.com/articles/s41467-020-15432-4.
    
    It contain annotations about 6 different ECGs abnormalities:
    - 1st degree AV block (1dAVb);
    - right bundle branch block (RBBB);
    - left bundle branch block (LBBB);
    - sinus bradycardia (SB);
    - atrial fibrillation (AF); and,
    - sinus tachycardia (ST).
    
    Companion python scripts are available in:
    https://github.com/antonior92/automatic-ecg-diagnosis
    
    --------
    
    Citation
    ```
    Ribeiro, A.H., Ribeiro, M.H., Paixão, G.M.M. et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun 11, 1760 (2020). https://doi.org/10.1038/s41467-020-15432-4
    ```
    
    Bibtex:
    ```
    @article{ribeiro_automatic_2020,
     title = {Automatic Diagnosis of the 12-Lead {{ECG}} Using a Deep Neural Network},
     author = {Ribeiro, Ant{\^o}nio H. and Ribeiro, Manoel Horta and Paix{\~a}o, Gabriela M. M. and Oliveira, Derick M. and Gomes, Paulo R. and Canazart, J{\'e}ssica A. and Ferreira, Milton P. S. and Andersson, Carl R. and Macfarlane, Peter W. and Meira Jr., Wagner and Sch{\"o}n, Thomas B. and Ribeiro, Antonio Luiz P.},
     year = {2020},
     volume = {11},
     pages = {1760},
     doi = {https://doi.org/10.1038/s41467-020-15432-4},
     journal = {Nature Communications},
     number = {1}
    }
    ```
    -----
    
    
    ## Folder content:
    
    - `ecg_tracings.hdf5`: The HDF5 file containing a single dataset named `tracings`. This dataset is a `(827, 4096, 12)` tensor. The first dimension correspond to the 827 different exams from different patients; the second dimension correspond to the 4096 signal samples; the third dimension to the 12 different leads of the ECG exams in the following order: `{DI, DII, DIII, AVR, AVL, AVF, V1, V2, V3, V4, V5, V6}`. 
    
    The signals are sampled at 400 Hz. Some signals originally have a duration of 10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples). In order to make them all have the same size (4096 samples) we fill them with zeros on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648 samples at the beginning and 648 samples at the end, yielding 4096 samples that are them saved in the hdf5 dataset. All signal are represented as floating point numbers at the scale 1e-4V: so it should be multiplied by 1000 in order to obtain the signals in V.
    
    In python, one can read this file using the following sequence:
    ```python
    import h5py
    with h5py.File(args.tracings, "r") as f:
      x = np.array(f['tracings'])
    ```
    
    - The file `attributes.csv` contain basic patient attributes: sex (M or F) and age. It
    contain 827 lines (plus the header). The i-th tracing in `ecg_tracings.hdf5` correspond to the i-th line.
    - `annotations/`: folder containing annotations csv format. Each csv file contain 827 lines (plus the header). The i-th line correspond to the i-th tracing in `ecg_tracings.hdf5` correspond to the in all csv files. The csv files all have 6 columns `1dAVb, RBBB, LBBB, SB, AF, ST`
    corresponding to weather the annotator have detect the abnormality in the ECG (`=1`) or not (`=0`).
     1. `cardiologist[1,2].csv` contain annotations from two different cardiologist.
     2. `gold_standard.csv` gold standard annotation for this test dataset. When the cardiologist 1 and cardiologist 2 agree, the common diagnosis was considered as gold standard. In cases where there was any disagreement, a third senior specialist, aware of the annotations from the other two, decided the diagnosis. 
     3. `dnn.csv` prediction from the deep neural network described in the paper. THe threshold is set in such way it maximizes the F1 score.
     4. `cardiology_residents.csv` annotations from two 4th year cardiology residents (each annotated half of the dataset).
     5. `emergency_residents.csv` annotations from two 3rd year emergency residents (each annotated half of the dataset).
     6. `medical_students.csv` annotations from two 5th year medical students (each annotated half of the dataset).
    
  4. p

    PTB Diagnostic ECG Database

    • physionet.org
    Updated Sep 25, 2004
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ralf-Dieter Bousseljot (2004). PTB Diagnostic ECG Database [Dataset]. http://doi.org/10.13026/C28C71
    Explore at:
    Dataset updated
    Sep 25, 2004
    Authors
    Ralf-Dieter Bousseljot
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Physikalisch-Technische Bundesanstalt (PTB), the National Metrology Institute of Germany, has provided this compilation of digitized ECGs for research, algorithmic benchmarking or teaching purposes to the users of PhysioNet. The ECGs were collected from healthy volunteers and patients with different heart diseases by Professor Michael Oeff, M.D.

  5. b

    Harvard-Emory ECG Database

    • bdsp.io
    Updated Nov 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zuzana Koscova; Valdery Moura Junior; Matthew Reyna; Shenda Hong; Aditya Gupta; Manohar Ghanta; Reza Sameni; Jonathan Rosand; Aaron Aguirre; Qiao Li; Gari Clifford; M Brandon Westover (2024). Harvard-Emory ECG Database [Dataset]. http://doi.org/10.60508/13rj-5d45
    Explore at:
    Dataset updated
    Nov 6, 2024
    Authors
    Zuzana Koscova; Valdery Moura Junior; Matthew Reyna; Shenda Hong; Aditya Gupta; Manohar Ghanta; Reza Sameni; Jonathan Rosand; Aaron Aguirre; Qiao Li; Gari Clifford; M Brandon Westover
    License

    https://github.com/bdsp-core/bdsp-license-and-duahttps://github.com/bdsp-core/bdsp-license-and-dua

    Description

    The Harvard-Emory ECG database (HEEDB) is a large collection of 12-lead electrocardiography (ECG) recordings, prepared through a collaboration between Harvard University and Emory University investigators.

    In version 1.0 of the database, these ECGs were provided without labels or metadata, to enable pre-training of ECG analysis models.

    In version 2.0, labels and metadata are included.

    HEEDB is published as part of the Human Sleep Project (HSP), funded by a grant (R01HL161253) from the National Heart Lung and Blood Institute (NHLBI) of the NIH to Massachusetts General Hospital, Emory University, Stanford University, Kaiser Permanente, Boston Children's Hospital, and Beth Israel Deaconess Medical Center.

  6. PTB-XL ECG dataset

    • kaggle.com
    • opendatalab.com
    Updated Feb 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    khyeh (2021). PTB-XL ECG dataset [Dataset]. https://www.kaggle.com/khyeh0719/ptb-xl-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 3, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    khyeh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Source: https://physionet.org/content/ptb-xl/1.0.1/

    Abstract

    Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.

    The PTB-XL ECG dataset is a large dataset of 21837 clinical 12-lead ECGs from 18885 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.

    Background

    The waveform data underlying the PTB-XL ECG dataset was collected with devices from Schiller AG over the course of nearly seven years between October 1989 and June 1996. With the acquisition of the original database from Schiller AG, the full usage rights were transferred to the PTB. The records were curated and converted into a structured database within a long-term project at the Physikalisch-Technische Bundesanstalt (PTB). The database was used in a number of publications, see e.g. [1,2], but the access remained restricted until now. The Institutional Ethics Committee approved the publication of the anonymous data in an open-access database (PTB-2020-1). During the public release process in 2019, the existing database was streamlined with particular regard to usability and accessibility for the machine learning community. Waveform and metadata were converted to open data formats that can easily processed by standard software.

    Methods

    Data Acquisition

    1. Raw signal data was recorded and stored in a proprietary compressed format. For all signals, we provide the standard set of 12 leads (I, II, III, AVL, AVR, AVF, V1, ..., V6) with reference electrodes on the right arm.
    2. The corresponding general metadata (such as age, sex, weight and height) was collected in a database.
    3. Each record was annotated with a report string (generated by cardiologist or automatic interpretation by ECG-device) which was converted into a standardized set of SCP-ECG statements (scp_codes). For most records also the heart’s axis (heart_axis) and infarction stadium (infarction_stadium1 and infarction_stadium2, if present) were extracted.
    4. A large fraction of the records was validated by a second cardiologist.
    5. All records were validated by a technical expert focusing mainly on signal characteristics.

    Data Preprocessing

    ECGs and patients are identified by unique identifiers (ecg_id and patient_id). Personal information in the metadata, such as names of validating cardiologists, nurses and recording site (hospital etc.) of the recording was pseudonymized. The date of birth only as age at the time of the ECG recording, where ages of more than 89 years appear in the range of 300 years in compliance with HIPAA standards. Furthermore, all ECG recording dates were shifted by a random offset for each patient. The ECG statements used for annotating the records follow the SCP-ECG standard [3].

    Data Description

    In general, the dataset is organized as follows: ptbxl ├── ptbxl_database.csv ├── scp_statements.csv ├── records100 ├── 00000 │ │ ├── 00001_lr.dat │ │ ├── 00001_lr.hea │ │ ├── ... │ │ ├── 00999_lr.dat │ │ └── 00999_lr.hea │ ├── ... │ └── 21000 │ ├── 21001_lr.dat │ ├── 21001_lr.hea │ ├── ... │ ├── 21837_lr.dat │ └── 21837_lr.hea └── records500 ├── 00000 │ ├── 00001_hr.dat │ ├── 00001_hr.hea │ ├── ... │ ├── 00999_hr.dat │ └── 00999_hr.hea ├── ... └── 21000 ├── 21001_hr.dat ├── 21001_hr.hea ├── ... ├── 21837_hr.dat └── 21837_hr.hea The dataset comprises 21837 clinical 12-lead ECG records of 10 seconds length from 18885 patients, where 52% are male and 48% are female with ages covering the whole range from 0 to 95 years (median 62 and interquantile range of 22). The value of the dataset results from the comprehensive collection of many different co-occurring path...

  7. ECG in High Intensity Exercise Dataset

    • zenodo.org
    • opendatalab.com
    • +2more
    zip
    Updated Dec 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elisabetta De Giovanni; Elisabetta De Giovanni; Tomas Teijeiro; Tomas Teijeiro; David Meier; Grégoire Millet; Grégoire Millet; David Atienza; David Atienza; David Meier (2021). ECG in High Intensity Exercise Dataset [Dataset]. http://doi.org/10.5281/zenodo.5727800
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 26, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Elisabetta De Giovanni; Elisabetta De Giovanni; Tomas Teijeiro; Tomas Teijeiro; David Meier; Grégoire Millet; Grégoire Millet; David Atienza; David Atienza; David Meier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data presented here was extracted from a larger dataset collected through a collaboration between the Embedded Systems Laboratory (ESL) of the Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland and the Institute of Sports Sciences of the University of Lausanne (ISSUL). In this dataset, we report the extracted segments used for an analysis of R peak detection algorithms during high intensity exercise.

    Protocol of the experiments
    The protocol of the experiment was the following.

    • 22 subjects performing a cardio-pulmonary maximal exercise test on a cycle ergometer, using a gas mask. A single-lead electrocardiogram (ECG) was measured using the BIOPAC system.
    • An initial 3 min of rest were recorded.
    • After this baseline, the subjects started cycling at a power of 60W or 90W depending on their fitness level.
    • Then, the power of the cycle ergometer was increased by 30W every 3 min till exhaustion (in terms of maximum oxygen uptake or VO2max).
    • Finally, physiology experts assessed the so-called ventilatory thresholds and the VO2max based on the pulmonary data (volume of oxygen and CO2).

    Description of the extracted dataset

    The characteristics of the dataset are the following:

    • We report only 20 out of 22 subjects that were used for the analysis, because for two subjects the signals were too corrupted or not complete. Specifically, subjects 5 and 12 were discarded.
    • The ECG signal was sampled at 500 Hz and then downsampled at 250 Hz. The original ECG signal were measured at maximum 10 mV. Then, they were scaled down by a factor of 1000, hence the data is represented in uV.
    • For each subject, 5 segments of 20 s were extracted from the ECG recordings and chosen based on different phases of the maximal exercise test (i.e., before and after the so-called second ventilatory threshold or VT2, before and in the middle of VO2max, and during the recovery after exhaustion) to represent different intensities of physical activity.

    seg1 --> [VT2-50,VT2-30]
    seg2 --> [VT2+60,VT2+80]
    seg3 --> [VO2max-50,VO2max-30]
    seg4 --> [VO2max-10,VO2max+10]
    seg5 --> [VO2max+60,VO2max+80]

    • The R peak locations were manually annotated in all segments and reviewed by a physician of the Lausanne University Hospital, CHUV. Only segment 5 of subject 9 could not be annotated since there was a problem with the input signal. So, the total number of segments extracted were 20 * 5 - 1 = 99.

    Format of the extracted dataset

    The dataset is divided in two main folders:

    • The folder `ecg_segments/` contains the ECG signals saved in two formats, `.csv` and `.mat`. This folder includes both raw (`ecg_raw`) and processed (`ecg`) signals. The processing consists of a morphological filtering and a relative energy non filtering method to enhance the R peaks. The `.csv` files contain only the signal, while the `.mat` files include the signal, the time vector within the maximal stress test, the sampling frequency and the unit of the signal amplitude (uV, as we mentioned before).
    • The folder `manual_annotations/` contains the sample indices of the annotated R peaks in `.csv` format. The annotation was done on the processed signals.
  8. H

    Data from: TELE ECG Database: 250 telehealth ECG records (collected using...

    • dataverse.harvard.edu
    Updated Sep 6, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heba Khamis; Robert Weiss; Yang Xie; Chan-Wei Chang; Nigel H. Lovell; Stephen J. Redmond (2016). TELE ECG Database: 250 telehealth ECG records (collected using dry metal electrodes) with annotated QRS and artifact masks, and MATLAB code for the UNSW artifact detection and UNSW QRS detection algorithms [Dataset]. http://doi.org/10.7910/DVN/QTG0EP
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 6, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Heba Khamis; Robert Weiss; Yang Xie; Chan-Wei Chang; Nigel H. Lovell; Stephen J. Redmond
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    Australian Research Council
    Description

    ------------------------------------------------------------------------------------------------------------- CITATION ------------------------------------------------------------------------------------------------------------- Please cite this data and code as: H. Khamis, R. Weiss, Y. Xie, C-W. Chang, N. H. Lovell, S. J. Redmond, "QRS detection algorithm for telehealth electrocardiogram recordings," IEEE Transaction in Biomedical Engineering, vol. 63(7), p. 1377-1388, 2016. ------------------------------------------------------------------------------------------------------------- DATABASE DESCRIPTION ------------------------------------------------------------------------------------------------------------- The following description of the TELE database is from Khamis et al (2016): "In Redmond et al (2012), 300 ECG single lead-I signals recorded in a telehealth environment are described. The data was recorded using the TeleMedCare Health Monitor (TeleMedCare Pty. Ltd. Sydney, Australia). This ECG is sampled at a rate of 500 Hz using dry metal Ag/AgCl plate electrodes which the patient holds with each hand; a reference electrode plate is also positioned under the pad of the right hand. Of the 300 recordings, 250 were selected randomly from 120 patients, and the remaining 50 were manually selected from 168 patients to obtain a larger representation of poor quality data. Three independent scorers annotated the data by identifying sections of artifact and QRS complexes. All scorers then annotated the signals as a group, to reconcile the individual annotations. Sections of the ECG signal which were less than 5 s in duration were considered to be part of the neighboring artifact sections and were subsequently masked. QRS annotations in the masked regions were discarded prior to the artifact mask and QRS locations being saved. Of the 300 telehealth ECG records in Redmond et al. (2012), 50 records (including 29 of the 250 randomly selected records and 21 of the 50 manually selected records) were discarded as all annotated RR intervals within these records overlap with the annotated artifact mask and therefore, no heart rate can be calculated, which is required for measuring algorithm performance. The remaining 250 records will be referred to as the TELE database." For all 250 recordings in the TELE database, the mains frequency was 50 Hz, the sampling frequency was 500 Hz and the top and bottom rail voltages were 5.556912223578890 and -5.554198887532222 mV respectively. ------------------------------------------------------------------------------------------------------------- DATA FILE DESCRIPTION ------------------------------------------------------------------------------------------------------------- Each record in the TELE database is stored as a X_Y.dat file where X indicates the index of the record in the TELE database (containing a total of 250 records) and Y indicates the index of the record in the original dataset containing 300 records (see Redmond et al. 2012). The .dat file is a comma separated values file. Each line contains: - the ECG sample value (mV) - a boolean indicating the locations of the annotated qrs complexes - a boolean indicating the visually determined mask - a boolean indicating the software determined mask (see Khamis et al. 2016) ------------------------------------------------------------------------------------------------------------- CONVERTING DATA TO MATLAB STRUCTURE ------------------------------------------------------------------------------------------------------------- A matlab function (readFromCSV_TELE.m) has been provided to read the .dat files into a matlab structure: %% % [DB,fm,fs,rail_mv] = readFromCSV_TELE(DATA_PATH) % % Extracts the data for each of the 250 telehealth ECG records of the TELE database [1] % and returns a structure containing all data, annotations and masks. % % IN: DATA_PATH - String. The path containing the .hdr and .dat files % % OUT: DB - 1xM Structure. Contains the extracted data from the M (250) data files. % The structure has fields: % * data_orig_ind - 1x1 double. The index of the data file in the original dataset of 300 records (see [1]) - for tracking purposes. % * ecg_mv - 1xN double. The ecg samples (mV). N is the number of samples for the data file. % * qrs_annotations - 1xN double. The qrs complexes - value of 1 where a qrs is located and 0 otherwise. % * visual_mask - 1xN double. The visually determined artifact mask - value of 1 where the data is masked and 0 otherwise. % * software_mask - 1xN double. The software artifact mask - value of 1 where the data is masked and 0 otherwise. % fm - 1x1 double. The mains frequency (Hz) % fs - 1x1 double. The sampling frequency (Hz) % rail_mv - 1x2 double. The bottom and top rail voltages (mV) % % If you use this code or data, please cite as follows: % % [1] H. Khamis, R. Weiss, Y. Xie, C-W. Chang, N. H. Lovell, S. J. Redmond, % "QRS detection algorithm...

  9. m

    ECG Dataset for Heart Condition Classification

    • data.mendeley.com
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankur Ray (2024). ECG Dataset for Heart Condition Classification [Dataset]. http://doi.org/10.17632/xw9sd3btcs.2
    Explore at:
    Dataset updated
    Dec 12, 2024
    Authors
    Ankur Ray
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This ECG dataset comprises three distinct classes: normal, abnormal, and disease-specific cardiac signals. Collected from both healthy individuals and patients with heart conditions, the dataset provides labeled ECG recordings suitable for training machine learning models aimed at real-time health monitoring and cardiac disease prediction. Each class contains a balanced number of high-quality ECG images, offering a valuable resource for developing and evaluating AI-based diagnostic tools in healthcare.

  10. P

    PTB Diagnostic ECG Database Dataset

    • paperswithcode.com
    Updated Sep 26, 2004
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2004). PTB Diagnostic ECG Database Dataset [Dataset]. https://paperswithcode.com/dataset/ptb
    Explore at:
    Dataset updated
    Sep 26, 2004
    Description

    The ECGs in this collection were obtained using a non-commercial, PTB prototype recorder with the following specifications:

    16 input channels, (14 for ECGs, 1 for respiration, 1 for line voltage) Input voltage: ±16 mV, compensated offset voltage up to ± 300 mV Input resistance: 100 Ω (DC) Resolution: 16 bit with 0.5 μV/LSB (2000 A/D units per mV) Bandwidth: 0 - 1 kHz (synchronous sampling of all channels) Noise voltage: max. 10 μV (pp), respectively 3 μV (RMS) with input short circuit Online recording of skin resistance Noise level recording during signal collection The database contains 549 records from 290 subjects (aged 17 to 87, mean 57.2; 209 men, mean age 55.5, and 81 women, mean age 61.6; ages were not recorded for 1 female and 14 male subjects). Each subject is represented by one to five records. There are no subjects numbered 124, 132, 134, or 161. Each record includes 15 simultaneously measured signals: the conventional 12 leads (i, ii, iii, avr, avl, avf, v1, v2, v3, v4, v5, v6) together with the 3 Frank lead ECGs (vx, vy, vz). Each signal is digitized at 1000 samples per second, with 16 bit resolution over a range of ± 16.384 mV. On special request to the contributors of the database, recordings may be available at sampling rates up to 10 KHz.

    Within the header (.hea) file of most of these ECG records is a detailed clinical summary, including age, gender, diagnosis, and where applicable, data on medical history, medication and interventions, coronary artery pathology, ventriculography, echocardiography, and hemodynamics. The clinical summary is not available for 22 subjects.

  11. i

    Data from: ECG data for deep transfer learning

    • ieee-dataport.org
    • produccioncientifica.ucm.es
    • +1more
    Updated Feb 3, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joerg Schaefer (2021). ECG data for deep transfer learning [Dataset]. https://ieee-dataport.org/open-access/ecg-data-deep-transfer-learning
    Explore at:
    Dataset updated
    Feb 3, 2021
    Authors
    Joerg Schaefer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fall is a prominent issue due to its severe consequences both physically and mentally. Fall detection and prevention is a critical area of research because it can help elderly people to depend less on caregivers and allow them to live and move more independently. Using electrocardiograms (ECG) signals independently for fall detection and activity classification is a novel approach used in this paper.

  12. P

    MIMIC-IV-ECG Dataset

    • paperswithcode.com
    Updated Dec 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). MIMIC-IV-ECG Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iv-ecg
    Explore at:
    Dataset updated
    Dec 24, 2022
    Description

    The MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sampled at 500 Hz. This subset contains all of the ECGs for patients who appear in the MIMIC-IV Clinical Database. When a cardiologist report is available for a given ECG, we provide the needed information to link the waveform to the report. The patients in MIMIC-IV-ECG have been matched against the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules.

  13. CODE-15%: a large scale annotated dataset of 12-lead ECGs

    • zenodo.org
    csv, zip
    Updated Jan 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antônio H. Ribeiro; Antônio H. Ribeiro; Gabriela M.M. Paixao; Gabriela M.M. Paixao; Emilly M. Lima; Emilly M. Lima; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Marcelo M. Pinto Filho; Marcelo M. Pinto Filho; Paulo R. Gomes; Paulo R. Gomes; Derick M. Oliveira; Derick M. Oliveira; Wagner Meira Jr; Wagner Meira Jr; Thömas B Schon; Thömas B Schon; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro (2025). CODE-15%: a large scale annotated dataset of 12-lead ECGs [Dataset]. http://doi.org/10.5281/zenodo.4916206
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jan 8, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antônio H. Ribeiro; Antônio H. Ribeiro; Gabriela M.M. Paixao; Gabriela M.M. Paixao; Emilly M. Lima; Emilly M. Lima; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Marcelo M. Pinto Filho; Marcelo M. Pinto Filho; Paulo R. Gomes; Paulo R. Gomes; Derick M. Oliveira; Derick M. Oliveira; Wagner Meira Jr; Wagner Meira Jr; Thömas B Schon; Thömas B Schon; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A dataset of 12-lead ECGs with annotations. The dataset contains 345 779 exams from 233 770 patients. It was obtained through stratified sampling from the CODE dataset ( 15% of the patients). The data was collected by the Telehealth Network of Minas Gerais in the period between 2010 and 2016.

    This repository contains the files `exams.csv` and the files `exams_part{i}.zip` for i = 0, 1, 2, ... 17.

    • "exams.csv": is a comma-separated values (csv) file containing the columns
      • "exam_id": id used for identifying the exam;
      • "age": patient age in years at the moment of the exam;
      • "is_male": true if the patient is male;
      • "nn_predicted_age": age predicted by a neural network to the patient. As described in the paper "Deep neural network estimated electrocardiographic-age as a mortality predictor" bellow.
      • "1dAVb": Whether or not the patient has 1st degree AV block;
      • "RBBB": Whether or not the patient has right bundle branch block;
      • "LBBB": Whether or not the patient has left bundle branch block;
      • "SB": Whether or not the patient has sinus bradycardia;
      • "AF": Whether or not the patient has atrial fibrillation;
      • "ST": Whether or not the patient has sinus tachycardia;
      • "patient_id": id used for identifying the patient;
      • "normal_ecg": True if automatic annotation system say it is a normal ECG;
      • "death": true if the patient dies in the follow-up time. This data is available only in the first exam of the patient. Other exams will have this as an empty field;
      • "timey": if the patient dies it is the time to the death of the patient. If not, it is the follow-up time. This data is available only in the first exam of the patient. Other exams will have this as an empty field;
      • "trace_file": identify in which hdf5 file the file corresponding to this patient is located.
    • "exams_part{i}.hdf5": The HDF5 file containing two datasets named `tracings` and other named `exam_id`. The `exam_id` is a tensor of dimension `(N,)` containing the exam id (the same as in the csv file) and the dataset `tracings` is a `(N, 4096, 12)` tensor containing the ECG tracings in the same order. The first dimension corresponds to the different exams; the second dimension corresponds to the 4096 signal samples; the third dimension to the 12 different leads of the ECG exams in the following order: `{DI, DII, DIII, AVR, AVL, AVF, V1, V2, V3, V4, V5, V6}`. The signals are sampled at 400 Hz. Some signals originally have a duration of 10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples). In order to make them all have the same size (4096 samples), we fill them with zeros on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648 samples at the beginning and 648 samples at the end, yielding 4096 samples that are then saved in the hdf5 dataset.

      In python, one can read this file using h5py.
      ```python
      import h5py

      f = h5py.File(path_to_file, 'r')
      # Get ids
      traces_ids = np.array(self.f['id_exam'])
      x = f['signal']
      ```
      The `signal` dataset is too large to fit in memory, so don't convert it to a numpy array all at once.
      It is possible to access a chunk of it using: ``x[start:end, :, :]``.

    The CODE dataset was collected by the Telehealth Network of Minas Gerais (TNMG) in the period between 2010 and 2016. TNMG is a public telehealth system assisting 811 out of the 853 municipalities in the state of Minas Gerais, Brazil. The dataset is described

    Ribeiro, Antônio H., Manoel Horta Ribeiro, Gabriela M. M. Paixão, Derick M. Oliveira, Paulo R. Gomes, Jéssica A. Canazart, Milton P. S. Ferreira, et al. “Automatic Diagnosis of the 12-Lead ECG Using a Deep Neural Network.” Nature Communications 11, no. 1 (2020): 1760. https://doi.org/10.1038/s41467-020-15432-4

    The CODE 15% dataset is obtained from stratified sampling from the CODE dataset. This subset of the code dataset is described in and used for assessing model performance:
    "Deep neural network estimated electrocardiographic-age as a mortality predictor"
    Emilly M Lima, Antônio H Ribeiro, Gabriela MM Paixão, Manoel Horta Ribeiro, Marcelo M Pinto Filho, Paulo R Gomes, Derick M Oliveira, Ester C Sabino, Bruce B Duncan, Luana Giatti, Sandhi M Barreto, Wagner Meira Jr, Thomas B Schön, Antonio Luiz P Ribeiro. MedRXiv (2021) https://www.doi.org/10.1101/2021.02.19.21251232

    The companion code for reproducing the experiments in the two papers described above can be found, respectively, in:
    - https://github.com/antonior92/automatic-ecg-diagnosis; and in,
    - https://github.com/antonior92/ecg-age-prediction.

    Note about authorship: Antônio H. Ribeiro, Emilly M. Lima and Gabriela M.M. Paixão contributed equally to this work.

  14. f

    ECG signals (744 fragments)

    • figshare.com
    • ieee-dataport.org
    • +1more
    zip
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paweł Pławiak (2023). ECG signals (744 fragments) [Dataset]. http://doi.org/10.6084/m9.figshare.5601664.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Paweł Pławiak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For research purposes, the ECG signals were obtained from the PhysioNet service (http://www.physionet.org) from the MIT-BIH Arrhythmia database. The created database with ECG signals is described below. 1) The ECG signals were from 29 patients: 15 female (age: 23-89) and 14 male (age: 32-89). 2) The ECG signals contained 17 classes: normal sinus rhythm, pacemaker rhythm, and 15 types of cardiac dysfunctions (for each of which at least 10 signal fragments were collected). 3) All ECG signals were recorded at a sampling frequency of 360 [Hz] and a gain of 200 [adu / mV]. 4) For the analysis, 744, 10-second (3600 samples) fragments of the ECG signal (not overlapping) were randomly selected. 5) Only signals derived from one lead, the MLII, were used. 6) Data are in mat format (Matlab).

  15. NCKU CBIC ECG Database

    • figshare.com
    zip
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tseng Wei-Cheng; Zhong Tai-Siang; Lee Shuenn-Yuh; Chen Ju-Yi (2023). NCKU CBIC ECG Database [Dataset]. http://doi.org/10.6084/m9.figshare.23807286.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 26, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Tseng Wei-Cheng; Zhong Tai-Siang; Lee Shuenn-Yuh; Chen Ju-Yi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AbstractThe NCKU CBIC ECG database collects ECG data from 6 different patients. The patients information have been processed for anonymization, and each patient has signed a consent form to ensure the legitimacy of data usage. Each patient collects lead II ECG for four hours a day to highlight patients' different physiological meanings at different times of the day, and the database provides the labels for motion artifact and baseline wandering, which are invalid signal for diagnosis. Prevent physicians from using the noise signal to diagnose. These data were collected using Patch[1] at Ministry of Health and Welfare Tainan Hospital, and the included data have been approved by the Institutional Review Board (IRB).BackgroundTechnology and medical treatment are highly developed in the 21st century, and people have more irregular daily routines and greater life pressure. Cardiovascular disease has become a tough nut to crack when the changing of lifestyle is coupled with the aging of society. The age distribution of patients is wider than ever. A wealth of health information can be obtained through electrocardiogram (ECG) measurement, including cardiac arrhythmias. Severe arrhythmias will lead to many life problems, including palpitations, chest tightness, dizziness, shock, and even life-threatening conditions. Therefore, the monitoring of ECG signal is quite essential.To do our part in the study of arrhythmia, our team started the patient enrollment after gaining the permission of the National Cheng Kung University Hospital Institutional Review Board (NCKUH IRB No. B-ER-104-379) from 2018. We have selected total 128 patients' 24 hours ECG data until now. The results of the arrhythmia label are confirmed by the cardiologist Ju-Yi Chen in NCKUH. Finally, We selected 6 patients from the received signals and made them into a database for researchers to access.MethodsThe NCKU CBIC ECG database contains the ECG recordings from 6 subjects. The signals were collected in Tainan Hospital (Ministry of Health and Welfare) via an ECG acquisition device[1] developed by Your health technology Co., Ltd. The sampling frequency is 400Hz, and the ADC resolution is 12 bits.The age distribution of subjects was from 24 to 76 years old, and each patient was measured at the lead II for 24 hours. After the signal is recorded, four cleaner segments in the morning, noon, evening, and midnight are selected, and each segment is one hour long. The heartbeat of human body is different when sleeping and awake, and some arrhythmia type occurs at sleeping period often. It's hard to detect some arrhythmia at specific time of a day, therefore, we choose signal segments from different time period for a patient, which is more representative of the daily heartbeat condition. It's worth mentioning that the ECG signals from the 6th subject contains too many noise signals in the daytime due to his career type, so the segments from 22:00 to 02:00 are selected.We have collected total 128 patients from Tainan Hospital since 2018. Since most of the ECG data of patients are normal beats, we finally selected the ECG data of six patients which contain clinically significant arrhythmia. The database provides two particular label type for motion artifact and baseline wandering, which are caused by body movement during ECG acquisition. In actual situations, cardiologist doesn't use the noise signals as a basis for diagnosis, therefore, these two specific labels prevent physicians from using noise to make a diagnosis.The original data is first compared with the holter report, and the R peak position and beat labels are manually marked. And then the data were given to a professional cardiologist, Ju-Yi, Chen, for verification. The cardiologist checked the correction and position of beat labels, and chose the acceptable signal segmentation for high quality.Introduction of Ju-Yi, Chen :JU-YI CHEN was born in Tainan, Taiwan, in 1974. He received the M.S. degree from Chang Gung University, Taoyuan City, Taiwan, in 1999 and the Ph.D. degree from the National Cheng Kung University, Tainan, in 2013. Since 2021, he has been a Professor at the Department of Internal Medicine, National Cheng Kung University. His current research interests include the cardiovascular diseases, including arrhythmias, hypertension, arterial stiffness, and cardiac implantable electric devices.Data DescriptionThe file structure and naming rule are described as follows :[The subject number]_[The measurement time] : The directory nameOUTPUT_ECG_data.csv : The one-hour ECG signals ( unit : 0.1V )OUTPUT_peak_label.csv : The arrhythmia type label of R-peakOUTPUT_peak_position.csv : The position of R-peakex : 1_0100 directory contains subject No. 1's data which is measured at 01:00.Arrhythmia diseases and the corresponding label codes :Code Arrhythmia Disease—————————————————————0 Normal1 Atrial Fibrillation2 Supraventricular Tachycardia3 Premature Ventricular Contraction4 Atrial Premature Contraction5 Motion Artifact6 Wandering7 First degree AV block8 Atrial FlutterPS : Wandering represents baseline drifted by 1mV.Patient information :Subject 1: Male,61 yearsSubject 2: Female,77 yearsSubject 3: Male,63 yearsSubject 4: Male,64 yearsSubject 5: Male,24 yearsSubject 6: Male,64 yearsUsage NotesFew public ECG databases provide long-term ECG, our goal in creating the database is to help understand what a person's ECG looks like in a day, and this database is more valuable in obtaining long-term ECG.EthicsOur team has cooperated with National Cheng Kung University Hospital and Tainan Hospital. All the patients enrolled gave their informed consent to participate in the study. The certification of safety-related IEC standards and human study approval are all acquired.Conflicts of InterestThe authors declare that there are no known conflicts of interest.ReferencesS.-Y. Lee, P.-W. Huang, M.-C. Liang, J.-H. Hong, and J.-Y. Chen, "Development of an arrhythmia monitoring system and human study," IEEE Transactions on Consumer Electronics, vol. 64, no. 4, pp. 442-451, 2018.

  16. o

    Data from: MIMIC-IV-ECG: Diagnostic Electrocardiogram Matched Subset

    • registry.opendata.aws
    • physionet.org
    Updated Dec 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PhysioNet (2024). MIMIC-IV-ECG: Diagnostic Electrocardiogram Matched Subset [Dataset]. https://registry.opendata.aws/mimic-iv-ecg/
    Explore at:
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    <a href="https://physionet.org/">PhysioNet</a>
    Description

    The MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sampled at 500 Hz. This subset contains all of the ECGs for patients who appear in the MIMIC-IV Clinical Database. When a cardiologist report is available for a given ECG, we provide the needed information to link the waveform to the report. The patients in MIMIC-IV-ECG have been matched against the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules.

  17. z

    ECG dataset 2023 by National Heart Foundation Bangladesh (NHFB)

    • zenodo.org
    zip
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NHFB (2024). ECG dataset 2023 by National Heart Foundation Bangladesh (NHFB) [Dataset]. http://doi.org/10.5281/zenodo.13825810
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 22, 2024
    Dataset provided by
    NHFB
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 12, 2023
    Area covered
    Bangladesh
    Description

    Dataset Description: ECG Data Categorized by Cardiac Condition

    This dataset comprises electrocardiogram (ECG) data organized into three distinct categories based on patient cardiac health and dataset collected by the National Heart Foundation Bangladesh (NHFB) from June 2023 to December 2023.

    1. Arrhythmia Patients: This category contains ECG data from individuals diagnosed with cardiac arrhythmias, characterized by irregular heart rhythms. The data within this category may encompass various types of arrhythmias, requiring further sub-classification depending on the specific research objectives.

    2. Myocardial Patients: This category encompasses ECG data from patients experiencing myocardial issues, most likely referring to myocardial infarction (heart attack) or other diseases affecting the myocardium (heart muscle). The specific myocardial conditions represented within this category may require further specification depending on the dataset's scope and purpose.

    3. Normal Patients: This category serves as a control group and includes ECG data from individuals deemed to have healthy cardiac function. These individuals exhibit no clinically significant ECG abnormalities or diagnosed cardiac conditions.

    Dataset Structure:

    The dataset is structured into three folders, each corresponding to a specific patient category: "Arrhythmia Patient," "Myocardial Patient," and "Normal Patient." .

    Potential Applications:

    This dataset can be utilized for various research and educational purposes, including:

    • Developing and evaluating algorithms for automated arrhythmia detection and classification.

    • Investigating the ECG characteristics associated with different myocardial conditions.

    • Training machine learning models for cardiac disease diagnosis and risk stratification.

    • Educating students and healthcare professionals on ECG interpretation and cardiac pathologies.

    Further Information:

    Detailed information regarding the data acquisition protocol, ECG recording parameters, patient demographics, and data annotation procedures is essential for comprehensive dataset utilization. Accessing relevant documentation accompanying the dataset is crucial for ensuring appropriate data interpretation and analysis.


  18. PMcardio ECG Image Database (PM-ECG-ID): A Diverse ECG Database for...

    • zenodo.org
    zip
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrej Iring; Viera Krešňáková; Viera Krešňáková; Michal Hojcka; Vladimir Boza; Adam Rafajdus; Boris Vavrik; Andrej Iring; Michal Hojcka; Vladimir Boza; Adam Rafajdus; Boris Vavrik (2024). PMcardio ECG Image Database (PM-ECG-ID): A Diverse ECG Database for Evaluating Digitization Solutions [Dataset]. http://doi.org/10.5281/zenodo.13617673
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrej Iring; Viera Krešňáková; Viera Krešňáková; Michal Hojcka; Vladimir Boza; Adam Rafajdus; Boris Vavrik; Andrej Iring; Michal Hojcka; Vladimir Boza; Adam Rafajdus; Boris Vavrik
    License

    https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html

    Description

    The dataset presents the collection of a diverse electrocardiogram (ECG) database for testing and evaluating ECG digitization solutions. The Powerful Medical ECG image database was curated using 100 ECG waveforms selected from the PTB-XL Digital Waveform Database and various images generated from the base waveforms with varying lead visibility and real-world paper deformations, including the use of different mobile phones, bends, crumbles, scans, and photos of computer screens with ECGs. The ECG waveforms were augmented using various techniques, including changes in contrast, brightness, perspective transformation, rotation, image blur, JPEG compression, and resolution change. This extensive approach yielded 6,000 unique entries, which provides a wide range of data variance and extreme cases to evaluate the limitations of ECG digitization solutions and improve their performance, and serves as a benchmark to evaluate ECG digitization solutions.

    PM-ECG-ID database contains electrocardiogram (ECG) images and their corresponding ECG information. The data records are organized in a hierarchical folder structure, which includes metadata, waveform data, and visual data folders. The contents of each folder are described below:

    • metadata.csv:
      This file serves as a key-to-key bridge between the image data and the corresponding ECG information. It contains the following columns:
      • Image name: image name with extension,
      • ECG ID: this ID corresponds to the specific ECG identifier from the original PTB-XL dataset. Under this ID you can find a cutout array in the leads.npz and rhythms.npz,
      • Image relative path: relative path to the image in question,
      • Image page: page number of the particular image (starting from 0),
      • ECG number of pages: number of pages in the whole ECG,
      • ECG number of columns per page: number of columns per page in the ECG,
      • ECG number of rows per page: number of rows in the ECG,
      • ECG number of rhythm leads: number of rhythms in the ECG,
      • ECG format: short version of the ECG format.
    • data folder:
      • leads.npz: NPZ file containing all underlying cutout lead signals; each signal is there under its ECG ID.
      • rhythms.npz: NPZ file containing all underlying rhythm signals; each signal is there under its ECG ID. If no rhythm lead is in the ECG, you will find an empty array in the NPZ.
    • visual_data folder:
      This folder contains subfolders for various image data, including augmented photos and visualization and different types of photos of ECG printouts. The subfolders are organized based on the specific augmentation or type of photograph. These folders contain images with various augmentation settings, such as different levels of blur, brightness, contrast, padding, perspective transformation, resolution scaling, and rotation. The database is organized in a way that allows for easy navigation and understanding of the different augmentations applied to the image data. Each of these subfolders contains images relevant to the specific augmentation or type of photograph. The metadata.csv file provides a direct link to each image and its associated ECG information.
  19. P

    PTB-XL Dataset

    • paperswithcode.com
    Updated Jul 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). PTB-XL Dataset [Dataset]. https://paperswithcode.com/dataset/ptb-xl
    Explore at:
    Dataset updated
    Jul 12, 2023
    Description

    Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.

    The PTB-XL ECG dataset is a large dataset of 21799 clinical 12-lead ECGs from 18869 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.

  20. c

    ECG Heartbeat Categorization Dataset

    • cubig.ai
    Updated May 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). ECG Heartbeat Categorization Dataset [Dataset]. https://cubig.ai/store/products/233/ecg-heartbeat-categorization-dataset
    Explore at:
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The ECG Heartbeat Categorization Dataset contains segmented and preprocessed ECG signals, specifically designed for heartbeat classification. It includes two collections of heartbeat signals from the MIT-BIH Arrhythmia Dataset and the PTB Diagnostic ECG Database. The dataset is intended for exploring heartbeat classification using deep neural networks and transfer learning. It offers a large number of samples for both normal heartbeats and those affected by various arrhythmias and myocardial infarction.

    2) Data Utilization (1) ECG Heartbeat data has characteristics that: • This dataset includes ECG signals for normal heartbeats and those affected by various arrhythmias and myocardial infarction, providing segmented signals corresponding to each heartbeat, making it suitable for heartbeat classification research. (2) ECG Heartbeat data can be used to: • Predictive Modeling: Useful for developing deep neural network models to predict arrhythmias and myocardial infarction based on heartbeat signals. • Medical Research: Contributes to understanding and researching patterns related to various heart diseases through ECG signal analysis. • Healthcare Planning: Supports early diagnosis and personalized treatment planning to help manage the health of heart disease patients.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Patrick Wagner; Nils Strodthoff; Ralf-Dieter Bousseljot; Wojciech Samek; Tobias Schaeffter (2022). PTB-XL, a large publicly available electrocardiography dataset [Dataset]. http://doi.org/10.13026/kfzx-aw45

PTB-XL, a large publicly available electrocardiography dataset

Explore at:
Dataset updated
Nov 9, 2022
Authors
Patrick Wagner; Nils Strodthoff; Ralf-Dieter Bousseljot; Wojciech Samek; Tobias Schaeffter
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.

The PTB-XL ECG dataset is a large dataset of 21799 clinical 12-lead ECGs from 18869 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.

Search
Clear search
Close search
Google apps
Main menu