100+ datasets found
  1. p

    PTB-XL, a large publicly available electrocardiography dataset

    • physionet.org
    • maplerate.net
    Updated Nov 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Wagner; Nils Strodthoff; Ralf-Dieter Bousseljot; Wojciech Samek; Tobias Schaeffter (2022). PTB-XL, a large publicly available electrocardiography dataset [Dataset]. http://doi.org/10.13026/kfzx-aw45
    Explore at:
    Dataset updated
    Nov 9, 2022
    Authors
    Patrick Wagner; Nils Strodthoff; Ralf-Dieter Bousseljot; Wojciech Samek; Tobias Schaeffter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.

    The PTB-XL ECG dataset is a large dataset of 21799 clinical 12-lead ECGs from 18869 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.

  2. m

    ECG Images dataset of Cardiac Patients

    • data.mendeley.com
    Updated Mar 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Haider Khan (2021). ECG Images dataset of Cardiac Patients [Dataset]. http://doi.org/10.17632/gwbz3fsgp8.2
    Explore at:
    Dataset updated
    Mar 19, 2021
    Authors
    Ali Haider Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ECG images dataset of Cardiac Patients created under the auspices of Ch. Pervaiz Elahi Institute of Cardiology Multan, Pakistan that aims to help the scientific community for conducting the research for Cardiovascular diseases.

  3. CODE-test: An annotated 12-lead ECG dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 7, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio H Ribeiro; Antonio H Ribeiro; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Gabriela M. Paixão; Derick M. Oliveira; Derick M. Oliveira; Paulo R. Gomes; Paulo R. Gomes; Jéssica A. Canazart; Jéssica A. Canazart; Milton P. Ferreira; Milton P. Ferreira; Carl R. Andersson; Carl R. Andersson; Peter W. Macfarlane; Peter W. Macfarlane; Wagner Meira Jr.; Wagner Meira Jr.; Thomas B. Schön; Thomas B. Schön; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro (2021). CODE-test: An annotated 12-lead ECG dataset [Dataset]. http://doi.org/10.5281/zenodo.3765780
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 7, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antonio H Ribeiro; Antonio H Ribeiro; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Gabriela M. Paixão; Derick M. Oliveira; Derick M. Oliveira; Paulo R. Gomes; Paulo R. Gomes; Jéssica A. Canazart; Jéssica A. Canazart; Milton P. Ferreira; Milton P. Ferreira; Carl R. Andersson; Carl R. Andersson; Peter W. Macfarlane; Peter W. Macfarlane; Wagner Meira Jr.; Wagner Meira Jr.; Thomas B. Schön; Thomas B. Schön; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    # Annotated 12 lead ECG dataset
    
    Contain 827 ECG tracings from different patients, annotated by several cardiologists, residents and medical students. It is used as test set on the paper: "Automatic diagnosis of the 12-lead ECG using a deep neural network". https://www.nature.com/articles/s41467-020-15432-4.
    
    It contain annotations about 6 different ECGs abnormalities:
    - 1st degree AV block (1dAVb);
    - right bundle branch block (RBBB);
    - left bundle branch block (LBBB);
    - sinus bradycardia (SB);
    - atrial fibrillation (AF); and,
    - sinus tachycardia (ST).
    
    Companion python scripts are available in:
    https://github.com/antonior92/automatic-ecg-diagnosis
    
    --------
    
    Citation
    ```
    Ribeiro, A.H., Ribeiro, M.H., Paixão, G.M.M. et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun 11, 1760 (2020). https://doi.org/10.1038/s41467-020-15432-4
    ```
    
    Bibtex:
    ```
    @article{ribeiro_automatic_2020,
     title = {Automatic Diagnosis of the 12-Lead {{ECG}} Using a Deep Neural Network},
     author = {Ribeiro, Ant{\^o}nio H. and Ribeiro, Manoel Horta and Paix{\~a}o, Gabriela M. M. and Oliveira, Derick M. and Gomes, Paulo R. and Canazart, J{\'e}ssica A. and Ferreira, Milton P. S. and Andersson, Carl R. and Macfarlane, Peter W. and Meira Jr., Wagner and Sch{\"o}n, Thomas B. and Ribeiro, Antonio Luiz P.},
     year = {2020},
     volume = {11},
     pages = {1760},
     doi = {https://doi.org/10.1038/s41467-020-15432-4},
     journal = {Nature Communications},
     number = {1}
    }
    ```
    -----
    
    
    ## Folder content:
    
    - `ecg_tracings.hdf5`: The HDF5 file containing a single dataset named `tracings`. This dataset is a `(827, 4096, 12)` tensor. The first dimension correspond to the 827 different exams from different patients; the second dimension correspond to the 4096 signal samples; the third dimension to the 12 different leads of the ECG exams in the following order: `{DI, DII, DIII, AVR, AVL, AVF, V1, V2, V3, V4, V5, V6}`. 
    
    The signals are sampled at 400 Hz. Some signals originally have a duration of 10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples). In order to make them all have the same size (4096 samples) we fill them with zeros on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648 samples at the beginning and 648 samples at the end, yielding 4096 samples that are them saved in the hdf5 dataset. All signal are represented as floating point numbers at the scale 1e-4V: so it should be multiplied by 1000 in order to obtain the signals in V.
    
    In python, one can read this file using the following sequence:
    ```python
    import h5py
    with h5py.File(args.tracings, "r") as f:
      x = np.array(f['tracings'])
    ```
    
    - The file `attributes.csv` contain basic patient attributes: sex (M or F) and age. It
    contain 827 lines (plus the header). The i-th tracing in `ecg_tracings.hdf5` correspond to the i-th line.
    - `annotations/`: folder containing annotations csv format. Each csv file contain 827 lines (plus the header). The i-th line correspond to the i-th tracing in `ecg_tracings.hdf5` correspond to the in all csv files. The csv files all have 6 columns `1dAVb, RBBB, LBBB, SB, AF, ST`
    corresponding to weather the annotator have detect the abnormality in the ECG (`=1`) or not (`=0`).
     1. `cardiologist[1,2].csv` contain annotations from two different cardiologist.
     2. `gold_standard.csv` gold standard annotation for this test dataset. When the cardiologist 1 and cardiologist 2 agree, the common diagnosis was considered as gold standard. In cases where there was any disagreement, a third senior specialist, aware of the annotations from the other two, decided the diagnosis. 
     3. `dnn.csv` prediction from the deep neural network described in the paper. THe threshold is set in such way it maximizes the F1 score.
     4. `cardiology_residents.csv` annotations from two 4th year cardiology residents (each annotated half of the dataset).
     5. `emergency_residents.csv` annotations from two 3rd year emergency residents (each annotated half of the dataset).
     6. `medical_students.csv` annotations from two 5th year medical students (each annotated half of the dataset).
    
  4. b

    Harvard-Emory ECG Database

    • bdsp.io
    Updated Nov 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zuzana Koscova; Valdery Moura Junior; Matthew Reyna; Shenda Hong; Aditya Gupta; Manohar Ghanta; Reza Sameni; Jonathan Rosand; Aaron Aguirre; Qiao Li; Gari Clifford; M Brandon Westover (2024). Harvard-Emory ECG Database [Dataset]. http://doi.org/10.60508/13rj-5d45
    Explore at:
    Dataset updated
    Nov 6, 2024
    Authors
    Zuzana Koscova; Valdery Moura Junior; Matthew Reyna; Shenda Hong; Aditya Gupta; Manohar Ghanta; Reza Sameni; Jonathan Rosand; Aaron Aguirre; Qiao Li; Gari Clifford; M Brandon Westover
    License

    https://github.com/bdsp-core/bdsp-license-and-duahttps://github.com/bdsp-core/bdsp-license-and-dua

    Description

    The Harvard-Emory ECG database (HEEDB) is a large collection of 12-lead electrocardiography (ECG) recordings, prepared through a collaboration between Harvard University and Emory University investigators.

    In version 1.0 of the database, these ECGs were provided without labels or metadata, to enable pre-training of ECG analysis models.

    In version 2.0, labels and metadata are included.

    HEEDB is published as part of the Human Sleep Project (HSP), funded by a grant (R01HL161253) from the National Heart Lung and Blood Institute (NHLBI) of the NIH to Massachusetts General Hospital, Emory University, Stanford University, Kaiser Permanente, Boston Children's Hospital, and Beth Israel Deaconess Medical Center.

  5. p

    PTB Diagnostic ECG Database

    • physionet.org
    Updated Sep 25, 2004
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ralf-Dieter Bousseljot (2004). PTB Diagnostic ECG Database [Dataset]. http://doi.org/10.13026/C28C71
    Explore at:
    Dataset updated
    Sep 25, 2004
    Authors
    Ralf-Dieter Bousseljot
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Physikalisch-Technische Bundesanstalt (PTB), the National Metrology Institute of Germany, has provided this compilation of digitized ECGs for research, algorithmic benchmarking or teaching purposes to the users of PhysioNet. The ECGs were collected from healthy volunteers and patients with different heart diseases by Professor Michael Oeff, M.D.

  6. PTB-XL ECG dataset

    • kaggle.com
    • opendatalab.com
    Updated Feb 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    khyeh (2021). PTB-XL ECG dataset [Dataset]. https://www.kaggle.com/khyeh0719/ptb-xl-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 3, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    khyeh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Source: https://physionet.org/content/ptb-xl/1.0.1/

    Abstract

    Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.

    The PTB-XL ECG dataset is a large dataset of 21837 clinical 12-lead ECGs from 18885 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.

    Background

    The waveform data underlying the PTB-XL ECG dataset was collected with devices from Schiller AG over the course of nearly seven years between October 1989 and June 1996. With the acquisition of the original database from Schiller AG, the full usage rights were transferred to the PTB. The records were curated and converted into a structured database within a long-term project at the Physikalisch-Technische Bundesanstalt (PTB). The database was used in a number of publications, see e.g. [1,2], but the access remained restricted until now. The Institutional Ethics Committee approved the publication of the anonymous data in an open-access database (PTB-2020-1). During the public release process in 2019, the existing database was streamlined with particular regard to usability and accessibility for the machine learning community. Waveform and metadata were converted to open data formats that can easily processed by standard software.

    Methods

    Data Acquisition

    1. Raw signal data was recorded and stored in a proprietary compressed format. For all signals, we provide the standard set of 12 leads (I, II, III, AVL, AVR, AVF, V1, ..., V6) with reference electrodes on the right arm.
    2. The corresponding general metadata (such as age, sex, weight and height) was collected in a database.
    3. Each record was annotated with a report string (generated by cardiologist or automatic interpretation by ECG-device) which was converted into a standardized set of SCP-ECG statements (scp_codes). For most records also the heart’s axis (heart_axis) and infarction stadium (infarction_stadium1 and infarction_stadium2, if present) were extracted.
    4. A large fraction of the records was validated by a second cardiologist.
    5. All records were validated by a technical expert focusing mainly on signal characteristics.

    Data Preprocessing

    ECGs and patients are identified by unique identifiers (ecg_id and patient_id). Personal information in the metadata, such as names of validating cardiologists, nurses and recording site (hospital etc.) of the recording was pseudonymized. The date of birth only as age at the time of the ECG recording, where ages of more than 89 years appear in the range of 300 years in compliance with HIPAA standards. Furthermore, all ECG recording dates were shifted by a random offset for each patient. The ECG statements used for annotating the records follow the SCP-ECG standard [3].

    Data Description

    In general, the dataset is organized as follows: ptbxl ├── ptbxl_database.csv ├── scp_statements.csv ├── records100 ├── 00000 │ │ ├── 00001_lr.dat │ │ ├── 00001_lr.hea │ │ ├── ... │ │ ├── 00999_lr.dat │ │ └── 00999_lr.hea │ ├── ... │ └── 21000 │ ├── 21001_lr.dat │ ├── 21001_lr.hea │ ├── ... │ ├── 21837_lr.dat │ └── 21837_lr.hea └── records500 ├── 00000 │ ├── 00001_hr.dat │ ├── 00001_hr.hea │ ├── ... │ ├── 00999_hr.dat │ └── 00999_hr.hea ├── ... └── 21000 ├── 21001_hr.dat ├── 21001_hr.hea ├── ... ├── 21837_hr.dat └── 21837_hr.hea The dataset comprises 21837 clinical 12-lead ECG records of 10 seconds length from 18885 patients, where 52% are male and 48% are female with ages covering the whole range from 0 to 95 years (median 62 and interquantile range of 22). The value of the dataset results from the comprehensive collection of many different co-occurring path...

  7. ECG in High Intensity Exercise Dataset

    • zenodo.org
    • opendatalab.com
    • +2more
    zip
    Updated Dec 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elisabetta De Giovanni; Elisabetta De Giovanni; Tomas Teijeiro; Tomas Teijeiro; David Meier; Grégoire Millet; Grégoire Millet; David Atienza; David Atienza; David Meier (2021). ECG in High Intensity Exercise Dataset [Dataset]. http://doi.org/10.5281/zenodo.5727800
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 26, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Elisabetta De Giovanni; Elisabetta De Giovanni; Tomas Teijeiro; Tomas Teijeiro; David Meier; Grégoire Millet; Grégoire Millet; David Atienza; David Atienza; David Meier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data presented here was extracted from a larger dataset collected through a collaboration between the Embedded Systems Laboratory (ESL) of the Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland and the Institute of Sports Sciences of the University of Lausanne (ISSUL). In this dataset, we report the extracted segments used for an analysis of R peak detection algorithms during high intensity exercise.

    Protocol of the experiments
    The protocol of the experiment was the following.

    • 22 subjects performing a cardio-pulmonary maximal exercise test on a cycle ergometer, using a gas mask. A single-lead electrocardiogram (ECG) was measured using the BIOPAC system.
    • An initial 3 min of rest were recorded.
    • After this baseline, the subjects started cycling at a power of 60W or 90W depending on their fitness level.
    • Then, the power of the cycle ergometer was increased by 30W every 3 min till exhaustion (in terms of maximum oxygen uptake or VO2max).
    • Finally, physiology experts assessed the so-called ventilatory thresholds and the VO2max based on the pulmonary data (volume of oxygen and CO2).

    Description of the extracted dataset

    The characteristics of the dataset are the following:

    • We report only 20 out of 22 subjects that were used for the analysis, because for two subjects the signals were too corrupted or not complete. Specifically, subjects 5 and 12 were discarded.
    • The ECG signal was sampled at 500 Hz and then downsampled at 250 Hz. The original ECG signal were measured at maximum 10 mV. Then, they were scaled down by a factor of 1000, hence the data is represented in uV.
    • For each subject, 5 segments of 20 s were extracted from the ECG recordings and chosen based on different phases of the maximal exercise test (i.e., before and after the so-called second ventilatory threshold or VT2, before and in the middle of VO2max, and during the recovery after exhaustion) to represent different intensities of physical activity.

    seg1 --> [VT2-50,VT2-30]
    seg2 --> [VT2+60,VT2+80]
    seg3 --> [VO2max-50,VO2max-30]
    seg4 --> [VO2max-10,VO2max+10]
    seg5 --> [VO2max+60,VO2max+80]

    • The R peak locations were manually annotated in all segments and reviewed by a physician of the Lausanne University Hospital, CHUV. Only segment 5 of subject 9 could not be annotated since there was a problem with the input signal. So, the total number of segments extracted were 20 * 5 - 1 = 99.

    Format of the extracted dataset

    The dataset is divided in two main folders:

    • The folder `ecg_segments/` contains the ECG signals saved in two formats, `.csv` and `.mat`. This folder includes both raw (`ecg_raw`) and processed (`ecg`) signals. The processing consists of a morphological filtering and a relative energy non filtering method to enhance the R peaks. The `.csv` files contain only the signal, while the `.mat` files include the signal, the time vector within the maximal stress test, the sampling frequency and the unit of the signal amplitude (uV, as we mentioned before).
    • The folder `manual_annotations/` contains the sample indices of the annotated R peaks in `.csv` format. The annotation was done on the processed signals.
  8. H

    Data from: TELE ECG Database: 250 telehealth ECG records (collected using...

    • dataverse.harvard.edu
    Updated Sep 6, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heba Khamis; Robert Weiss; Yang Xie; Chan-Wei Chang; Nigel H. Lovell; Stephen J. Redmond (2016). TELE ECG Database: 250 telehealth ECG records (collected using dry metal electrodes) with annotated QRS and artifact masks, and MATLAB code for the UNSW artifact detection and UNSW QRS detection algorithms [Dataset]. http://doi.org/10.7910/DVN/QTG0EP
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 6, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Heba Khamis; Robert Weiss; Yang Xie; Chan-Wei Chang; Nigel H. Lovell; Stephen J. Redmond
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    Australian Research Council
    Description

    ------------------------------------------------------------------------------------------------------------- CITATION ------------------------------------------------------------------------------------------------------------- Please cite this data and code as: H. Khamis, R. Weiss, Y. Xie, C-W. Chang, N. H. Lovell, S. J. Redmond, "QRS detection algorithm for telehealth electrocardiogram recordings," IEEE Transaction in Biomedical Engineering, vol. 63(7), p. 1377-1388, 2016. ------------------------------------------------------------------------------------------------------------- DATABASE DESCRIPTION ------------------------------------------------------------------------------------------------------------- The following description of the TELE database is from Khamis et al (2016): "In Redmond et al (2012), 300 ECG single lead-I signals recorded in a telehealth environment are described. The data was recorded using the TeleMedCare Health Monitor (TeleMedCare Pty. Ltd. Sydney, Australia). This ECG is sampled at a rate of 500 Hz using dry metal Ag/AgCl plate electrodes which the patient holds with each hand; a reference electrode plate is also positioned under the pad of the right hand. Of the 300 recordings, 250 were selected randomly from 120 patients, and the remaining 50 were manually selected from 168 patients to obtain a larger representation of poor quality data. Three independent scorers annotated the data by identifying sections of artifact and QRS complexes. All scorers then annotated the signals as a group, to reconcile the individual annotations. Sections of the ECG signal which were less than 5 s in duration were considered to be part of the neighboring artifact sections and were subsequently masked. QRS annotations in the masked regions were discarded prior to the artifact mask and QRS locations being saved. Of the 300 telehealth ECG records in Redmond et al. (2012), 50 records (including 29 of the 250 randomly selected records and 21 of the 50 manually selected records) were discarded as all annotated RR intervals within these records overlap with the annotated artifact mask and therefore, no heart rate can be calculated, which is required for measuring algorithm performance. The remaining 250 records will be referred to as the TELE database." For all 250 recordings in the TELE database, the mains frequency was 50 Hz, the sampling frequency was 500 Hz and the top and bottom rail voltages were 5.556912223578890 and -5.554198887532222 mV respectively. ------------------------------------------------------------------------------------------------------------- DATA FILE DESCRIPTION ------------------------------------------------------------------------------------------------------------- Each record in the TELE database is stored as a X_Y.dat file where X indicates the index of the record in the TELE database (containing a total of 250 records) and Y indicates the index of the record in the original dataset containing 300 records (see Redmond et al. 2012). The .dat file is a comma separated values file. Each line contains: - the ECG sample value (mV) - a boolean indicating the locations of the annotated qrs complexes - a boolean indicating the visually determined mask - a boolean indicating the software determined mask (see Khamis et al. 2016) ------------------------------------------------------------------------------------------------------------- CONVERTING DATA TO MATLAB STRUCTURE ------------------------------------------------------------------------------------------------------------- A matlab function (readFromCSV_TELE.m) has been provided to read the .dat files into a matlab structure: %% % [DB,fm,fs,rail_mv] = readFromCSV_TELE(DATA_PATH) % % Extracts the data for each of the 250 telehealth ECG records of the TELE database [1] % and returns a structure containing all data, annotations and masks. % % IN: DATA_PATH - String. The path containing the .hdr and .dat files % % OUT: DB - 1xM Structure. Contains the extracted data from the M (250) data files. % The structure has fields: % * data_orig_ind - 1x1 double. The index of the data file in the original dataset of 300 records (see [1]) - for tracking purposes. % * ecg_mv - 1xN double. The ecg samples (mV). N is the number of samples for the data file. % * qrs_annotations - 1xN double. The qrs complexes - value of 1 where a qrs is located and 0 otherwise. % * visual_mask - 1xN double. The visually determined artifact mask - value of 1 where the data is masked and 0 otherwise. % * software_mask - 1xN double. The software artifact mask - value of 1 where the data is masked and 0 otherwise. % fm - 1x1 double. The mains frequency (Hz) % fs - 1x1 double. The sampling frequency (Hz) % rail_mv - 1x2 double. The bottom and top rail voltages (mV) % % If you use this code or data, please cite as follows: % % [1] H. Khamis, R. Weiss, Y. Xie, C-W. Chang, N. H. Lovell, S. J. Redmond, % "QRS detection algorithm...

  9. m

    ECG Dataset for Heart Condition Classification

    • data.mendeley.com
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankur Ray (2024). ECG Dataset for Heart Condition Classification [Dataset]. http://doi.org/10.17632/xw9sd3btcs.2
    Explore at:
    Dataset updated
    Dec 12, 2024
    Authors
    Ankur Ray
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This ECG dataset comprises three distinct classes: normal, abnormal, and disease-specific cardiac signals. Collected from both healthy individuals and patients with heart conditions, the dataset provides labeled ECG recordings suitable for training machine learning models aimed at real-time health monitoring and cardiac disease prediction. Each class contains a balanced number of high-quality ECG images, offering a valuable resource for developing and evaluating AI-based diagnostic tools in healthcare.

  10. i

    Data from: ECG data for deep transfer learning

    • ieee-dataport.org
    • produccioncientifica.ucm.es
    • +1more
    Updated Feb 3, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joerg Schaefer (2021). ECG data for deep transfer learning [Dataset]. https://ieee-dataport.org/open-access/ecg-data-deep-transfer-learning
    Explore at:
    Dataset updated
    Feb 3, 2021
    Authors
    Joerg Schaefer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fall is a prominent issue due to its severe consequences both physically and mentally. Fall detection and prevention is a critical area of research because it can help elderly people to depend less on caregivers and allow them to live and move more independently. Using electrocardiograms (ECG) signals independently for fall detection and activity classification is a novel approach used in this paper.

  11. P

    MIMIC-IV-ECG Dataset

    • paperswithcode.com
    Updated Dec 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). MIMIC-IV-ECG Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iv-ecg
    Explore at:
    Dataset updated
    Dec 24, 2022
    Description

    The MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sampled at 500 Hz. This subset contains all of the ECGs for patients who appear in the MIMIC-IV Clinical Database. When a cardiologist report is available for a given ECG, we provide the needed information to link the waveform to the report. The patients in MIMIC-IV-ECG have been matched against the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules.

  12. f

    ECG signals (744 fragments)

    • figshare.com
    • ieee-dataport.org
    • +1more
    zip
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paweł Pławiak (2023). ECG signals (744 fragments) [Dataset]. http://doi.org/10.6084/m9.figshare.5601664.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Paweł Pławiak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For research purposes, the ECG signals were obtained from the PhysioNet service (http://www.physionet.org) from the MIT-BIH Arrhythmia database. The created database with ECG signals is described below. 1) The ECG signals were from 29 patients: 15 female (age: 23-89) and 14 male (age: 32-89). 2) The ECG signals contained 17 classes: normal sinus rhythm, pacemaker rhythm, and 15 types of cardiac dysfunctions (for each of which at least 10 signal fragments were collected). 3) All ECG signals were recorded at a sampling frequency of 360 [Hz] and a gain of 200 [adu / mV]. 4) For the analysis, 744, 10-second (3600 samples) fragments of the ECG signal (not overlapping) were randomly selected. 5) Only signals derived from one lead, the MLII, were used. 6) Data are in mat format (Matlab).

  13. o

    Data from: MIMIC-IV-ECG: Diagnostic Electrocardiogram Matched Subset

    • registry.opendata.aws
    • physionet.org
    Updated Dec 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PhysioNet (2024). MIMIC-IV-ECG: Diagnostic Electrocardiogram Matched Subset [Dataset]. https://registry.opendata.aws/mimic-iv-ecg/
    Explore at:
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    <a href="https://physionet.org/">PhysioNet</a>
    Description

    The MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sampled at 500 Hz. This subset contains all of the ECGs for patients who appear in the MIMIC-IV Clinical Database. When a cardiologist report is available for a given ECG, we provide the needed information to link the waveform to the report. The patients in MIMIC-IV-ECG have been matched against the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules.

  14. z

    ECG dataset 2023 by National Heart Foundation Bangladesh (NHFB)

    • zenodo.org
    zip
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NHFB (2024). ECG dataset 2023 by National Heart Foundation Bangladesh (NHFB) [Dataset]. http://doi.org/10.5281/zenodo.13825810
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 22, 2024
    Dataset provided by
    NHFB
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 12, 2023
    Area covered
    Bangladesh
    Description

    Dataset Description: ECG Data Categorized by Cardiac Condition

    This dataset comprises electrocardiogram (ECG) data organized into three distinct categories based on patient cardiac health and dataset collected by the National Heart Foundation Bangladesh (NHFB) from June 2023 to December 2023.

    1. Arrhythmia Patients: This category contains ECG data from individuals diagnosed with cardiac arrhythmias, characterized by irregular heart rhythms. The data within this category may encompass various types of arrhythmias, requiring further sub-classification depending on the specific research objectives.

    2. Myocardial Patients: This category encompasses ECG data from patients experiencing myocardial issues, most likely referring to myocardial infarction (heart attack) or other diseases affecting the myocardium (heart muscle). The specific myocardial conditions represented within this category may require further specification depending on the dataset's scope and purpose.

    3. Normal Patients: This category serves as a control group and includes ECG data from individuals deemed to have healthy cardiac function. These individuals exhibit no clinically significant ECG abnormalities or diagnosed cardiac conditions.

    Dataset Structure:

    The dataset is structured into three folders, each corresponding to a specific patient category: "Arrhythmia Patient," "Myocardial Patient," and "Normal Patient." .

    Potential Applications:

    This dataset can be utilized for various research and educational purposes, including:

    • Developing and evaluating algorithms for automated arrhythmia detection and classification.

    • Investigating the ECG characteristics associated with different myocardial conditions.

    • Training machine learning models for cardiac disease diagnosis and risk stratification.

    • Educating students and healthcare professionals on ECG interpretation and cardiac pathologies.

    Further Information:

    Detailed information regarding the data acquisition protocol, ECG recording parameters, patient demographics, and data annotation procedures is essential for comprehensive dataset utilization. Accessing relevant documentation accompanying the dataset is crucial for ensuring appropriate data interpretation and analysis.


  15. NCKU CBIC ECG Database

    • figshare.com
    zip
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tseng Wei-Cheng; Zhong Tai-Siang; Lee Shuenn-Yuh; Chen Ju-Yi (2023). NCKU CBIC ECG Database [Dataset]. http://doi.org/10.6084/m9.figshare.23807286.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 26, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Tseng Wei-Cheng; Zhong Tai-Siang; Lee Shuenn-Yuh; Chen Ju-Yi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AbstractThe NCKU CBIC ECG database collects ECG data from 6 different patients. The patients information have been processed for anonymization, and each patient has signed a consent form to ensure the legitimacy of data usage. Each patient collects lead II ECG for four hours a day to highlight patients' different physiological meanings at different times of the day, and the database provides the labels for motion artifact and baseline wandering, which are invalid signal for diagnosis. Prevent physicians from using the noise signal to diagnose. These data were collected using Patch[1] at Ministry of Health and Welfare Tainan Hospital, and the included data have been approved by the Institutional Review Board (IRB).BackgroundTechnology and medical treatment are highly developed in the 21st century, and people have more irregular daily routines and greater life pressure. Cardiovascular disease has become a tough nut to crack when the changing of lifestyle is coupled with the aging of society. The age distribution of patients is wider than ever. A wealth of health information can be obtained through electrocardiogram (ECG) measurement, including cardiac arrhythmias. Severe arrhythmias will lead to many life problems, including palpitations, chest tightness, dizziness, shock, and even life-threatening conditions. Therefore, the monitoring of ECG signal is quite essential.To do our part in the study of arrhythmia, our team started the patient enrollment after gaining the permission of the National Cheng Kung University Hospital Institutional Review Board (NCKUH IRB No. B-ER-104-379) from 2018. We have selected total 128 patients' 24 hours ECG data until now. The results of the arrhythmia label are confirmed by the cardiologist Ju-Yi Chen in NCKUH. Finally, We selected 6 patients from the received signals and made them into a database for researchers to access.MethodsThe NCKU CBIC ECG database contains the ECG recordings from 6 subjects. The signals were collected in Tainan Hospital (Ministry of Health and Welfare) via an ECG acquisition device[1] developed by Your health technology Co., Ltd. The sampling frequency is 400Hz, and the ADC resolution is 12 bits.The age distribution of subjects was from 24 to 76 years old, and each patient was measured at the lead II for 24 hours. After the signal is recorded, four cleaner segments in the morning, noon, evening, and midnight are selected, and each segment is one hour long. The heartbeat of human body is different when sleeping and awake, and some arrhythmia type occurs at sleeping period often. It's hard to detect some arrhythmia at specific time of a day, therefore, we choose signal segments from different time period for a patient, which is more representative of the daily heartbeat condition. It's worth mentioning that the ECG signals from the 6th subject contains too many noise signals in the daytime due to his career type, so the segments from 22:00 to 02:00 are selected.We have collected total 128 patients from Tainan Hospital since 2018. Since most of the ECG data of patients are normal beats, we finally selected the ECG data of six patients which contain clinically significant arrhythmia. The database provides two particular label type for motion artifact and baseline wandering, which are caused by body movement during ECG acquisition. In actual situations, cardiologist doesn't use the noise signals as a basis for diagnosis, therefore, these two specific labels prevent physicians from using noise to make a diagnosis.The original data is first compared with the holter report, and the R peak position and beat labels are manually marked. And then the data were given to a professional cardiologist, Ju-Yi, Chen, for verification. The cardiologist checked the correction and position of beat labels, and chose the acceptable signal segmentation for high quality.Introduction of Ju-Yi, Chen :JU-YI CHEN was born in Tainan, Taiwan, in 1974. He received the M.S. degree from Chang Gung University, Taoyuan City, Taiwan, in 1999 and the Ph.D. degree from the National Cheng Kung University, Tainan, in 2013. Since 2021, he has been a Professor at the Department of Internal Medicine, National Cheng Kung University. His current research interests include the cardiovascular diseases, including arrhythmias, hypertension, arterial stiffness, and cardiac implantable electric devices.Data DescriptionThe file structure and naming rule are described as follows :[The subject number]_[The measurement time] : The directory nameOUTPUT_ECG_data.csv : The one-hour ECG signals ( unit : 0.1V )OUTPUT_peak_label.csv : The arrhythmia type label of R-peakOUTPUT_peak_position.csv : The position of R-peakex : 1_0100 directory contains subject No. 1's data which is measured at 01:00.Arrhythmia diseases and the corresponding label codes :Code Arrhythmia Disease—————————————————————0 Normal1 Atrial Fibrillation2 Supraventricular Tachycardia3 Premature Ventricular Contraction4 Atrial Premature Contraction5 Motion Artifact6 Wandering7 First degree AV block8 Atrial FlutterPS : Wandering represents baseline drifted by 1mV.Patient information :Subject 1: Male,61 yearsSubject 2: Female,77 yearsSubject 3: Male,63 yearsSubject 4: Male,64 yearsSubject 5: Male,24 yearsSubject 6: Male,64 yearsUsage NotesFew public ECG databases provide long-term ECG, our goal in creating the database is to help understand what a person's ECG looks like in a day, and this database is more valuable in obtaining long-term ECG.EthicsOur team has cooperated with National Cheng Kung University Hospital and Tainan Hospital. All the patients enrolled gave their informed consent to participate in the study. The certification of safety-related IEC standards and human study approval are all acquired.Conflicts of InterestThe authors declare that there are no known conflicts of interest.ReferencesS.-Y. Lee, P.-W. Huang, M.-C. Liang, J.-H. Hong, and J.-Y. Chen, "Development of an arrhythmia monitoring system and human study," IEEE Transactions on Consumer Electronics, vol. 64, no. 4, pp. 442-451, 2018.

  16. c

    ECG Heartbeat Categorization Dataset

    • cubig.ai
    Updated May 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). ECG Heartbeat Categorization Dataset [Dataset]. https://cubig.ai/store/products/233/ecg-heartbeat-categorization-dataset
    Explore at:
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The ECG Heartbeat Categorization Dataset contains segmented and preprocessed ECG signals, specifically designed for heartbeat classification. It includes two collections of heartbeat signals from the MIT-BIH Arrhythmia Dataset and the PTB Diagnostic ECG Database. The dataset is intended for exploring heartbeat classification using deep neural networks and transfer learning. It offers a large number of samples for both normal heartbeats and those affected by various arrhythmias and myocardial infarction.

    2) Data Utilization (1) ECG Heartbeat data has characteristics that: • This dataset includes ECG signals for normal heartbeats and those affected by various arrhythmias and myocardial infarction, providing segmented signals corresponding to each heartbeat, making it suitable for heartbeat classification research. (2) ECG Heartbeat data can be used to: • Predictive Modeling: Useful for developing deep neural network models to predict arrhythmias and myocardial infarction based on heartbeat signals. • Medical Research: Contributes to understanding and researching patterns related to various heart diseases through ECG signal analysis. • Healthcare Planning: Supports early diagnosis and personalized treatment planning to help manage the health of heart disease patients.

  17. P

    PTB Diagnostic ECG Database Dataset

    • paperswithcode.com
    Updated Sep 26, 2004
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2004). PTB Diagnostic ECG Database Dataset [Dataset]. https://paperswithcode.com/dataset/ptb
    Explore at:
    Dataset updated
    Sep 26, 2004
    Description

    The ECGs in this collection were obtained using a non-commercial, PTB prototype recorder with the following specifications:

    16 input channels, (14 for ECGs, 1 for respiration, 1 for line voltage) Input voltage: ±16 mV, compensated offset voltage up to ± 300 mV Input resistance: 100 Ω (DC) Resolution: 16 bit with 0.5 μV/LSB (2000 A/D units per mV) Bandwidth: 0 - 1 kHz (synchronous sampling of all channels) Noise voltage: max. 10 μV (pp), respectively 3 μV (RMS) with input short circuit Online recording of skin resistance Noise level recording during signal collection The database contains 549 records from 290 subjects (aged 17 to 87, mean 57.2; 209 men, mean age 55.5, and 81 women, mean age 61.6; ages were not recorded for 1 female and 14 male subjects). Each subject is represented by one to five records. There are no subjects numbered 124, 132, 134, or 161. Each record includes 15 simultaneously measured signals: the conventional 12 leads (i, ii, iii, avr, avl, avf, v1, v2, v3, v4, v5, v6) together with the 3 Frank lead ECGs (vx, vy, vz). Each signal is digitized at 1000 samples per second, with 16 bit resolution over a range of ± 16.384 mV. On special request to the contributors of the database, recordings may be available at sampling rates up to 10 KHz.

    Within the header (.hea) file of most of these ECG records is a detailed clinical summary, including age, gender, diagnosis, and where applicable, data on medical history, medication and interventions, coronary artery pathology, ventriculography, echocardiography, and hemodynamics. The clinical summary is not available for 22 subjects.

  18. The Apnea ECG Database v1.0.0

    • kaggle.com
    Updated May 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paulo Pinheiro (2023). The Apnea ECG Database v1.0.0 [Dataset]. https://www.kaggle.com/datasets/paulopinheiro/the-apnea-ecg-database-v100
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 24, 2023
    Dataset provided by
    Kaggle
    Authors
    Paulo Pinheiro
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Data Description The data consist of 70 records, divided into a learning set of 35 records (a01 through a20, b01 through b05, and c01 through c10), and a test set of 35 records (x01 through x35), all of which may be downloaded from this page. Recordings vary in length from slightly less than 7 hours to nearly 10 hours each. Each recording includes a continuous digitized ECG signal, a set of apnea annotations (derived by human experts on the basis of simultaneously recorded respiration and related signals), and a set of machine-generated QRS annotations (in which all beats regardless of type have been labeled normal). In addition, eight recordings (a01 through a04, b01, and c01 through c03) are accompanied by four additional signals (Resp C and Resp A, chest and abdominal respiratory effort signals obtained using inductance plethysmography; Resp N, oronasal airflow measured using nasal thermistors; and SpO2, oxygen saturation).

    Several files are associated with each recording. The files with names of the form rnn.dat contain the digitized ECGs (16 bits per sample, least significant byte first in each pair, 100 samples per second, nominally 200 A/D units per millivolt). The .hea files are (text) header files that specify the names and formats of the associated signal files; these header files are needed by the software available from this site. The .apn files are (binary) annotation files, containing an annotation for each minute of each recording indicating the presence or absence of apnea at that time; these are available for the 35 learning set recordings only. The qrs files are machine-generated (binary) annotation files, made using sqrs125, and provided for the convenience of those who do not wish to use their own QRS detectors.

    When using this resource, please cite the original publication: T Penzel, GB Moody, RG Mark, AL Goldberger, JH Peter. The Apnea-ECG Database. Computers in Cardiology 2000;27:255-258.

    Please include the standard citation for PhysioNet: (show more options) Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

  19. i

    MIT‑Physio AFib ECG Database

    • ieee-dataport.org
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachit Asthana (2025). MIT‑Physio AFib ECG Database [Dataset]. https://ieee-dataport.org/documents/mit-physio-afib-ecg-database
    Explore at:
    Dataset updated
    Mar 20, 2025
    Authors
    Rachit Asthana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MIT-Physio AFib ECG Database is a comprehensive integrated resource that combines two of the most frequently used datasets for atrial fibrillation research: the MIT‑BIH AFib Database and the PhysioNet/Computing in Cardiology Challenge 2017 dataset. This resource includes 25 long-term 10‑hour recordings with dual-channel ECG signals (recorded at 250 Hz with 12‑bit resolution over ±10 mV) as well as short single‑lead ECG recordings (ranging from 30 to 60 seconds at 300 Hz).

  20. p

    A large scale 12-lead electrocardiogram database for arrhythmia study

    • physionet.org
    • opendatalab.com
    Updated Aug 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jianwei Zheng; Hangyuan Guo; Huimin Chu (2022). A large scale 12-lead electrocardiogram database for arrhythmia study [Dataset]. http://doi.org/10.13026/wgex-er52
    Explore at:
    Dataset updated
    Aug 24, 2022
    Authors
    Jianwei Zheng; Hangyuan Guo; Huimin Chu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This newly inaugurated research database for 12-lead electrocardiogram (ECG) signals was created under the auspices of Chapman University, Shaoxing People’s Hospital (Shaoxing Hospital Zhejiang University School of Medicine), and Ningbo First Hospital. It aims to enable the scientific community in conducting new studies on arrhythmia and other cardiovascular conditions. Certain types of arrhythmias, such as atrial fibrillation, have a pronounced negative impact on public health, quality of life, and medical expenditures. As a non-invasive test, ECG is a major and vital diagnostic tool for detecting these conditions. This practice, however, generates large amounts of data, the analysis of which requires considerable time and effort by human experts. Modern machine learning and statistical tools can be trained on high quality, large data to achieve exceptional levels of automated diagnostic accuracy. Thus, we collected and disseminated this novel database that contains 12-lead ECGs of 45,152 patients with a 500 Hz sampling rate that features multiple common rhythms and additional cardiovascular conditions, all labeled by professional experts. The dataset can be used to design, compare, and fine-tune new and classical statistical and machine learning techniques in studies focused on arrhythmia and other cardiovascular conditions.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Patrick Wagner; Nils Strodthoff; Ralf-Dieter Bousseljot; Wojciech Samek; Tobias Schaeffter (2022). PTB-XL, a large publicly available electrocardiography dataset [Dataset]. http://doi.org/10.13026/kfzx-aw45

PTB-XL, a large publicly available electrocardiography dataset

Explore at:
Dataset updated
Nov 9, 2022
Authors
Patrick Wagner; Nils Strodthoff; Ralf-Dieter Bousseljot; Wojciech Samek; Tobias Schaeffter
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.

The PTB-XL ECG dataset is a large dataset of 21799 clinical 12-lead ECGs from 18869 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.

Search
Clear search
Close search
Google apps
Main menu