100+ datasets found

s
CODE dataset
figshare.scilifelab.se
researchdata.se
Updated Feb 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonio H. Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Derick M. Oliveira; Paulo R. Gomes; Jéssica A. Canazart; Milton P. Ferreira; Carl R. Andersson; Peter W. Macfarlane; Wagner Meira Jr.; Thomas B. Schön; Antonio Luiz P. Ribeiro (2025). CODE dataset [Dataset]. http://doi.org/10.17044/scilifelab.15169716.v1
Explore at:
Unique identifier
https://doi.org/10.17044/scilifelab.15169716.v1
Dataset updated
Feb 27, 2025
Dataset provided by
Uppsala University & UFMG
Authors
Antonio H. Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Derick M. Oliveira; Paulo R. Gomes; Jéssica A. Canazart; Milton P. Ferreira; Carl R. Andersson; Peter W. Macfarlane; Wagner Meira Jr.; Thomas B. Schön; Antonio Luiz P. Ribeiro
License
https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/
Description
Dataset with annotated 12-lead ECG records. The exams were taken in 811 counties in the state of Minas Gerais/Brazil by the Telehealth Network of Minas Gerais (TNMG) between 2010 and 2016. And organized by the CODE (Clinical outcomes in digital electrocardiography) group.Requesting accessResearchers affiliated to educational or research institutions might make requests to access this data dataset. Requests will be analyzed on an individual basis and should contain: Name of PI and host organisation; Contact details (including your name and email); and, the scientific purpose of data access request.If approved, a data user agreement will be forwarded to the researcher that made the request (through the email that was provided). After the agreement has been signed (by the researcher or by the research institution) access to the dataset will be granted.Openly available subset:A subset of this dataset (with 15% of the patients) is openly available. See: "CODE-15%: a large scale annotated dataset of 12-lead ECGs" https://doi.org/10.5281/zenodo.4916206.ContentThe folder contains: A column separated file containing basic patient attributes. The ECG waveforms in the wfdb format.Additional referencesThe dataset is described in the paper "Automatic diagnosis of the 12-lead ECG using a deep neural network". https://www.nature.com/articles/s41467-020-15432-4. Related publications also using this dataset are:- [1] G. Paixao et al., “Validation of a Deep Neural Network Electrocardiographic-Age as a Mortality Predictor: The CODE Study,” Circulation, vol. 142, no. Suppl_3, pp. A16883–A16883, Nov. 2020, doi: 10.1161/circ.142.suppl_3.16883.- [2] A. L. P. Ribeiro et al., “Tele-electrocardiography and bigdata: The CODE (Clinical Outcomes in Digital Electrocardiography) study,” Journal of Electrocardiology, Sep. 2019, doi: 10/gf7pwg.- [3] D. M. Oliveira, A. H. Ribeiro, J. A. O. Pedrosa, G. M. M. Paixao, A. L. P. Ribeiro, and W. Meira Jr, “Explaining end-to-end ECG automated diagnosis using contextual features,” in Machine Learning and Knowledge Discovery in Databases. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Ghent, Belgium, Sep. 2020, vol. 12461, pp. 204--219. doi: 10.1007/978-3-030-67670-4_13.- [4] D. M. Oliveira, A. H. Ribeiro, J. A. O. Pedrosa, G. M. M. Paixao, A. L. Ribeiro, and W. M. Jr, “Explaining black-box automated electrocardiogram classiﬁcation to cardiologists,” in 2020 Computing in Cardiology (CinC), 2020, vol. 47. doi: 10.22489/CinC.2020.452.- [5] G. M. M. Paixão et al., “Evaluation of mortality in bundle branch block patients from an electronic cohort: Clinical Outcomes in Digital Electrocardiography (CODE) study,” Journal of Electrocardiology, Sep. 2019, doi: 10/dcgk.- [6] G. M. M. Paixão et al., “Evaluation of Mortality in Atrial Fibrillation: Clinical Outcomes in Digital Electrocardiography (CODE) Study,” Global Heart, vol. 15, no. 1, p. 48, Jul. 2020, doi: 10.5334/gh.772.- [7] G. M. M. Paixão et al., “Electrocardiographic Predictors of Mortality: Data from a Primary Care Tele-Electrocardiography Cohort of Brazilian Patients,” Hearts, vol. 2, no. 4, Art. no. 4, Dec. 2021, doi: 10.3390/hearts2040035.- [8] G. M. Paixão et al., “ECG-AGE FROM ARTIFICIAL INTELLIGENCE: A NEW PREDICTOR FOR MORTALITY? THE CODE (CLINICAL OUTCOMES IN DIGITAL ELECTROCARDIOGRAPHY) STUDY,” Journal of the American College of Cardiology, vol. 75, no. 11 Supplement 1, p. 3672, 2020, doi: 10.1016/S0735-1097(20)34299-6.- [9] E. M. Lima et al., “Deep neural network estimated electrocardiographic-age as a mortality predictor,” Nature Communications, vol. 12, 2021, doi: 10.1038/s41467-021-25351-7.- [10] W. Meira Jr, A. L. P. Ribeiro, D. M. Oliveira, and A. H. Ribeiro, “Contextualized Interpretable Machine Learning for Medical Diagnosis,” Communications of the ACM, 2020, doi: 10.1145/3416965.- [11] A. H. Ribeiro et al., “Automatic diagnosis of the 12-lead ECG using a deep neural network,” Nature Communications, vol. 11, no. 1, p. 1760, 2020, doi: 10/drkd.- [12] A. H. Ribeiro et al., “Automatic Diagnosis of Short-Duration 12-Lead ECG using a Deep Convolutional Network,” Machine Learning for Health (ML4H) Workshop at NeurIPS, 2018.- [13] A. H. Ribeiro et al., “Automatic 12-lead ECG classiﬁcation using a convolutional network ensemble,” 2020. doi: 10.22489/CinC.2020.130.- [14] V. Sangha et al., “Automated Multilabel Diagnosis on Electrocardiographic Images and Signals,” medRxiv, Sep. 2021, doi: 10.1101/2021.09.22.21263926.- [15] S. Biton et al., “Atrial fibrillation risk prediction from the 12-lead ECG using digital biomarkers and deep representation learning,” European Heart Journal - Digital Health, 2021, doi: 10.1093/ehjdh/ztab071.Code:The following github repositories perform analysis that use this dataset:- https://github.com/antonior92/automatic-ecg-diagnosis- https://github.com/antonior92/ecg-age-predictionRelated Datasets:- CODE-test: An annotated 12-lead ECG dataset (https://doi.org/10.5281/zenodo.3765780)- CODE-15%: a large scale annotated dataset of 12-lead ECGs (https://doi.org/10.5281/zenodo.4916206)- Sami-Trop: 12-lead ECG traces with age and mortality annotations (https://doi.org/10.5281/zenodo.4905618)Ethics declarationsThe CODE Study was approved by the Research Ethics Committee of the Universidade Federal de Minas Gerais, protocol 49368496317.7.0000.5149.
CODE-15%: a large scale annotated dataset of 12-lead ECGs
zenodo.org
csv, zip
Updated Jan 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antônio H. Ribeiro; Antônio H. Ribeiro; Gabriela M.M. Paixao; Gabriela M.M. Paixao; Emilly M. Lima; Emilly M. Lima; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Marcelo M. Pinto Filho; Marcelo M. Pinto Filho; Paulo R. Gomes; Paulo R. Gomes; Derick M. Oliveira; Derick M. Oliveira; Wagner Meira Jr; Wagner Meira Jr; Thömas B Schon; Thömas B Schon; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro (2025). CODE-15%: a large scale annotated dataset of 12-lead ECGs [Dataset]. http://doi.org/10.5281/zenodo.4916206
Explore at:
zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4916206
Dataset updated
Jan 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antônio H. Ribeiro; Antônio H. Ribeiro; Gabriela M.M. Paixao; Gabriela M.M. Paixao; Emilly M. Lima; Emilly M. Lima; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Marcelo M. Pinto Filho; Marcelo M. Pinto Filho; Paulo R. Gomes; Paulo R. Gomes; Derick M. Oliveira; Derick M. Oliveira; Wagner Meira Jr; Wagner Meira Jr; Thömas B Schon; Thömas B Schon; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A dataset of 12-lead ECGs with annotations. The dataset contains 345 779 exams from 233 770 patients. It was obtained through stratified sampling from the CODE dataset ( 15% of the patients). The data was collected by the Telehealth Network of Minas Gerais in the period between 2010 and 2016.

This repository contains the files `exams.csv` and the files `exams_part{i}.zip` for i = 0, 1, 2, ... 17.

"exams.csv": is a comma-separated values (csv) file containing the columns

"exam_id": id used for identifying the exam;

"age": patient age in years at the moment of the exam;

"is_male": true if the patient is male;

"nn_predicted_age": age predicted by a neural network to the patient. As described in the paper "Deep neural network estimated electrocardiographic-age as a mortality predictor" bellow.

"1dAVb": Whether or not the patient has 1st degree AV block;

"RBBB": Whether or not the patient has right bundle branch block;

"LBBB": Whether or not the patient has left bundle branch block;

"SB": Whether or not the patient has sinus bradycardia;

"AF": Whether or not the patient has atrial fibrillation;

"ST": Whether or not the patient has sinus tachycardia;

"patient_id": id used for identifying the patient;

"normal_ecg": True if automatic annotation system say it is a normal ECG;

"death": true if the patient dies in the follow-up time. This data is available only in the first exam of the patient. Other exams will have this as an empty field;

"timey": if the patient dies it is the time to the death of the patient. If not, it is the follow-up time. This data is available only in the first exam of the patient. Other exams will have this as an empty field;

"trace_file": identify in which hdf5 file the file corresponding to this patient is located.

"exams_part{i}.hdf5": The HDF5 file containing two datasets named `tracings` and other named `exam_id`. The `exam_id` is a tensor of dimension `(N,)` containing the exam id (the same as in the csv file) and the dataset `tracings` is a `(N, 4096, 12)` tensor containing the ECG tracings in the same order. The first dimension corresponds to the different exams; the second dimension corresponds to the 4096 signal samples; the third dimension to the 12 different leads of the ECG exams in the following order: `{DI, DII, DIII, AVR, AVL, AVF, V1, V2, V3, V4, V5, V6}`. The signals are sampled at 400 Hz. Some signals originally have a duration of 10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples). In order to make them all have the same size (4096 samples), we fill them with zeros on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648 samples at the beginning and 648 samples at the end, yielding 4096 samples that are then saved in the hdf5 dataset.
In python, one can read this file using h5py.
```python
import h5py

f = h5py.File(path_to_file, 'r')
# Get ids
traces_ids = np.array(self.f['id_exam'])
x = f['signal']
```
The `signal` dataset is too large to fit in memory, so don't convert it to a numpy array all at once.
It is possible to access a chunk of it using: ``x[start:end, :, :]``.

The CODE dataset was collected by the Telehealth Network of Minas Gerais (TNMG) in the period between 2010 and 2016. TNMG is a public telehealth system assisting 811 out of the 853 municipalities in the state of Minas Gerais, Brazil. The dataset is described

Ribeiro, Antônio H., Manoel Horta Ribeiro, Gabriela M. M. Paixão, Derick M. Oliveira, Paulo R. Gomes, Jéssica A. Canazart, Milton P. S. Ferreira, et al. “Automatic Diagnosis of the 12-Lead ECG Using a Deep Neural Network.” Nature Communications 11, no. 1 (2020): 1760. https://doi.org/10.1038/s41467-020-15432-4

The CODE 15% dataset is obtained from stratified sampling from the CODE dataset. This subset of the code dataset is described in and used for assessing model performance:
"Deep neural network estimated electrocardiographic-age as a mortality predictor"
Emilly M Lima, Antônio H Ribeiro, Gabriela MM Paixão, Manoel Horta Ribeiro, Marcelo M Pinto Filho, Paulo R Gomes, Derick M Oliveira, Ester C Sabino, Bruce B Duncan, Luana Giatti, Sandhi M Barreto, Wagner Meira Jr, Thomas B Schön, Antonio Luiz P Ribeiro. MedRXiv (2021) https://www.doi.org/10.1101/2021.02.19.21251232

The companion code for reproducing the experiments in the two papers described above can be found, respectively, in:
- https://github.com/antonior92/automatic-ecg-diagnosis; and in,
- https://github.com/antonior92/ecg-age-prediction.

Note about authorship: Antônio H. Ribeiro, Emilly M. Lima and Gabriela M.M. Paixão contributed equally to this work.

CODE-test: An annotated 12-lead ECG dataset

zenodo.org
data.niaid.nih.gov

zip

Updated Jun 7, 2021

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Antonio H Ribeiro; Antonio H Ribeiro; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Gabriela M. Paixão; Derick M. Oliveira; Derick M. Oliveira; Paulo R. Gomes; Paulo R. Gomes; Jéssica A. Canazart; Jéssica A. Canazart; Milton P. Ferreira; Milton P. Ferreira; Carl R. Andersson; Carl R. Andersson; Peter W. Macfarlane; Peter W. Macfarlane; Wagner Meira Jr.; Wagner Meira Jr.; Thomas B. Schön; Thomas B. Schön; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro (2021). CODE-test: An annotated 12-lead ECG dataset [Dataset]. http://doi.org/10.5281/zenodo.3765780

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3765780

Dataset updated

Jun 7, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

# Annotated 12 lead ECG dataset

Contain 827 ECG tracings from different patients, annotated by several cardiologists, residents and medical students. It is used as test set on the paper: "Automatic diagnosis of the 12-lead ECG using a deep neural network". https://www.nature.com/articles/s41467-020-15432-4.

It contain annotations about 6 different ECGs abnormalities:
- 1st degree AV block (1dAVb);
- right bundle branch block (RBBB);
- left bundle branch block (LBBB);
- sinus bradycardia (SB);
- atrial fibrillation (AF); and,
- sinus tachycardia (ST).

Companion python scripts are available in:
https://github.com/antonior92/automatic-ecg-diagnosis

--------

Citation
```
Ribeiro, A.H., Ribeiro, M.H., Paixão, G.M.M. et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun 11, 1760 (2020). https://doi.org/10.1038/s41467-020-15432-4
```

Bibtex:
```
@article{ribeiro_automatic_2020,
 title = {Automatic Diagnosis of the 12-Lead {{ECG}} Using a Deep Neural Network},
 author = {Ribeiro, Ant{\^o}nio H. and Ribeiro, Manoel Horta and Paix{\~a}o, Gabriela M. M. and Oliveira, Derick M. and Gomes, Paulo R. and Canazart, J{\'e}ssica A. and Ferreira, Milton P. S. and Andersson, Carl R. and Macfarlane, Peter W. and Meira Jr., Wagner and Sch{\"o}n, Thomas B. and Ribeiro, Antonio Luiz P.},
 year = {2020},
 volume = {11},
 pages = {1760},
 doi = {https://doi.org/10.1038/s41467-020-15432-4},
 journal = {Nature Communications},
 number = {1}
}
```
-----


## Folder content:

- `ecg_tracings.hdf5`: The HDF5 file containing a single dataset named `tracings`. This dataset is a `(827, 4096, 12)` tensor. The first dimension correspond to the 827 different exams from different patients; the second dimension correspond to the 4096 signal samples; the third dimension to the 12 different leads of the ECG exams in the following order: `{DI, DII, DIII, AVR, AVL, AVF, V1, V2, V3, V4, V5, V6}`. 

The signals are sampled at 400 Hz. Some signals originally have a duration of 10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples). In order to make them all have the same size (4096 samples) we fill them with zeros on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648 samples at the beginning and 648 samples at the end, yielding 4096 samples that are them saved in the hdf5 dataset. All signal are represented as floating point numbers at the scale 1e-4V: so it should be multiplied by 1000 in order to obtain the signals in V.

In python, one can read this file using the following sequence:
```python
import h5py
with h5py.File(args.tracings, "r") as f:
  x = np.array(f['tracings'])
```

- The file `attributes.csv` contain basic patient attributes: sex (M or F) and age. It
contain 827 lines (plus the header). The i-th tracing in `ecg_tracings.hdf5` correspond to the i-th line.
- `annotations/`: folder containing annotations csv format. Each csv file contain 827 lines (plus the header). The i-th line correspond to the i-th tracing in `ecg_tracings.hdf5` correspond to the in all csv files. The csv files all have 6 columns `1dAVb, RBBB, LBBB, SB, AF, ST`
corresponding to weather the annotator have detect the abnormality in the ECG (`=1`) or not (`=0`).
 1. `cardiologist[1,2].csv` contain annotations from two different cardiologist.
 2. `gold_standard.csv` gold standard annotation for this test dataset. When the cardiologist 1 and cardiologist 2 agree, the common diagnosis was considered as gold standard. In cases where there was any disagreement, a third senior specialist, aware of the annotations from the other two, decided the diagnosis. 
 3. `dnn.csv` prediction from the deep neural network described in the paper. THe threshold is set in such way it maximizes the F1 score.
 4. `cardiology_residents.csv` annotations from two 4th year cardiology residents (each annotated half of the dataset).
 5. `emergency_residents.csv` annotations from two 3rd year emergency residents (each annotated half of the dataset).
 6. `medical_students.csv` annotations from two 5th year medical students (each annotated half of the dataset).

H
Data from: TELE ECG Database: 250 telehealth ECG records (collected using...
dataverse.harvard.edu
Updated Sep 6, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heba Khamis; Robert Weiss; Yang Xie; Chan-Wei Chang; Nigel H. Lovell; Stephen J. Redmond (2016). TELE ECG Database: 250 telehealth ECG records (collected using dry metal electrodes) with annotated QRS and artifact masks, and MATLAB code for the UNSW artifact detection and UNSW QRS detection algorithms [Dataset]. http://doi.org/10.7910/DVN/QTG0EP
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/QTG0EP
Dataset updated
Sep 6, 2016
Dataset provided by
Harvard Dataverse
Authors
Heba Khamis; Robert Weiss; Yang Xie; Chan-Wei Chang; Nigel H. Lovell; Stephen J. Redmond
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
Australian Research Council
Description
------------------------------------------------------------------------------------------------------------- CITATION ------------------------------------------------------------------------------------------------------------- Please cite this data and code as: H. Khamis, R. Weiss, Y. Xie, C-W. Chang, N. H. Lovell, S. J. Redmond, "QRS detection algorithm for telehealth electrocardiogram recordings," IEEE Transaction in Biomedical Engineering, vol. 63(7), p. 1377-1388, 2016. ------------------------------------------------------------------------------------------------------------- DATABASE DESCRIPTION ------------------------------------------------------------------------------------------------------------- The following description of the TELE database is from Khamis et al (2016): "In Redmond et al (2012), 300 ECG single lead-I signals recorded in a telehealth environment are described. The data was recorded using the TeleMedCare Health Monitor (TeleMedCare Pty. Ltd. Sydney, Australia). This ECG is sampled at a rate of 500 Hz using dry metal Ag/AgCl plate electrodes which the patient holds with each hand; a reference electrode plate is also positioned under the pad of the right hand. Of the 300 recordings, 250 were selected randomly from 120 patients, and the remaining 50 were manually selected from 168 patients to obtain a larger representation of poor quality data. Three independent scorers annotated the data by identifying sections of artifact and QRS complexes. All scorers then annotated the signals as a group, to reconcile the individual annotations. Sections of the ECG signal which were less than 5 s in duration were considered to be part of the neighboring artifact sections and were subsequently masked. QRS annotations in the masked regions were discarded prior to the artifact mask and QRS locations being saved. Of the 300 telehealth ECG records in Redmond et al. (2012), 50 records (including 29 of the 250 randomly selected records and 21 of the 50 manually selected records) were discarded as all annotated RR intervals within these records overlap with the annotated artifact mask and therefore, no heart rate can be calculated, which is required for measuring algorithm performance. The remaining 250 records will be referred to as the TELE database." For all 250 recordings in the TELE database, the mains frequency was 50 Hz, the sampling frequency was 500 Hz and the top and bottom rail voltages were 5.556912223578890 and -5.554198887532222 mV respectively. ------------------------------------------------------------------------------------------------------------- DATA FILE DESCRIPTION ------------------------------------------------------------------------------------------------------------- Each record in the TELE database is stored as a X_Y.dat file where X indicates the index of the record in the TELE database (containing a total of 250 records) and Y indicates the index of the record in the original dataset containing 300 records (see Redmond et al. 2012). The .dat file is a comma separated values file. Each line contains: - the ECG sample value (mV) - a boolean indicating the locations of the annotated qrs complexes - a boolean indicating the visually determined mask - a boolean indicating the software determined mask (see Khamis et al. 2016) ------------------------------------------------------------------------------------------------------------- CONVERTING DATA TO MATLAB STRUCTURE ------------------------------------------------------------------------------------------------------------- A matlab function (readFromCSV_TELE.m) has been provided to read the .dat files into a matlab structure: %% % [DB,fm,fs,rail_mv] = readFromCSV_TELE(DATA_PATH) % % Extracts the data for each of the 250 telehealth ECG records of the TELE database [1] % and returns a structure containing all data, annotations and masks. % % IN: DATA_PATH - String. The path containing the .hdr and .dat files % % OUT: DB - 1xM Structure. Contains the extracted data from the M (250) data files. % The structure has fields: % * data_orig_ind - 1x1 double. The index of the data file in the original dataset of 300 records (see [1]) - for tracking purposes. % * ecg_mv - 1xN double. The ecg samples (mV). N is the number of samples for the data file. % * qrs_annotations - 1xN double. The qrs complexes - value of 1 where a qrs is located and 0 otherwise. % * visual_mask - 1xN double. The visually determined artifact mask - value of 1 where the data is masked and 0 otherwise. % * software_mask - 1xN double. The software artifact mask - value of 1 where the data is masked and 0 otherwise. % fm - 1x1 double. The mains frequency (Hz) % fs - 1x1 double. The sampling frequency (Hz) % rail_mv - 1x2 double. The bottom and top rail voltages (mV) % % If you use this code or data, please cite as follows: % % [1] H. Khamis, R. Weiss, Y. Xie, C-W. Chang, N. H. Lovell, S. J. Redmond, % "QRS detection algorithm...
PTB-XL ECG dataset
kaggle.com
opendatalab.com
Updated Feb 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
khyeh (2021). PTB-XL ECG dataset [Dataset]. https://www.kaggle.com/khyeh0719/ptb-xl-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
khyeh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Source: https://physionet.org/content/ptb-xl/1.0.1/

Abstract

Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.

The PTB-XL ECG dataset is a large dataset of 21837 clinical 12-lead ECGs from 18885 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.

Background

The waveform data underlying the PTB-XL ECG dataset was collected with devices from Schiller AG over the course of nearly seven years between October 1989 and June 1996. With the acquisition of the original database from Schiller AG, the full usage rights were transferred to the PTB. The records were curated and converted into a structured database within a long-term project at the Physikalisch-Technische Bundesanstalt (PTB). The database was used in a number of publications, see e.g. [1,2], but the access remained restricted until now. The Institutional Ethics Committee approved the publication of the anonymous data in an open-access database (PTB-2020-1). During the public release process in 2019, the existing database was streamlined with particular regard to usability and accessibility for the machine learning community. Waveform and metadata were converted to open data formats that can easily processed by standard software.

Methods

Data Acquisition

Raw signal data was recorded and stored in a proprietary compressed format. For all signals, we provide the standard set of 12 leads (I, II, III, AVL, AVR, AVF, V1, ..., V6) with reference electrodes on the right arm.

The corresponding general metadata (such as age, sex, weight and height) was collected in a database.

Each record was annotated with a report string (generated by cardiologist or automatic interpretation by ECG-device) which was converted into a standardized set of SCP-ECG statements (scp_codes). For most records also the heart’s axis (heart_axis) and infarction stadium (infarction_stadium1 and infarction_stadium2, if present) were extracted.

A large fraction of the records was validated by a second cardiologist.

All records were validated by a technical expert focusing mainly on signal characteristics.

Data Preprocessing

ECGs and patients are identified by unique identifiers (ecg_id and patient_id). Personal information in the metadata, such as names of validating cardiologists, nurses and recording site (hospital etc.) of the recording was pseudonymized. The date of birth only as age at the time of the ECG recording, where ages of more than 89 years appear in the range of 300 years in compliance with HIPAA standards. Furthermore, all ECG recording dates were shifted by a random offset for each patient. The ECG statements used for annotating the records follow the SCP-ECG standard [3].

Data Description

In general, the dataset is organized as follows: ptbxl ├── ptbxl_database.csv ├── scp_statements.csv ├── records100 ├── 00000 │ │ ├── 00001_lr.dat │ │ ├── 00001_lr.hea │ │ ├── ... │ │ ├── 00999_lr.dat │ │ └── 00999_lr.hea │ ├── ... │ └── 21000 │ ├── 21001_lr.dat │ ├── 21001_lr.hea │ ├── ... │ ├── 21837_lr.dat │ └── 21837_lr.hea └── records500 ├── 00000 │ ├── 00001_hr.dat │ ├── 00001_hr.hea │ ├── ... │ ├── 00999_hr.dat │ └── 00999_hr.hea ├── ... └── 21000 ├── 21001_hr.dat ├── 21001_hr.hea ├── ... ├── 21837_hr.dat └── 21837_hr.hea The dataset comprises 21837 clinical 12-lead ECG records of 10 seconds length from 18885 patients, where 52% are male and 48% are female with ages covering the whole range from 0 to 95 years (median 62 and interquantile range of 22). The value of the dataset results from the comprehensive collection of many different co-occurring path...
P
ECG Heartbeat Categorization Dataset Dataset
paperswithcode.com
Updated Aug 16, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Kachuee; Shayan Fazeli; Majid Sarrafzadeh (2018). ECG Heartbeat Categorization Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/ecg-heartbeat-categorization-dataset
Explore at:
Dataset updated
Aug 16, 2018
Authors
Mohammad Kachuee; Shayan Fazeli; Majid Sarrafzadeh
Description
This dataset is composed of two collections of heartbeat signals derived from two famous PhysioNet datasets in heartbeat classification, the MIT-BIH Arrhythmia Dataset and the PTB Diagnostic ECG Database. The number of samples in both collections is large enough for training a deep neural network.

This dataset has been used in exploring heartbeat classification using deep neural network architectures, and observing some of the capabilities of transfer learning on it. The signals correspond to electrocardiogram (ECG) shapes of heartbeats for the normal case and the cases affected by different arrhythmias and myocardial infarction. These signals are preprocessed and segmented, with each segment corresponding to a heartbeat.
Z
Surface electrocardiogram (ECG) dataset recorded during relaxation in 70...
data.niaid.nih.gov
zenodo.org
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miljković, Nadica (2024). Surface electrocardiogram (ECG) dataset recorded during relaxation in 70 healthy subjects [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5599238
Explore at:
Dataset updated
Jul 17, 2024
Dataset provided by
Milašinović, Goran
Boljanić, Tanja
Miljković, Nadica
Knežević, Goran
Lazarević B., Ljiljana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Study Sample and Ethics Statement

The sample consisted of 71 university students, average age 20.38 years (SD = 2.96), 78.8% female. Subjects with previous cardio-vascular disorders and irregular ECG were excluded. The study has been approved by the Institutional Review Board of the Department of Psychology, University of Belgrade No. 2018-19. All participants signed Informed Consents in accordance with the Declaration of Helsinki.

In the course of visual examination, it was decided to discard ECG from one subject due to the presence of bigeminial arythmia, so further analysis was performed on 70 subjects instead of 71.

Measurement Setup

BIOPAC sensors (Biopac Systems Inc., Camino Goleta, CA, USA) were used for recording biosignals in another study (Bjegojević et al., 2020). Here, we used only ECG signals recorded in sitting relaxed position from standard bipolar Lead I using the BIOPAC MP150 unit with AcqKnowledge software and ECG 100C module with surface H135SG Ag/AgCl electrodes (Kendall/Covidien, Dublin, Ireland). In order to decrease skin-electrode impedance, the skin was cleaned with Nuprep gel (Weaver & Co., Aurora, USA) to reduce skin-electrode impedance. The sampling frequency was set at 2000 Hz and the gain was set to 1000.

ECG signals were recorded during relaxation in a sitting position and data were recorded during 2 min long intervals. More information is available in the article [1].

Dataset, Code, and Feature Extraction Instructions

analysisECG.R, function with analysis procedures written in R programming language

anec12919-sup-0001-supinfo.pdf, detailed ECG processing and feature extraction procedure (also available as supplementary material for article [1])

ecg_70.txt, .txt data file, text format

mainECG.R, a main program written in R programming language

R-studio-version-info.txt, the version of R Studio where the code was tested

R-version-info.txt , the version of R programming language where the code was tested

For ECG-based feature extraction, we used the following R packages:

signal - Signal Processing Functions (signal developers (2014). signal: Signal processing. http://r-forge.r-project.org/projects/signal/)

pracma - Practical Numerical Math Functions ( Borchers, H. W. (2019). Package ‘pracma’: Practical numerical math functions. R package version, 2(1). https://CRAN.R-project.org/package=pracma)

Please, note that the results of personality trait tests are not available in the current dataset. We are planning to open them in our future research. For more information and planned availability in open access, please, contact the corresponding author of [1] by e-mail (nadica.miljkovic@etf.bg.ac.rs).

Citing Instruction

If you find these signals and code useful for your own research or teaching class, please cite relevant dataset and supporting publications:

Boljanić, T., Miljković, N., Lazarević, L. B., Knežević, G., & Milašinović, G. (2021). Relationship between electrocardiogram-based features and personality traits: Machine learning approach. Annals of Noninvasive Electrocardiology, 00, e12919. https://doi.org/10.1111/anec.12919

Bjegojević, B., Milosavljević, N., Dubljević, O., Purić, D., & Knežević, G. (2020). In pursuit of objectivity: Physiological measures as a means of emotion induction procedure validation. XXIVI Scientific Conference on Empirical Studies in Psychology, p. 17-19.

Boljanić, T., Miljković, N., Lazarević B. Lj., Knežević, G., & Milašinović, G. (2021). Surface electrocardiogram (ECG) dataset recorded during relaxation in 70 healthy subjects (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5599239
Z
ECG in High Intensity Exercise Dataset
data.niaid.nih.gov
opendatalab.com
+3more
Updated Dec 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atienza, David (2021). ECG in High Intensity Exercise Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5727799
Explore at:
Dataset updated
Dec 26, 2021
Dataset provided by
David Meier
Atienza, David
Millet, Grégoire
Teijeiro, Tomas
De Giovanni, Elisabetta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data presented here was extracted from a larger dataset collected through a collaboration between the Embedded Systems Laboratory (ESL) of the Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland and the Institute of Sports Sciences of the University of Lausanne (ISSUL). In this dataset, we report the extracted segments used for an analysis of R peak detection algorithms during high intensity exercise.

Protocol of the experiments The protocol of the experiment was the following.

22 subjects performing a cardio-pulmonary maximal exercise test on a cycle ergometer, using a gas mask. A single-lead electrocardiogram (ECG) was measured using the BIOPAC system.

An initial 3 min of rest were recorded.

After this baseline, the subjects started cycling at a power of 60W or 90W depending on their fitness level.

Then, the power of the cycle ergometer was increased by 30W every 3 min till exhaustion (in terms of maximum oxygen uptake or VO2max).

Finally, physiology experts assessed the so-called ventilatory thresholds and the VO2max based on the pulmonary data (volume of oxygen and CO2).

Description of the extracted dataset

The characteristics of the dataset are the following:

We report only 20 out of 22 subjects that were used for the analysis, because for two subjects the signals were too corrupted or not complete. Specifically, subjects 5 and 12 were discarded.

The ECG signal was sampled at 500 Hz and then downsampled at 250 Hz. The original ECG signal were measured at maximum 10 mV. Then, they were scaled down by a factor of 1000, hence the data is represented in uV.

For each subject, 5 segments of 20 s were extracted from the ECG recordings and chosen based on different phases of the maximal exercise test (i.e., before and after the so-called second ventilatory threshold or VT2, before and in the middle of VO2max, and during the recovery after exhaustion) to represent different intensities of physical activity.

seg1 --> [VT2-50,VT2-30] seg2 --> [VT2+60,VT2+80] seg3 --> [VO2max-50,VO2max-30] seg4 --> [VO2max-10,VO2max+10] seg5 --> [VO2max+60,VO2max+80]

The R peak locations were manually annotated in all segments and reviewed by a physician of the Lausanne University Hospital, CHUV. Only segment 5 of subject 9 could not be annotated since there was a problem with the input signal. So, the total number of segments extracted were 20 * 5 - 1 = 99.

Format of the extracted dataset

The dataset is divided in two main folders:

The folder ecg_segments/ contains the ECG signals saved in two formats, .csv and .mat. This folder includes both raw (ecg_raw) and processed (ecg) signals. The processing consists of a morphological filtering and a relative energy non filtering method to enhance the R peaks. The .csv files contain only the signal, while the .mat files include the signal, the time vector within the maximal stress test, the sampling frequency and the unit of the signal amplitude (uV, as we mentioned before).

The folder manual_annotations/ contains the sample indices of the annotated R peaks in .csv format. The annotation was done on the processed signals.
ECG SIGNALS
kaggle.com
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Radwa Kandeel (2025). ECG SIGNALS [Dataset]. https://www.kaggle.com/datasets/radwakandeel/ecg-signals/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Radwa Kandeel
Description
Dataset

This dataset was created by Radwa Kandeel

Contents
P
MedalCare-XL Dataset
paperswithcode.com
zenodo.org
Updated Nov 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). MedalCare-XL Dataset [Dataset]. https://paperswithcode.com/dataset/medalcare-xl
Explore at:
Dataset updated
Nov 28, 2022
Description
Mechanistic cardiac electrophysiology models allow for personalized simulations of the electrical activity in the heart and the ensuing electrocardiogram (ECG) on the body surface. As such, synthetic signals possess precisely known ground truth labels of the underlying disease (model parameterization) and can be employed for validation of machine learning ECG analysis tools in addition to clinical signals. Recently, synthetic ECG signals were used to enrich sparse clinical data for machine learning or even replace them completely during training leading to good performance on real-world clinical test data.

We thus generated a large synthetic database comprising a total of 16,900 12~lead ECGs based on multi-scale electrophysiological simulations equally distributed into 1 normal healthy control and 7 pathology classes. The pathological case of myocardial infraction had 6 sub-classes. A comparison of extracted timing and amplitude features between the virtual cohort and a large publicly available clinical ECG database demonstrated that the synthetic signals represent clinical ECGs for healthy and pathological subpopulations with high fidelity. The novel dataset of simulated ECG signals is split into training, validation and test data folds for development of novel machine learning algorithms and their objective assessment.

This folder WP2_largeDataset_Noise contains the 12-lead ECGs of 10 seconds length. Each ECG is stored in a separate CSV file with one row per lead (lead order: I, II, III, aVR, aVL, aVF, V1-V6) and one sample per column (sampling rate: 500Hz). Data are split by pathologies (avblock = AV block, lbbb = left bundle branch block, rbbb = right bundle branch block, sinus = normal sinus rhythm, lae = left atrial enlargement, fam = fibrotic atrial cardiomyopathy, iab = interatrial conduction block, mi = myocardial infarction). MI data are further split into subclasses depending on the occlusion site (LAD, LCX, RCA) and transmurality (0.3 or 1.0). Each pathology subclass contains training, validation and testing data (~ 70/15/15 split). Training, validation and testing datasets were defined according to the model with which QRST complexes were simulated, i.e., ECGs calculated with the same anatomical model but different electrophysiological parameters are only present in one of the test, validation and training datasets but never in multiple. Each subfolder also contains a "siginfo.csv" file specifying the respective simulation run for the P wave and the QRST segment that was used to synthesize the 10 second ECG segment. Each signal is available in three variations: run_raw.csv contains the synthesized ECG without added noise and without filtering runnoise.csv contains the synthesized ECG (unfiltered) with superimposed noise run*_filtered.csv contains the filtered synthesized ECG (fiter settings: highpass cutoff frequency 0.5Hz, lowpass cutoff frequency 150Hz, butterworth filters of order 3).

The folder WP2_largeDataset_ParameterFiles contains the parameter files used to simulate the 12-lead ECGs. Parameters are split for atrial and ventricular simulations, which were run independently from one another. See Gillette, Gsell, Nagel* et al. "MedalCare-XL: 16,900 healthy and pathological electrocardiograms obtained through multi-scale electrophysiological models" for a description of the model parameters.
ECG-DATASET
kaggle.com
Updated Apr 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EL-Sara (2021). ECG-DATASET [Dataset]. https://www.kaggle.com/elomarysara/ecgdataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 7, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
EL-Sara
Description
Dataset

This dataset was created by EL-Sara

Contents
R
Ecg Signal Classification Dataset
universe.roboflow.com
zip
Updated Aug 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Faizan (2022). Ecg Signal Classification Dataset [Dataset]. https://universe.roboflow.com/muhammad-faizan-b3pid/ecg-signal-classification/model/1
Explore at:
zipAvailable download formats
Dataset updated
Aug 11, 2022
Dataset authored and provided by
Muhammad Faizan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Heart Signal Analysis
Description
ECG Signal Classification

## Overview ECG Signal Classification is a dataset for classification tasks - it contains Heart Signal Analysis annotations for 375 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
PTB XL Dataset - Reformatted
kaggle.com
zip
Updated Feb 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kun Hao Yeh (2021). PTB XL Dataset - Reformatted [Dataset]. https://www.kaggle.com/khyeh0719/ptb-xl-dataset-reformatted
Explore at:
zip(502936797 bytes)Available download formats
Dataset updated
Feb 23, 2021
Authors
Kun Hao Yeh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.

The PTB-XL ECG dataset is a large dataset of 21837 clinical 12-lead ECGs from 18885 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.

Background

The waveform data underlying the PTB-XL ECG dataset was collected with devices from Schiller AG over the course of nearly seven years between October 1989 and June 1996. With the acquisition of the original database from Schiller AG, the full usage rights were transferred to the PTB. The records were curated and converted into a structured database within a long-term project at the Physikalisch-Technische Bundesanstalt (PTB). The database was used in a number of publications, see e.g. [1,2], but the access remained restricted until now. The Institutional Ethics Committee approved the publication of the anonymous data in an open-access database (PTB-2020-1). During the public release process in 2019, the existing database was streamlined with particular regard to usability and accessibility for the machine learning community. Waveform and metadata were converted to open data formats that can easily processed by standard software.

Method

This dataset is generated by processing the raw dataset with this notebook.

Dataset Details

Files

train_12_lead_ecgs.pkl- ECG signals as pickled numpy format in train set.

valid_12_lead_ecgs.pkl- ECG signals as pickled numpy format in valid set.

test_12_lead_ecgs.pkl- ECG signals as pickled numpy format in test set.

train_table.csv- patient's meta features and ECG diagnosis in train set.

valid_table.csv- patient's meta features and ECG diagnosis in valid set.

test_table.csv- patient's meta features and ECG diagnosis in test set.

How to work with pickle files:

import pandas as pd train_ecgs = pd.read_pickle('train_12_lead_ecgs.pkl') # train_ecgs is of shape (number of ECG records, 1000, 12) # 1000 is signal data points for each ECG record # 12 stands for 12-channel from 12-lead

Columns

ecg_id- ID used in the raw data from: https://www.kaggle.com/khyeh0719/ptb-xl-dataset and paper

strat_fold- stratified fold as suggested from the paper

age, sex, height, weight, nurse, site, device- patient's information

NORM- Diagnosis for normal ECG

MI- Diagnosis for Myocardial Infarction, a myocardial infarction (MI), commonly known as a heart attack, occurs when blood flow decreases or stops to a part of the heart, causing damage to the heart muscle.

STTC- Diagnosis for ST/T Change, ST and T wave changes may represent cardiac pathology or be a normal variant. Interpretation of the findings, therefore, depends on the clinical context and presence of similar findings on prior electrocardiograms

CD- Diagnosis for Conduction Disturbance. Your heart rhythm is the way your heartbeats. Conduction is how electrical impulses travel through your heart, which causes it to beat. Some conduction disorders can cause arrhythmias or irregular heartbeats.

HYP- Diagnosis for Hypertrophy, Hypertrophic cardiomyopathy (HCM) is a disease in which the heart muscle becomes abnormally thick (hypertrophied). The thickened heart muscle can make it harder for the heart to pump blood.

sub_- Columns with the 'sub_' prefix are more detailed diagnosis for ECG.
h
deepfake-ecg-small
huggingface.co
Updated Apr 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepSynthBody (2023). deepfake-ecg-small [Dataset]. https://huggingface.co/datasets/deepsynthbody/deepfake-ecg-small
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 24, 2023
Dataset authored and provided by
DeepSynthBody
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ECG Dataset

This repository contains an small version of the ECG dataset: https://huggingface.co/datasets/deepsynthbody/deepfake_ecg, split into training, validation, and test sets. The dataset is provided as CSV files and corresponding ECG data files in .asc format. The ECG data files are organized into separate folders for the train, validation, and test sets.

Folder Structure

. ├── train.csv ├── validate.csv ├── test.csv ├── train │ ├── file_1.asc │ ├── file_2.asc… See the full description on the dataset page: https://huggingface.co/datasets/deepsynthbody/deepfake-ecg-small.
f
Non-Contact ECG
figshare.com
txt
Updated Mar 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sphere nature (2025). Non-Contact ECG [Dataset]. http://doi.org/10.6084/m9.figshare.28425452.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28425452.v2
Dataset updated
Mar 8, 2025
Dataset provided by
figshare
Authors
sphere nature
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Thank you very much for your interest in our work!This repository contains data and code for the AI Radar ECG.Below are the key components:Demo.m: Tests the trained models using the test dataset.TrainingPerformance.m: Displays the training performance of the trained models.TrainedModels.m: Trains the models using the training and validation datasets.PQRST_Detection.m: Detects P, Q, R, S, and T waves in radar ECGs.Thank you!Wishing you happiness every day.
P
MIMIC-IV-ECG Dataset
paperswithcode.com
Updated Dec 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). MIMIC-IV-ECG Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iv-ecg
Explore at:
Dataset updated
Dec 24, 2022
Description
The MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sampled at 500 Hz. This subset contains all of the ECGs for patients who appear in the MIMIC-IV Clinical Database. When a cardiologist report is available for a given ECG, we provide the needed information to link the waveform to the report. The patients in MIMIC-IV-ECG have been matched against the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules.
h
FSL_ECG_QA_Dataset
huggingface.co
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jialu tang (2025). FSL_ECG_QA_Dataset [Dataset]. https://huggingface.co/datasets/jialucode/FSL_ECG_QA_Dataset
Explore at:
Dataset updated
Jul 2, 2025
Authors
jialu tang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
FSL ECG QA Dataset

FSL ECG QA Dataset is a benchmark dataset used in paper "Electrocardiogram–Language Model for Few-Shot Question Answering with Meta Learning". It supports research in combining electrocardiogram (ECG) signals with natural language question answering (QA), particularly in few-shot and meta-learning scenarios.

Dataset Highlights

🧠 Task Diversification: Restructured ECG-QA tasks promote rapid few-shot adaptation.… See the full description on the dataset page: https://huggingface.co/datasets/jialucode/FSL_ECG_QA_Dataset.
R
Ecg Classification Dataset
universe.roboflow.com
zip
Updated Apr 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Individual Project (2024). Ecg Classification Dataset [Dataset]. https://universe.roboflow.com/individual-project-kt6if/ecg-classification-ygs4v/model/1
Explore at:
zipAvailable download formats
Dataset updated
Apr 28, 2024
Dataset authored and provided by
Individual Project
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
ECG Normal Bounding Boxes
Description
ECG Classification

## Overview ECG Classification is a dataset for object detection tasks - it contains ECG Normal annotations for 491 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
P
PTB Diagnostic ECG Database Dataset
paperswithcode.com
Updated Sep 26, 2004
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2004). PTB Diagnostic ECG Database Dataset [Dataset]. https://paperswithcode.com/dataset/ptb
Explore at:
Dataset updated
Sep 26, 2004
Description
The ECGs in this collection were obtained using a non-commercial, PTB prototype recorder with the following specifications:

16 input channels, (14 for ECGs, 1 for respiration, 1 for line voltage) Input voltage: ±16 mV, compensated offset voltage up to ± 300 mV Input resistance: 100 Ω (DC) Resolution: 16 bit with 0.5 μV/LSB (2000 A/D units per mV) Bandwidth: 0 - 1 kHz (synchronous sampling of all channels) Noise voltage: max. 10 μV (pp), respectively 3 μV (RMS) with input short circuit Online recording of skin resistance Noise level recording during signal collection The database contains 549 records from 290 subjects (aged 17 to 87, mean 57.2; 209 men, mean age 55.5, and 81 women, mean age 61.6; ages were not recorded for 1 female and 14 male subjects). Each subject is represented by one to five records. There are no subjects numbered 124, 132, 134, or 161. Each record includes 15 simultaneously measured signals: the conventional 12 leads (i, ii, iii, avr, avl, avf, v1, v2, v3, v4, v5, v6) together with the 3 Frank lead ECGs (vx, vy, vz). Each signal is digitized at 1000 samples per second, with 16 bit resolution over a range of ± 16.384 mV. On special request to the contributors of the database, recordings may be available at sampling rates up to 10 KHz.

Within the header (.hea) file of most of these ECG records is a detailed clinical summary, including age, gender, diagnosis, and where applicable, data on medical history, medication and interventions, coronary artery pathology, ventriculography, echocardiography, and hemodynamics. The clinical summary is not available for 22 subjects.
R
Heart Ecg Images Dataset
universe.roboflow.com
zip
Updated Dec 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Devanshu (2023). Heart Ecg Images Dataset [Dataset]. https://universe.roboflow.com/devanshu-y1oux/heart-ecg-images
Explore at:
zipAvailable download formats
Dataset updated
Dec 23, 2023
Dataset authored and provided by
Devanshu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Ecg Images
Description
Heart Ecg Images

## Overview Heart Ecg Images is a dataset for classification tasks - it contains Ecg Images annotations for 491 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Facebook

Twitter

Click to copy link

Link copied

Cite

Antonio H. Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Derick M. Oliveira; Paulo R. Gomes; Jéssica A. Canazart; Milton P. Ferreira; Carl R. Andersson; Peter W. Macfarlane; Wagner Meira Jr.; Thomas B. Schön; Antonio Luiz P. Ribeiro (2025). CODE dataset [Dataset]. http://doi.org/10.17044/scilifelab.15169716.v1

CODE dataset

Explore at:

Unique identifier

https://doi.org/10.17044/scilifelab.15169716.v1

Dataset updated

Feb 27, 2025

Dataset provided by

Uppsala University & UFMG

Authors

License

https://www.scilifelab.se/data/restricted-access/https://www.scilifelab.se/data/restricted-access/

Description

Dataset with annotated 12-lead ECG records. The exams were taken in 811 counties in the state of Minas Gerais/Brazil by the Telehealth Network of Minas Gerais (TNMG) between 2010 and 2016. And organized by the CODE (Clinical outcomes in digital electrocardiography) group.Requesting accessResearchers affiliated to educational or research institutions might make requests to access this data dataset. Requests will be analyzed on an individual basis and should contain: Name of PI and host organisation; Contact details (including your name and email); and, the scientific purpose of data access request.If approved, a data user agreement will be forwarded to the researcher that made the request (through the email that was provided). After the agreement has been signed (by the researcher or by the research institution) access to the dataset will be granted.Openly available subset:A subset of this dataset (with 15% of the patients) is openly available. See: "CODE-15%: a large scale annotated dataset of 12-lead ECGs" https://doi.org/10.5281/zenodo.4916206.ContentThe folder contains: A column separated file containing basic patient attributes. The ECG waveforms in the wfdb format.Additional referencesThe dataset is described in the paper "Automatic diagnosis of the 12-lead ECG using a deep neural network". https://www.nature.com/articles/s41467-020-15432-4. Related publications also using this dataset are:- [1] G. Paixao et al., “Validation of a Deep Neural Network Electrocardiographic-Age as a Mortality Predictor: The CODE Study,” Circulation, vol. 142, no. Suppl_3, pp. A16883–A16883, Nov. 2020, doi: 10.1161/circ.142.suppl_3.16883.- [2] A. L. P. Ribeiro et al., “Tele-electrocardiography and bigdata: The CODE (Clinical Outcomes in Digital Electrocardiography) study,” Journal of Electrocardiology, Sep. 2019, doi: 10/gf7pwg.- [3] D. M. Oliveira, A. H. Ribeiro, J. A. O. Pedrosa, G. M. M. Paixao, A. L. P. Ribeiro, and W. Meira Jr, “Explaining end-to-end ECG automated diagnosis using contextual features,” in Machine Learning and Knowledge Discovery in Databases. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Ghent, Belgium, Sep. 2020, vol. 12461, pp. 204--219. doi: 10.1007/978-3-030-67670-4_13.- [4] D. M. Oliveira, A. H. Ribeiro, J. A. O. Pedrosa, G. M. M. Paixao, A. L. Ribeiro, and W. M. Jr, “Explaining black-box automated electrocardiogram classiﬁcation to cardiologists,” in 2020 Computing in Cardiology (CinC), 2020, vol. 47. doi: 10.22489/CinC.2020.452.- [5] G. M. M. Paixão et al., “Evaluation of mortality in bundle branch block patients from an electronic cohort: Clinical Outcomes in Digital Electrocardiography (CODE) study,” Journal of Electrocardiology, Sep. 2019, doi: 10/dcgk.- [6] G. M. M. Paixão et al., “Evaluation of Mortality in Atrial Fibrillation: Clinical Outcomes in Digital Electrocardiography (CODE) Study,” Global Heart, vol. 15, no. 1, p. 48, Jul. 2020, doi: 10.5334/gh.772.- [7] G. M. M. Paixão et al., “Electrocardiographic Predictors of Mortality: Data from a Primary Care Tele-Electrocardiography Cohort of Brazilian Patients,” Hearts, vol. 2, no. 4, Art. no. 4, Dec. 2021, doi: 10.3390/hearts2040035.- [8] G. M. Paixão et al., “ECG-AGE FROM ARTIFICIAL INTELLIGENCE: A NEW PREDICTOR FOR MORTALITY? THE CODE (CLINICAL OUTCOMES IN DIGITAL ELECTROCARDIOGRAPHY) STUDY,” Journal of the American College of Cardiology, vol. 75, no. 11 Supplement 1, p. 3672, 2020, doi: 10.1016/S0735-1097(20)34299-6.- [9] E. M. Lima et al., “Deep neural network estimated electrocardiographic-age as a mortality predictor,” Nature Communications, vol. 12, 2021, doi: 10.1038/s41467-021-25351-7.- [10] W. Meira Jr, A. L. P. Ribeiro, D. M. Oliveira, and A. H. Ribeiro, “Contextualized Interpretable Machine Learning for Medical Diagnosis,” Communications of the ACM, 2020, doi: 10.1145/3416965.- [11] A. H. Ribeiro et al., “Automatic diagnosis of the 12-lead ECG using a deep neural network,” Nature Communications, vol. 11, no. 1, p. 1760, 2020, doi: 10/drkd.- [12] A. H. Ribeiro et al., “Automatic Diagnosis of Short-Duration 12-Lead ECG using a Deep Convolutional Network,” Machine Learning for Health (ML4H) Workshop at NeurIPS, 2018.- [13] A. H. Ribeiro et al., “Automatic 12-lead ECG classiﬁcation using a convolutional network ensemble,” 2020. doi: 10.22489/CinC.2020.130.- [14] V. Sangha et al., “Automated Multilabel Diagnosis on Electrocardiographic Images and Signals,” medRxiv, Sep. 2021, doi: 10.1101/2021.09.22.21263926.- [15] S. Biton et al., “Atrial fibrillation risk prediction from the 12-lead ECG using digital biomarkers and deep representation learning,” European Heart Journal - Digital Health, 2021, doi: 10.1093/ehjdh/ztab071.Code:The following github repositories perform analysis that use this dataset:- https://github.com/antonior92/automatic-ecg-diagnosis- https://github.com/antonior92/ecg-age-predictionRelated Datasets:- CODE-test: An annotated 12-lead ECG dataset (https://doi.org/10.5281/zenodo.3765780)- CODE-15%: a large scale annotated dataset of 12-lead ECGs (https://doi.org/10.5281/zenodo.4916206)- Sami-Trop: 12-lead ECG traces with age and mortality annotations (https://doi.org/10.5281/zenodo.4905618)Ethics declarationsThe CODE Study was approved by the Research Ethics Committee of the Universidade Federal de Minas Gerais, protocol 49368496317.7.0000.5149.

Clear search

Close search

Google apps

Main menu

CODE dataset

CODE-15%: a large scale annotated dataset of 12-lead ECGs

CODE-test: An annotated 12-lead ECG dataset

Data from: TELE ECG Database: 250 telehealth ECG records (collected using...

PTB-XL ECG dataset

Abstract

Background

Methods

Data Acquisition

Data Preprocessing

Data Description

ECG Heartbeat Categorization Dataset Dataset

Surface electrocardiogram (ECG) dataset recorded during relaxation in 70...

ECG in High Intensity Exercise Dataset

ECG SIGNALS

Dataset

Contents

MedalCare-XL Dataset

ECG-DATASET

Dataset

Contents

Ecg Signal Classification Dataset

ECG Signal Classification

PTB XL Dataset - Reformatted

Abstract

Background

Method

Dataset Details

Files

How to work with pickle files:

Columns

deepfake-ecg-small

Non-Contact ECG

MIMIC-IV-ECG Dataset

FSL_ECG_QA_Dataset

Ecg Classification Dataset

ECG Classification

PTB Diagnostic ECG Database Dataset

Heart Ecg Images Dataset

Heart Ecg Images

CODE dataset