3 datasets found
  1. P

    PhysioNet Challenge 2020 Dataset

    • paperswithcode.com
    Updated Dec 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erick A. Perez Alday; Annie Gu; Amit Shah; Chad Robichaux; An-Kwok Ian Wong; Chengyu Liu; Feifei Liu; Ali Bahrami Rad; Andoni Elola; Salman Seyedi; Qiao Li; ASHISH SHARMA; Gari D. Clifford; Matthew A. Reyna (2020). PhysioNet Challenge 2020 Dataset [Dataset]. https://paperswithcode.com/dataset/physionet-challenge-2020
    Explore at:
    Dataset updated
    Dec 30, 2020
    Authors
    Erick A. Perez Alday; Annie Gu; Amit Shah; Chad Robichaux; An-Kwok Ian Wong; Chengyu Liu; Feifei Liu; Ali Bahrami Rad; Andoni Elola; Salman Seyedi; Qiao Li; ASHISH SHARMA; Gari D. Clifford; Matthew A. Reyna
    Description

    Data The data for this Challenge are from multiple sources: CPSC Database and CPSC-Extra Database INCART Database PTB and PTB-XL Database The Georgia 12-lead ECG Challenge (G12EC) Database Undisclosed Database The first source is the public (CPSC Database) and unused data (CPSC-Extra Database) from the China Physiological Signal Challenge in 2018 (CPSC2018), held during the 7th International Conference on Biomedical Engineering and Biotechnology in Nanjing, China. The unused data from the CPSC2018 is NOT the test data from the CPSC2018. The test data of the CPSC2018 is included in the final private database that has been sequestered. This training set consists of two sets of 6,877 (male: 3,699; female: 3,178) and 3,453 (male: 1,843; female: 1,610) of 12-ECG recordings lasting from 6 seconds to 60 seconds. Each recording was sampled at 500 Hz.

    The second source set is the public dataset from St Petersburg INCART 12-lead Arrhythmia Database. This database consists of 74 annotated recordings extracted from 32 Holter records. Each record is 30 minutes long and contains 12 standard leads, each sampled at 257 Hz.

    The third source from the Physikalisch Technische Bundesanstalt (PTB) comprises two public databases: the PTB Diagnostic ECG Database and the PTB-XL, a large publicly available electrocardiography dataset. The first PTB database contains 516 records (male: 377, female: 139). Each recording was sampled at 1000 Hz. The PTB-XL contains 21,837 clinical 12-lead ECGs (male: 11,379 and female: 10,458) of 10 second length with a sampling frequency of 500 Hz.

    The fourth source is a Georgia database which represents a unique demographic of the Southeastern United States. This training set contains 10,344 12-lead ECGs (male: 5,551, female: 4,793) of 10 second length with a sampling frequency of 500 Hz.

    The fifth source is an undisclosed American database that is geographically distinct from the Georgia database. This source contains 10,000 ECGs (all retained as test data).

    All data is provided in WFDB format. Each ECG recording has a binary MATLAB v4 file (see page 27) for the ECG signal data and a text file in WFDB header format describing the recording and patient attributes, including the diagnosis (the labels for the recording). The binary files can be read using the load function in MATLAB and the scipy.io.loadmat function in Python; please see our baseline models for examples of loading the data. The first line of the header provides information about the total number of leads and the total number of samples or points per lead. The following lines describe how each lead was saved, and the last lines provide information on demographics and diagnosis. Below is an example header file A0001.hea:

    A0001 12 500 7500 05-Feb-2020 11:39:16
    A0001.mat 16+24 1000/mV 16 0 28 -1716 0 I
    A0001.mat 16+24 1000/mV 16 0 7 2029 0 II
    A0001.mat 16+24 1000/mV 16 0 -21 3745 0 III
    A0001.mat 16+24 1000/mV 16 0 -17 3680 0 aVR
    A0001.mat 16+24 1000/mV 16 0 24 -2664 0 aVL
    A0001.mat 16+24 1000/mV 16 0 -7 -1499 0 aVF
    A0001.mat 16+24 1000/mV 16 0 -290 390 0 V1
    A0001.mat 16+24 1000/mV 16 0 -204 157 0 V2
    A0001.mat 16+24 1000/mV 16 0 -96 -2555 0 V3
    A0001.mat 16+24 1000/mV 16 0 -112 49 0 V4
    A0001.mat 16+24 1000/mV 16 0 -596 -321 0 V5
    A0001.mat 16+24 1000/mV 16 0 -16 -3112 0 V6
    
    Age: 74
    Sex: Male
    Dx: 426783006
    Rx: Unknown
    Hx: Unknown
    Sx: Unknown
    

    From the first line, we see that the recording number is A0001, and the recording file is A0001.mat. The recording has 12 leads, each recorded at 500 Hz sample frequency, and contains 7500 samples. From the next 12 lines, we see that each signal was written at 16 bits with an offset of 24 bits, the amplitude resolution is 1000 with units in mV, the resolution of the analog-to-digital converter (ADC) used to digitize the signal is 16 bits, and the baseline value corresponding to 0 physical units is 0. The first value of the signal, the checksum, and the lead name are included for each signal. From the final 6 lines, we see that the patient is a 74-year-old male with a diagnosis (Dx) of 426783006. The medical prescription (Rx), history (Hx), and symptom or surgery (Sx) are unknown.

    Each ECG recording has one or more labels from different type of abnormalities in SNOMED-CT codes. The full list of diagnoses for the challenge has been posted here as a 3 column CSV file: Long-form description, corresponding SNOMED-CT code, abbreviation. Although these descriptions apply to all training data there may be fewer classes in the test data, and in different proportions. However, every class in the test data will be represented in the training data.

  2. Z

    Data from: Simulated Arterial Pulse Waves Database (preliminary version)

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vennin, Samuel (2020). Simulated Arterial Pulse Waves Database (preliminary version) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3296510
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Vennin, Samuel
    Mariscal Harana, Jorge
    Alastruey, Jordi
    Peter H Charlton
    Chowienczyk, Phil
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    This provides a brief overview of the database. Further details are provided at: https://peterhcharlton.github.io/pwdb/ppwdb.html

    Background: The shape of the arterial pulse wave (PW) is a rich source of information on cardiovascular (CV) health, since it is influenced by both the heart and the vasculature. Consequently, many algorithms have been proposed to estimate clinical parameters from PWs. However, it is difficult and costly to acquire comprehensive datasets with which to assess their performance. We are aiming to address this difficulty by creating a database of simulated PWs under a range of CV conditions, representative of a healthy population. The database provided here is an initial version which has already been used to gain some novel insights into haemodynamics.

    Methods: Baseline PWs were simulated using 1D computational modelling. CV model parameters were varied across normal healthy ranges to simulate a sample of subjects for each age decade from 25 to 75 years. The model was extended to simulate photoplethysmographic (PPG) PWs at common measurement sites, in addition to the pressure (ABP), flow rate (Q), flow velocity (U) and diameter (D) PWs produced by the model.

    Validation: The database was verified by comparing simulated PWs with in vivo PWs. Good agreement was observed, with age-related changes in blood pressure and wave morphology well reproduced.

    Conclusion: This database is a valuable resource for development and pre-clinical assessment of PW analysis algorithms. It is particularly useful because it contains several types of PWs at multiple measurement sites, and the exact CV conditions which generated each PW are known.

    Future work: However, there are two limitations: (i) the database does not exhibit the wide variation in cardiovascular properties observed across a population sample; and (ii) the methods used to model changes with age have been improved since creating this initial version. Therefore, we are currently creating a more comprehensive database which addresses these limitations.

    Accompanying Presentation: This database was originally presented at the BioMedEng18 Conference. The presentation describing the methods for creating the database, and providing an introduction to the database, is available at: https://www.youtube.com/watch?v=X8aPZFs8c08 . The accompanying abstract is available here.

    Accompanying Manual: Further information on how to use the PWDB datasets, including this preliminary dataset, are provided in the user manual. Further details on the contents of the dataset files are available here.

    Citation: When using this dataset please cite this publication:

    Charlton P.H. et al. Modelling arterial pulse wave propagation during healthy ageing, In World Congress of Biomechanics 2018, Dublin, Ireland, 2018.

    Version History:

    • v.1.0: Originally uploaded to PhysioNet. This is the version which was used in the accompanying presentation.

    • v.2.0: The initial upload to this DOI. The database was curated using the PWDB Algorithms v.0.1.1. It differs slightly from the originally reported version in that: (i) the augmentation pressure and index were calculated at the aortic root rather than the carotid artery.

    Text adapted from: Charlton P.H. et al., 'A database for the development of pulse wave analysis algorithms', BioMedEng18, London, 2018.

  3. P

    CPSC2021 Dataset

    • paperswithcode.com
    Updated Mar 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). CPSC2021 Dataset [Dataset]. https://paperswithcode.com/dataset/cpsc2021
    Explore at:
    Dataset updated
    Mar 21, 2021
    Description

    Introduction The 4th China Physiological Signal Challenge 2021 (CPSC 2021) aims to encourage the development of algorithms for searching the paroxysmal atrial fibrillation (PAF) events from dynamic ECG recordings.

    ECG signal provides an important role in non-invasively monitoring and clinical diagnosis for cardiovascular disease (CVD). AF is the most frequent arrhythmia, but PAF often remains unrecognized[1, 2]. Early screening and early detection of paroxysmal AF are particularly important. It is of great value for AF surgery options, drug intervention, and the diagnosis and treatment of various clinical complications [3].

    Although accurate detection of paroxysmal AF is very important, there is currently no algorithm that can efficiently measure the onsets and offsets of AF events in dynamic or wearable ECGs [4]. Previous AF detection algorithms usually focus on the classification of AF rhythm, such as entropy feature-based [5, 6] or machine learning-based methods [7, 8], without the location of onsets and offsets of AF events. Thus, the clinical significance for the personalized treatment and management of AF patients is limited. In clinical applications, other abnormal rhythms can significantly influence the accurate identification of AF rhythm. In this year’s challenge, we focus on the detection of paroxysmal AF events from dynamic ECGs. A new dynamic ECG database containing episodes with totally or partly AF rhythm, or non-AF rhythm was constructed, to encourage the development of the more efficient and robust algorithms for paroxysmal AF event detection.

    Challenge Data Data are recorded from 12-lead Holter or 3-lead wearable ECG monitoring devices. Challenge data provides variable-length ECG records fragments extracted from lead I and lead II of the long-term dynamic ECGs, each sampled at 200 Hz. In order to avoid ambiguity in annotation, an AF event is limited to contain no less than 5 heart beats. The training set in the 1st stage consists of 730 records, extracted from the Holter records from 10 AF patients (5 PAF patients) and 39 non-AF patients (usually including other abnormal and normal rhythms). The training set in the 2nd stage consists of 706 records from 37 AF patients (18 PAF patients) and 14 non-AF patients. The test set comprises data from the same source as the training set as well as different data source. We ensure that at least one test subset was collected by a different ECG monitoring system compared with the training set. Same as in previous years, we are not planning to release the test set at any point. All data is provided in WFDB format and the annotations are standardized according to PhysioBank Annotations (link: https://archive.physionet.org/physiobank/annotations.shtml). The annotation includes the beat annotations (R peak location and beat type), the rhythm annotations (rhythm change flag and rhythm type) and the diagnosis of the global rhythm. Please refer to the example code entry (link: https://github.com/CPSC-Committee/cpsc2021-python-entry ) of the challenge for specific data and label load functions. Note that the flag of atrial fibrillation and atrial flutter (‘AFIB’ and ‘AFL’) in annotated information are seemed as the same type when scoring the method. Please download the training data from here ( Training Set I and Training Set II).

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Erick A. Perez Alday; Annie Gu; Amit Shah; Chad Robichaux; An-Kwok Ian Wong; Chengyu Liu; Feifei Liu; Ali Bahrami Rad; Andoni Elola; Salman Seyedi; Qiao Li; ASHISH SHARMA; Gari D. Clifford; Matthew A. Reyna (2020). PhysioNet Challenge 2020 Dataset [Dataset]. https://paperswithcode.com/dataset/physionet-challenge-2020

PhysioNet Challenge 2020 Dataset

Explore at:
34 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Dec 30, 2020
Authors
Erick A. Perez Alday; Annie Gu; Amit Shah; Chad Robichaux; An-Kwok Ian Wong; Chengyu Liu; Feifei Liu; Ali Bahrami Rad; Andoni Elola; Salman Seyedi; Qiao Li; ASHISH SHARMA; Gari D. Clifford; Matthew A. Reyna
Description

Data The data for this Challenge are from multiple sources: CPSC Database and CPSC-Extra Database INCART Database PTB and PTB-XL Database The Georgia 12-lead ECG Challenge (G12EC) Database Undisclosed Database The first source is the public (CPSC Database) and unused data (CPSC-Extra Database) from the China Physiological Signal Challenge in 2018 (CPSC2018), held during the 7th International Conference on Biomedical Engineering and Biotechnology in Nanjing, China. The unused data from the CPSC2018 is NOT the test data from the CPSC2018. The test data of the CPSC2018 is included in the final private database that has been sequestered. This training set consists of two sets of 6,877 (male: 3,699; female: 3,178) and 3,453 (male: 1,843; female: 1,610) of 12-ECG recordings lasting from 6 seconds to 60 seconds. Each recording was sampled at 500 Hz.

The second source set is the public dataset from St Petersburg INCART 12-lead Arrhythmia Database. This database consists of 74 annotated recordings extracted from 32 Holter records. Each record is 30 minutes long and contains 12 standard leads, each sampled at 257 Hz.

The third source from the Physikalisch Technische Bundesanstalt (PTB) comprises two public databases: the PTB Diagnostic ECG Database and the PTB-XL, a large publicly available electrocardiography dataset. The first PTB database contains 516 records (male: 377, female: 139). Each recording was sampled at 1000 Hz. The PTB-XL contains 21,837 clinical 12-lead ECGs (male: 11,379 and female: 10,458) of 10 second length with a sampling frequency of 500 Hz.

The fourth source is a Georgia database which represents a unique demographic of the Southeastern United States. This training set contains 10,344 12-lead ECGs (male: 5,551, female: 4,793) of 10 second length with a sampling frequency of 500 Hz.

The fifth source is an undisclosed American database that is geographically distinct from the Georgia database. This source contains 10,000 ECGs (all retained as test data).

All data is provided in WFDB format. Each ECG recording has a binary MATLAB v4 file (see page 27) for the ECG signal data and a text file in WFDB header format describing the recording and patient attributes, including the diagnosis (the labels for the recording). The binary files can be read using the load function in MATLAB and the scipy.io.loadmat function in Python; please see our baseline models for examples of loading the data. The first line of the header provides information about the total number of leads and the total number of samples or points per lead. The following lines describe how each lead was saved, and the last lines provide information on demographics and diagnosis. Below is an example header file A0001.hea:

A0001 12 500 7500 05-Feb-2020 11:39:16
A0001.mat 16+24 1000/mV 16 0 28 -1716 0 I
A0001.mat 16+24 1000/mV 16 0 7 2029 0 II
A0001.mat 16+24 1000/mV 16 0 -21 3745 0 III
A0001.mat 16+24 1000/mV 16 0 -17 3680 0 aVR
A0001.mat 16+24 1000/mV 16 0 24 -2664 0 aVL
A0001.mat 16+24 1000/mV 16 0 -7 -1499 0 aVF
A0001.mat 16+24 1000/mV 16 0 -290 390 0 V1
A0001.mat 16+24 1000/mV 16 0 -204 157 0 V2
A0001.mat 16+24 1000/mV 16 0 -96 -2555 0 V3
A0001.mat 16+24 1000/mV 16 0 -112 49 0 V4
A0001.mat 16+24 1000/mV 16 0 -596 -321 0 V5
A0001.mat 16+24 1000/mV 16 0 -16 -3112 0 V6

Age: 74
Sex: Male
Dx: 426783006
Rx: Unknown
Hx: Unknown
Sx: Unknown

From the first line, we see that the recording number is A0001, and the recording file is A0001.mat. The recording has 12 leads, each recorded at 500 Hz sample frequency, and contains 7500 samples. From the next 12 lines, we see that each signal was written at 16 bits with an offset of 24 bits, the amplitude resolution is 1000 with units in mV, the resolution of the analog-to-digital converter (ADC) used to digitize the signal is 16 bits, and the baseline value corresponding to 0 physical units is 0. The first value of the signal, the checksum, and the lead name are included for each signal. From the final 6 lines, we see that the patient is a 74-year-old male with a diagnosis (Dx) of 426783006. The medical prescription (Rx), history (Hx), and symptom or surgery (Sx) are unknown.

Each ECG recording has one or more labels from different type of abnormalities in SNOMED-CT codes. The full list of diagnoses for the challenge has been posted here as a 3 column CSV file: Long-form description, corresponding SNOMED-CT code, abbreviation. Although these descriptions apply to all training data there may be fewer classes in the test data, and in different proportions. However, every class in the test data will be represented in the training data.

Search
Clear search
Close search
Google apps
Main menu