Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.
The PTB-XL ECG dataset is a large dataset of 21837 clinical 12-lead ECGs from 18885 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.
The waveform data underlying the PTB-XL ECG dataset was collected with devices from Schiller AG over the course of nearly seven years between October 1989 and June 1996. With the acquisition of the original database from Schiller AG, the full usage rights were transferred to the PTB. The records were curated and converted into a structured database within a long-term project at the Physikalisch-Technische Bundesanstalt (PTB). The database was used in a number of publications, see e.g. [1,2], but the access remained restricted until now. The Institutional Ethics Committee approved the publication of the anonymous data in an open-access database (PTB-2020-1). During the public release process in 2019, the existing database was streamlined with particular regard to usability and accessibility for the machine learning community. Waveform and metadata were converted to open data formats that can easily processed by standard software.
This dataset is generated by processing the raw dataset with this notebook.
train_12_lead_ecgs.pkl
- ECG signals as pickled numpy format in train set.valid_12_lead_ecgs.pkl
- ECG signals as pickled numpy format in valid set.test_12_lead_ecgs.pkl
- ECG signals as pickled numpy format in test set.train_table.csv
- patient's meta features and ECG diagnosis in train set.valid_table.csv
- patient's meta features and ECG diagnosis in valid set.test_table.csv
- patient's meta features and ECG diagnosis in test set.import pandas as pd
train_ecgs = pd.read_pickle('train_12_lead_ecgs.pkl')
# train_ecgs is of shape (number of ECG records, 1000, 12)
# 1000 is signal data points for each ECG record
# 12 stands for 12-channel from 12-lead
ecg_id
- ID used in the raw data from: https://www.kaggle.com/khyeh0719/ptb-xl-dataset and paperstrat_fold
- stratified fold as suggested from the paperage
, sex
, height
, weight
, nurse
, site
, device
- patient's informationNORM
- Diagnosis for normal ECGMI
- Diagnosis for Myocardial Infarction, a myocardial infarction (MI), commonly known as a heart attack, occurs when blood flow decreases or stops to a part of the heart, causing damage to the heart muscle.STTC
- Diagnosis for ST/T Change, ST and T wave changes may represent cardiac pathology or be a normal variant. Interpretation of the findings, therefore, depends on the clinical context and presence of similar findings on prior electrocardiogramsCD
- Diagnosis for Conduction Disturbance. Your heart rhythm is the way your heartbeats. Conduction is how electrical impulses travel through your heart, which causes it to beat. Some conduction disorders can cause arrhythmias or irregular heartbeats.HYP
- Diagnosis for Hypertrophy, Hypertrophic cardiomyopathy (HCM) is a disease in which the heart muscle becomes abnormally thick (hypertrophied). The thickened heart muscle can make it harder for the heart to pump blood. sub_
- Columns with the 'sub_' prefix are more detailed diagnosis for ECG.Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Source: https://physionet.org/content/ptb-xl/1.0.1/
Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.
The PTB-XL ECG dataset is a large dataset of 21837 clinical 12-lead ECGs from 18885 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.
The waveform data underlying the PTB-XL ECG dataset was collected with devices from Schiller AG over the course of nearly seven years between October 1989 and June 1996. With the acquisition of the original database from Schiller AG, the full usage rights were transferred to the PTB. The records were curated and converted into a structured database within a long-term project at the Physikalisch-Technische Bundesanstalt (PTB). The database was used in a number of publications, see e.g. [1,2], but the access remained restricted until now. The Institutional Ethics Committee approved the publication of the anonymous data in an open-access database (PTB-2020-1). During the public release process in 2019, the existing database was streamlined with particular regard to usability and accessibility for the machine learning community. Waveform and metadata were converted to open data formats that can easily processed by standard software.
heart_axis
) and infarction stadium (infarction_stadium1
and infarction_stadium2
, if present) were extracted.ECGs and patients are identified by unique identifiers (ecg_id
and patient_id
). Personal information in the metadata, such as names of validating cardiologists, nurses and recording site (hospital etc.) of the recording was pseudonymized. The date of birth only as age at the time of the ECG recording, where ages of more than 89 years appear in the range of 300 years in compliance with HIPAA standards. Furthermore, all ECG recording dates were shifted by a random offset for each patient. The ECG statements used for annotating the records follow the SCP-ECG standard [3].
In general, the dataset is organized as follows:
ptbxl
├── ptbxl_database.csv
├── scp_statements.csv
├── records100
├── 00000
│ │ ├── 00001_lr.dat
│ │ ├── 00001_lr.hea
│ │ ├── ...
│ │ ├── 00999_lr.dat
│ │ └── 00999_lr.hea
│ ├── ...
│ └── 21000
│ ├── 21001_lr.dat
│ ├── 21001_lr.hea
│ ├── ...
│ ├── 21837_lr.dat
│ └── 21837_lr.hea
└── records500
├── 00000
│ ├── 00001_hr.dat
│ ├── 00001_hr.hea
│ ├── ...
│ ├── 00999_hr.dat
│ └── 00999_hr.hea
├── ...
└── 21000
├── 21001_hr.dat
├── 21001_hr.hea
├── ...
├── 21837_hr.dat
└── 21837_hr.hea
The dataset comprises 21837 clinical 12-lead ECG records of 10 seconds length from 18885 patients, where 52% are male and 48% are female with ages covering the whole range from 0 to 95 years (median 62 and interquantile range of 22). The value of the dataset results from the comprehensive collection of many different co-occurring path...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.
The PTB-XL ECG dataset is a large dataset of 21799 clinical 12-lead ECGs from 18869 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.
Mechanistic cardiac electrophysiology models allow for personalized simulations of the electrical activity in the heart and the ensuing electrocardiogram (ECG) on the body surface. As such, synthetic signals possess precisely known ground truth labels of the underlying disease (model parameterization) and can be employed for validation of machine learning ECG analysis tools in addition to clinical signals. Recently, synthetic ECG signals were used to enrich sparse clinical data for machine learning or even replace them completely during training leading to good performance on real-world clinical test data.
We thus generated a large synthetic database comprising a total of 16,900 12~lead ECGs based on multi-scale electrophysiological simulations equally distributed into 1 normal healthy control and 7 pathology classes. The pathological case of myocardial infraction had 6 sub-classes. A comparison of extracted timing and amplitude features between the virtual cohort and a large publicly available clinical ECG database demonstrated that the synthetic signals represent clinical ECGs for healthy and pathological subpopulations with high fidelity. The novel dataset of simulated ECG signals is split into training, validation and test data folds for development of novel machine learning algorithms and their objective assessment.
This folder WP2_largeDataset_Noise contains the 12-lead ECGs of 10 seconds length. Each ECG is stored in a separate CSV file with one row per lead (lead order: I, II, III, aVR, aVL, aVF, V1-V6) and one sample per column (sampling rate: 500Hz). Data are split by pathologies (avblock = AV block, lbbb = left bundle branch block, rbbb = right bundle branch block, sinus = normal sinus rhythm, lae = left atrial enlargement, fam = fibrotic atrial cardiomyopathy, iab = interatrial conduction block, mi = myocardial infarction). MI data are further split into subclasses depending on the occlusion site (LAD, LCX, RCA) and transmurality (0.3 or 1.0). Each pathology subclass contains training, validation and testing data (~ 70/15/15 split). Training, validation and testing datasets were defined according to the model with which QRST complexes were simulated, i.e., ECGs calculated with the same anatomical model but different electrophysiological parameters are only present in one of the test, validation and training datasets but never in multiple. Each subfolder also contains a "siginfo.csv" file specifying the respective simulation run for the P wave and the QRST segment that was used to synthesize the 10 second ECG segment. Each signal is available in three variations: run_raw.csv contains the synthesized ECG without added noise and without filtering runnoise.csv contains the synthesized ECG (unfiltered) with superimposed noise run*_filtered.csv contains the filtered synthesized ECG (fiter settings: highpass cutoff frequency 0.5Hz, lowpass cutoff frequency 150Hz, butterworth filters of order 3).
The folder WP2_largeDataset_ParameterFiles contains the parameter files used to simulate the 12-lead ECGs. Parameters are split for atrial and ventricular simulations, which were run independently from one another. See Gillette, Gsell, Nagel* et al. "MedalCare-XL: 16,900 healthy and pathological electrocardiograms obtained through multi-scale electrophysiological models" for a description of the model parameters.
📈 Daily Historical Stock Price Data for Destination XL Group, Inc. (1987–2025)
A clean, ready-to-use dataset containing daily stock prices for Destination XL Group, Inc. from 1987-06-02 to 2025-05-28. This dataset is ideal for use in financial analysis, algorithmic trading, machine learning, and academic research.
🗂️ Dataset Overview
Company: Destination XL Group, Inc. Ticker Symbol: DXLG Date Range: 1987-06-02 to 2025-05-28 Frequency: Daily Total Records: 9571 rows… See the full description on the dataset page: https://huggingface.co/datasets/khaledxbenali/daily-historical-stock-price-data-for-destination-xl-group-inc-19872025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Training data used to train fragment ion intensity Prosit-XL models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CPSC 2018The first dataset is a preprocessed version of the CPSC 2018 dataset, which contains 6877 ECG recordings. We preprocessed the dataset by resampling the ECG signals to 250 Hz and equalizing the ECG signal length to 60 seconds, yielding a signal length of T=15,000 data points per recording.For the hyperparameter study, we employed a fixed train-valid-test split with ratio 60-20-20, while for the final evaluations, including the comparison with the state-of-the-art methods and ablation studies, we used a 10-fold cross-validation strategy.The raw CPSC 2018 dataset can be downloaded from the website of the PhysioNet/Computing in Cardiology Challenge 2020.(License: Creative Commons Attribution 4.0 International Public License).PTB-XL (Super-Diag.)The second dataset is a pre-processed version of PTB-XL, a large multi-label dataset of 21,799 clinical 12-lead ECG records of 10 seconds each. PTB-XL contains 71 ECG statements, categorized into 44 diagnostic, 19 form, and 12 rhythmic classes. In addition, the diagnostic category can be divided into 24 sub- and 5 coarse-grained super-classes. In our pre-processed version, we utilize the super-diagnostic labels for classification and the recommended train-valid-test splits, sampled at 100 Hz. We select only samples with at least one label in the super-diagnostic category,without applying any further preprocessing.The raw PTB-XL dataset can be downloaded from the PhysioNet/PTB-XL website.(License: Creative Commons Attribution 4.0 International Public License).
It has been shown that integrating peptide property predictions such as fragment intensity into the scoring process of peptide spectrum match can greatly increase the number of confidently identified peptides compared to using traditional scoring methods. Here, we introduce Prosit-XL, a robust and accurate fragment intensity predictor covering the cleavable (DSSO/DSBU) and non-cleavable cross-linkers (DSS/BS3), achieving high accuracy on various holdout sets with consistent performance on external datasets without fine-tuning. Due to the complex nature of false positives in XL-MS, a novel approach to data-driven rescoring was developed that benefits from Prosit-XL’s predictions while limiting the overestimation of the false discovery rate (FDR). We first evaluated this approach using two ground truth datasets (PXD029252, PXD042173) that demonstrate the accurate and precise FDR estimation. Second, we applied Prosit-XL on a proteome-scale dataset (JPST000845, PXD017711), demonstrating an up to ~3.4-fold improvement in PPI discovery compared to classic approaches. Finally, Prosit-XL was used to increase the coverage and depth of a spatially resolved interactome map of intact human cytomegalovirus virions (PXD031911), leading to the discovery of previously unobserved interactions between human and cytomegalovirus proteins.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EyeFi Dataset
This dataset is collected as a part of the EyeFi project at Bosch Research and Technology Center, Pittsburgh, PA, USA. The dataset contains WiFi CSI values of human motion trajectories along with ground truth location information captured through a camera. This dataset is used in the following paper "EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching" that is published in the IEEE International Conference on Distributed Computing in Sensor Systems 2020 (DCOSS '20). We also published a dataset paper titled as "Dataset: Person Tracking and Identification using Cameras and Wi-Fi Channel State Information (CSI) from Smartphones" in Data: Acquisition to Analysis 2020 (DATA '20) workshop describing details of data collection. Please check it out for more information on the dataset.
Clarification/Bug report: Please note that the order of antennas and subcarriers in .h5 files is not written clearly in the README.md file. The order of antennas and subcarriers are as follows for the 90 csi_real
and csi_imag
values : [subcarrier1-antenna1, subcarrier1-antenna2, subcarrier1-antenna3, subcarrier2-antenna1, subcarrier2-antenna2, subcarrier2-antenna3,… subcarrier30-antenna1, subcarrier30-antenna2, subcarrier30-antenna3]. Please see the description below. The newer version of the dataset contains this information in README.md. We are sorry for the inconvenience.
Data Collection Setup
In our experiments, we used Intel 5300 WiFi Network Interface Card (NIC) installed in an Intel NUC and Linux CSI tools [1] to extract the WiFi CSI packets. The (x,y) coordinates of the subjects are collected from Bosch Flexidome IP Panoramic 7000 panoramic camera mounted on the ceiling and Angle of Arrivals (AoAs) are derived from the (x,y) coordinates. Both the WiFi card and camera are located at the same origin coordinates but at different height, the camera is location around 2.85m from the ground and WiFi antennas are around 1.12m above the ground.
The data collection environment consists of two areas, first one is a rectangular space measured 11.8m x 8.74m, and the second space is an irregularly shaped kitchen area with maximum distances of 19.74m and 14.24m between two walls. The kitchen also has numerous obstacles and different materials that pose different RF reflection characteristics including strong reflectors such as metal refrigerators and dishwashers.
To collect the WiFi data, we used a Google Pixel 2 XL smartphone as an access point and connect the Intel 5300 NIC to it for WiFi communication. The transmission rate is about 20-25 packets per second. The same WiFi card and phone are used in both lab and kitchen area.
List of Files Here is a list of files included in the dataset:
|- 1_person |- 1_person_1.h5 |- 1_person_2.h5 |- 2_people |- 2_people_1.h5 |- 2_people_2.h5 |- 2_people_3.h5 |- 3_people |- 3_people_1.h5 |- 3_people_2.h5 |- 3_people_3.h5 |- 5_people |- 5_people_1.h5 |- 5_people_2.h5 |- 5_people_3.h5 |- 5_people_4.h5 |- 10_people |- 10_people_1.h5 |- 10_people_2.h5 |- 10_people_3.h5 |- Kitchen |- 1_person |- kitchen_1_person_1.h5 |- kitchen_1_person_2.h5 |- kitchen_1_person_3.h5 |- 3_people |- kitchen_3_people_1.h5 |- training |- shuffuled_train.h5 |- shuffuled_valid.h5 |- shuffuled_test.h5 View-Dataset-Example.ipynb README.md
In this dataset, folder 1_person/
, 2_people/
, 3_people/
, 5_people/
, and 10_people/
contains data collected from the lab area whereas Kitchen/
folder contains data collected from the kitchen area. To see how the each file is structured, please see below in section Access the data.
The training folder contains the training dataset we used to train the neural network discussed in our paper. They are generated by shuffling all the data from 1_person/
folder collected in the lab area (1_person_1.h5
and 1_person_2.h5
).
Why multiple files in one folder?
Each folder contains multiple files. For example, 1_person
folder has two files: 1_person_1.h5
and 1_person_2.h5
. Files in the same folder always have the same number of human subjects present simultaneously in the scene. However, the person who is holding the phone can be different. Also, the data could be collected through different days and/or the data collection system needs to be rebooted due to stability issue. As result, we provided different files (like 1_person_1.h5
, 1_person_2.h5
) to distinguish different person who is holding the phone and possible system reboot that introduces different phase offsets (see below) in the system.
Special note:
For 1_person_1.h5
, this file is generated by the same person who is holding the phone, and 1_person_2.h5
contains different people holding the phone but only one person is present in the area at a time. Boths files are collected in different days as well.
Access the data To access the data, hdf5 library is needed to open the dataset. There are free HDF5 viewer available on the official website: https://www.hdfgroup.org/downloads/hdfview/. We also provide an example Python code View-Dataset-Example.ipynb to demonstrate how to access the data.
Each file is structured as (except the files under "training/" folder):
|- csi_imag |- csi_real |- nPaths_1 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_2 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_3 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_4 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- num_obj |- obj_0 |- cam_aoa |- coordinates |- obj_1 |- cam_aoa |- coordinates ... |- timestamp
The csi_real
and csi_imag
are the real and imagenary part of the CSI measurements. The order of antennas and subcarriers are as follows for the 90 csi_real
and csi_imag
values : [subcarrier1-antenna1, subcarrier1-antenna2, subcarrier1-antenna3, subcarrier2-antenna1, subcarrier2-antenna2, subcarrier2-antenna3,… subcarrier30-antenna1, subcarrier30-antenna2, subcarrier30-antenna3]. nPaths_x
group are SpotFi [2] calculated WiFi Angle of Arrival (AoA) with x
number of multiple paths specified during calculation. Under the nPath_x
group are offset_xx
subgroup where xx
stands for the offset combination used to correct the phase offset during the SpotFi calculation. We measured the offsets as:
Antennas | Offset 1 (rad) | Offset 2 (rad) |
---|---|---|
1 & 2 | 1.1899 | -2.0071 |
1 & 3 | 1.3883 | -1.8129 |
The measurement is based on the work [3], where the authors state there are two possible offsets between two antennas which we measured by booting the device multiple times. The combination of the offset are used for the offset_xx
naming. For example, offset_12
is offset 1 between antenna 1 & 2 and offset 2 between antenna 1 & 3 are used in the SpotFi calculation.
The num_obj
field is used to store the number of human subjects present in the scene. The obj_0
is always the subject who is holding the phone. In each file, there are num_obj
of obj_x
. For each obj_x1
, we have the coordinates
reported from the camera and cam_aoa
, which is estimated AoA from the camera reported coordinates. The (x,y) coordinates and AoA listed here are chronologically ordered (except the files in the training
folder) . It reflects the way the person carried the phone moved in the space (for obj_0
) and everyone else walked (for other obj_y
, where y
> 0).
The timestamp
is provided here for time reference for each WiFi packets.
To access the data (Python):
import h5py
data = h5py.File('3_people_3.h5','r')
csi_real = data['csi_real'][()] csi_imag = data['csi_imag'][()]
cam_aoa = data['obj_0/cam_aoa'][()] cam_loc = data['obj_0/coordinates'][()]
For file inside training/
folder:
Files inside training folder has a different data structure:
|- nPath-1 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-2 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-3 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-4 |- aoa |- csi_imag |- csi_real |- spotfi
The group nPath-x
is the number of multiple path specified during the SpotFi calculation. aoa
is the camera generated angle of arrival (AoA) (can be considered as ground truth), csi_image
and csi_real
is the imaginary and real component of the CSI value. spotfi
is the SpotFi calculated AoA values. The SpotFi values are chosen based on the lowest median and mean error from across 1_person_1.h5
and 1_person_2.h5
. All the rows under the same nPath-x
group are aligned (i.e., first row of aoa
corresponds to the first row of csi_imag
, csi_real
, and spotfi
. There is no timestamp recorded and the sequence of the data is not chronological as they are randomly shuffled from the 1_person_1.h5
and 1_person_2.h5
files.
Citation If you use the dataset, please cite our paper:
@inproceedings{eyefi2020, title={EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching}, author={Fang, Shiwei and Islam, Tamzeed and Munir, Sirajum and Nirjon, Shahriar}, booktitle={2020 IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS)},
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Abstract: The Visibility Region (VR) information can be used to reduce the complexity in transmission design of EXtremely Large-scale massive Multiple-Input Multiple-Output (XL-MIMO) systems. Existing theoretical analysis and transmission design are mostly based on simplified VR models. In order to evaluate and analyze the performance of XL-MIMO in realistic propagation scenarios, this paper discloses a VR spatial distribution dataset for XL-MIMO systems, which is constructed by steps including environmental parameter setting, ray tracing simulation, field strength data preprocessing and VR determination. For typical urban scenarios, the dataset establishes the connections between user locations, field strength data, and VR data, with a total number of hundreds of millions of data entries. Furthermore, the VR distribution is visualized and analyzed, and a VR-based XL-MIMO user access protocol is taken as an example usecase, with its performance being evaluated with the proposed VR dataset.
Data The data for this Challenge are from multiple sources: CPSC Database and CPSC-Extra Database INCART Database PTB and PTB-XL Database The Georgia 12-lead ECG Challenge (G12EC) Database Undisclosed Database The first source is the public (CPSC Database) and unused data (CPSC-Extra Database) from the China Physiological Signal Challenge in 2018 (CPSC2018), held during the 7th International Conference on Biomedical Engineering and Biotechnology in Nanjing, China. The unused data from the CPSC2018 is NOT the test data from the CPSC2018. The test data of the CPSC2018 is included in the final private database that has been sequestered. This training set consists of two sets of 6,877 (male: 3,699; female: 3,178) and 3,453 (male: 1,843; female: 1,610) of 12-ECG recordings lasting from 6 seconds to 60 seconds. Each recording was sampled at 500 Hz.
The second source set is the public dataset from St Petersburg INCART 12-lead Arrhythmia Database. This database consists of 74 annotated recordings extracted from 32 Holter records. Each record is 30 minutes long and contains 12 standard leads, each sampled at 257 Hz.
The third source from the Physikalisch Technische Bundesanstalt (PTB) comprises two public databases: the PTB Diagnostic ECG Database and the PTB-XL, a large publicly available electrocardiography dataset. The first PTB database contains 516 records (male: 377, female: 139). Each recording was sampled at 1000 Hz. The PTB-XL contains 21,837 clinical 12-lead ECGs (male: 11,379 and female: 10,458) of 10 second length with a sampling frequency of 500 Hz.
The fourth source is a Georgia database which represents a unique demographic of the Southeastern United States. This training set contains 10,344 12-lead ECGs (male: 5,551, female: 4,793) of 10 second length with a sampling frequency of 500 Hz.
The fifth source is an undisclosed American database that is geographically distinct from the Georgia database. This source contains 10,000 ECGs (all retained as test data).
All data is provided in WFDB format. Each ECG recording has a binary MATLAB v4 file (see page 27) for the ECG signal data and a text file in WFDB header format describing the recording and patient attributes, including the diagnosis (the labels for the recording). The binary files can be read using the load function in MATLAB and the scipy.io.loadmat function in Python; please see our baseline models for examples of loading the data. The first line of the header provides information about the total number of leads and the total number of samples or points per lead. The following lines describe how each lead was saved, and the last lines provide information on demographics and diagnosis. Below is an example header file A0001.hea:
A0001 12 500 7500 05-Feb-2020 11:39:16
A0001.mat 16+24 1000/mV 16 0 28 -1716 0 I
A0001.mat 16+24 1000/mV 16 0 7 2029 0 II
A0001.mat 16+24 1000/mV 16 0 -21 3745 0 III
A0001.mat 16+24 1000/mV 16 0 -17 3680 0 aVR
A0001.mat 16+24 1000/mV 16 0 24 -2664 0 aVL
A0001.mat 16+24 1000/mV 16 0 -7 -1499 0 aVF
A0001.mat 16+24 1000/mV 16 0 -290 390 0 V1
A0001.mat 16+24 1000/mV 16 0 -204 157 0 V2
A0001.mat 16+24 1000/mV 16 0 -96 -2555 0 V3
A0001.mat 16+24 1000/mV 16 0 -112 49 0 V4
A0001.mat 16+24 1000/mV 16 0 -596 -321 0 V5
A0001.mat 16+24 1000/mV 16 0 -16 -3112 0 V6
Age: 74
Sex: Male
Dx: 426783006
Rx: Unknown
Hx: Unknown
Sx: Unknown
From the first line, we see that the recording number is A0001, and the recording file is A0001.mat. The recording has 12 leads, each recorded at 500 Hz sample frequency, and contains 7500 samples. From the next 12 lines, we see that each signal was written at 16 bits with an offset of 24 bits, the amplitude resolution is 1000 with units in mV, the resolution of the analog-to-digital converter (ADC) used to digitize the signal is 16 bits, and the baseline value corresponding to 0 physical units is 0. The first value of the signal, the checksum, and the lead name are included for each signal. From the final 6 lines, we see that the patient is a 74-year-old male with a diagnosis (Dx) of 426783006. The medical prescription (Rx), history (Hx), and symptom or surgery (Sx) are unknown.
Each ECG recording has one or more labels from different type of abnormalities in SNOMED-CT codes. The full list of diagnoses for the challenge has been posted here as a 3 column CSV file: Long-form description, corresponding SNOMED-CT code, abbreviation. Although these descriptions apply to all training data there may be fewer classes in the test data, and in different proportions. However, every class in the test data will be represented in the training data.
Web application and database designed for sharing, visualizing, and analyzing protein cross-linking mass spectrometry data with emphasis on structural analysis and quality control. Includes public and private data sharing capabilities, project based interface designed to ensure security and facilitate collaboration among multiple researchers. Used for private collaboration and public data dissemination.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Indonesian telecommunications market, valued at $17.13 billion in 2025, exhibits robust growth potential, driven by increasing smartphone penetration, rising internet usage, and the expanding adoption of digital services. A Compound Annual Growth Rate (CAGR) of 5.76% is projected from 2025 to 2033, indicating a significant market expansion. Key growth drivers include the increasing demand for high-speed mobile broadband, the proliferation of over-the-top (OTT) platforms and pay-TV services, and government initiatives promoting digital infrastructure development. The market is segmented into Voice Services (wired and wireless), Data services, and OTT/Pay TV services, with significant competition among major players such as Telkom Indonesia, Indosat Ooredoo, and XL Axiata. These companies are strategically investing in network infrastructure upgrades, 5G deployment, and the development of innovative digital solutions to cater to the evolving consumer needs. While challenges such as infrastructure limitations in remote areas and regulatory hurdles exist, the overall market outlook remains positive, fueled by Indonesia's burgeoning digital economy and its large and young population. The competitive landscape is intense, with both established players and new entrants vying for market share. Differentiation strategies involve offering bundled packages, competitive pricing, and improving network quality and coverage. The increasing adoption of cloud-based services and the growing demand for enhanced cybersecurity solutions will also shape market dynamics in the coming years. The strong focus on digital transformation across various sectors will continue to fuel demand for advanced telecommunication services. Geographic expansion within Indonesia, particularly reaching underserved areas, and strategic partnerships will be crucial for sustained growth in the sector. Market penetration of 5G technology will be a significant factor influencing future market growth. The expansion of e-commerce and the government's focus on digitalization are predicted to boost the demand for data services considerably throughout the forecast period. Recent developments include: March 2024: NEC Indonesia announced that it had signed a Memorandum of Understanding (MoU) with Telkom Indonesia to collaborate on developing smart cities in the new capital city of Ibu Kota Nusantara (IKN) and other cities in Indonesia. Under the MoU, Telkom Indonesia and NEC agreed to formulate a strategy, create a roadmap, design the architecture, and develop an implementation plan for smart city projects in Nusantara.January 2024: Aviat Networks Inc., a provider of wireless transport and access solutions, announced a strategic collaboration with PT Smartfren Telecom Tbk. This partnership was established to offer high-speed, ultra-reliable wireless connectivity, including private wireless networks for indoor and outdoor environments. Additionally, the collaboration aims to deliver industry digitalization and automation services to private network clients throughout Indonesia.. Key drivers for this market are: Increased Pace of 5G Roll Out, Digital Transformation Boosting Telecom. Potential restraints include: Increased Pace of 5G Roll Out, Digital Transformation Boosting Telecom. Notable trends are: Increased Pace of 5G Roll-out Driving the Market.
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
According to a survey conducted in Indonesia in April 2019, ** percent of respondents stated that they used Telkomsel as their mobile internet provider to browse the internet. Indosat and XL were also popular mobile internet providers in Indonesia among the respondents.
Indonesia is one of the biggest online markets worldwide. As of March 2017, online penetration in the country stood at only slightly over ** percent. Popular online activities include mobile messaging and social media.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundDepression is the most common cause of disability in the world, which affects 350 million people. University students struggle to cope with stressors that are typical of higher education institutions as well as anxiety related to education. Although evidence indicates that they have a high prevalence of depression, no reviews have been done to determine the prevalence of depression among students at Ethiopian universities comprehensively.MethodsWithout regard to time constraints, PubMed, Scopus, and EMBASE were investigated. A manual search for an article reference list was also conducted. The Meta XL software was used to extract relevant data, and the Stata-11 meta-prop package was used to analyze it. The Higgs I2 test was used to test for heterogeneity.ResultsA search of the electronic and manual systems resulted in 940 articles. Data were extracted from ten studies included in this review involving a total number of 5207 university students. The pooled prevalence of depression was 28.13% (95% CI: 22.67, 33.59). In the sub-group analysis, the average prevalence was higher in studies having a lower sample size (28.42%) than studies with a higher sample; 27.70%, and studies that utilized other (PHQ-9, HADS); 30.67% higher than studies that used BDI-II; 26.07%. Being female (pooled AOR = 5.56) (95% CI: 1.51, 9.61), being a first-year (pooled AOR = 4.78) (95% CI: 2.21, 7.36), chewing khat (pooled AOR = 2.83) (95% CI: 2.32, 3.33), alcohol use (pooled AOR = 3.12 (95% CI:3.12, 4.01) and family history of mental illness (pooled AOR = 2.57 (95% CI:2.00, 3.15) were factors significantly associated with depression.ConclusionThis systematic review and meta-analysis revealed that more than one-fourth of students at Ethiopian universities had depression. More efforts need to be done to provide better mental healthcare to university students in Ethiopia.
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
The dataset presents the collection of a diverse electrocardiogram (ECG) database for testing and evaluating ECG digitization solutions. The Powerful Medical ECG image database was curated using 100 ECG waveforms selected from the PTB-XL Digital Waveform Database and various images generated from the base waveforms with varying lead visibility and real-world paper deformations, including the use of different mobile phones, bends, crumbles, scans, and photos of computer screens with ECGs. The ECG waveforms were augmented using various techniques, including changes in contrast, brightness, perspective transformation, rotation, image blur, JPEG compression, and resolution change. This extensive approach yielded 6,000 unique entries, which provides a wide range of data variance and extreme cases to evaluate the limitations of ECG digitization solutions and improve their performance, and serves as a benchmark to evaluate ECG digitization solutions.
PM-ECG-ID database contains electrocardiogram (ECG) images and their corresponding ECG information. The data records are organized in a hierarchical folder structure, which includes metadata, waveform data, and visual data folders. The contents of each folder are described below:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.
The PTB-XL ECG dataset is a large dataset of 21837 clinical 12-lead ECGs from 18885 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.
The waveform data underlying the PTB-XL ECG dataset was collected with devices from Schiller AG over the course of nearly seven years between October 1989 and June 1996. With the acquisition of the original database from Schiller AG, the full usage rights were transferred to the PTB. The records were curated and converted into a structured database within a long-term project at the Physikalisch-Technische Bundesanstalt (PTB). The database was used in a number of publications, see e.g. [1,2], but the access remained restricted until now. The Institutional Ethics Committee approved the publication of the anonymous data in an open-access database (PTB-2020-1). During the public release process in 2019, the existing database was streamlined with particular regard to usability and accessibility for the machine learning community. Waveform and metadata were converted to open data formats that can easily processed by standard software.
This dataset is generated by processing the raw dataset with this notebook.
train_12_lead_ecgs.pkl
- ECG signals as pickled numpy format in train set.valid_12_lead_ecgs.pkl
- ECG signals as pickled numpy format in valid set.test_12_lead_ecgs.pkl
- ECG signals as pickled numpy format in test set.train_table.csv
- patient's meta features and ECG diagnosis in train set.valid_table.csv
- patient's meta features and ECG diagnosis in valid set.test_table.csv
- patient's meta features and ECG diagnosis in test set.import pandas as pd
train_ecgs = pd.read_pickle('train_12_lead_ecgs.pkl')
# train_ecgs is of shape (number of ECG records, 1000, 12)
# 1000 is signal data points for each ECG record
# 12 stands for 12-channel from 12-lead
ecg_id
- ID used in the raw data from: https://www.kaggle.com/khyeh0719/ptb-xl-dataset and paperstrat_fold
- stratified fold as suggested from the paperage
, sex
, height
, weight
, nurse
, site
, device
- patient's informationNORM
- Diagnosis for normal ECGMI
- Diagnosis for Myocardial Infarction, a myocardial infarction (MI), commonly known as a heart attack, occurs when blood flow decreases or stops to a part of the heart, causing damage to the heart muscle.STTC
- Diagnosis for ST/T Change, ST and T wave changes may represent cardiac pathology or be a normal variant. Interpretation of the findings, therefore, depends on the clinical context and presence of similar findings on prior electrocardiogramsCD
- Diagnosis for Conduction Disturbance. Your heart rhythm is the way your heartbeats. Conduction is how electrical impulses travel through your heart, which causes it to beat. Some conduction disorders can cause arrhythmias or irregular heartbeats.HYP
- Diagnosis for Hypertrophy, Hypertrophic cardiomyopathy (HCM) is a disease in which the heart muscle becomes abnormally thick (hypertrophied). The thickened heart muscle can make it harder for the heart to pump blood. sub_
- Columns with the 'sub_' prefix are more detailed diagnosis for ECG.