Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by LIU Meiting Avril
Released under Apache 2.0
This dataset was created by Srikanth Chitteti
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This heart disease dataset is acquired from one o f the multispecialty hospitals in India. Over 14 common features which makes it one of the heart disease dataset available so far for research purposes. This dataset consists of 1000 subjects with 12 features. This dataset will be useful for building a early-stage heart disease detection as well as to generate predictive machine learning models.
2018 to 2020, 3-year average. Rates are age-standardized. County rates are spatially smoothed. The data can be viewed by sex and race/ethnicity. Data source: National Vital Statistics System. Additional data, maps, and methodology can be viewed on the Interactive Atlas of Heart Disease and Stroke https://www.cdc.gov/heart-disease-stroke-atlas/about/index.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data set CHDdata.csv contains cases of coronary heart disease (CHD) and variables associated with the patient's condition: systolic blood pressure, yearly tobacco use (in kg), low density lipoprotein (Idl), adiposity, family history (0 or 1), type A personality score (typea), obesity (body mass index), alcohol use, age, and the diagnosis of CHD (0 or 1).
This dataset was created by Rischan Mafrur
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Muneer Iqbal24
Released under CC0: Public Domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cardiovascular disease accounts for millions of deaths each year and is currently the leading cause of mortality worldwide. The aging process is clearly linked to cardiovascular disease, however, the exact relationship between aging and heart function is not fully understood. Furthermore, a holistic view of cardiac aging, linking features of early life development to changes observed in old age, has not been synthesized. Here, we re-purpose RNA-sequencing data previously-collected by our group, investigating gene expression differences between wild-type mice of different age groups that represent key developmental milestones in the murine lifespan. DESeq2's generalized linear model was applied with two hypothesis testing approaches to identify differentially-expressed (DE) genes, both between pairs of age groups and across mice of all ages. Pairwise comparisons identified genes associated with specific age transitions, while comparisons across all age groups identified a large set of genes associated with the aging process more broadly. An unsupervised machine learning approach was then applied to extract common expression patterns from this set of age-associated genes. Sets of genes with both linear and non-linear expression trajectories were identified, suggesting that aging not only involves the activation of gene expression programs unique to different age groups, but also the re-activation of gene expression programs from earlier ages. Overall, we present a comprehensive transcriptomic analysis of cardiac gene expression patterns across the entirety of the murine lifespan.
This dataset documents rates and trends in heart disease and stroke mortality. Specifically, this report presents county (or county equivalent) estimates of heart disease and stroke death rates in 2000-2019 and trends during two intervals (2000-2010, 2010-2019) by age group (ages 35–64 years, ages 65 years and older), race/ethnicity (non-Hispanic American Indian/Alaska Native, non-Hispanic Asian/Pacific Islander, non-Hispanic Black, Hispanic, non-Hispanic White), and sex (women, men). The rates and trends were estimated using a Bayesian spatiotemporal model and a smoothed over space, time, and demographic group. Rates are age-standardized in 10-year age groups using the 2010 US population. Data source: National Vital Statistics System.
This dataset was created by Anna Tshngryan
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Deesya Lovely Susanto
Released under Apache 2.0
This dataset was created by Esha Asif005
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chronic heart failure (HF) is a syndrome of heterogeneous etiology associated with multiple co-morbidities. Inflammation is increasingly recognized as a key contributor to the pathophysiology of HF. Heterogeneity and lack of data on the immune mechanism(s) contributing to HF may partially underlie the failure of clinical trials targeting inflammatory mediators. We studied the Immunome in HF cohort using mass cytometry and used data-driven systems immunology approach to discover and characterize modulated immune cell subsets from peripheral blood. We showed cytotoxic and inflammatory innate lymphoid and myeloid cells were expanded in HF patients compared to healthy controls. Network analysis showed highly modular and centralized immune cell architecture in healthy control immune cell network. In contrast, the HF immune cell network showed greater inter-cellular communication and less modular structure. Furthermore, we found, as an immune mechanism specific to HF with preserved ejection fraction (HFpEF), an increase in inflammatory MAIT and CD4 T cell subsets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data presented here was extracted from a larger dataset collected through a collaboration between the Embedded Systems Laboratory (ESL) of the Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland and the Institute of Sports Sciences of the University of Lausanne (ISSUL). In this dataset, we report the extracted segments used for an analysis of R peak detection algorithms during high intensity exercise.
Protocol of the experiments
The protocol of the experiment was the following.
Description of the extracted dataset
The characteristics of the dataset are the following:
seg1 --> [VT2-50,VT2-30]
seg2 --> [VT2+60,VT2+80]
seg3 --> [VO2max-50,VO2max-30]
seg4 --> [VO2max-10,VO2max+10]
seg5 --> [VO2max+60,VO2max+80]
Format of the extracted dataset
The dataset is divided in two main folders:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Description: We present a database of four-chamber heart models derived from a statistical shape model (SSM) suitable for electro-mechanical (EM) simulations. Our database consists of 1000 four-chamber heart models generated from end-diastolic CT-derived meshes (available in the repository called ("Virtual cohort of adult healthy four-chamber heart meshes from CT images"). These meshes were used for EM simulations. The weights of the SSM are also provided.
Cardiac meshes: To build the SSM, we rigidly aligned the CT cohort and extracted the surfaces, representing them asdeRham currents. The registration between meshes and computation of the average shape was done using a Large Deformation Diffeomorphic Metric Mapping method. The deformation functions depend on a set of uniformly distributed control points in which the shapes are embedded, and on the deformation vectors attached to these points. It is in this spatial field of deformation vectors (one per each control point) where the Principal Component Analysis (PCA) is applied. Case #20 of the CT cohort was not included. More information on the details can be found in Supplement 3 of the reference paper. We created this cohort by modifying the weight of the modes explaining 90%of the variance in shape (corresponding to modes 1 to 9) within 2 standard deviations (SD) of each mode added to the average mesh. The elements of all the meshes are labelled as follows:
Left ventricle myocardium
Right ventricle myocardium
Left atrium myocardium
Right atrium myocardium
Aorta wall
Pulmonary artery wall
Mitral valve plane
Tricuspid valve plane
Aortic valve plane
Pulmonary valve plane
Left atrium appendage "inlet"
Left superior pulmonary vein inlet
Left inferior pulmonary vein inlet
Right inferior pulmonary vein inlet
Right superior pulmonary vein inlet
Superior vena cava inlet
Inferior vena cava inlet
Left atrial appendage border
Right inferior pulmonary vein border
Left inferior pulmonary vein border
Left superior pulmonary vein border
Right superior pulmonary vein border
Superior vena cava border
Inferior vena cava border
Each zipped folder contains 25 meshes and the weights of modes used to construct them for each mesh, A VTK file for each mesh (in ASCII) contains an UNSTRUCTURED GRID with the following fields:
POINTS, with the coordinates of the points in mm.
CELL_TYPES, having all of the points the value 10 since they are tetrahedra.
CELLS, with the indices of the vertices of every element.
CELL_DATA, corresponding to the meshing tags.
In addition, three descriptive files are included:
Normalized_explained_variance.csv contains the percentages of variance explained by each of the 18 modes generated from PCA.
Mode_standard_deviation.csv contains absolute standard deviations of each of the 18 modes.
Eigenvectors.csv contains the directions of maximum shape variability within the shape population.
This dataset was created by Alien
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Heart Disease is among the most prevalent chronic diseases in the United States, impacting millions of Americans each year and exerting a significant financial burden on the economy. In the United States alone, heart disease claims roughly 647,000 lives each year — making it the leading cause of death. The buildup of plaques inside larger coronary arteries, molecular changes associated with aging, chronic inflammation, high blood pressure, and diabetes are all causes of and risk factors for heart disease.
While there are different types of coronary heart disease, the majority of individuals only learn they have the disease following symptoms such as chest pain, a heart attack, or sudden cardiac arrest. This fact highlights the importance of preventative measures and tests that can accurately predict heart disease in the population prior to negative outcomes like myocardial infarctions (heart attacks) taking place.
The Centers for Disease Control and Prevention has identified high blood pressure, high blood cholesterol, and smoking as three key risk factors for heart disease. Roughly half of Americans have at least one of these three risk factors. The National Heart, Lung, and Blood Institute highlights a wider array of factors such as Age, Environment and Occupation, Family History and Genetics, Lifestyle Habits, Other Medical Conditions, Race or Ethnicity, and Sex for clinicians to use in diagnosing coronary heart disease. Diagnosis tends to be driven by an initial survey of these common risk factors followed by bloodwork and other tests.
The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, I downloaded a csv of the dataset available on Kaggle for the year 2015. This original dataset contains responses from 441,455 individuals and has 330 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.
This dataset contains 253,680 survey responses from cleaned BRFSS 2015 to be used primarily for the binary classification of heart disease. Not that there is strong class imbalance in this dataset. 229,787 respondents do not have/have not had heart disease while 23,893 have had heart disease. The question to be explored is:
and
It it important to reiterate that I did not create this dataset, it is just a cleaned and consolidated dataset created from the BRFSS 2015 dataset already on Kaggle. That dataset can be found here and the notebook I used for the data cleaning can be found here.
Let's build some predictive models for for heart disease.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data in this Zenodo entry corresponds to the data used to produce the results in https://www.jacc.org/doi/full/10.1016/j.jacadv.2022.100169. The zipped folder contains three files
Metabolite Data.csv - The meatobilte measurements for all the samples
Clinical Data.csv - Values for the clinical variables
Clinical Data Descriptions.csv - More in depth explanation of clinical variables as well as possible values of the variables
This study was supported by the American Heart Association (AHA20CDA35310498 and AHA18IPA34170070) and the National Institutes of Health (NIH/NCATS Colorado CTSA, No. UL1 TR001082 and NIH/NHLBI K23HL12363
2016 to 2018, 3-year average. Rates are age-standardized. County rates are spatially smoothed. The data can be viewed by sex and race/ethnicity. Data source: National Vital Statistics System. Additional data, maps, and methodology can be viewed on the Interactive Atlas of Heart Disease and Stroke https://www.cdc.gov/heart-disease-stroke-atlas/about/index.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT
The issue of diagnosing psychotic diseases, including schizophrenia and bipolar disorder, in particular, the objectification of symptom severity assessment, is still a problem requiring the attention of researchers. Two measures that can be helpful in patient diagnosis are heart rate variability calculated based on electrocardiographic signal and accelerometer mobility data. The following dataset contains data from 30 psychiatric ward patients having schizophrenia or bipolar disorder and 30 healthy persons. The duration of the measurements for individuals was usually between 1.5 and 2 hours. R-R intervals necessary for heart rate variability calculation were collected simultaneously with accelerometer data using a wearable Polar H10 device. The Positive and Negative Syndrome Scale (PANSS) test was performed for each patient participating in the experiment, and its results were attached to the dataset. Furthermore, the code for loading and preprocessing data, as well as for statistical analysis, was included on the corresponding GitHub repository.
BACKGROUND
Heart rate variability (HRV), calculated based on electrocardiographic (ECG) recordings of R-R intervals stemming from the heart's electrical activity, may be used as a biomarker of mental illnesses, including schizophrenia and bipolar disorder (BD) [Benjamin et al]. The variations of R-R interval values correspond to the heart's autonomic regulation changes [Berntson et al, Stogios et al]. Moreover, the HRV measure reflects the activity of the sympathetic and parasympathetic parts of the autonomous nervous system (ANS) [Task Force of the European Society of Cardiology the North American Society of Pacing Electrophysiology, Matusik et al]. Patients with psychotic mental disorders show a tendency for a change in the centrally regulated ANS balance in the direction of less dynamic changes in the ANS activity in response to different environmental conditions [Stogios et al]. Larger sympathetic activity relative to the parasympathetic one leads to lower HRV, while, on the other hand, higher parasympathetic activity translates to higher HRV. This loss of dynamic response may be an indicator of mental health. Additional benefits may come from measuring the daily activity of patients using accelerometry. This may be used to register periods of physical activity and inactivity or withdrawal for further correlation with HRV values recorded at the same time.
EXPERIMENTS
In our experiment, the participants were 30 psychiatric ward patients with schizophrenia or BD and 30 healthy people. All measurements were performed using a Polar H10 wearable device. The sensor collects ECG recordings and accelerometer data and, additionally, prepares a detection of R wave peaks. Participants of the experiment had to wear the sensor for a given time. Basically, it was between 1.5 and 2 hours, but the shortest recording was 70 minutes. During this time, evaluated persons could perform any activity a few minutes after starting the measurement. Participants were encouraged to undertake physical activity and, more specifically, to take a walk. Due to patients being in the medical ward, they received instruction to take a walk in the corridors at the beginning of the experiment. They were to repeat the walk 30 minutes and 1 hour after the first walk. The subsequent walks were to be slightly longer (about 3, 5 and 7 minutes, respectively). We did not remind or supervise the command during the experiment, both in the treatment and the control group. Seven persons from the control group did not receive this order and their measurements correspond to freely selected activities with rest periods but at least three of them performed physical activities during this time. Nevertheless, at the start of the experiment, all participants were requested to rest in a sitting position for 5 minutes. Moreover, for each patient, the disease severity was assessed using the PANSS test and its scores are attached to the dataset.
The data from sensors were collected using Polar Sensor Logger application [Happonen]. Such extracted measurements were then preprocessed and analyzed using the code prepared by the authors of the experiment. It is publicly available on the GitHub repository [Książek et al].
Firstly, we performed a manual artifact detection to remove abnormal heartbeats due to non-sinus beats and technical issues of the device (e.g. temporary disconnections and inappropriate electrode readings). We also performed anomaly detection using Daubechies wavelet transform. Nevertheless, the dataset includes raw data, while a full code necessary to reproduce our anomaly detection approach is available in the repository. Optionally, it is also possible to perform cubic spline data interpolation. After that step, rolling windows of a particular size and time intervals between them are created. Then, a statistical analysis is prepared, e.g. mean HRV calculation using the RMSSD (Root Mean Square of Successive Differences) approach, measuring a relationship between mean HRV and PANSS scores, mobility coefficient calculation based on accelerometer data and verification of dependencies between HRV and mobility scores.
DATA DESCRIPTION
The structure of the dataset is as follows. One folder, called HRV_anonymized_data contains values of R-R intervals together with timestamps for each experiment participant. The data was properly anonymized, i.e. the day of the measurement was removed to prevent person identification. Files concerned with patients have the name treatment_X.csv, where X is the number of the person, while files related to the healthy controls are named control_Y.csv, where Y is the identification number of the person. Furthermore, for visualization purposes, an image of the raw RR intervals for each participant is presented. Its name is raw_RR_{control,treatment}_N.png, where N is the number of the person from the control/treatment group. The collected data are raw, i.e. before the anomaly removal. The code enabling reproducing the anomaly detection stage and removing suspicious heartbeats is publicly available in the repository [Książek et al]. The structure of consecutive files collecting R-R intervals is following:
Phone timestamp | RR-interval [ms] |
12:43:26.538000 | 651 |
12:43:27.189000 | 632 |
12:43:27.821000 | 618 |
12:43:28.439000 | 621 |
12:43:29.060000 | 661 |
... | ... |
The first column contains the timestamp for which the distance between two consecutive R peaks was registered. The corresponding R-R interval is presented in the second column of the file and is expressed in milliseconds.
The second folder, called accelerometer_anonymized_data contains values of accelerometer data collected at the same time as R-R intervals. The naming convention is similar to that of the R-R interval data: treatment_X.csv and control_X.csv represent the data coming from the persons from the treatment and control group, respectively, while X is the identification number of the selected participant. The numbers are exactly the same as for R-R intervals. The structure of the files with accelerometer recordings is as follows:
Phone timestamp | X [mg] | Y [mg] | Z [mg] |
13:00:17.196000 | -961 | -23 | 182 |
13:00:17.205000 | -965 | -21 | 181 |
13:00:17.215000 | -966 | -22 | 187 |
13:00:17.225000 | -967 | -26 | 193 |
13:00:17.235000 | -965 | -27 | 191 |
... | ... | ... | ... |
The first column contains a timestamp, while the next three columns correspond to the currently registered acceleration in three axes: X, Y and Z, in milli-g unit.
We also attached a file with the PANSS test scores (PANSS.csv) for all patients participating in the measurement. The structure of this file is as follows:
no_of_person | PANSS_P | PANSS_N | PANSS_G | PANSS_total |
1 | 8 | 13 | 22 | 43 |
2 | 11 | 7 | 18 | 36 |
3 | 14 | 30 | 44 | 88 |
4 | 18 | 13 | 27 | 58 |
... | ... | ... | ... | .. |
The first column contains the identification number of the patient, while the three following columns refer to the PANSS scores related to positive, negative and general symptoms, respectively.
USAGE NOTES
All the files necessary to run the HRV and/or accelerometer data analysis are available on the GitHub repository [Książek et al]. HRV data loading, preprocessing (i.e. anomaly detection and removal), as well as the
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by LIU Meiting Avril
Released under Apache 2.0