Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset created using https://people.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/ Applied appropriate transformations and edits to make it more usable.
"This table below is a knowledge database of disease-symptom associations generated by an automated method based on information in textual discharge summaries of patients at New York Presbyterian Hospital admitted during 2004. The first column shows the disease, the second the number of discharge summaries containing a positive and current mention of the disease, and the associated symptom. Associations for the 150 most frequent diseases based on these notes were computed and the symptoms are shown ranked based on the strength of association. The method used the MedLEE natural language processing system to obtain UMLS codes for diseases and symptoms from the notes; then statistical methods based on frequencies and co-occurrences were used to obtain the associations. A more detailed description of the automated method can be found in Wang X, Chused A, Elhadad N, Friedman C, Markatou M. Automated knowledge acquisition from clinical reports. AMIA Annu Symp Proc. 2008. p. 783-7. PMCID: PMC2656103."
Facebook
TwitterThese data contain case counts and rates for selected communicable diseases—listed in the data dictionary—that met the surveillance case definition for that disease and was reported for California residents, by disease, county, year, and sex. The data represent cases with an estimated illness onset date from 2001 through the last year indicated from California Confidential Morbidity Reports and/or Laboratory Reports. Data captured represent reportable case counts as of the date indicated in the “Temporal Coverage” section below, so the data presented may differ from previous publications due to delays inherent to case reporting, laboratory reporting, and epidemiologic investigation.
Facebook
TwitterBy Oklahoma [source]
This dataset contains an overview of historical heart disease death rates in Oklahoma from 2000 to 2018. The dataset consists of yearly figures and target figures for the numbers of deaths due to heart diseases, allowing a comparison between the expected rate and the actual rate over time. This data is important as it can be used to analyze trends in heart disease death rates, helping inform public health initiatives and policy decisions
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset includes the number of death due to heart disease in Oklahoma. It provides a single, comprehensive data set that captures detailed information on the historical prevalence of heart disease death rates in the state. This dataset can be used for various research or analytical purposes such as epidemiological studies or health services planning.
To use this dataset, one must first understand that it contains three main pieces: the year of reported deaths, the actual number of deaths related to heart disease during each year and a target total for expected deaths from heart disease per year, which are used as reference points when analyzing other years. The years column includes all relevant dates while historical data column provides more specifics such as exact numbers and percentages related to those who perished due to heart-related conditions.
By utilizing this data set users can easily find out how many persons died due to cardiac-related diseases along with what risks were most prevalent at certain times over that period by comparing provided figures with reference targets at any given time slice in question (time point). Additionally, one can observe trends carefully within different groups such as males versus females or rural versus urban locations thus allowing them more robust insight into factors associated with mortality from cardiac conditions across different demographics
- Identifying which geographic areas in Oklahoma are at highest risk for heart disease and creating targeted public health initiatives to reduce its incidence.
- Determining correlations between changes in vital health indicators (e.g., increase of physical activity) with changes in heart disease death rates to better inform policy and research direction.
- Analyzing overall mortality rates compared to other counties or states with comparable demographics to assess the effectiveness of existing public health interventions over time
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: res_heart_disease_deaths_kdjx-hayj.csv | Column name | Description | |:--------------------|:-----------------------------------------------------------------------------------------------------------------------------------------| | Years | The year associated with the data. (Integer) | | Historical Data | The number of deaths due to heart disease in Oklahoma in that particular year from 2000-2018. (Integer) | | Target | A value generated based on Historical Data indicating what should be targeted as a baseline performance measure going forward. (Integer) |
File: res_heart_disease_deaths_-_column_chart_3a28-gndr.csv | Column name | Description | |:--------------------|:-----------------------------------------------------------------------------------------------------------------------------------------| | Years | The year associated with the data. (Integer) | | Historical Data | The number of deaths due to heart disease in Oklahoma in that particular year from 2000-2018. (Integer) | | Target | A value generated based on Historical Data indicating what should be targeted as a baseline performance measure going forward. (Integer) |
...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents the principal causes of death in the State of Qatar, classified according to ICD-10 chapters. It includes annual death counts for various disease categories over a ten-year period. The dataset is structured by cause of death and provides a time series that enables trend analysis and comparison across years.This information is valuable for health policymakers, researchers, and public health professionals to monitor disease burdens, design interventions, and evaluate national health outcomes. It supports health planning, epidemic tracking, and resource allocation in line with international classification standards.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Heart Disease is among the most prevalent chronic diseases in the United States, impacting millions of Americans each year and exerting a significant financial burden on the economy. In the United States alone, heart disease claims roughly 647,000 lives each year — making it the leading cause of death. The buildup of plaques inside larger coronary arteries, molecular changes associated with aging, chronic inflammation, high blood pressure, and diabetes are all causes of and risk factors for heart disease. While there are different types of coronary heart disease, the majority of individuals only learn they have the disease following symptoms such as chest pain, a heart attack, or sudden cardiac arrest. This fact highlights the importance of preventative measures and tests that can accurately predict heart disease in the population prior to negative outcomes like myocardial infarctions (heart attacks) taking place
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context Diabetes is one of the most prevalent chronic diseases in the United States, affecting millions of Americans each year and placing a substantial financial burden on the economy. It is a serious chronic condition in which the body loses the ability to effectively regulate blood glucose levels, leading to a reduced quality of life and decreased life expectancy. During digestion, food is broken down into sugars, which enter the bloodstream. This triggers the pancreas to release insulin, a hormone that helps cells in the body use these sugars for energy. Diabetes is typically characterized by either insufficient insulin production or the body's inability to use insulin effectively.
Chronic high blood sugar levels in individuals with diabetes can lead to severe complications, including heart disease, vision loss, kidney disease, and lower-limb amputation. Although there is no cure for diabetes, strategies such as maintaining a healthy weight, eating a balanced diet, staying physically active, and receiving medical treatments can help mitigate its effects. Early diagnosis is crucial, as it allows for lifestyle modifications and more effective treatment, making predictive models for assessing diabetes risk valuable tools for public health officials.
The scale of the diabetes epidemic is significant. According to the Centers for Disease Control and Prevention (CDC), as of 2018, approximately 34.2 million Americans have diabetes, while 88 million have prediabetes. Alarmingly, the CDC estimates that 1 in 5 individuals with diabetes and about 8 in 10 individuals with prediabetes are unaware of their condition. Type II diabetes is the most common form, and its prevalence varies based on factors such as age, education, income, geographic location, race, and other social determinants of health. The burden of diabetes disproportionately affects those with lower socioeconomic status. The economic impact is also substantial, with the cost of diagnosed diabetes reaching approximately $327 billion annually, and total costs, including undiagnosed diabetes and prediabetes, nearing $400 billion each year.
Content The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a XPT of the dataset available on CDC website for the year 2023 was used. This original dataset contains responses from 433,323 individuals and has 345 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.
I have selected 20 features from this dataset that are suitable for working on the topic of diabetes, and I have saved them in a CSV file without making any changes to the data. The goal of this is to make it easier to work with the data. For more information or to access updated data, you can refer to the CDC website. I initially examined the original dataset from the CDC and found no duplicate entries. That dataset contains 330 columns and features. Therefore, the duplicate cases in this dataset are not due to errors but rather represent individuals with similar conditions. In my opinion, removing these entries would both introduce errors and reduce accuracy.
Explore some of the following research questions: - Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? - What risk factors are most predictive of diabetes risk? - Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? - Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?
Acknowledgements It is important to reiterate that I did not create this dataset, it is simply a summarized and reformatted dataset derived from the BRFSS 2023 dataset available on the CDC website. It is also worth noting that none of the data in this dataset discloses individuals' identities.
Inspiration Zidian Xie et al for Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques using the 2014 BRFSS, and Alex Teboul for building Diabetes Health Indicators dataset based on BRFSS 2015 were the inspiration for creating this dataset and exploring the BRFSS in general.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Originally, the dataset come from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to gather data on the health status of U.S. residents. As the CDC describes: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.". The most recent dataset (as of February 15, 2022) includes data from 2020. It consists of 401,958 rows and 279 columns. The vast majority of columns are questions asked to respondents about their health status, such as "Do you have serious difficulty walking or climbing stairs?" or "Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]".
To improve the efficiency and relevance of our analysis, we removed certain attributes from the original BRFSS dataset. Many of the 279 original attributes included administrative codes, metadata, or survey-specific variables that do not contribute meaningfully to heart disease prediction—such as respondent IDs, timestamps, state-level identifiers, and detailed lifestyle questions unrelated to cardiovascular health. By focusing on a carefully selected subset of 18 attributes directly linked to medical, behavioral, and demographic factors known to influence heart health, we streamlined the dataset. This not only reduced computational complexity but also improved model interpretability and performance by eliminating noise and irrelevant information. All predicting variables could be divided into 4 broad categories:
Demographic factors: sex, age category (14 levels), race, BMI (Body Mass Index)
Diseases: weather respondent ever had such diseases as asthma, skin cancer, diabetes, stroke or kidney disease (not including kidney stones, bladder infection or incontinence)
Unhealthy habits:
General Health:
Below is a description of the features collected for each patient:
| # | Feature | Coded Variable Name | Description |
|---|---|---|---|
| 1 | HeartDisease | CVDINFR4 | Respondents that have ever reported having coronary heart disease (CHD) or myocardial infarction (MI) |
| 2 | BMI | _BMI5CAT | Body Mass Index (BMI) |
| 3 | Smoking | _SMOKER3 | Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes] |
| 4 | AlcoholDrinking | _RFDRHV7 | Heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week |
| 5 | Stroke | CVDSTRK3 | (Ever told) (you had) a stroke? |
| 6 | PhysicalHealth | PHYSHLTH | Now thinking about your physical health, which includes physical illness and injury, for how many days during the past 30 |
| 7 | MentalHealth | MENTHLTH | Thinking about your mental health, for how many days during the past 30 days was your mental health not good? |
| 8 | DiffWalking | DIFFWALK | Do you have serious difficulty walking or climbing stairs? |
| 9 | Sex | SEXVAR | Are you male or female? |
| 10 | AgeCategory | _AGE_G, | Fourteen-level age category |
| 11 | Race | _IMPRACE | Imputed race/ethnicity value |
| 12 | Diabetic | DIABETE4 | (Ever told) (you had) diabetes? |
| 13 | PhysicalActivity | EXERANY2 | Adults who reported doing physical activity or exercise during the past 30 days other than their regular job |
| 14 | GenHealth | GENHLTH | Would you say that in general your health is... |
| 15 | SleepTime | SLEPTIM1 | On average, how many hours of sleep do you get in a 24-hour period? |
| 16 | Asthma | CHASTHMA | (Ever told) (you had) asthma? |
| 17 | KidneyDisease | CHCKDNY2 | Not including kidney stones, bladder infection or incontinence, were you ever told you had kidney disease? |
| 18 | SkinCancer | CHCSCNCR | (Ever told) (you had) skin cancer? |
Facebook
TwitterBy Health [source]
This dataset provides comprehensive information on the number and rate of infectious diseases in California. Focusing on counties, sexes, and various diseases between 2001-2014, it offers powerful insights into the health status of its citizens. Its data also reveals trends in the spread of common illnesses in this state. Whether you are an epidemiologist looking to inform public health policy or a researcher seeking to investigate particular illnesses within certain populations, this dataset contains all the necessary information to answer your questions. Explore it today and discover hidden stories waiting to be uncovered!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains counts and rates of infectious diseases in California by county, disease, sex, and year. This dataset can be used to generate trends to understand the changes in incidence of different types of diseases over time and across counties or between sexes.
To use this dataset: - Select the columns you are interested in exploring - these could include Disease, County, Sex or Year. - Filter out the rows that do not relate to your question - for example filtering by a specific county or disease. - Examine the average rate per 100000 people for each group you selected as well as its lower and upper confidence intervals (CI). - Use Rate as your dependent variable for analysis; Population is likely also important determining factors. Make sure to check if any Rates have 'unstable' flags.
- Visualise or statistically analyse your data using suitable methods such as descriptive statistics (means/medians/mode etc.)for comparison between 2+ groups or correlation/regression based models when comparing one variable to another over time etc.
- Analyzing the geographic spread of infectious diseases over time to identify areas in need of increased education, resources, and care.
- Comparing rates of disease by sex to identify and understand any gender-based differences in infectious disease cases.
- Using the Unstable column to determine whether a particular county or region needs further study of a certain type of infectious disease due to unusual spikes or drops in rate or count during a specific year
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: Infectious_Disease_Cases_by_County_Year_and_Sex_2001-2014.csv | Column name | Description | |:---------------|:---------------------------------------------------------------------------------------------------------------| | Disease | The type of infectious disease reported. (String) | | County | The county in California where the cases were reported. (String) | | Year | The year in which the cases were reported. (Integer) | | Sex | The gender of the individuals who contracted the disease. (String) | | Population | The population size of the county in which the cases were reported. (Integer) | | Rate | The rate of infection per 100 thousand people living in the county. (Float) | | CI.lower | The lower confidence interval associated with the rate of infection. (Float) | | CI.upper | The upper confidence interval associated with the rate of infection. (Float) ...
Facebook
TwitterThis dataset of U.S. mortality trends since 1900 highlights trends in age-adjusted death rates for five selected major causes of death. Age-adjusted death rates (deaths per 100,000) after 1998 are calculated based on the 2000 U.S. standard population. Populations used for computing death rates for 2011–2017 are postcensal estimates based on the 2010 census, estimated as of July 1, 2010. Rates for census years are based on populations enumerated in the corresponding censuses. Rates for noncensus years between 2000 and 2010 are revised using updated intercensal population estimates and may differ from rates previously published. Data on age-adjusted death rates prior to 1999 are taken from historical data (see References below). Revisions to the International Classification of Diseases (ICD) over time may result in discontinuities in cause-of-death trends. SOURCES CDC/NCHS, National Vital Statistics System, historical data, 1900-1998 (see https://www.cdc.gov/nchs/nvss/mortality_historical_data.htm); CDC/NCHS, National Vital Statistics System, mortality data (see http://www.cdc.gov/nchs/deaths.htm); and CDC WONDER (see http://wonder.cdc.gov). REFERENCES National Center for Health Statistics, Data Warehouse. Comparability of cause-of-death between ICD revisions. 2008. Available from: http://www.cdc.gov/nchs/nvss/mortality/comparability_icd.htm. National Center for Health Statistics. Vital statistics data available. Mortality multiple cause files. Hyattsville, MD: National Center for Health Statistics. Available from: https://www.cdc.gov/nchs/data_access/vitalstatsonline.htm. Kochanek KD, Murphy SL, Xu JQ, Arias E. Deaths: Final data for 2017. National Vital Statistics Reports; vol 68 no 9. Hyattsville, MD: National Center for Health Statistics. 2019. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr68/nvsr68_09-508.pdf. Arias E, Xu JQ. United States life tables, 2017. National Vital Statistics Reports; vol 68 no 7. Hyattsville, MD: National Center for Health Statistics. 2019. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr68/nvsr68_07-508.pdf. National Center for Health Statistics. Historical Data, 1900-1998. 2009. Available from: https://www.cdc.gov/nchs/nvss/mortality_historical_data.htm.
Facebook
TwitterDeath rate has been age-adjusted to the 2000 U.S. standard population. Single-year data are only available for Los Angeles County overall, Service Planning Areas, Supervisorial Districts, City of Los Angeles overall, and City of Los Angeles Council Districts.Coronary heart disease is a type of heart disease in which the arteries of the heart cannot deliver enough oxygen-rich blood to the heart muscles. Over time, this can weaken the heart muscle and may lead to heart attack or heart failure. It is the most common type of heart disease in the US and has been the leading cause of death in Los Angeles County for the last two decades. Poor diet, sedentary lifestyle, tobacco exposure, and chronic stress are all important risk factors for coronary heart disease. Cities and communities can mitigate these risks by improving local food environments and encouraging physical activity by making communities safer and more walkable.For more information about the Community Health Profiles Data Initiative, please see the initiative homepage.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundEstimating the burden of healthcare-associated infections (HAIs) compared to other communicable diseases is an ongoing challenge given the need for good quality data on the incidence of these infections and the involved comorbidities. Based on the methodology of the Burden of Communicable Diseases in Europe (BCoDE) project and 2011–2012 data from the European Centre for Disease Prevention and Control (ECDC) point prevalence survey (PPS) of HAIs and antimicrobial use in European acute care hospitals, we estimated the burden of six common HAIs.Methods and FindingsThe included HAIs were healthcare-associated pneumonia (HAP), healthcare-associated urinary tract infection (HA UTI), surgical site infection (SSI), healthcare-associated Clostridium difficile infection (HA CDI), healthcare-associated neonatal sepsis, and healthcare-associated primary bloodstream infection (HA primary BSI). The burden of these HAIs was measured in disability-adjusted life years (DALYs). Evidence relating to the disease progression pathway of each type of HAI was collected through systematic literature reviews, in order to estimate the risks attributable to HAIs. For each of the six HAIs, gender and age group prevalence from the ECDC PPS was converted into incidence rates by applying the Rhame and Sudderth formula. We adjusted for reduced life expectancy within the hospital population using three severity groups based on McCabe score data from the ECDC PPS. We estimated that 2,609,911 new cases of HAI occur every year in the European Union and European Economic Area (EU/EEA). The cumulative burden of the six HAIs was estimated at 501 DALYs per 100,000 general population each year in EU/EEA. HAP and HA primary BSI were associated with the highest burden and represented more than 60% of the total burden, with 169 and 145 DALYs per 100,000 total population, respectively. HA UTI, SSI, HA CDI, and HA primary BSI ranked as the third to sixth syndromes in terms of burden of disease. HAP and HA primary BSI were associated with the highest burden because of their high severity. The cumulative burden of the six HAIs was higher than the total burden of all other 32 communicable diseases included in the BCoDE 2009–2013 study. The main limitations of the study are the variability in the parameter estimates, in particular the disease models’ case fatalities, and the use of the Rhame and Sudderth formula for estimating incident number of cases from prevalence data.ConclusionsWe estimated the EU/EEA burden of HAIs in DALYs in 2011–2012 using a transparent and evidence-based approach that allows for combining estimates of morbidity and of mortality in order to compare with other diseases and to inform a comprehensive ranking suitable for prioritization. Our results highlight the high burden of HAIs and the need for increased efforts for their prevention and control. Furthermore, our model should allow for estimations of the potential benefit of preventive measures on the burden of HAIs in the EU/EEA.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
A dataset providing GP recorded coronary heart disease. Coronary heart disease (CHD) is the leading cause of death both in the UK and worldwide. It's responsible for more than 73,000 deaths in the UK each year. About 1 in 6 men and 1 in 10 women die from CHD. In the UK, there are an estimated 2.3 million people living with CHD and around 2 million people affected by angina (the most common symptom of coronary heart disease). CHD generally affects more men than women, although from the age of 50 the chances of developing the condition are similar for both sexes. As well as angina (chest pain), the main symptoms of CHD are heart attacks and heart failure. However, not everyone has the same symptoms and some people may not have any before CHD is diagnosed. CHD is sometimes called ischaemic heart disease.
Facebook
TwitterThe dataset is tabular data containing electronic health records from individuals with heart failure, a prior myocardial infarction, or represent a random sample of the UNC population. The data contains demographic data, dates of diagnoses, comorbidities, all-cause mortality dates, and proximity to major roadways (as a proxy for exposure to traffic related air pollution). This dataset is not publicly accessible because: This data contains PII in the form of electronic health records and cannot be uploaded to ScienceHub. It can be accessed through the following means: The data requires an approved IRB application to be accessed. With an approved IRB application it can be accessed by emailing ward-caviness.cavin@epa.gov. Format: The dataset is tabular data containing electronic health records from individuals with heart failure, a prior myocardial infarction, or represent a random sample of the UNC population. The data contains demographic data, dates of diagnoses, comorbidities, all-cause mortality dates, and proximity to major roadways. This dataset is associated with the following publication: Raab, H., M. Breen, A. Weaver, J. Moyer, W. Cascio, D. Diazsanchez, and C. Ward-Caviness. Comparison of associations between proximity to major roads and all-cause mortality across a spectrum of cardiovascular diseases. Environmental Epidemiology. Wolters Kluwer, Alphen aan den Rijn, NETHERLANDS, 8(6): e351, (2024).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive statistics on global health, focusing on various diseases, treatments, and outcomes. The data spans multiple countries and years, offering valuable insights for health research, epidemiology studies, and machine learning applications. The dataset includes information on the prevalence, incidence, and mortality rates of major diseases, as well as the effectiveness of treatments and healthcare infrastructure.
This dataset can be used for:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worlwide. Heart failure is a common event caused by CVDs and this dataset contains 12 features that can be used to predict mortality by heart failure. Most cardiovascular diseases can be prevented by addressing behavioural risk factors such as tobacco use, unhealthy diet and obesity, physical inactivity and harmful use of alcohol using population-wide strategies. People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Finding a good data source is the first step toward creating a database. Cardiovascular illnesses (CVDs) are the major cause of death worldwide. CVDs include coronary heart disease, cerebrovascular disease, rheumatic heart disease, and other heart and blood vessel problems. According to the World Health Organization, 17.9 million people die each year. Heart attacks and strokes account for more than four out of every five CVD deaths, with one-third of these deaths occurring before the age of 70 A comprehensive database for factors that contribute to a heart attack has been constructed , The main purpose here is to collect characteristics of Heart Attack or factors that contribute to it. As a result, a form is created to accomplish this. Microsoft Excel was used to create this form. Figure 1 depicts the form which It has nine fields, where eight fields for input fields and one field for output field. Age, gender, heart rate, systolic BP, diastolic BP, blood sugar, CK-MB, and Test-Troponin are representing the input fields, while the output field pertains to the presence of heart attack, which is divided into two categories (negative and positive).negative refers to the absence of a heart attack, while positive refers to the presence of a heart attack.Table 1 show the detailed information and max and min of values attributes for 1319 cases in the whole database.To confirm the validity of this data, we looked at the patient files in the hospital archive and compared them with the data stored in the laboratories system. On the other hand, we interviewed the patients and specialized doctors. Table 2 is a sample for 1320 cases, which shows 44 cases and the factors that lead to a heart attack in the whole database,After collecting this data, we checked the data if it has null values (invalid values) or if there was an error during data collection. The value is null if it is unknown. Null values necessitate special treatment. This value is used to indicate that the target isn’t a valid data element. When trying to retrieve data that isn't present, you can come across the keyword null in Processing. If you try to do arithmetic operations on a numeric column with one or more null values, the outcome will be null. An example of a null values processing is shown in Figure 2.The data used in this investigation were scaled between 0 and 1 to guarantee that all inputs and outputs received equal attention and to eliminate their dimensionality. Prior to the use of AI models, data normalization has two major advantages. The first is to avoid overshadowing qualities in smaller numeric ranges by employing attributes in larger numeric ranges. The second goal is to avoid any numerical problems throughout the process.After completion of the normalization process, we split the data set into two parts - training and test sets. In the test, we have utilized1060 for train 259 for testing Using the input and output variables, modeling was implemented.
Facebook
TwitterRank, number of deaths, percentage of deaths, and age-specific mortality rates for the leading causes of death, by age group and sex, 2000 to most recent year.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveSystemic arterial hypertension (HT) is a major modifiable risk factor for cardiovascular disease (CVDs), associated with all-cause death (ACD). Understanding its progression from the early state to late complications should lead to more timely intensification of treatment. This study aimed to construct a real-world cohort profile of HT and to estimate transition probabilities from the uncomplicated state to any of these long-term complications; chronic kidney disease (CKD), coronary artery disease (CAD), stroke, and ACD.MethodsThis real-world cohort study used routine clinical practice data for all adult patients diagnosed with HT in the Ramathibodi Hospital, Thailand from 2010 to 2022. A multi-state model was developed based on the following: state 1-uncomplicated HT, 2-CKD, 3-CAD, 4-stroke, and 5-ACD. Transition probabilities were estimated using Kaplan-Meier method.ResultsA total of 144,149 patients were initially classified as having uncomplicated HT. The transition probabilities (95% CI) from the initial state to CKD, CAD, stroke, and ACD at 10-years were 19.6% (19.3%, 20.0%), 18.2% (17.9%, 18.6%), 7.4% (7.1%, 7.6%), and 1.7% (1.5%, 1.8%), respectively. Once in the intermediate-states of CKD, CAD, and stroke, 10-year transition probabilities to death were 7.5% (6.8%, 8.4%), 9.0% (8.2%, 9.9%), and 10.8% (9.3%, 12.5%).ConclusionsIn this 13-year cohort, CKD was observed as the most common complication, followed by CAD and stroke. Among these, stroke carried the highest risk of ACD, followed by CAD and CKD. These findings provide improved understanding of disease progression to guide appropriate prevention measures. Further investigations of prognostic factors and treatment effectiveness are warranted.
Facebook
TwitterThe dataset contains counts for the Top Five inpatient diagnosis groups based on Major Diagnostic Categories (MDCs) from the Patient Discharge Data (PDD) for each California hospital. Each MDC corresponds to a major organ system (e.g., Respiratory System, Circulatory System, Digestive System) rather than a specific disease (e.g., cancer, sepsis). The MDCs are also generally associated with a particular medical specialty. Therefore, the MDCs can be used to help identify what types of health care specialists are needed at each facility. For instance, a facility with “Circulatory System, Disease and Disorders” as one of their Top Five MDC diagnosis groups is more likely to have a greater need for cardiac specialists. The data will be updated on an annual basis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There is a strong and continuously growing interest in using large electronic healthcare databases to study health outcomes and the effects of pharmaceutical products. However, concerns regarding disease misclassification (i.e. classification errors of the disease status) and its impact on the study results are legitimate. Validation is therefore increasingly recognized as an essential component of database research. In this work, we elucidate the interrelations between the true prevalence of a disease in a database population (i.e. prevalence assuming no disease misclassification), the observed prevalence subject to disease misclassification, and the most common validity indices: sensitivity, specificity, positive and negative predictive value. Based on this, we obtained analytical expressions to derive all the validity indices and true prevalence from the observed prevalence and any combination of two other parameters. The analytical expressions can be used for various purposes. Most notably, they can be used to obtain an estimate of the observed prevalence adjusted for outcome misclassification from any combination of two validity indices and to derive validity indices from each other which would otherwise be difficult to obtain. To allow researchers to easily use the analytical expressions, we additionally developed a user-friendly and freely available web-application.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset created using https://people.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/ Applied appropriate transformations and edits to make it more usable.
"This table below is a knowledge database of disease-symptom associations generated by an automated method based on information in textual discharge summaries of patients at New York Presbyterian Hospital admitted during 2004. The first column shows the disease, the second the number of discharge summaries containing a positive and current mention of the disease, and the associated symptom. Associations for the 150 most frequent diseases based on these notes were computed and the symptoms are shown ranked based on the strength of association. The method used the MedLEE natural language processing system to obtain UMLS codes for diseases and symptoms from the notes; then statistical methods based on frequencies and co-occurrences were used to obtain the associations. A more detailed description of the automated method can be found in Wang X, Chused A, Elhadad N, Friedman C, Markatou M. Automated knowledge acquisition from clinical reports. AMIA Annu Symp Proc. 2008. p. 783-7. PMCID: PMC2656103."