Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
CDC's Division of Population Health provides cross-cutting set of 124 indicators that were developed by consensus and that allows states and territories and large metropolitan areas to uniformly define, collect, and report chronic disease data that are important to public health practice and available for states, territories and large metropolitan areas. In addition to providing access to state-specific indicator data, the CDI web site serves as a gateway to additional information and data resources.
A variety of health-related questions were assessed at various times and places across the US over the past 15 years. Data is provided with confidence intervals and demographic stratification.
Data was compiled by the CDC.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Health care in the United States is provided by many distinct organizations. Health care facilities are largely owned and operated by private sector businesses. 58% of US community hospitals are non-profit, 21% are government owned, and 21% are for-profit. According to the World Health Organization (WHO), the United States spent more on healthcare per capita ($9,403), and more on health care as percentage of its GDP (17.1%), than any other nation in 2014. Many different datasets are needed to portray different aspects of healthcare in US like disease prevalences, pharmaceuticals and drugs, Nutritional data of different food products available in US. Such data is collected by surveys (or otherwise) conducted by Centre of Disease Control and Prevention (CDC), Foods and Drugs Administration, Center of Medicare and Medicaid Services and Agency for Healthcare Research and Quality (AHRQ). These datasets can be used to properly review demographics and diseases, determining start ratings of healthcare providers, different drugs and their compositions as well as package informations for different diseases and for food quality. We often want such information and finding and scraping such data can be a huge hurdle. So, Here an attempt is made to make available all US healthcare data at one place to download from in csv files.
Facebook
TwitterThese data contain case counts and rates for selected communicable diseases—listed in the data dictionary—that met the surveillance case definition for that disease and was reported for California residents, by disease, county, year, and sex. The data represent cases with an estimated illness onset date from 2001 through the last year indicated from California Confidential Morbidity Reports and/or Laboratory Reports. Data captured represent reportable case counts as of the date indicated in the “Temporal Coverage” section below, so the data presented may differ from previous publications due to delays inherent to case reporting, laboratory reporting, and epidemiologic investigation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.
Facebook
TwitterThis dataset contains electronic health records used to study associations between PFAS occurrence and multimorbidity in a random sample of UNC Healthcare system patients. The dataset contains the medical record number to uniquely identify each individual as well as information on PFAS occurrence at the zip code level, the zip code of residence for each individual, chronic disease diagnoses, patient demographics, and neighborhood socioeconomic information from the 2010 US Census. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Because this data has PII from electronic health records the data can only be accessed with an approved IRB application. Project analytic code is available at L:/PRIV/EPHD_CRB/Cavin/CARES/Project Analytic Code/Cavin Ward/PFAS Chronic Disease and Multimorbidity. Format: This data is formatted as a R dataframe and associated comma-delimited flat text file. The data has the medical record number to uniquely identify each individual (which also serves as the primary key for the dataset), as well as information on the occurrence of PFAS contamination at the zip code level, socioeconomic data at the census tract level from the 2010 US Census, demographics, and the presence of chronic disease as well as multimorbidity (the presence of two or more chronic diseases). This dataset is associated with the following publication: Ward-Caviness, C., J. Moyer, A. Weaver, R. Devlin, and D. Diazsanchez. Associations between PFAS occurrence and multimorbidity as observed in an electronic health record cohort. Environmental Epidemiology. Wolters Kluwer, Alphen aan den Rijn, NETHERLANDS, 6(4): p e217, (2022).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This heart disease dataset is acquired from one o f the multispecialty hospitals in India. Over 14 common features which makes it one of the heart disease dataset available so far for research purposes. This dataset consists of 1000 subjects with 12 features. This dataset will be useful for building a early-stage heart disease detection as well as to generate predictive machine learning models.
Facebook
TwitterSUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of coronary heart disease (in persons of all ages). Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to coronary heart disease (in persons of all ages).This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.The percentage of each MSOA’s population (all ages) with coronary heart disease was estimated. This was achieved by calculating a weighted average based on:The percentage of the MSOA area that was covered by each GP practice’s catchment areaOf the GPs that covered part of that MSOA: the percentage of registered patients that have that illness The estimated percentage of each MSOA’s population with coronary heart disease was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with coronary heart disease, within the relevant age range.Each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have coronary heart diseaseB) the NUMBER of people within that MSOA who are estimated to have coronary heart diseaseAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA that are estimated to have coronary heart disease, compared to other MSOAs. In other words, those are areas where it’s estimated a large number of people suffer from coronary heart disease, and where those people make up a large percentage of the population, indicating there is a real issue with coronary heart disease within the population and the investment of resources to address that issue could have the greatest benefits.LIMITATIONS1. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).2. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.3. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of coronary heart disease, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of coronary heart disease.TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:Health and wellbeing statistics (GP-level, England): Missing data and potential outliersLevels of obesity, inactivity and associated illnesses (England): Missing dataDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present two urban road disease datasets: DURDD for road disease detection and CURDD for road disease classification. DURDD includes four main types of underground road diseases: cavity, detachment, water-rich, and looseness. It also contains disease detection datasets in three base formats: COCO, Pascal VOC, and YOLO. In CURDD, the dataset is divided into two levels: level 0 and level 1, corresponding to the "Cls0" and "Cls1" catalogs, respectively. Level 1 includes cavity, detachment, water-rich, looseness, and background. Level 0 categories combine the four main disease types mentioned earlier into a single "diseases" category, with the other category being "background." This dataset was jointly published by Hebei University and the 519 Team of North China Geological Exploration Bureau. We support individuals or teams using the data for research purposes. We also welcome collaboration for commercial use. For commercial inquiries, please contact us for authorization.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
According to the CDC, heart disease is a leading cause of death for people of most races in the U.S. (African Americans, American Indians and Alaska Natives, and whites). About half of all Americans (47%) have at least 1 of 3 major risk factors for heart disease: high blood pressure, high cholesterol, and smoking. Other key indicators include diabetes status, obesity (high BMI), not getting enough physical activity, or drinking too much alcohol. Identifying and preventing the factors that have the greatest impact on heart disease is very important in healthcare. In turn, developments in computing allow the application of machine learning methods to detect "patterns" in the data that can predict a patient's condition.
The dataset originally comes from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to collect data on the health status of U.S. residents. As described by the CDC: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states, the District of Columbia, and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world. The most recent dataset includes data from 2023. In this dataset, I noticed many factors (questions) that directly or indirectly influence heart disease, so I decided to select the most relevant variables from it. I also decided to share with you two versions of the most recent dataset: with NaNs and without it.
As described above, the original dataset of nearly 300 variables was reduced to 40variables. In addition to classical EDA, this dataset can be used to apply a number of machine learning methods, especially classifier models (logistic regression, SVM, random forest, etc.). You should treat the variable "HadHeartAttack" as binary ("Yes" - respondent had heart disease; "No" - respondent did not have heart disease). Note, however, that the classes are unbalanced, so the classic approach of applying a model is not advisable. Fixing the weights/undersampling should yield much better results. Based on the data set, I built a logistic regression model and embedded it in an application that might inspire you: https://share.streamlit.io/kamilpytlak/heart-condition-checker/main/app.py. Can you indicate which variables have a significant effect on the likelihood of heart disease?
Check out this notebook in my GitHub repository: https://github.com/kamilpytlak/data-science-projects/blob/main/heart-disease-prediction/2022/notebooks/data_processing.ipynb
Facebook
TwitterThe tracking system is for patients identified in the Creutzfeldt Jakob Disease (CJD) lookback notification initiative established in January 1995 as part of the lookback notification of all Department of Veterans Affairs (VA) patients who may have received certain lot numbers of blood derivatives or blood components produced from donors with CJD. Even though the Centers of Disease Control and Prevention characterized the risk of transmission of CJD from blood derivative products as 'small and immeasurable' and 'theoretical', VA believed it had an ethical obligation to inform patients of the exposure to potentially contaminated blood components or plasma derivative products while under VA's care. The patients were notified. The Veterans Health Administration (VHA) established a tracking system for individuals who received these products to determine if there was an increase in VA CJD cases. Every two years, the VHA National Infectious Diseases Service updates the status of patients who had previously been identified through the VA CJD lookback notification initiative. The Creutzfeldt-Jakob Disease Lookback Dataset (CJDLD) is a prospective collection of data; requests for individual reports are not accepted.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an update of a prior dataset publication containing baseline and 5-year follow-up data from the PERU MIGRANT Study (PEru's Rural to Urban MIGRANTs Study).The PERU MIGRANT Study was designed to investigate the magnitude of differences between rural-to-urban migrant and non-migrant groups in specific cardiovascular risk factors. Three groups were selected: i) Rural, people who have always have lived in a rural environment; ii) Rural-urban, people who migrated from rural to urban areas; and, iii) Urban, people who have always lived in a urban environment.PERU MIGRANT Study protocol, instruments and variables are described in full in:Miranda JJ, Gilman RH, García HH, Smeeth L. The effect on cardiovascular risk factors of migration from rural to urban areas in Peru: PERU MIGRANT Study. BMC Cardiovasc Disord 2009;9:23. PERU MIGRANT Study baseline dataset is available at:https://figshare.com/articles/PERU_MIGRANT_Study_Baseline_dataset/3125005Main findings of the baseline study:Miranda JJ, Gilman RH, Smeeth L. Differences in cardiovascular risk factors in rural, urban and rural-to-urban migrants in Peru. Heart 2011;97(10):787-96. Main findings of the 5-yr follow-up study: Carrillo-Larco RM, Bernabé-Ortiz A, Pillay TD, Gilman RH, Sanchez JF, Poterico JA, Quispe R, Smeeth L, Miranda JJ. Obesity risk in rural, urban and rural-to-urban migrants: prospective results of the PERU MIGRANT study. Int J Obes (Lond) 2016;40(1):181-5. Bernabe-Ortiz A, Sanchez JF, Carrillo-Larco RM, Gilman RH, Poterico JA, Quispe R, Smeeth L, Miranda JJ. Rural-to-urban migration and risk of hypertension: longitudinal results of the PERU MIGRANT study. J Hum Hypertens 2017;31(1):22-28. Lazo-Porras M, Bernabe-Ortiz A, Málaga G, Gilman RH, Acuña-Villaorduña A, Cardenas-Montero D, Smeeth L, Miranda JJ. Low HDL cholesterol as a cardiovascular risk factor in rural, urban, and rural-urban migrants: PERU MIGRANT cohort study. Atherosclerosis 2016;246:36-43.Burroughs Pena MS, Bernabé-Ortiz A, Carrillo-Larco RM, Sánchez JF, Quispe R, Pillay TD, Málaga G, Gilman RH, Smeeth L, Miranda JJ. Migration, urbanisation and mortality: 5-year longitudinal analysis of the PERU MIGRANT study. J Epidemiol Community Health 2015;69(7):715-8.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2
Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.
Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)
The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Zoonotic Disease DatasetZoonotic diseases are infections that spread between people and animals. This dataset contains information to investigate the correlation between climate variables (temperature, precipitation) and zoonotic disease in different countries across different years. The data is clean and does not have missing values.Dataset Variables:Country: region from where data was collectedYear: year when data was collectedTemperature: collected in degrees Celsius Precipitation: collected in millimeters (mm)Zoonotic Cases: number of zoonotic infections Population Density: number of people per kilometer square of countryUrbanization Rate: percentage of country's population living in urban areas
Facebook
TwitterMortality Rates for Lake County, Illinois. Explanation of field attributes: Average Age of Death – The average age at which a people in the given zip code die. Cancer Deaths – Cancer deaths refers to individuals who have died of cancer as the underlying cause. This is a rate per 100,000. Heart Disease Related Deaths – Heart Disease Related Deaths refers to individuals who have died of heart disease as the underlying cause. This is a rate per 100,000. COPD Related Deaths – COPD Related Deaths refers to individuals who have died of chronic obstructive pulmonary disease (COPD) as the underlying cause. This is a rate per 100,000.
Facebook
TwitterDeath rate has been age-adjusted to the 2000 U.S. standard population. Single-year data are only available for Los Angeles County overall, Service Planning Areas, Supervisorial Districts, City of Los Angeles overall, and City of Los Angeles Council Districts.Coronary heart disease is a type of heart disease in which the arteries of the heart cannot deliver enough oxygen-rich blood to the heart muscles. Over time, this can weaken the heart muscle and may lead to heart attack or heart failure. It is the most common type of heart disease in the US and has been the leading cause of death in Los Angeles County for the last two decades. Poor diet, sedentary lifestyle, tobacco exposure, and chronic stress are all important risk factors for coronary heart disease. Cities and communities can mitigate these risks by improving local food environments and encouraging physical activity by making communities safer and more walkable.For more information about the Community Health Profiles Data Initiative, please see the initiative homepage.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview: This dataset comprises information collected from 450 females and is utilized for research in the field of Data Security and Privacy in 3D Modeling for Healthcare. The dataset is focused on the prevalence and analysis of sexually transmitted diseases (STDs) in females. It provides valuable insights into the demographics, behaviors, and health status of the study participants.
Data Fields: The dataset includes the following key fields for each of the 450 females:
S.no: A unique identifier assigned to each participant. Age: The age of the female participant, ranging from young adults to middle-aged individuals. Intimate Partners: The number of intimate partners the female has had, indicating their level of sexual activity. Protection Usage: A binary variable (0: Never, 1: Sometimes, 2: Always) representing the usage of protection during sexual activity. Symptoms: A binary variable (0: No symptoms, 1: Symptoms) indicating the presence or absence of symptoms related to STDs. Location: The location of the participant, categorized into general city/district areas, which can provide geographical context. Education: A binary variable (0: Low education, 1: High education) representing the education level of the participant. STD Testing History: A binary variable (0: No, 1: Yes) indicating whether the participant has a history of undergoing anonymous STD testing. STD Status: A binary variable (0: Uninfected, 1: Infected) reflecting the STD status of the female participants. Usage: This dataset serves as a valuable resource for researchers in the fields of healthcare, data security, and 3D modeling. Researchers can leverage this dataset to explore the relationship between demographic factors, behaviors, and STD prevalence among females. It is particularly relevant for studies that aim to enhance data security and privacy while utilizing 3D modeling techniques for healthcare applications.
Data Privacy and Ethics: The collection of this dataset adheres to ethical and privacy considerations, with a focus on ensuring the anonymity and confidentiality of the study participants. Personal identifiers have been removed to protect the privacy of the individuals.
Citation: If you intend to use this dataset in your research, please consider citing the source and acknowledging the data collection process. Proper citation helps maintain transparency and credit the researchers and institutions involved in data collection.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worlwide. Heart failure is a common event caused by CVDs and this dataset contains 12 features that can be used to predict mortality by heart failure. Most cardiovascular diseases can be prevented by addressing behavioural risk factors such as tobacco use, unhealthy diet and obesity, physical inactivity and harmful use of alcohol using population-wide strategies. People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help.
Facebook
Twitterhttps://www.usa.gov/government-workshttps://www.usa.gov/government-works
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.
Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.
This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.
The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.
For more information:
NNDSS Supports the COVID-19 Response | CDC.
The deidentified data in the “COVID-19 Case Surveillance Public Use Data” include demographic characteristics, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and presence of any underlying medical conditions and risk behaviors. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
COVID-19 case reports have been routinely submitted using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.
All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for laboratory-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.
To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.
CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<5) records and indirect identifiers (e.g., date of first positive specimen). Suppression includes rare combinations of demographic characteristics (sex, age group, race/ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.
For questions, please contact Ask SRRG (eocevent394@cdc.gov).
COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These
Facebook
TwitterOrganized by zipcode: Rates of Alzheimer's disease Percent of landcover types Modelled PM2.5 Socioeconomic variables. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Lucas Neas (CPHEA/PHESD/EB) is the owner of the copy of this dataset that was used. Format: Medicare database. This dataset is associated with the following publication: Wu, J., and L. Jackson. Greenspace inversely associated with the risk of Alzheimer’s disease in the mid-Atlantic United States. Earth. MDPI AG, Basel, SWITZERLAND, 2(1): 140-150, (2021).
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
CDC's Division of Population Health provides a cross-cutting set of 115 indicators developed by consensus among CDC, the Council of State and Territorial Epidemiologists, and the National Association of Chronic Disease Directors. These indicators allow states and territories to uniformly define, collect, and report chronic disease data that are important to public health practice in their area.
This dataset is extremely useful for public health data science as it enables the study of prevalence and distribution of chronic diseases across different demographics and geographical areas. Analysts can assess health outcomes, identify risk factors, and measure the impact of public health interventions.
Some analysis that can be performed include:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
CDC's Division of Population Health provides cross-cutting set of 124 indicators that were developed by consensus and that allows states and territories and large metropolitan areas to uniformly define, collect, and report chronic disease data that are important to public health practice and available for states, territories and large metropolitan areas. In addition to providing access to state-specific indicator data, the CDI web site serves as a gateway to additional information and data resources.
A variety of health-related questions were assessed at various times and places across the US over the past 15 years. Data is provided with confidence intervals and demographic stratification.
Data was compiled by the CDC.