100+ datasets found

Pandemics in World
kaggle.com
zip
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamadreza Momeni (2024). Pandemics in World [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/pandemics-in-world
Explore at:
zip(1428134 bytes)Available download formats
Dataset updated
Jan 9, 2024
Authors
Mohamadreza Momeni
Area covered
World
Description
By Saloni Dattani, Lucas Rodés-Guirao, Edouard Mathieu, Hannah Ritchie and Max Roser.

Data description:

Disease outbreaks may be inevitable, but large-scale pandemics are not. The world can respond swiftly and effectively to pandemic risks in the future with better understanding, resources, and effort.

To avoid suffering through another large pandemic, we have to take the risk of pandemics seriously. Despite warnings that another one was likely, the COVID-19 pandemic killed more than 27 million people.1

We must build the capacity to test for pathogens and understand them: which pathogens put us at the greatest risk, how they spread, and how to tackle them.

We know it is possible to greatly reduce the risk of infectious disease. We’ve learned over history how to reduce their impact with vaccines, public health efforts, and medicine.

In addition to the old risks, we face new threats from factory farming, genetic modification, climate change, and antimicrobial resistance. With more attention and effort, we can reduce their risks too.

Good luck in your analysis.
H
Global Health Observatory (GHO)
data.niaid.nih.gov
dataverse.harvard.edu
Updated May 5, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2011). Global Health Observatory (GHO) [Dataset]. http://doi.org/10.7910/DVN/JILCZW
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/JILCZW
Dataset updated
May 5, 2011
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Users can find data on a range of global health topics like mortality, the burden of disease, infectious diseases, risk factors and health expenditures. Background The Global Health Observatory (GHO) database is the World Health Organization's main health statistics repository. Data is available for 193 World Health Organization member states on topics including but not limited to: Health related millennium goals, mortality, immunization, nutrition, infectious disease, non- communicable disease, tobacco control, violence, injuries, alcohol, HIV/AIDS, tuberculosis, malaria, water and sanitation, maternal and reproductive health, cho lera, child health, child nutrition, and road safety. User FunctionalityUsers can generate tables and charts according to country or region, health indicator, and time period. Data can also be compared across countries. Data can be filtered, tabulated, charted, and downloaded into Excel statistical software. These data are also published in statistical reports covering topics including: Alcohol and health, Child health, Cholera, HIV/AIDS, Malaria, Maternal and reproductive heal th, Non-communicable diseases, Public health and environment, Road safety, Tuberculosis, Tobacco control. Data Notes Data are derived from surveillance and household surveys. Years in which data were collected is indicated with these health statistics. Information is available for each WHO member country and international region. The most recent data is available from 2009.
Data from: A global dataset of pandemic- and epidemic-prone disease...
figshare.com
7z
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Armando Torres Munguía (2025). A global dataset of pandemic- and epidemic-prone disease outbreaks [Dataset]. http://doi.org/10.6084/m9.figshare.17207183.v6
Explore at:
7zAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17207183.v6
Dataset updated
Oct 8, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Juan Armando Torres Munguía
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IMPORTANT NOTE #####From October 2024, this project is being updated by Dr. Juan Armando Torres Munguía. In case of questions, requests, or collaborations, you can contact me via GitHub or here. Updated data can be found in data-monthly-updated-1996-2025.zip. You can also access the updated data here: https://github.com/jatorresmunguia/disease_outbreak_news.
California Infectious Disease Cases
kaggle.com
zip
Updated Jan 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). California Infectious Disease Cases [Dataset]. https://www.kaggle.com/datasets/thedevastator/california-infectious-disease-cases
Explore at:
zip(2093378 bytes)Available download formats
Dataset updated
Jan 24, 2023
Authors
The Devastator
Area covered
California
Description
California Infectious Disease Cases

Rates and Counts By County, Disease, Sex, and Year (2001-2014)

By Health [source]

About this dataset

This dataset provides comprehensive information on the number and rate of infectious diseases in California. Focusing on counties, sexes, and various diseases between 2001-2014, it offers powerful insights into the health status of its citizens. Its data also reveals trends in the spread of common illnesses in this state. Whether you are an epidemiologist looking to inform public health policy or a researcher seeking to investigate particular illnesses within certain populations, this dataset contains all the necessary information to answer your questions. Explore it today and discover hidden stories waiting to be uncovered!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains counts and rates of infectious diseases in California by county, disease, sex, and year. This dataset can be used to generate trends to understand the changes in incidence of different types of diseases over time and across counties or between sexes.

To use this dataset: - Select the columns you are interested in exploring - these could include Disease, County, Sex or Year. - Filter out the rows that do not relate to your question - for example filtering by a specific county or disease. - Examine the average rate per 100000 people for each group you selected as well as its lower and upper confidence intervals (CI). - Use Rate as your dependent variable for analysis; Population is likely also important determining factors. Make sure to check if any Rates have 'unstable' flags.
- Visualise or statistically analyse your data using suitable methods such as descriptive statistics (means/medians/mode etc.)for comparison between 2+ groups or correlation/regression based models when comparing one variable to another over time etc.

Research Ideas

Analyzing the geographic spread of infectious diseases over time to identify areas in need of increased education, resources, and care.

Comparing rates of disease by sex to identify and understand any gender-based differences in infectious disease cases.

Using the Unstable column to determine whether a particular county or region needs further study of a certain type of infectious disease due to unusual spikes or drops in rate or count during a specific year

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: Infectious_Disease_Cases_by_County_Year_and_Sex_2001-2014.csv | Column name | Description | |:---------------|:---------------------------------------------------------------------------------------------------------------| | Disease | The type of infectious disease reported. (String) | | County | The county in California where the cases were reported. (String) | | Year | The year in which the cases were reported. (Integer) | | Sex | The gender of the individuals who contracted the disease. (String) | | Population | The population size of the county in which the cases were reported. (Integer) | | Rate | The rate of infection per 100 thousand people living in the county. (Float) | | CI.lower | The lower confidence interval associated with the rate of infection. (Float) | | CI.upper | The upper confidence interval associated with the rate of infection. (Float) ...
m
Global Burden of Disease analysis dataset of noncommunicable disease...
data.mendeley.com
Updated Apr 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Cundiff (2023). Global Burden of Disease analysis dataset of noncommunicable disease outcomes, risk factors, and SAS codes [Dataset]. http://doi.org/10.17632/g6b39zxck4.10
Explore at:
Unique identifier
https://doi.org/10.17632/g6b39zxck4.10
Dataset updated
Apr 6, 2023
Authors
David Cundiff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This formatted dataset (AnalysisDatabaseGBD) originates from raw data files from the Institute of Health Metrics and Evaluation (IHME) Global Burden of Disease Study (GBD2017) affiliated with the University of Washington. We are volunteer collaborators with IHME and not employed by IHME or the University of Washington.

The population weighted GBD2017 data are on male and female cohorts ages 15-69 years including noncommunicable diseases (NCDs), body mass index (BMI), cardiovascular disease (CVD), and other health outcomes and associated dietary, metabolic, and other risk factors. The purpose of creating this population-weighted, formatted database is to explore the univariate and multiple regression correlations of health outcomes with risk factors. Our research hypothesis is that we can successfully model NCDs, BMI, CVD, and other health outcomes with their attributable risks.

These Global Burden of disease data relate to the preprint: The EAT-Lancet Commission Planetary Health Diet compared with Institute of Health Metrics and Evaluation Global Burden of Disease Ecological Data Analysis. The data include the following: 1. Analysis database of population weighted GBD2017 data that includes over 40 health risk factors, noncommunicable disease deaths/100k/year of male and female cohorts ages 15-69 years from 195 countries (the primary outcome variable that includes over 100 types of noncommunicable diseases) and over 20 individual noncommunicable diseases (e.g., ischemic heart disease, colon cancer, etc). 2. A text file to import the analysis database into SAS 3. The SAS code to format the analysis database to be used for analytics 4. SAS code for deriving Tables 1, 2, 3 and Supplementary Tables 5 and 6 5. SAS code for deriving the multiple regression formula in Table 4. 6. SAS code for deriving the multiple regression formula in Table 5 7. SAS code for deriving the multiple regression formula in Supplementary Table 7
8. SAS code for deriving the multiple regression formula in Supplementary Table 8 9. The Excel files that accompanied the above SAS code to produce the tables

For questions, please email davidkcundiff@gmail.com. Thanks.
World Health Organization Estimates of the Global and Regional Disease...
plos.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martyn D. Kirk; Sara M. Pires; Robert E. Black; Marisa Caipo; John A. Crump; Brecht Devleesschauwer; Dörte Döpfer; Aamir Fazil; Christa L. Fischer-Walker; Tine Hald; Aron J. Hall; Karen H. Keddy; Robin J. Lake; Claudio F. Lanata; Paul R. Torgerson; Arie H. Havelaar; Frederick J. Angulo (2023). World Health Organization Estimates of the Global and Regional Disease Burden of 22 Foodborne Bacterial, Protozoal, and Viral Diseases, 2010: A Data Synthesis [Dataset]. http://doi.org/10.1371/journal.pmed.1001921
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pmed.1001921
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Martyn D. Kirk; Sara M. Pires; Robert E. Black; Marisa Caipo; John A. Crump; Brecht Devleesschauwer; Dörte Döpfer; Aamir Fazil; Christa L. Fischer-Walker; Tine Hald; Aron J. Hall; Karen H. Keddy; Robin J. Lake; Claudio F. Lanata; Paul R. Torgerson; Arie H. Havelaar; Frederick J. Angulo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundFoodborne diseases are important worldwide, resulting in considerable morbidity and mortality. To our knowledge, we present the first global and regional estimates of the disease burden of the most important foodborne bacterial, protozoal, and viral diseases.Methods and FindingsWe synthesized data on the number of foodborne illnesses, sequelae, deaths, and Disability Adjusted Life Years (DALYs), for all diseases with sufficient data to support global and regional estimates, by age and region. The data sources included varied by pathogen and included systematic reviews, cohort studies, surveillance studies and other burden of disease assessments. We sought relevant data circa 2010, and included sources from 1990–2012. The number of studies per pathogen ranged from as few as 5 studies for bacterial intoxications through to 494 studies for diarrheal pathogens. To estimate mortality for Mycobacterium bovis infections and morbidity and mortality for invasive non-typhoidal Salmonella enterica infections, we excluded cases attributed to HIV infection. We excluded stillbirths in our estimates. We estimate that the 22 diseases included in our study resulted in two billion (95% uncertainty interval [UI] 1.5–2.9 billion) cases, over one million (95% UI 0.89–1.4 million) deaths, and 78.7 million (95% UI 65.0–97.7 million) DALYs in 2010. To estimate the burden due to contaminated food, we then applied proportions of infections that were estimated to be foodborne from a global expert elicitation. Waterborne transmission of disease was not included. We estimate that 29% (95% UI 23–36%) of cases caused by diseases in our study, or 582 million (95% UI 401–922 million), were transmitted by contaminated food, resulting in 25.2 million (95% UI 17.5–37.0 million) DALYs. Norovirus was the leading cause of foodborne illness causing 125 million (95% UI 70–251 million) cases, while Campylobacter spp. caused 96 million (95% UI 52–177 million) foodborne illnesses. Of all foodborne diseases, diarrheal and invasive infections due to non-typhoidal S. enterica infections resulted in the highest burden, causing 4.07 million (95% UI 2.49–6.27 million) DALYs. Regionally, DALYs per 100,000 population were highest in the African region followed by the South East Asian region. Considerable burden of foodborne disease is borne by children less than five years of age. Major limitations of our study include data gaps, particularly in middle- and high-mortality countries, and uncertainty around the proportion of diseases that were foodborne.ConclusionsFoodborne diseases result in a large disease burden, particularly in children. Although it is known that diarrheal diseases are a major burden in children, we have demonstrated for the first time the importance of contaminated food as a cause. There is a need to focus food safety interventions on preventing foodborne diseases, particularly in low- and middle-income settings.
m
Disease and symptoms dataset 2023
data.mendeley.com
Updated Mar 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bran Stark (2025). Disease and symptoms dataset 2023 [Dataset]. http://doi.org/10.17632/2cxccsxydc.1
Explore at:
Unique identifier
https://doi.org/10.17632/2cxccsxydc.1
Dataset updated
Mar 3, 2025
Authors
Bran Stark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.
p
Counts of Meningococcal infectious disease reported in UNITED STATES OF...
tycho.pitt.edu
data.niaid.nih.gov
Updated Apr 1, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Meningococcal infectious disease reported in UNITED STATES OF AMERICA: 1951-2010 [Dataset]. https://www.tycho.pitt.edu/dataset/US.23511006
Explore at:
Dataset updated
Apr 1, 2018
Dataset provided by
Project Tycho, University of Pittsburgh
Authors
Willem G Van Panhuis; Anne L Cross; Donald S Burke
Time period covered
1951 - 2010
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
m
Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...
data.mendeley.com
Updated Jul 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak [Dataset]. http://doi.org/10.17632/xmcg82mx9k.3
Explore at:
Unique identifier
https://doi.org/10.17632/xmcg82mx9k.3
Dataset updated
Jul 25, 2022
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2

Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)

The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.
p
Counts of Anthrax reported in UNITED STATES OF AMERICA: 1942-1945
tycho.pitt.edu
data.niaid.nih.gov
Updated Apr 1, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Anthrax reported in UNITED STATES OF AMERICA: 1942-1945 [Dataset]. https://www.tycho.pitt.edu/dataset/US.409498004
Explore at:
Dataset updated
Apr 1, 2018
Dataset provided by
Project Tycho, University of Pittsburgh
Authors
Willem G Van Panhuis; Anne L Cross; Donald S Burke
Time period covered
1942 - 1945
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

BRFSS 2020 Heart Disease Dataset(Cleaned Version)

zenodo.org

csv

Updated May 4, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Koushal Kumar; BP Pande; Koushal Kumar; BP Pande (2025). BRFSS 2020 Heart Disease Dataset(Cleaned Version) [Dataset]. http://doi.org/10.5281/zenodo.15336526

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15336526

Dataset updated

May 4, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Koushal Kumar; BP Pande; Koushal Kumar; BP Pande

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Originally, the dataset come from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to gather data on the health status of U.S. residents. As the CDC describes: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.". The most recent dataset (as of February 15, 2022) includes data from 2020. It consists of 401,958 rows and 279 columns. The vast majority of columns are questions asked to respondents about their health status, such as "Do you have serious difficulty walking or climbing stairs?" or "Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]".

To improve the efficiency and relevance of our analysis, we removed certain attributes from the original BRFSS dataset. Many of the 279 original attributes included administrative codes, metadata, or survey-specific variables that do not contribute meaningfully to heart disease prediction—such as respondent IDs, timestamps, state-level identifiers, and detailed lifestyle questions unrelated to cardiovascular health. By focusing on a carefully selected subset of 18 attributes directly linked to medical, behavioral, and demographic factors known to influence heart health, we streamlined the dataset. This not only reduced computational complexity but also improved model interpretability and performance by eliminating noise and irrelevant information. All predicting variables could be divided into 4 broad categories:

Demographic factors: sex, age category (14 levels), race, BMI (Body Mass Index)
Diseases: weather respondent ever had such diseases as asthma, skin cancer, diabetes, stroke or kidney disease (not including kidney stones, bladder infection or incontinence)
Unhealthy habits:
- Smoking - respondents that smoked at least 100 cigarettes in their entire life (5 packs = 100 cigarettes)
- Alcohol Drinking - heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week
General Health:
- Difficulty Walking - weather respondent have serious difficulty walking or climbing stairs
- Physical Activity - adults who reported doing physical activity or exercise during the past 30 days other than their regular job
- Sleep Time - respondent’s reported average hours of sleep in a 24-hour period
- Physical Health - number of days being physically ill or injured (0-30 days)
- Mental Health - number of days having bad mental health (0-30 days)
- General Health - respondents declared their health as ’Excellent’, ’Very good’, ’Good’ ,’Fair’ or ’Poor’

Below is a description of the features collected for each patient:

#	Feature	Coded Variable Name	Description
1	HeartDisease	CVDINFR4	Respondents that have ever reported having coronary heart disease (CHD) or myocardial infarction (MI)
2	BMI	_BMI5CAT	Body Mass Index (BMI)
3	Smoking	_SMOKER3	Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]
4	AlcoholDrinking	_RFDRHV7	Heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week
5	Stroke	CVDSTRK3	(Ever told) (you had) a stroke?
6	PhysicalHealth	PHYSHLTH	Now thinking about your physical health, which includes physical illness and injury, for how many days during the past 30
7	MentalHealth	MENTHLTH	Thinking about your mental health, for how many days during the past 30 days was your mental health not good?
8	DiffWalking	DIFFWALK	Do you have serious difficulty walking or climbing stairs?
9	Sex	SEXVAR	Are you male or female?
10	AgeCategory	_AGE_G,	Fourteen-level age category
11	Race	_IMPRACE	Imputed race/ethnicity value
12	Diabetic	DIABETE4	(Ever told) (you had) diabetes?
13	PhysicalActivity	EXERANY2	Adults who reported doing physical activity or exercise during the past 30 days other than their regular job
14	GenHealth	GENHLTH	Would you say that in general your health is...
15	SleepTime	SLEPTIM1	On average, how many hours of sleep do you get in a 24-hour period?
16	Asthma	CHASTHMA	(Ever told) (you had) asthma?
17	KidneyDisease	CHCKDNY2	Not including kidney stones, bladder infection or incontinence, were you ever told you had kidney disease?
18	SkinCancer	CHCSCNCR	(Ever told) (you had) skin cancer?

t
FAIR Dataset for Disease Prediction in Healthcare Applications
test.researchdata.tuwien.ac.at
bin, csv, json, png
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf (2025). FAIR Dataset for Disease Prediction in Healthcare Applications [Dataset]. http://doi.org/10.70124/5n77a-dnf02
Explore at:
csv, json, bin, pngAvailable download formats
Unique identifier
https://doi.org/10.70124/5n77a-dnf02
Dataset updated
Apr 14, 2025
Dataset provided by
TU Wien
Authors
Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Description

Context and Methodology

Research Domain/Project:
This dataset was created for a machine learning experiment aimed at developing a classification model to predict outcomes based on a set of features. The primary research domain is disease prediction in patients. The dataset was used in the context of training, validating, and testing.

Purpose of the Dataset:
The purpose of this dataset is to provide training, validation, and testing data for the development of machine learning models. It includes labeled examples that help train classifiers to recognize patterns in the data and make predictions.

Dataset Creation:
Data preprocessing steps involved cleaning, normalization, and splitting the data into training, validation, and test sets. The data was carefully curated to ensure its quality and relevance to the problem at hand. For any missing values or outliers, appropriate handling techniques were applied (e.g., imputation, removal, etc.).

Technical Details

Structure of the Dataset:
The dataset consists of several files organized into folders by data type:

Training Data: Contains the training dataset used to train the machine learning model.

Validation Data: Used for hyperparameter tuning and model selection.

Test Data: Reserved for final model evaluation.

Each folder contains files with consistent naming conventions for easy navigation, such as train_data.csv, validation_data.csv, and test_data.csv. Each file follows a tabular format with columns representing features and rows representing individual data points.

Software Requirements:
To open and work with this dataset, you need VS Code or Jupyter, which could include tools like:

Python (with libraries such as pandas, numpy, scikit-learn, matplotlib, etc.)

Further Details

Reusability:
Users of this dataset should be aware that it is designed for machine learning experiments involving classification tasks. The dataset is already split into training, validation, and test subsets. Any model trained with this dataset should be evaluated using the test set to ensure proper validation.

Limitations:
The dataset may not cover all edge cases, and it might have biases depending on the selection of data sources. It's important to consider these limitations when generalizing model results to real-world applications.
H
Data from: Global Health Atlas
data.niaid.nih.gov
dataverse.harvard.edu
Updated May 5, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2011). Global Health Atlas [Dataset]. http://doi.org/10.7910/DVN/GJKWGR
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/GJKWGR
Dataset updated
May 5, 2011
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Users can view statistics and generate cross-country comparisons pertaining to infectious diseases and health indicators in 193 WHO member states. Background The Global Health Atlas is a database maintained by the World Health Organization (WHO) that provides information regarding infectious diseases in WHO member states. Health conditions include: malaria, HIV/AIDS, cholera, STIs, meningitis, and polio, among others. User Functionality Users can generate statistics regarding infectious diseases and health systems indicators by country or region, or generate cross-country comparisons. In addition, users can v iew maps showing the distribution of various health indicators and diseases by geographic region or individual country. Data Notes Statistics are available for all WHO member states. Data are available from 1949 to 2009.
m
Global Burden of Disease analysis dataset of cardiovascular disease...
data.mendeley.com
narcis.nl
Updated Jun 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Cundiff (2021). Global Burden of Disease analysis dataset of cardiovascular disease outcomes, risk factors, and SAS codes [Dataset]. http://doi.org/10.17632/g6b39zxck4.4
Explore at:
Unique identifier
https://doi.org/10.17632/g6b39zxck4.4
Dataset updated
Jun 23, 2021
Authors
David Cundiff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This formatted dataset originates from raw data files from the Institute of Health Metrics and Evaluation Global Burden of Disease (GBD2017). It is population weighted worldwide data on male and female cohorts ages 15-69 years including cardiovascular disease early death and associated dietary, metabolic and other risk factors. The purpose of creating this formatted database is to explore the univariate and multiple regression correlations of cardiovascular early deaths and other health outcomes with risk factors. Our research hypothesis is that we can successfully apply artificial intelligence to model cardiovascular disease outcomes with risk factors. We found that fat-soluble vitamin containing foods (animal products) and added fats are negatively correlated with CVD early deaths worldwide but positively correlated with CVD early deaths in high fat-soluble vitamin cohorts. We interpret this as showing that optimal cardiovascular outcomes come with moderate (not low and not high) intakes of animal foods and added fats. You are invited to download the dataset, the associated SAS code to access the dataset, and the tables that have resulted from the analysis. Please comment on the article by indicating what you found by exploring the dataset with the provided SAS codes. Please say whether or not you found the outputs from the SAS codes accurately reflected the tables provided and the tables in the published article. If you use our data to reproduce our findings and comment on your findings on the MedRxIV website (https://www.medrxiv.org/content/10.1101/2021.04.17.21255675v4) and would like to be recognized, we will be happy to list you as a contributor when the article is summited to JAMA. For questions, please email davidkcundiff@gmail.com. Thanks.
Deaths related to infectious diseases
ec.europa.eu
Updated Oct 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eurostat (2025). Deaths related to infectious diseases [Dataset]. http://doi.org/10.2908/HLTH_CD_IDO
Explore at:
json, tsv, application/vnd.sdmx.data+csv;version=2.0.0, application/vnd.sdmx.data+xml;version=3.0.0, application/vnd.sdmx.genericdata+xml;version=2.1, application/vnd.sdmx.data+csv;version=1.0.0Available download formats
Unique identifier
https://doi.org/10.2908/HLTH_CD_IDO
Dataset updated
Oct 10, 2025
Dataset authored and provided by
Eurostathttps://ec.europa.eu/eurostat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2011 - 2024
Area covered
Germany, Finland, Italy, Czechia, Estonia, Serbia, Moldova, Ireland, Romania, Norway
Description
Data on causes of death (COD) provide information on mortality patterns and form a major element of public health information.

The COD data refer to the underlying cause which - according to the World Health Organisation (WHO) - is "the disease or injury which initiated the train of morbid events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury".

The data are derived from the medical certificate of death, which is obligatory in the Member States. The information recorded in the death certificate is according to the rules specified by the WHO.

Data published in Eurostat's dissemination database are broken down by sex, 5-year age groups, cause of death and by residency and country of occurrence. For stillbirths and neonatal deaths additional breakdowns might include age of mother and parity.

Data are available for Member States, Iceland, Norway, Liechtenstein, Switzerland, United Kingdom, Serbia, Turkey, North Macedonia and Albania. Regional data (NUTS level 2) are available for all of the countries having NUTS2 regions except Albania.

Annual national data are available in Eurostat's dissemination database in absolute number, crude death rates and standardised death rates. At regional level the same is provided in form of 3-years averages (the average of year, year -1 and year -2). Annual crude and standardised death rates are also available at NUTS2 level. Monthly national data are available for 21 EU Member States from reference year 2019 and in 24 Member States from reference year 2022 in absolute numbers and standardised death rates.
d
Johns Hopkins COVID-19 Case Tracker
data.world
kaggle.com
csv, zip
Updated Dec 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
Explore at:
zip, csvAvailable download formats
Dataset updated
Dec 3, 2025
Authors
The Associated Press
Time period covered
Jan 22, 2020 - Mar 9, 2023
Area covered
Description
Updates

Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

CDC Weekly case and death counts (national and state level)

CDC County level cases and deaths

HHS New hospital admissions

CDC NowCast COVID variant proportions (national and regional level)

April 9, 2020

The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.

April 20, 2020

Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.

April 29, 2020

The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.

September 1st, 2020

Johns Hopkins is now providing counts for the five New York City counties individually.

February 12, 2021

The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."

Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.

February 16, 2021

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

The AP is updating this dataset hourly at 45 minutes past the hour.

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

Queries

Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

Filter cases by state here

Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac

Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true

Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.

Pull the 100 counties with the highest per-capita confirmed cases here

Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.

Interactive

The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

@(https://datawrapper.dwcdn.net/nRyaf/15/)

Interactive Embed Code

<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>

Caveats

This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.

In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.

In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"

This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.

Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.

The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

Attribution

This data should be credited to Johns Hopkins University COVID-19 tracking project
p
Counts of Disease caused by West Nile virus reported in UNITED STATES OF...
tycho.pitt.edu
data.niaid.nih.gov
Updated Apr 1, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Disease caused by West Nile virus reported in UNITED STATES OF AMERICA: 2002-2005 [Dataset]. https://www.tycho.pitt.edu/dataset/US.417093003
Explore at:
Dataset updated
Apr 1, 2018
Dataset provided by
Project Tycho, University of Pittsburgh
Authors
Willem G Van Panhuis; Anne L Cross; Donald S Burke
Time period covered
2002 - 2005
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
m
COVID-19 Combined Data-set with Improved Measurement Errors
data.mendeley.com
Updated May 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afshin Ashofteh (2020). COVID-19 Combined Data-set with Improved Measurement Errors [Dataset]. http://doi.org/10.17632/nw5m4hs3jr.3
Explore at:
Unique identifier
https://doi.org/10.17632/nw5m4hs3jr.3
Dataset updated
May 13, 2020
Authors
Afshin Ashofteh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.
m
Data from: COVID-19 Datasets for predicting the number of new cases of...
data.mendeley.com
narcis.nl
Updated Jul 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pınar Tüfekci (2020). COVID-19 Datasets for predicting the number of new cases of COVID-19 ahead of 1 day, 3 days, and 10 days [Dataset]. http://doi.org/10.17632/499vtcykvw.1
Explore at:
Unique identifier
https://doi.org/10.17632/499vtcykvw.1
Dataset updated
Jul 28, 2020
Authors
Pınar Tüfekci
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Four datasets are presented here. The original dataset is a collection of the COVID-19 data maintained by Our World in Data. It includes data on confirmed cases, and deaths, as well as other variables of potential interest for ten countries such as Australia, Brazil, Canada, China, Denmark, France, Israel, Italy, the United Kingdom, and the United States. The original dataset includes the data from the date of 31st December in 2019 to 31st May in 2020 with a total of 1.530 instances and 19 features. This dataset is collected from a variety of sources (the European Centre for Disease Prevention and Control, United Nations, World Bank, Global Burden of Disease, Blavatnik School of Government, etc.). After the original dataset is pre-processed by cleaning and removing some data including unnecessary and blank. Then, all strings are converted numeric values, and some new features such as continent, hemisphere, year, month, and day are added by extracting the original features. After that, the processed original dataset is organized for prediction of the number of new cases of COVID-19 for 1 day, 3 days, and 10 days ago and three datasets (Dataset-1, 2, 3) are created for that.
d
Data from: Non-listed disease report to OIE (World Organisation for Animal...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Non-listed disease report to OIE (World Organisation for Animal Health) for the 1st semester of 2019 [Dataset]. https://catalog.data.gov/dataset/non-listed-disease-report-to-oie-world-organisation-for-animal-health-for-the-1st-semester
Explore at:
Dataset updated
Nov 12, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
As a member of the World Organisation for Animal Health (OIE), and the reporting authority for the United States, the USGS National Wildlife Health Center (NWHC) is responsible for reporting wildlife disease outbreaks that involve diseases which are not OIE-listed (https://www.oie.int/wahis_2/public/wahidwild.php# ). These outbreaks are to be reported on a semesterly basis via OIE’s WAHIS-Wild reporting system. The data fields described within are based on those in WAHIS-Wild. Since OIE’s reporting mechanism is based primarily on domestic and agricultural animals, several of the variables are not applicable to wildlife (i.e. vaccination, slaughtered, etc.). In an effort to use a consistent data source that is broad in scope and captures information from around the country, from various natural resource management authorities, NWHC will use the Wildlife Health Information Sharing Partnership - Event Reporting System (WHISPers - https://whispers.usgs.gov/home ) as the sole source to generate and supply the requested information to OIE. Data supplied to OIE have been restricted to publicly available information on wildlife morbidity/mortality and surveillance events in WHISPers.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mohamadreza Momeni (2024). Pandemics in World [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/pandemics-in-world

Pandemics in World

"an epidemic occurring on a scale that crosses international boundaries.

Explore at:

102 scholarly articles cite this dataset (View in Google Scholar)

zip(1428134 bytes)Available download formats

Dataset updated

Jan 9, 2024

Authors

Mohamadreza Momeni

Area covered

World

Description

By Saloni Dattani, Lucas Rodés-Guirao, Edouard Mathieu, Hannah Ritchie and Max Roser.

Data description:

Disease outbreaks may be inevitable, but large-scale pandemics are not. The world can respond swiftly and effectively to pandemic risks in the future with better understanding, resources, and effort.

To avoid suffering through another large pandemic, we have to take the risk of pandemics seriously. Despite warnings that another one was likely, the COVID-19 pandemic killed more than 27 million people.1

We must build the capacity to test for pathogens and understand them: which pathogens put us at the greatest risk, how they spread, and how to tackle them.

We know it is possible to greatly reduce the risk of infectious disease. We’ve learned over history how to reduce their impact with vaccines, public health efforts, and medicine.

In addition to the old risks, we face new threats from factory farming, genetic modification, climate change, and antimicrobial resistance. With more attention and effort, we can reduce their risks too.

Good luck in your analysis.

Clear search

Close search

Google apps

Main menu

Pandemics in World

Global Health Observatory (GHO)

Data from: A global dataset of pandemic- and epidemic-prone disease...

California Infectious Disease Cases

California Infectious Disease Cases

Rates and Counts By County, Disease, Sex, and Year (2001-2014)

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Global Burden of Disease analysis dataset of noncommunicable disease...

World Health Organization Estimates of the Global and Regional Disease...

Disease and symptoms dataset 2023

Counts of Meningococcal infectious disease reported in UNITED STATES OF...

Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...

Counts of Anthrax reported in UNITED STATES OF AMERICA: 1942-1945

BRFSS 2020 Heart Disease Dataset(Cleaned Version)

FAIR Dataset for Disease Prediction in Healthcare Applications

Dataset Description

Context and Methodology

Technical Details

Further Details

Data from: Global Health Atlas

Global Burden of Disease analysis dataset of cardiovascular disease...

Deaths related to infectious diseases

Johns Hopkins COVID-19 Case Tracker

Updates

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

Queries

Interactive

Interactive Embed Code

Caveats

Attribution

Counts of Disease caused by West Nile virus reported in UNITED STATES OF...

COVID-19 Combined Data-set with Improved Measurement Errors

Data from: COVID-19 Datasets for predicting the number of new cases of...

Data from: Non-listed disease report to OIE (World Organisation for Animal...

Pandemics in World

"an epidemic occurring on a scale that crosses international boundaries.