100+ datasets found
  1. Pandemics in World

    • kaggle.com
    zip
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamadreza Momeni (2024). Pandemics in World [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/pandemics-in-world
    Explore at:
    zip(1428134 bytes)Available download formats
    Dataset updated
    Jan 9, 2024
    Authors
    Mohamadreza Momeni
    Area covered
    World
    Description

    By Saloni Dattani, Lucas Rodés-Guirao, Edouard Mathieu, Hannah Ritchie and Max Roser.

    Data description:

    Disease outbreaks may be inevitable, but large-scale pandemics are not. The world can respond swiftly and effectively to pandemic risks in the future with better understanding, resources, and effort.

    To avoid suffering through another large pandemic, we have to take the risk of pandemics seriously. Despite warnings that another one was likely, the COVID-19 pandemic killed more than 27 million people.1

    We must build the capacity to test for pathogens and understand them: which pathogens put us at the greatest risk, how they spread, and how to tackle them.

    We know it is possible to greatly reduce the risk of infectious disease. We’ve learned over history how to reduce their impact with vaccines, public health efforts, and medicine.

    In addition to the old risks, we face new threats from factory farming, genetic modification, climate change, and antimicrobial resistance. With more attention and effort, we can reduce their risks too.

    Good luck in your analysis.

  2. H

    Global Health Observatory (GHO)

    • data.niaid.nih.gov
    • dataverse.harvard.edu
    Updated May 5, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). Global Health Observatory (GHO) [Dataset]. http://doi.org/10.7910/DVN/JILCZW
    Explore at:
    Dataset updated
    May 5, 2011
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Users can find data on a range of global health topics like mortality, the burden of disease, infectious diseases, risk factors and health expenditures. Background The Global Health Observatory (GHO) database is the World Health Organization's main health statistics repository. Data is available for 193 World Health Organization member states on topics including but not limited to: Health related millennium goals, mortality, immunization, nutrition, infectious disease, non- communicable disease, tobacco control, violence, injuries, alcohol, HIV/AIDS, tuberculosis, malaria, water and sanitation, maternal and reproductive health, cho lera, child health, child nutrition, and road safety. User FunctionalityUsers can generate tables and charts according to country or region, health indicator, and time period. Data can also be compared across countries. Data can be filtered, tabulated, charted, and downloaded into Excel statistical software. These data are also published in statistical reports covering topics including: Alcohol and health, Child health, Cholera, HIV/AIDS, Malaria, Maternal and reproductive heal th, Non-communicable diseases, Public health and environment, Road safety, Tuberculosis, Tobacco control. Data Notes Data are derived from surveillance and household surveys. Years in which data were collected is indicated with these health statistics. Information is available for each WHO member country and international region. The most recent data is available from 2009.

  3. Data from: A global dataset of pandemic- and epidemic-prone disease...

    • figshare.com
    7z
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Armando Torres Munguía (2025). A global dataset of pandemic- and epidemic-prone disease outbreaks [Dataset]. http://doi.org/10.6084/m9.figshare.17207183.v6
    Explore at:
    7zAvailable download formats
    Dataset updated
    Oct 8, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Juan Armando Torres Munguía
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IMPORTANT NOTE #####From October 2024, this project is being updated by Dr. Juan Armando Torres Munguía. In case of questions, requests, or collaborations, you can contact me via GitHub or here. Updated data can be found in data-monthly-updated-1996-2025.zip. You can also access the updated data here: https://github.com/jatorresmunguia/disease_outbreak_news.

  4. California Infectious Disease Cases

    • kaggle.com
    zip
    Updated Jan 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). California Infectious Disease Cases [Dataset]. https://www.kaggle.com/datasets/thedevastator/california-infectious-disease-cases
    Explore at:
    zip(2093378 bytes)Available download formats
    Dataset updated
    Jan 24, 2023
    Authors
    The Devastator
    Area covered
    California
    Description

    California Infectious Disease Cases

    Rates and Counts By County, Disease, Sex, and Year (2001-2014)

    By Health [source]

    About this dataset

    This dataset provides comprehensive information on the number and rate of infectious diseases in California. Focusing on counties, sexes, and various diseases between 2001-2014, it offers powerful insights into the health status of its citizens. Its data also reveals trends in the spread of common illnesses in this state. Whether you are an epidemiologist looking to inform public health policy or a researcher seeking to investigate particular illnesses within certain populations, this dataset contains all the necessary information to answer your questions. Explore it today and discover hidden stories waiting to be uncovered!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains counts and rates of infectious diseases in California by county, disease, sex, and year. This dataset can be used to generate trends to understand the changes in incidence of different types of diseases over time and across counties or between sexes.

    To use this dataset: - Select the columns you are interested in exploring - these could include Disease, County, Sex or Year. - Filter out the rows that do not relate to your question - for example filtering by a specific county or disease. - Examine the average rate per 100000 people for each group you selected as well as its lower and upper confidence intervals (CI). - Use Rate as your dependent variable for analysis; Population is likely also important determining factors. Make sure to check if any Rates have 'unstable' flags.
    - Visualise or statistically analyse your data using suitable methods such as descriptive statistics (means/medians/mode etc.)for comparison between 2+ groups or correlation/regression based models when comparing one variable to another over time etc.

    Research Ideas

    • Analyzing the geographic spread of infectious diseases over time to identify areas in need of increased education, resources, and care.
    • Comparing rates of disease by sex to identify and understand any gender-based differences in infectious disease cases.
    • Using the Unstable column to determine whether a particular county or region needs further study of a certain type of infectious disease due to unusual spikes or drops in rate or count during a specific year

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: Infectious_Disease_Cases_by_County_Year_and_Sex_2001-2014.csv | Column name | Description | |:---------------|:---------------------------------------------------------------------------------------------------------------| | Disease | The type of infectious disease reported. (String) | | County | The county in California where the cases were reported. (String) | | Year | The year in which the cases were reported. (Integer) | | Sex | The gender of the individuals who contracted the disease. (String) | | Population | The population size of the county in which the cases were reported. (Integer) | | Rate | The rate of infection per 100 thousand people living in the county. (Float) | | CI.lower | The lower confidence interval associated with the rate of infection. (Float) | | CI.upper | The upper confidence interval associated with the rate of infection. (Float) ...

  5. m

    Global Burden of Disease analysis dataset of noncommunicable disease...

    • data.mendeley.com
    Updated Apr 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Cundiff (2023). Global Burden of Disease analysis dataset of noncommunicable disease outcomes, risk factors, and SAS codes [Dataset]. http://doi.org/10.17632/g6b39zxck4.10
    Explore at:
    Dataset updated
    Apr 6, 2023
    Authors
    David Cundiff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This formatted dataset (AnalysisDatabaseGBD) originates from raw data files from the Institute of Health Metrics and Evaluation (IHME) Global Burden of Disease Study (GBD2017) affiliated with the University of Washington. We are volunteer collaborators with IHME and not employed by IHME or the University of Washington.

    The population weighted GBD2017 data are on male and female cohorts ages 15-69 years including noncommunicable diseases (NCDs), body mass index (BMI), cardiovascular disease (CVD), and other health outcomes and associated dietary, metabolic, and other risk factors. The purpose of creating this population-weighted, formatted database is to explore the univariate and multiple regression correlations of health outcomes with risk factors. Our research hypothesis is that we can successfully model NCDs, BMI, CVD, and other health outcomes with their attributable risks.

    These Global Burden of disease data relate to the preprint: The EAT-Lancet Commission Planetary Health Diet compared with Institute of Health Metrics and Evaluation Global Burden of Disease Ecological Data Analysis. The data include the following: 1. Analysis database of population weighted GBD2017 data that includes over 40 health risk factors, noncommunicable disease deaths/100k/year of male and female cohorts ages 15-69 years from 195 countries (the primary outcome variable that includes over 100 types of noncommunicable diseases) and over 20 individual noncommunicable diseases (e.g., ischemic heart disease, colon cancer, etc). 2. A text file to import the analysis database into SAS 3. The SAS code to format the analysis database to be used for analytics 4. SAS code for deriving Tables 1, 2, 3 and Supplementary Tables 5 and 6 5. SAS code for deriving the multiple regression formula in Table 4. 6. SAS code for deriving the multiple regression formula in Table 5 7. SAS code for deriving the multiple regression formula in Supplementary Table 7
    8. SAS code for deriving the multiple regression formula in Supplementary Table 8 9. The Excel files that accompanied the above SAS code to produce the tables

    For questions, please email davidkcundiff@gmail.com. Thanks.

  6. World Health Organization Estimates of the Global and Regional Disease...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martyn D. Kirk; Sara M. Pires; Robert E. Black; Marisa Caipo; John A. Crump; Brecht Devleesschauwer; Dörte Döpfer; Aamir Fazil; Christa L. Fischer-Walker; Tine Hald; Aron J. Hall; Karen H. Keddy; Robin J. Lake; Claudio F. Lanata; Paul R. Torgerson; Arie H. Havelaar; Frederick J. Angulo (2023). World Health Organization Estimates of the Global and Regional Disease Burden of 22 Foodborne Bacterial, Protozoal, and Viral Diseases, 2010: A Data Synthesis [Dataset]. http://doi.org/10.1371/journal.pmed.1001921
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Martyn D. Kirk; Sara M. Pires; Robert E. Black; Marisa Caipo; John A. Crump; Brecht Devleesschauwer; Dörte Döpfer; Aamir Fazil; Christa L. Fischer-Walker; Tine Hald; Aron J. Hall; Karen H. Keddy; Robin J. Lake; Claudio F. Lanata; Paul R. Torgerson; Arie H. Havelaar; Frederick J. Angulo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundFoodborne diseases are important worldwide, resulting in considerable morbidity and mortality. To our knowledge, we present the first global and regional estimates of the disease burden of the most important foodborne bacterial, protozoal, and viral diseases.Methods and FindingsWe synthesized data on the number of foodborne illnesses, sequelae, deaths, and Disability Adjusted Life Years (DALYs), for all diseases with sufficient data to support global and regional estimates, by age and region. The data sources included varied by pathogen and included systematic reviews, cohort studies, surveillance studies and other burden of disease assessments. We sought relevant data circa 2010, and included sources from 1990–2012. The number of studies per pathogen ranged from as few as 5 studies for bacterial intoxications through to 494 studies for diarrheal pathogens. To estimate mortality for Mycobacterium bovis infections and morbidity and mortality for invasive non-typhoidal Salmonella enterica infections, we excluded cases attributed to HIV infection. We excluded stillbirths in our estimates. We estimate that the 22 diseases included in our study resulted in two billion (95% uncertainty interval [UI] 1.5–2.9 billion) cases, over one million (95% UI 0.89–1.4 million) deaths, and 78.7 million (95% UI 65.0–97.7 million) DALYs in 2010. To estimate the burden due to contaminated food, we then applied proportions of infections that were estimated to be foodborne from a global expert elicitation. Waterborne transmission of disease was not included. We estimate that 29% (95% UI 23–36%) of cases caused by diseases in our study, or 582 million (95% UI 401–922 million), were transmitted by contaminated food, resulting in 25.2 million (95% UI 17.5–37.0 million) DALYs. Norovirus was the leading cause of foodborne illness causing 125 million (95% UI 70–251 million) cases, while Campylobacter spp. caused 96 million (95% UI 52–177 million) foodborne illnesses. Of all foodborne diseases, diarrheal and invasive infections due to non-typhoidal S. enterica infections resulted in the highest burden, causing 4.07 million (95% UI 2.49–6.27 million) DALYs. Regionally, DALYs per 100,000 population were highest in the African region followed by the South East Asian region. Considerable burden of foodborne disease is borne by children less than five years of age. Major limitations of our study include data gaps, particularly in middle- and high-mortality countries, and uncertainty around the proportion of diseases that were foodborne.ConclusionsFoodborne diseases result in a large disease burden, particularly in children. Although it is known that diarrheal diseases are a major burden in children, we have demonstrated for the first time the importance of contaminated food as a cause. There is a need to focus food safety interventions on preventing foodborne diseases, particularly in low- and middle-income settings.

  7. m

    Disease and symptoms dataset 2023

    • data.mendeley.com
    Updated Mar 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bran Stark (2025). Disease and symptoms dataset 2023 [Dataset]. http://doi.org/10.17632/2cxccsxydc.1
    Explore at:
    Dataset updated
    Mar 3, 2025
    Authors
    Bran Stark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.

  8. p

    Counts of Meningococcal infectious disease reported in UNITED STATES OF...

    • tycho.pitt.edu
    • data.niaid.nih.gov
    Updated Apr 1, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Meningococcal infectious disease reported in UNITED STATES OF AMERICA: 1951-2010 [Dataset]. https://www.tycho.pitt.edu/dataset/US.23511006
    Explore at:
    Dataset updated
    Apr 1, 2018
    Dataset provided by
    Project Tycho, University of Pittsburgh
    Authors
    Willem G Van Panhuis; Anne L Cross; Donald S Burke
    Time period covered
    1951 - 2010
    Area covered
    United States
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  9. m

    Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...

    • data.mendeley.com
    Updated Jul 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2022). MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak [Dataset]. http://doi.org/10.17632/xmcg82mx9k.3
    Explore at:
    Dataset updated
    Jul 25, 2022
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2

    Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

    Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)

    The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.

  10. p

    Counts of Anthrax reported in UNITED STATES OF AMERICA: 1942-1945

    • tycho.pitt.edu
    • data.niaid.nih.gov
    Updated Apr 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Anthrax reported in UNITED STATES OF AMERICA: 1942-1945 [Dataset]. https://www.tycho.pitt.edu/dataset/US.409498004
    Explore at:
    Dataset updated
    Apr 1, 2018
    Dataset provided by
    Project Tycho, University of Pittsburgh
    Authors
    Willem G Van Panhuis; Anne L Cross; Donald S Burke
    Time period covered
    1942 - 1945
    Area covered
    United States
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  11. BRFSS 2020 Heart Disease Dataset(Cleaned Version)

    • zenodo.org
    csv
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koushal Kumar; BP Pande; Koushal Kumar; BP Pande (2025). BRFSS 2020 Heart Disease Dataset(Cleaned Version) [Dataset]. http://doi.org/10.5281/zenodo.15336526
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Koushal Kumar; BP Pande; Koushal Kumar; BP Pande
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Originally, the dataset come from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to gather data on the health status of U.S. residents. As the CDC describes: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.". The most recent dataset (as of February 15, 2022) includes data from 2020. It consists of 401,958 rows and 279 columns. The vast majority of columns are questions asked to respondents about their health status, such as "Do you have serious difficulty walking or climbing stairs?" or "Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]".

    To improve the efficiency and relevance of our analysis, we removed certain attributes from the original BRFSS dataset. Many of the 279 original attributes included administrative codes, metadata, or survey-specific variables that do not contribute meaningfully to heart disease prediction—such as respondent IDs, timestamps, state-level identifiers, and detailed lifestyle questions unrelated to cardiovascular health. By focusing on a carefully selected subset of 18 attributes directly linked to medical, behavioral, and demographic factors known to influence heart health, we streamlined the dataset. This not only reduced computational complexity but also improved model interpretability and performance by eliminating noise and irrelevant information. All predicting variables could be divided into 4 broad categories:

    1. Demographic factors: sex, age category (14 levels), race, BMI (Body Mass Index)

    2. Diseases: weather respondent ever had such diseases as asthma, skin cancer, diabetes, stroke or kidney disease (not including kidney stones, bladder infection or incontinence)

    3. Unhealthy habits:

      • Smoking - respondents that smoked at least 100 cigarettes in their entire life (5 packs = 100 cigarettes)
      • Alcohol Drinking - heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week
    4. General Health:

      • Difficulty Walking - weather respondent have serious difficulty walking or climbing stairs
      • Physical Activity - adults who reported doing physical activity or exercise during the past 30 days other than their regular job
      • Sleep Time - respondent’s reported average hours of sleep in a 24-hour period
      • Physical Health - number of days being physically ill or injured (0-30 days)
      • Mental Health - number of days having bad mental health (0-30 days)
      • General Health - respondents declared their health as ’Excellent’, ’Very good’, ’Good’ ,’Fair’ or ’Poor’

    Below is a description of the features collected for each patient:

    #FeatureCoded Variable NameDescription
    1HeartDiseaseCVDINFR4Respondents that have ever reported having coronary heart disease (CHD) or myocardial infarction (MI)
    2BMI_BMI5CATBody Mass Index (BMI)
    3Smoking_SMOKER3Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]
    4AlcoholDrinking_RFDRHV7Heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week
    5StrokeCVDSTRK3(Ever told) (you had) a stroke?
    6PhysicalHealthPHYSHLTHNow thinking about your physical health, which includes physical illness and injury, for how many days during the past 30
    7MentalHealthMENTHLTHThinking about your mental health, for how many days during the past 30 days was your mental health not good?
    8DiffWalkingDIFFWALKDo you have serious difficulty walking or climbing stairs?
    9SexSEXVARAre you male or female?
    10AgeCategory_AGE_G,Fourteen-level age category
    11Race_IMPRACEImputed race/ethnicity value
    12DiabeticDIABETE4(Ever told) (you had) diabetes?
    13PhysicalActivityEXERANY2Adults who reported doing physical activity or exercise during the past 30 days other than their regular job
    14GenHealthGENHLTHWould you say that in general your health is...
    15SleepTimeSLEPTIM1On average, how many hours of sleep do you get in a 24-hour period?
    16AsthmaCHASTHMA(Ever told) (you had) asthma?
    17KidneyDiseaseCHCKDNY2Not including kidney stones, bladder infection or incontinence, were you ever told you had kidney disease?
    18SkinCancerCHCSCNCR(Ever told) (you had) skin cancer?
  12. t

    FAIR Dataset for Disease Prediction in Healthcare Applications

    • test.researchdata.tuwien.ac.at
    bin, csv, json, png
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf (2025). FAIR Dataset for Disease Prediction in Healthcare Applications [Dataset]. http://doi.org/10.70124/5n77a-dnf02
    Explore at:
    csv, json, bin, pngAvailable download formats
    Dataset updated
    Apr 14, 2025
    Dataset provided by
    TU Wien
    Authors
    Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Description

    Context and Methodology

    • Research Domain/Project:
      This dataset was created for a machine learning experiment aimed at developing a classification model to predict outcomes based on a set of features. The primary research domain is disease prediction in patients. The dataset was used in the context of training, validating, and testing.

    • Purpose of the Dataset:
      The purpose of this dataset is to provide training, validation, and testing data for the development of machine learning models. It includes labeled examples that help train classifiers to recognize patterns in the data and make predictions.

    • Dataset Creation:
      Data preprocessing steps involved cleaning, normalization, and splitting the data into training, validation, and test sets. The data was carefully curated to ensure its quality and relevance to the problem at hand. For any missing values or outliers, appropriate handling techniques were applied (e.g., imputation, removal, etc.).

    Technical Details

    • Structure of the Dataset:
      The dataset consists of several files organized into folders by data type:

      • Training Data: Contains the training dataset used to train the machine learning model.

      • Validation Data: Used for hyperparameter tuning and model selection.

      • Test Data: Reserved for final model evaluation.

      Each folder contains files with consistent naming conventions for easy navigation, such as train_data.csv, validation_data.csv, and test_data.csv. Each file follows a tabular format with columns representing features and rows representing individual data points.

    • Software Requirements:
      To open and work with this dataset, you need VS Code or Jupyter, which could include tools like:

      • Python (with libraries such as pandas, numpy, scikit-learn, matplotlib, etc.)

    Further Details

    • Reusability:
      Users of this dataset should be aware that it is designed for machine learning experiments involving classification tasks. The dataset is already split into training, validation, and test subsets. Any model trained with this dataset should be evaluated using the test set to ensure proper validation.

    • Limitations:
      The dataset may not cover all edge cases, and it might have biases depending on the selection of data sources. It's important to consider these limitations when generalizing model results to real-world applications.

  13. H

    Data from: Global Health Atlas

    • data.niaid.nih.gov
    • dataverse.harvard.edu
    Updated May 5, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). Global Health Atlas [Dataset]. http://doi.org/10.7910/DVN/GJKWGR
    Explore at:
    Dataset updated
    May 5, 2011
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Users can view statistics and generate cross-country comparisons pertaining to infectious diseases and health indicators in 193 WHO member states. Background The Global Health Atlas is a database maintained by the World Health Organization (WHO) that provides information regarding infectious diseases in WHO member states. Health conditions include: malaria, HIV/AIDS, cholera, STIs, meningitis, and polio, among others. User Functionality Users can generate statistics regarding infectious diseases and health systems indicators by country or region, or generate cross-country comparisons. In addition, users can v iew maps showing the distribution of various health indicators and diseases by geographic region or individual country. Data Notes Statistics are available for all WHO member states. Data are available from 1949 to 2009.

  14. m

    Global Burden of Disease analysis dataset of cardiovascular disease...

    • data.mendeley.com
    • narcis.nl
    Updated Jun 23, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Cundiff (2021). Global Burden of Disease analysis dataset of cardiovascular disease outcomes, risk factors, and SAS codes [Dataset]. http://doi.org/10.17632/g6b39zxck4.4
    Explore at:
    Dataset updated
    Jun 23, 2021
    Authors
    David Cundiff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This formatted dataset originates from raw data files from the Institute of Health Metrics and Evaluation Global Burden of Disease (GBD2017). It is population weighted worldwide data on male and female cohorts ages 15-69 years including cardiovascular disease early death and associated dietary, metabolic and other risk factors. The purpose of creating this formatted database is to explore the univariate and multiple regression correlations of cardiovascular early deaths and other health outcomes with risk factors. Our research hypothesis is that we can successfully apply artificial intelligence to model cardiovascular disease outcomes with risk factors. We found that fat-soluble vitamin containing foods (animal products) and added fats are negatively correlated with CVD early deaths worldwide but positively correlated with CVD early deaths in high fat-soluble vitamin cohorts. We interpret this as showing that optimal cardiovascular outcomes come with moderate (not low and not high) intakes of animal foods and added fats. You are invited to download the dataset, the associated SAS code to access the dataset, and the tables that have resulted from the analysis. Please comment on the article by indicating what you found by exploring the dataset with the provided SAS codes. Please say whether or not you found the outputs from the SAS codes accurately reflected the tables provided and the tables in the published article. If you use our data to reproduce our findings and comment on your findings on the MedRxIV website (https://www.medrxiv.org/content/10.1101/2021.04.17.21255675v4) and would like to be recognized, we will be happy to list you as a contributor when the article is summited to JAMA. For questions, please email davidkcundiff@gmail.com. Thanks.

  15. Deaths related to infectious diseases

    • ec.europa.eu
    Updated Oct 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eurostat (2025). Deaths related to infectious diseases [Dataset]. http://doi.org/10.2908/HLTH_CD_IDO
    Explore at:
    json, tsv, application/vnd.sdmx.data+csv;version=2.0.0, application/vnd.sdmx.data+xml;version=3.0.0, application/vnd.sdmx.genericdata+xml;version=2.1, application/vnd.sdmx.data+csv;version=1.0.0Available download formats
    Dataset updated
    Oct 10, 2025
    Dataset authored and provided by
    Eurostathttps://ec.europa.eu/eurostat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2011 - 2024
    Area covered
    Germany, Finland, Italy, Czechia, Estonia, Serbia, Moldova, Ireland, Romania, Norway
    Description

    Data on causes of death (COD) provide information on mortality patterns and form a major element of public health information.

    The COD data refer to the underlying cause which - according to the World Health Organisation (WHO) - is "the disease or injury which initiated the train of morbid events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury".

    The data are derived from the medical certificate of death, which is obligatory in the Member States. The information recorded in the death certificate is according to the rules specified by the WHO.

    Data published in Eurostat's dissemination database are broken down by sex, 5-year age groups, cause of death and by residency and country of occurrence. For stillbirths and neonatal deaths additional breakdowns might include age of mother and parity.

    Data are available for Member States, Iceland, Norway, Liechtenstein, Switzerland, United Kingdom, Serbia, Turkey, North Macedonia and Albania. Regional data (NUTS level 2) are available for all of the countries having NUTS2 regions except Albania.

    Annual national data are available in Eurostat's dissemination database in absolute number, crude death rates and standardised death rates. At regional level the same is provided in form of 3-years averages (the average of year, year -1 and year -2). Annual crude and standardised death rates are also available at NUTS2 level. Monthly national data are available for 21 EU Member States from reference year 2019 and in 24 Member States from reference year 2022 in absolute numbers and standardised death rates.

  16. d

    Johns Hopkins COVID-19 Case Tracker

    • data.world
    • kaggle.com
    csv, zip
    Updated Dec 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Dec 3, 2025
    Authors
    The Associated Press
    Time period covered
    Jan 22, 2020 - Mar 9, 2023
    Area covered
    Description

    Updates

    • Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

    • April 9, 2020

      • The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.
    • April 20, 2020

      • Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.
    • April 29, 2020

      • The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.
    • September 1st, 2020

      • Johns Hopkins is now providing counts for the five New York City counties individually.
    • February 12, 2021

      • The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."
      • Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.
    • February 16, 2021

      - Johns Hopkins has reconciled Ohio's historical deaths data with the state.

      Overview

    The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

    The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

    This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

    The AP is updating this dataset hourly at 45 minutes past the hour.

    To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

    Queries

    Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

    Interactive

    The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

    @(https://datawrapper.dwcdn.net/nRyaf/15/)

    Interactive Embed Code

    <iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
    

    Caveats

    • This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.
    • In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.
    • In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"
    • This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.
    • Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
    • Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.
    • The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

    Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

    Attribution

    This data should be credited to Johns Hopkins University COVID-19 tracking project

  17. p

    Counts of Disease caused by West Nile virus reported in UNITED STATES OF...

    • tycho.pitt.edu
    • data.niaid.nih.gov
    Updated Apr 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Disease caused by West Nile virus reported in UNITED STATES OF AMERICA: 2002-2005 [Dataset]. https://www.tycho.pitt.edu/dataset/US.417093003
    Explore at:
    Dataset updated
    Apr 1, 2018
    Dataset provided by
    Project Tycho, University of Pittsburgh
    Authors
    Willem G Van Panhuis; Anne L Cross; Donald S Burke
    Time period covered
    2002 - 2005
    Area covered
    United States
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  18. m

    COVID-19 Combined Data-set with Improved Measurement Errors

    • data.mendeley.com
    Updated May 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afshin Ashofteh (2020). COVID-19 Combined Data-set with Improved Measurement Errors [Dataset]. http://doi.org/10.17632/nw5m4hs3jr.3
    Explore at:
    Dataset updated
    May 13, 2020
    Authors
    Afshin Ashofteh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.

  19. m

    Data from: COVID-19 Datasets for predicting the number of new cases of...

    • data.mendeley.com
    • narcis.nl
    Updated Jul 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pınar Tüfekci (2020). COVID-19 Datasets for predicting the number of new cases of COVID-19 ahead of 1 day, 3 days, and 10 days [Dataset]. http://doi.org/10.17632/499vtcykvw.1
    Explore at:
    Dataset updated
    Jul 28, 2020
    Authors
    Pınar Tüfekci
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Four datasets are presented here. The original dataset is a collection of the COVID-19 data maintained by Our World in Data. It includes data on confirmed cases, and deaths, as well as other variables of potential interest for ten countries such as Australia, Brazil, Canada, China, Denmark, France, Israel, Italy, the United Kingdom, and the United States. The original dataset includes the data from the date of 31st December in 2019 to 31st May in 2020 with a total of 1.530 instances and 19 features. This dataset is collected from a variety of sources (the European Centre for Disease Prevention and Control, United Nations, World Bank, Global Burden of Disease, Blavatnik School of Government, etc.). After the original dataset is pre-processed by cleaning and removing some data including unnecessary and blank. Then, all strings are converted numeric values, and some new features such as continent, hemisphere, year, month, and day are added by extracting the original features. After that, the processed original dataset is organized for prediction of the number of new cases of COVID-19 for 1 day, 3 days, and 10 days ago and three datasets (Dataset-1, 2, 3) are created for that.

  20. d

    Data from: Non-listed disease report to OIE (World Organisation for Animal...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Non-listed disease report to OIE (World Organisation for Animal Health) for the 1st semester of 2019 [Dataset]. https://catalog.data.gov/dataset/non-listed-disease-report-to-oie-world-organisation-for-animal-health-for-the-1st-semester
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    As a member of the World Organisation for Animal Health (OIE), and the reporting authority for the United States, the USGS National Wildlife Health Center (NWHC) is responsible for reporting wildlife disease outbreaks that involve diseases which are not OIE-listed (https://www.oie.int/wahis_2/public/wahidwild.php# ). These outbreaks are to be reported on a semesterly basis via OIE’s WAHIS-Wild reporting system. The data fields described within are based on those in WAHIS-Wild. Since OIE’s reporting mechanism is based primarily on domestic and agricultural animals, several of the variables are not applicable to wildlife (i.e. vaccination, slaughtered, etc.). In an effort to use a consistent data source that is broad in scope and captures information from around the country, from various natural resource management authorities, NWHC will use the Wildlife Health Information Sharing Partnership - Event Reporting System (WHISPers - https://whispers.usgs.gov/home ) as the sole source to generate and supply the requested information to OIE. Data supplied to OIE have been restricted to publicly available information on wildlife morbidity/mortality and surveillance events in WHISPers.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mohamadreza Momeni (2024). Pandemics in World [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/pandemics-in-world
Organization logo

Pandemics in World

"an epidemic occurring on a scale that crosses international boundaries.

Explore at:
102 scholarly articles cite this dataset (View in Google Scholar)
zip(1428134 bytes)Available download formats
Dataset updated
Jan 9, 2024
Authors
Mohamadreza Momeni
Area covered
World
Description

By Saloni Dattani, Lucas Rodés-Guirao, Edouard Mathieu, Hannah Ritchie and Max Roser.

Data description:

Disease outbreaks may be inevitable, but large-scale pandemics are not. The world can respond swiftly and effectively to pandemic risks in the future with better understanding, resources, and effort.

To avoid suffering through another large pandemic, we have to take the risk of pandemics seriously. Despite warnings that another one was likely, the COVID-19 pandemic killed more than 27 million people.1

We must build the capacity to test for pathogens and understand them: which pathogens put us at the greatest risk, how they spread, and how to tackle them.

We know it is possible to greatly reduce the risk of infectious disease. We’ve learned over history how to reduce their impact with vaccines, public health efforts, and medicine.

In addition to the old risks, we face new threats from factory farming, genetic modification, climate change, and antimicrobial resistance. With more attention and effort, we can reduce their risks too.

Good luck in your analysis.

Search
Clear search
Close search
Google apps
Main menu