100+ datasets found

Pandemics in World
kaggle.com
zip
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamadreza Momeni (2024). Pandemics in World [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/pandemics-in-world
Explore at:
zip(1428134 bytes)Available download formats
Dataset updated
Jan 9, 2024
Authors
Mohamadreza Momeni
Area covered
World
Description
By Saloni Dattani, Lucas Rodés-Guirao, Edouard Mathieu, Hannah Ritchie and Max Roser.

Data description:

Disease outbreaks may be inevitable, but large-scale pandemics are not. The world can respond swiftly and effectively to pandemic risks in the future with better understanding, resources, and effort.

To avoid suffering through another large pandemic, we have to take the risk of pandemics seriously. Despite warnings that another one was likely, the COVID-19 pandemic killed more than 27 million people.1

We must build the capacity to test for pathogens and understand them: which pathogens put us at the greatest risk, how they spread, and how to tackle them.

We know it is possible to greatly reduce the risk of infectious disease. We’ve learned over history how to reduce their impact with vaccines, public health efforts, and medicine.

In addition to the old risks, we face new threats from factory farming, genetic modification, climate change, and antimicrobial resistance. With more attention and effort, we can reduce their risks too.

Good luck in your analysis.
Data from: A global dataset of pandemic- and epidemic-prone disease...
figshare.com
7z
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Armando Torres Munguía (2025). A global dataset of pandemic- and epidemic-prone disease outbreaks [Dataset]. http://doi.org/10.6084/m9.figshare.17207183.v6
Explore at:
7zAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17207183.v6
Dataset updated
Oct 8, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Juan Armando Torres Munguía
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IMPORTANT NOTE #####From October 2024, this project is being updated by Dr. Juan Armando Torres Munguía. In case of questions, requests, or collaborations, you can contact me via GitHub or here. Updated data can be found in data-monthly-updated-1996-2025.zip. You can also access the updated data here: https://github.com/jatorresmunguia/disease_outbreak_news.
World Health Organization Estimates of the Global and Regional Disease...
plos.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martyn D. Kirk; Sara M. Pires; Robert E. Black; Marisa Caipo; John A. Crump; Brecht Devleesschauwer; Dörte Döpfer; Aamir Fazil; Christa L. Fischer-Walker; Tine Hald; Aron J. Hall; Karen H. Keddy; Robin J. Lake; Claudio F. Lanata; Paul R. Torgerson; Arie H. Havelaar; Frederick J. Angulo (2023). World Health Organization Estimates of the Global and Regional Disease Burden of 22 Foodborne Bacterial, Protozoal, and Viral Diseases, 2010: A Data Synthesis [Dataset]. http://doi.org/10.1371/journal.pmed.1001921
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pmed.1001921
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Martyn D. Kirk; Sara M. Pires; Robert E. Black; Marisa Caipo; John A. Crump; Brecht Devleesschauwer; Dörte Döpfer; Aamir Fazil; Christa L. Fischer-Walker; Tine Hald; Aron J. Hall; Karen H. Keddy; Robin J. Lake; Claudio F. Lanata; Paul R. Torgerson; Arie H. Havelaar; Frederick J. Angulo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundFoodborne diseases are important worldwide, resulting in considerable morbidity and mortality. To our knowledge, we present the first global and regional estimates of the disease burden of the most important foodborne bacterial, protozoal, and viral diseases.Methods and FindingsWe synthesized data on the number of foodborne illnesses, sequelae, deaths, and Disability Adjusted Life Years (DALYs), for all diseases with sufficient data to support global and regional estimates, by age and region. The data sources included varied by pathogen and included systematic reviews, cohort studies, surveillance studies and other burden of disease assessments. We sought relevant data circa 2010, and included sources from 1990–2012. The number of studies per pathogen ranged from as few as 5 studies for bacterial intoxications through to 494 studies for diarrheal pathogens. To estimate mortality for Mycobacterium bovis infections and morbidity and mortality for invasive non-typhoidal Salmonella enterica infections, we excluded cases attributed to HIV infection. We excluded stillbirths in our estimates. We estimate that the 22 diseases included in our study resulted in two billion (95% uncertainty interval [UI] 1.5–2.9 billion) cases, over one million (95% UI 0.89–1.4 million) deaths, and 78.7 million (95% UI 65.0–97.7 million) DALYs in 2010. To estimate the burden due to contaminated food, we then applied proportions of infections that were estimated to be foodborne from a global expert elicitation. Waterborne transmission of disease was not included. We estimate that 29% (95% UI 23–36%) of cases caused by diseases in our study, or 582 million (95% UI 401–922 million), were transmitted by contaminated food, resulting in 25.2 million (95% UI 17.5–37.0 million) DALYs. Norovirus was the leading cause of foodborne illness causing 125 million (95% UI 70–251 million) cases, while Campylobacter spp. caused 96 million (95% UI 52–177 million) foodborne illnesses. Of all foodborne diseases, diarrheal and invasive infections due to non-typhoidal S. enterica infections resulted in the highest burden, causing 4.07 million (95% UI 2.49–6.27 million) DALYs. Regionally, DALYs per 100,000 population were highest in the African region followed by the South East Asian region. Considerable burden of foodborne disease is borne by children less than five years of age. Major limitations of our study include data gaps, particularly in middle- and high-mortality countries, and uncertainty around the proportion of diseases that were foodborne.ConclusionsFoodborne diseases result in a large disease burden, particularly in children. Although it is known that diarrheal diseases are a major burden in children, we have demonstrated for the first time the importance of contaminated food as a cause. There is a need to focus food safety interventions on preventing foodborne diseases, particularly in low- and middle-income settings.
H
Global Health Observatory (GHO)
data.niaid.nih.gov
dataverse.harvard.edu
Updated May 5, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2011). Global Health Observatory (GHO) [Dataset]. http://doi.org/10.7910/DVN/JILCZW
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/JILCZW
Dataset updated
May 5, 2011
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Users can find data on a range of global health topics like mortality, the burden of disease, infectious diseases, risk factors and health expenditures. Background The Global Health Observatory (GHO) database is the World Health Organization's main health statistics repository. Data is available for 193 World Health Organization member states on topics including but not limited to: Health related millennium goals, mortality, immunization, nutrition, infectious disease, non- communicable disease, tobacco control, violence, injuries, alcohol, HIV/AIDS, tuberculosis, malaria, water and sanitation, maternal and reproductive health, cho lera, child health, child nutrition, and road safety. User FunctionalityUsers can generate tables and charts according to country or region, health indicator, and time period. Data can also be compared across countries. Data can be filtered, tabulated, charted, and downloaded into Excel statistical software. These data are also published in statistical reports covering topics including: Alcohol and health, Child health, Cholera, HIV/AIDS, Malaria, Maternal and reproductive heal th, Non-communicable diseases, Public health and environment, Road safety, Tuberculosis, Tobacco control. Data Notes Data are derived from surveillance and household surveys. Years in which data were collected is indicated with these health statistics. Information is available for each WHO member country and international region. The most recent data is available from 2009.
California Infectious Disease Cases
kaggle.com
zip
Updated Jan 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). California Infectious Disease Cases [Dataset]. https://www.kaggle.com/datasets/thedevastator/california-infectious-disease-cases
Explore at:
zip(2093378 bytes)Available download formats
Dataset updated
Jan 24, 2023
Authors
The Devastator
Area covered
California
Description
California Infectious Disease Cases

Rates and Counts By County, Disease, Sex, and Year (2001-2014)

By Health [source]

About this dataset

This dataset provides comprehensive information on the number and rate of infectious diseases in California. Focusing on counties, sexes, and various diseases between 2001-2014, it offers powerful insights into the health status of its citizens. Its data also reveals trends in the spread of common illnesses in this state. Whether you are an epidemiologist looking to inform public health policy or a researcher seeking to investigate particular illnesses within certain populations, this dataset contains all the necessary information to answer your questions. Explore it today and discover hidden stories waiting to be uncovered!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains counts and rates of infectious diseases in California by county, disease, sex, and year. This dataset can be used to generate trends to understand the changes in incidence of different types of diseases over time and across counties or between sexes.

To use this dataset: - Select the columns you are interested in exploring - these could include Disease, County, Sex or Year. - Filter out the rows that do not relate to your question - for example filtering by a specific county or disease. - Examine the average rate per 100000 people for each group you selected as well as its lower and upper confidence intervals (CI). - Use Rate as your dependent variable for analysis; Population is likely also important determining factors. Make sure to check if any Rates have 'unstable' flags.
- Visualise or statistically analyse your data using suitable methods such as descriptive statistics (means/medians/mode etc.)for comparison between 2+ groups or correlation/regression based models when comparing one variable to another over time etc.

Research Ideas

Analyzing the geographic spread of infectious diseases over time to identify areas in need of increased education, resources, and care.

Comparing rates of disease by sex to identify and understand any gender-based differences in infectious disease cases.

Using the Unstable column to determine whether a particular county or region needs further study of a certain type of infectious disease due to unusual spikes or drops in rate or count during a specific year

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: Infectious_Disease_Cases_by_County_Year_and_Sex_2001-2014.csv | Column name | Description | |:---------------|:---------------------------------------------------------------------------------------------------------------| | Disease | The type of infectious disease reported. (String) | | County | The county in California where the cases were reported. (String) | | Year | The year in which the cases were reported. (Integer) | | Sex | The gender of the individuals who contracted the disease. (String) | | Population | The population size of the county in which the cases were reported. (Integer) | | Rate | The rate of infection per 100 thousand people living in the county. (Float) | | CI.lower | The lower confidence interval associated with the rate of infection. (Float) | | CI.upper | The upper confidence interval associated with the rate of infection. (Float) ...
m
Global Burden of Disease analysis dataset of noncommunicable disease...
data.mendeley.com
Updated Apr 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Cundiff (2023). Global Burden of Disease analysis dataset of noncommunicable disease outcomes, risk factors, and SAS codes [Dataset]. http://doi.org/10.17632/g6b39zxck4.10
Explore at:
Unique identifier
https://doi.org/10.17632/g6b39zxck4.10
Dataset updated
Apr 6, 2023
Authors
David Cundiff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This formatted dataset (AnalysisDatabaseGBD) originates from raw data files from the Institute of Health Metrics and Evaluation (IHME) Global Burden of Disease Study (GBD2017) affiliated with the University of Washington. We are volunteer collaborators with IHME and not employed by IHME or the University of Washington.

The population weighted GBD2017 data are on male and female cohorts ages 15-69 years including noncommunicable diseases (NCDs), body mass index (BMI), cardiovascular disease (CVD), and other health outcomes and associated dietary, metabolic, and other risk factors. The purpose of creating this population-weighted, formatted database is to explore the univariate and multiple regression correlations of health outcomes with risk factors. Our research hypothesis is that we can successfully model NCDs, BMI, CVD, and other health outcomes with their attributable risks.

These Global Burden of disease data relate to the preprint: The EAT-Lancet Commission Planetary Health Diet compared with Institute of Health Metrics and Evaluation Global Burden of Disease Ecological Data Analysis. The data include the following: 1. Analysis database of population weighted GBD2017 data that includes over 40 health risk factors, noncommunicable disease deaths/100k/year of male and female cohorts ages 15-69 years from 195 countries (the primary outcome variable that includes over 100 types of noncommunicable diseases) and over 20 individual noncommunicable diseases (e.g., ischemic heart disease, colon cancer, etc). 2. A text file to import the analysis database into SAS 3. The SAS code to format the analysis database to be used for analytics 4. SAS code for deriving Tables 1, 2, 3 and Supplementary Tables 5 and 6 5. SAS code for deriving the multiple regression formula in Table 4. 6. SAS code for deriving the multiple regression formula in Table 5 7. SAS code for deriving the multiple regression formula in Supplementary Table 7
8. SAS code for deriving the multiple regression formula in Supplementary Table 8 9. The Excel files that accompanied the above SAS code to produce the tables

For questions, please email davidkcundiff@gmail.com. Thanks.
m
Disease and symptoms dataset 2023
data.mendeley.com
Updated Mar 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bran Stark (2025). Disease and symptoms dataset 2023 [Dataset]. http://doi.org/10.17632/2cxccsxydc.1
Explore at:
Unique identifier
https://doi.org/10.17632/2cxccsxydc.1
Dataset updated
Mar 3, 2025
Authors
Bran Stark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains disease names along with the symptoms faced by the respective patient. There are a total of 773 unique diseases and 377 symptoms, with ~246,000 rows. The dataset was artificially generated, preserving Symptom Severity and Disease Occurrence Possibility. Several distinct groups of symptoms might all be indicators of the same disease. There may even be one single symptom contributing to a disease in a row or sample. This is an indicator of a very high correlation between the symptom and that particular disease. A larger number of rows for a particular disease corresponds to its higher probability of occurrence in the real world. Similarly, in a row, if the feature vector has the occurrence of a single symptom, it implies that this symptom has more correlation to classify the disease than any one symptom of a feature vector with multiple symptoms in another sample.
H
Data from: Global Health Atlas
data.niaid.nih.gov
dataverse.harvard.edu
Updated May 5, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2011). Global Health Atlas [Dataset]. http://doi.org/10.7910/DVN/GJKWGR
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/GJKWGR
Dataset updated
May 5, 2011
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Users can view statistics and generate cross-country comparisons pertaining to infectious diseases and health indicators in 193 WHO member states. Background The Global Health Atlas is a database maintained by the World Health Organization (WHO) that provides information regarding infectious diseases in WHO member states. Health conditions include: malaria, HIV/AIDS, cholera, STIs, meningitis, and polio, among others. User Functionality Users can generate statistics regarding infectious diseases and health systems indicators by country or region, or generate cross-country comparisons. In addition, users can v iew maps showing the distribution of various health indicators and diseases by geographic region or individual country. Data Notes Statistics are available for all WHO member states. Data are available from 1949 to 2009.
U.S. Healthcare Data
kaggle.com
zip
Updated Dec 22, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BuryBuryZymon (2017). U.S. Healthcare Data [Dataset]. https://www.kaggle.com/maheshdadhich/us-healthcare-data
Explore at:
zip(37547642 bytes)Available download formats
Dataset updated
Dec 22, 2017
Authors
BuryBuryZymon
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Health care in the United States is provided by many distinct organizations. Health care facilities are largely owned and operated by private sector businesses. 58% of US community hospitals are non-profit, 21% are government owned, and 21% are for-profit. According to the World Health Organization (WHO), the United States spent more on healthcare per capita ($9,403), and more on health care as percentage of its GDP (17.1%), than any other nation in 2014. Many different datasets are needed to portray different aspects of healthcare in US like disease prevalences, pharmaceuticals and drugs, Nutritional data of different food products available in US. Such data is collected by surveys (or otherwise) conducted by Centre of Disease Control and Prevention (CDC), Foods and Drugs Administration, Center of Medicare and Medicaid Services and Agency for Healthcare Research and Quality (AHRQ). These datasets can be used to properly review demographics and diseases, determining start ratings of healthcare providers, different drugs and their compositions as well as package informations for different diseases and for food quality. We often want such information and finding and scraping such data can be a huge hurdle. So, Here an attempt is made to make available all US healthcare data at one place to download from in csv files.

Content

Nhanes Survey (National Health and Nutrition Examination Survey) - The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. NHANES is a major program of the National Center for Health Statistics (NCHS). NCHS is part of the Centers for Disease Control and Prevention (CDC) and has the responsibility for producing vital and health statistics for the Nation. The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel. The diseases, medical conditions, and health indicators to be studied include: Anemia, Cardiovascular disease, Diabetes, Environmental exposures, Eye diseases, Hearing loss, Infectious diseases, Kidney disease, Nutrition, Obesity, Oral health, Osteoporosis, Physical fitness and physical functioning, Reproductive history and sexual behavior, Respiratory disease (asthma, chronic bronchitis, emphysema), Sexually transmitted diseases, Vision. 10000 individuals are surveyed to represent US statistics. Five files in this datasets represent current recent Nhanes data -
Nhanes_2005_2006.csv
Nhanes_2007_2008.csv
Nhanes_2009_2010.csv
Nhanes_2011_2012.csv
Nhanes_2013_2014.csv

Data fields' description -

Nhanes_2005_2006.csv - Demographic, Dietary, Examinations, Laboratory

Nhanes_2007_2008.csv - Demographic, Dietary, Examinations, Laboratory

Nhanes_2009_2010.csv - Demographic, Dietary, Examinations, Laboratory

Nhanes_2011_2012.csv - Demographic, Dietary, [Examinations](http://https://wwwn.cdc.gov/nchs/nhanes/search/variab...
p
Counts of Meningococcal infectious disease reported in UNITED STATES OF...
tycho.pitt.edu
data.niaid.nih.gov
Updated Apr 1, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Meningococcal infectious disease reported in UNITED STATES OF AMERICA: 1951-2010 [Dataset]. https://www.tycho.pitt.edu/dataset/US.23511006
Explore at:
Dataset updated
Apr 1, 2018
Dataset provided by
Project Tycho, University of Pittsburgh
Authors
Willem G Van Panhuis; Anne L Cross; Donald S Burke
Time period covered
1951 - 2010
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Deaths related to infectious diseases
ec.europa.eu
Updated Oct 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eurostat (2025). Deaths related to infectious diseases [Dataset]. http://doi.org/10.2908/HLTH_CD_IDO
Explore at:
json, tsv, application/vnd.sdmx.data+csv;version=2.0.0, application/vnd.sdmx.data+xml;version=3.0.0, application/vnd.sdmx.genericdata+xml;version=2.1, application/vnd.sdmx.data+csv;version=1.0.0Available download formats
Unique identifier
https://doi.org/10.2908/HLTH_CD_IDO
Dataset updated
Oct 10, 2025
Dataset authored and provided by
Eurostathttps://ec.europa.eu/eurostat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2011 - 2024
Area covered
Finland, Germany, Romania, Italy, Czechia, Ireland, Moldova, Norway, Serbia, Estonia
Description
Data on causes of death (COD) provide information on mortality patterns and form a major element of public health information.

The COD data refer to the underlying cause which - according to the World Health Organisation (WHO) - is "the disease or injury which initiated the train of morbid events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury".

The data are derived from the medical certificate of death, which is obligatory in the Member States. The information recorded in the death certificate is according to the rules specified by the WHO.

Data published in Eurostat's dissemination database are broken down by sex, 5-year age groups, cause of death and by residency and country of occurrence. For stillbirths and neonatal deaths additional breakdowns might include age of mother and parity.

Data are available for Member States, Iceland, Norway, Liechtenstein, Switzerland, United Kingdom, Serbia, Turkey, North Macedonia and Albania. Regional data (NUTS level 2) are available for all of the countries having NUTS2 regions except Albania.

Annual national data are available in Eurostat's dissemination database in absolute number, crude death rates and standardised death rates. At regional level the same is provided in form of 3-years averages (the average of year, year -1 and year -2). Annual crude and standardised death rates are also available at NUTS2 level. Monthly national data are available for 21 EU Member States from reference year 2019 and in 24 Member States from reference year 2022 in absolute numbers and standardised death rates.
fdata-02-00018_Global Awareness Landscape for Ailments—A Twitter Based...
frontiersin.figshare.com
bin
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Durga Toshniwal; Soumya Somani; Rohit Aggarwal; Preeti Malik (2023). fdata-02-00018_Global Awareness Landscape for Ailments—A Twitter Based Microscopic View Into Thought Processes of People.xml [Dataset]. http://doi.org/10.3389/fdata.2019.00018.s002
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.3389/fdata.2019.00018.s002
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Durga Toshniwal; Soumya Somani; Rohit Aggarwal; Preeti Malik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this day and age, people face a lot of stress due to the fast pace of life. Due to this, people in today's digital age, suffer from a plethora of ailments. It is universally accepted that a greater awareness of ailments and their corresponding symptoms leads to an increased lifespan and better quality of life. Early detection and screening can help doctors nip diseases in their natal stages. However, not everyone is aware of them, which makes it a global issue. The study of the degree of disease awareness amongst people belonging to different nations and continents is a matter of great interest. One method that is suitable for this purpose is using clinical data. But, this data is not readily available. However, today a plethora of platforms are available to people to share their thoughts and experiences. People post about many of the important events in their lives on social media. Their posts offer a microscopic view into their lives and thought processes. Based on this intuition, twitter data pertaining to various chronic and acute diseases has been collected. Tweets for 30 deadly ailments have been collected over a period of 3 months amounting to a total of 19 million. A feature extraction approach is proposed which is used to identify the disease awareness levels across different nations. Deriving the global awareness landscape for ailments can help to identify regions which are well aware and also those that need to get aware. Clustering has been used for this purpose.
p
Counts of Anthrax reported in UNITED STATES OF AMERICA: 1942-1945
tycho.pitt.edu
data.niaid.nih.gov
Updated Apr 1, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Anthrax reported in UNITED STATES OF AMERICA: 1942-1945 [Dataset]. https://www.tycho.pitt.edu/dataset/US.409498004
Explore at:
Dataset updated
Apr 1, 2018
Dataset provided by
Project Tycho, University of Pittsburgh
Authors
Willem G Van Panhuis; Anne L Cross; Donald S Burke
Time period covered
1942 - 1945
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
m
Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...
data.mendeley.com
Updated Jul 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak [Dataset]. http://doi.org/10.17632/xmcg82mx9k.3
Explore at:
Unique identifier
https://doi.org/10.17632/xmcg82mx9k.3
Dataset updated
Jul 25, 2022
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2

Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)

The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.

BRFSS 2020 Heart Disease Dataset(Cleaned Version)

zenodo.org

csv

Updated May 4, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Koushal Kumar; BP Pande; Koushal Kumar; BP Pande (2025). BRFSS 2020 Heart Disease Dataset(Cleaned Version) [Dataset]. http://doi.org/10.5281/zenodo.15336526

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15336526

Dataset updated

May 4, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Koushal Kumar; BP Pande; Koushal Kumar; BP Pande

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Originally, the dataset come from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to gather data on the health status of U.S. residents. As the CDC describes: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.". The most recent dataset (as of February 15, 2022) includes data from 2020. It consists of 401,958 rows and 279 columns. The vast majority of columns are questions asked to respondents about their health status, such as "Do you have serious difficulty walking or climbing stairs?" or "Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]".

To improve the efficiency and relevance of our analysis, we removed certain attributes from the original BRFSS dataset. Many of the 279 original attributes included administrative codes, metadata, or survey-specific variables that do not contribute meaningfully to heart disease prediction—such as respondent IDs, timestamps, state-level identifiers, and detailed lifestyle questions unrelated to cardiovascular health. By focusing on a carefully selected subset of 18 attributes directly linked to medical, behavioral, and demographic factors known to influence heart health, we streamlined the dataset. This not only reduced computational complexity but also improved model interpretability and performance by eliminating noise and irrelevant information. All predicting variables could be divided into 4 broad categories:

Demographic factors: sex, age category (14 levels), race, BMI (Body Mass Index)
Diseases: weather respondent ever had such diseases as asthma, skin cancer, diabetes, stroke or kidney disease (not including kidney stones, bladder infection or incontinence)
Unhealthy habits:
- Smoking - respondents that smoked at least 100 cigarettes in their entire life (5 packs = 100 cigarettes)
- Alcohol Drinking - heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week
General Health:
- Difficulty Walking - weather respondent have serious difficulty walking or climbing stairs
- Physical Activity - adults who reported doing physical activity or exercise during the past 30 days other than their regular job
- Sleep Time - respondent’s reported average hours of sleep in a 24-hour period
- Physical Health - number of days being physically ill or injured (0-30 days)
- Mental Health - number of days having bad mental health (0-30 days)
- General Health - respondents declared their health as ’Excellent’, ’Very good’, ’Good’ ,’Fair’ or ’Poor’

Below is a description of the features collected for each patient:

#	Feature	Coded Variable Name	Description
1	HeartDisease	CVDINFR4	Respondents that have ever reported having coronary heart disease (CHD) or myocardial infarction (MI)
2	BMI	_BMI5CAT	Body Mass Index (BMI)
3	Smoking	_SMOKER3	Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]
4	AlcoholDrinking	_RFDRHV7	Heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week
5	Stroke	CVDSTRK3	(Ever told) (you had) a stroke?
6	PhysicalHealth	PHYSHLTH	Now thinking about your physical health, which includes physical illness and injury, for how many days during the past 30
7	MentalHealth	MENTHLTH	Thinking about your mental health, for how many days during the past 30 days was your mental health not good?
8	DiffWalking	DIFFWALK	Do you have serious difficulty walking or climbing stairs?
9	Sex	SEXVAR	Are you male or female?
10	AgeCategory	_AGE_G,	Fourteen-level age category
11	Race	_IMPRACE	Imputed race/ethnicity value
12	Diabetic	DIABETE4	(Ever told) (you had) diabetes?
13	PhysicalActivity	EXERANY2	Adults who reported doing physical activity or exercise during the past 30 days other than their regular job
14	GenHealth	GENHLTH	Would you say that in general your health is...
15	SleepTime	SLEPTIM1	On average, how many hours of sleep do you get in a 24-hour period?
16	Asthma	CHASTHMA	(Ever told) (you had) asthma?
17	KidneyDisease	CHCKDNY2	Not including kidney stones, bladder infection or incontinence, were you ever told you had kidney disease?
18	SkinCancer	CHCSCNCR	(Ever told) (you had) skin cancer?

d
Death Profiles by Leading Causes of Death
catalog.data.gov
data.chhs.ca.gov
+4more
Updated Nov 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Death Profiles by Leading Causes of Death [Dataset]. https://catalog.data.gov/dataset/death-profiles-by-leading-causes-of-death-35077
Explore at:
Dataset updated
Nov 23, 2025
Dataset provided by
California Department of Public Health
Description
Data for deaths by leading cause of death categories are now available in the death profiles dataset for each geographic granularity. The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death. Cause of death categories for years 1999 and later are based on tenth revision of International Classification of Diseases (ICD-10) codes. Comparable categories are provided for years 1979 through 1998 based on ninth revision (ICD-9) codes. For more information on the comparability of cause of death classification between ICD revisions see Comparability of Cause-of-death Between ICD Revisions.
t
FAIR Dataset for Disease Prediction in Healthcare Applications
test.researchdata.tuwien.ac.at
bin, csv, json, png
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf (2025). FAIR Dataset for Disease Prediction in Healthcare Applications [Dataset]. http://doi.org/10.70124/5n77a-dnf02
Explore at:
csv, json, bin, pngAvailable download formats
Unique identifier
https://doi.org/10.70124/5n77a-dnf02
Dataset updated
Apr 14, 2025
Dataset provided by
TU Wien
Authors
Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Description

Context and Methodology

Research Domain/Project:
This dataset was created for a machine learning experiment aimed at developing a classification model to predict outcomes based on a set of features. The primary research domain is disease prediction in patients. The dataset was used in the context of training, validating, and testing.

Purpose of the Dataset:
The purpose of this dataset is to provide training, validation, and testing data for the development of machine learning models. It includes labeled examples that help train classifiers to recognize patterns in the data and make predictions.

Dataset Creation:
Data preprocessing steps involved cleaning, normalization, and splitting the data into training, validation, and test sets. The data was carefully curated to ensure its quality and relevance to the problem at hand. For any missing values or outliers, appropriate handling techniques were applied (e.g., imputation, removal, etc.).

Technical Details

Structure of the Dataset:
The dataset consists of several files organized into folders by data type:

Training Data: Contains the training dataset used to train the machine learning model.

Validation Data: Used for hyperparameter tuning and model selection.

Test Data: Reserved for final model evaluation.

Each folder contains files with consistent naming conventions for easy navigation, such as train_data.csv, validation_data.csv, and test_data.csv. Each file follows a tabular format with columns representing features and rows representing individual data points.

Software Requirements:
To open and work with this dataset, you need VS Code or Jupyter, which could include tools like:

Python (with libraries such as pandas, numpy, scikit-learn, matplotlib, etc.)

Further Details

Reusability:
Users of this dataset should be aware that it is designed for machine learning experiments involving classification tasks. The dataset is already split into training, validation, and test subsets. Any model trained with this dataset should be evaluated using the test set to ensure proper validation.

Limitations:
The dataset may not cover all edge cases, and it might have biases depending on the selection of data sources. It's important to consider these limitations when generalizing model results to real-world applications.
d
Data from: Non-listed disease report to OIE (World Organisation for Animal...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Non-listed disease report to OIE (World Organisation for Animal Health) for the 1st semester of 2019 [Dataset]. https://catalog.data.gov/dataset/non-listed-disease-report-to-oie-world-organisation-for-animal-health-for-the-1st-semester
Explore at:
Dataset updated
Nov 12, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
As a member of the World Organisation for Animal Health (OIE), and the reporting authority for the United States, the USGS National Wildlife Health Center (NWHC) is responsible for reporting wildlife disease outbreaks that involve diseases which are not OIE-listed (https://www.oie.int/wahis_2/public/wahidwild.php# ). These outbreaks are to be reported on a semesterly basis via OIE’s WAHIS-Wild reporting system. The data fields described within are based on those in WAHIS-Wild. Since OIE’s reporting mechanism is based primarily on domestic and agricultural animals, several of the variables are not applicable to wildlife (i.e. vaccination, slaughtered, etc.). In an effort to use a consistent data source that is broad in scope and captures information from around the country, from various natural resource management authorities, NWHC will use the Wildlife Health Information Sharing Partnership - Event Reporting System (WHISPers - https://whispers.usgs.gov/home ) as the sole source to generate and supply the requested information to OIE. Data supplied to OIE have been restricted to publicly available information on wildlife morbidity/mortality and surveillance events in WHISPers.
p
Counts of Disease caused by West Nile virus reported in UNITED STATES OF...
tycho.pitt.edu
data.niaid.nih.gov
Updated Apr 1, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Disease caused by West Nile virus reported in UNITED STATES OF AMERICA: 2002-2005 [Dataset]. https://www.tycho.pitt.edu/dataset/US.417093003
Explore at:
Dataset updated
Apr 1, 2018
Dataset provided by
Project Tycho, University of Pittsburgh
Authors
Willem G Van Panhuis; Anne L Cross; Donald S Burke
Time period covered
2002 - 2005
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
NNDSS - TABLE 1O. Hansen's disease to Hantavirus pulmonary syndrome
catalog.data.gov
datahub.hhs.gov
+3more
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2025). NNDSS - TABLE 1O. Hansen's disease to Hantavirus pulmonary syndrome [Dataset]. https://catalog.data.gov/dataset/nndss-table-1o-hansens-disease-to-hantavirus-pulmonary-syndrome
Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
NNDSS - TABLE 1O. Hansen's disease to Hantavirus pulmonary syndrome - 2020. In this Table, provisional cases* of notifiable diseases are displayed for United States, U.S. territories, and Non-U.S. residents. Note: This table contains provisional cases of national notifiable diseases from the National Notifiable Diseases Surveillance System (NNDSS). NNDSS data from the 50 states, New York City, the District of Columbia and the U.S. territories are collated and published weekly on the NNDSS Data and Statistics web page (https://wwwn.cdc.gov/nndss/data-and-statistics.html). Cases reported by state health departments to CDC for weekly publication are provisional because of the time needed to complete case follow-up. Therefore, numbers presented in later weeks may reflect changes made to these counts as additional information becomes available. The national surveillance case definitions used to define a case are available on the NNDSS web site at https://wwwn.cdc.gov/nndss/. Information about the weekly provisional data and guides to interpreting data are available at: https://wwwn.cdc.gov/nndss/infectious-tables.html. Footnotes: U: Unavailable — The reporting jurisdiction was unable to send the data to CDC or CDC was unable to process the data. -: No reported cases — The reporting jurisdiction did not submit any cases to CDC. N: Not reportable — The disease or condition was not reportable by law, statute, or regulation in the reporting jurisdiction. NN: Not nationally notifiable — This condition was not designated as being nationally notifiable. NP: Nationally notifiable but not published. NC: Not calculated — There is insufficient data available to support the calculation of this statistic. Cum: Cumulative year-to-date counts. Max: Maximum — Maximum case count during the previous 52 weeks. * Case counts for reporting years 2019 and 2020 are provisional and subject to change. Cases are assigned to the reporting jurisdiction submitting the case to NNDSS, if the case's country of usual residence is the U.S., a U.S. territory, unknown, or null (i.e. country not reported); otherwise, the case is assigned to the 'Non-U.S. Residents' category. Country of usual residence is currently not reported by all jurisdictions or for all conditions. For further information on interpretation of these data, see https://wwwn.cdc.gov/nndss/document/Users_guide_WONDER_tables_cleared_final.pdf. †Previous 52 week maximum and cumulative YTD are determined from periods of time when the condition was reportable in the jurisdiction (i.e., may be less than 52 weeks of data or incomplete YTD data). § Includes data for old world hantavirus infections, such as Seoul virus infections. Prior to 2015, this condition was not nationally notifiable and data for this condition was not submitted to CDC's National Notifiable Diseases Surveillance System (NNDSS). ¶ Includes data for Andes virus infections.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mohamadreza Momeni (2024). Pandemics in World [Dataset]. https://www.kaggle.com/datasets/imtkaggleteam/pandemics-in-world

Pandemics in World

"an epidemic occurring on a scale that crosses international boundaries.

Explore at:

102 scholarly articles cite this dataset (View in Google Scholar)

zip(1428134 bytes)Available download formats

Dataset updated

Jan 9, 2024

Authors

Mohamadreza Momeni

Area covered

World

Description

By Saloni Dattani, Lucas Rodés-Guirao, Edouard Mathieu, Hannah Ritchie and Max Roser.

Data description:

Disease outbreaks may be inevitable, but large-scale pandemics are not. The world can respond swiftly and effectively to pandemic risks in the future with better understanding, resources, and effort.

To avoid suffering through another large pandemic, we have to take the risk of pandemics seriously. Despite warnings that another one was likely, the COVID-19 pandemic killed more than 27 million people.1

We must build the capacity to test for pathogens and understand them: which pathogens put us at the greatest risk, how they spread, and how to tackle them.

We know it is possible to greatly reduce the risk of infectious disease. We’ve learned over history how to reduce their impact with vaccines, public health efforts, and medicine.

In addition to the old risks, we face new threats from factory farming, genetic modification, climate change, and antimicrobial resistance. With more attention and effort, we can reduce their risks too.

Good luck in your analysis.

Clear search

Close search

Google apps

Main menu

Pandemics in World

Data from: A global dataset of pandemic- and epidemic-prone disease...

World Health Organization Estimates of the Global and Regional Disease...

Global Health Observatory (GHO)

California Infectious Disease Cases

California Infectious Disease Cases

Rates and Counts By County, Disease, Sex, and Year (2001-2014)

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Global Burden of Disease analysis dataset of noncommunicable disease...

Disease and symptoms dataset 2023

Data from: Global Health Atlas

U.S. Healthcare Data

Context

Content

Counts of Meningococcal infectious disease reported in UNITED STATES OF...

Deaths related to infectious diseases

fdata-02-00018_Global Awareness Landscape for Ailments—A Twitter Based...

Counts of Anthrax reported in UNITED STATES OF AMERICA: 1942-1945

Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...

BRFSS 2020 Heart Disease Dataset(Cleaned Version)

Death Profiles by Leading Causes of Death

FAIR Dataset for Disease Prediction in Healthcare Applications

Dataset Description

Context and Methodology

Technical Details

Further Details

Data from: Non-listed disease report to OIE (World Organisation for Animal...

Counts of Disease caused by West Nile virus reported in UNITED STATES OF...

NNDSS - TABLE 1O. Hansen's disease to Hantavirus pulmonary syndrome

Pandemics in World

"an epidemic occurring on a scale that crosses international boundaries.