100+ datasets found

Death in the United States
kaggle.com
zip
Updated Aug 3, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2017). Death in the United States [Dataset]. https://www.kaggle.com/datasets/cdc/mortality
Explore at:
zip(766333584 bytes)Available download formats
Dataset updated
Aug 3, 2017
Dataset authored and provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Every year the CDC releases the country’s most detailed report on death in the United States under the National Vital Statistics Systems. This mortality dataset is a record of every death in the country for 2005 through 2015, including detailed information about causes of death and the demographic background of the deceased.

It's been said that "statistics are human beings with the tears wiped off." This is especially true with this dataset. Each death record represents somebody's loved one, often connected with a lifetime of memories and sometimes tragically too short.

Putting the sensitive nature of the topic aside, analyzing mortality data is essential to understanding the complex circumstances of death across the country. The US Government uses this data to determine life expectancy and understand how death in the U.S. differs from the rest of the world. Whether you’re looking for macro trends or analyzing unique circumstances, we challenge you to use this dataset to find your own answers to one of life’s great mysteries.

Overview

This dataset is a collection of CSV files each containing one year's worth of data and paired JSON files containing the code mappings, plus an ICD 10 code set. The CSVs were reformatted from their original fixed-width file formats using information extracted from the CDC's PDF manuals using this script. Please note that this process may have introduced errors as the text extracted from the pdf is not a perfect match. If you have any questions or find errors in the preparation process, please leave a note in the forums. We hope to publish additional years of data using this method soon.

A more detailed overview of the data can be found here. You'll find that the fields are consistent within this time window, but some of data codes change every few years. For example, the 113_cause_recode entry 069 only covers ICD codes (I10,I12) in 2005, but by 2015 it covers (I10,I12,I15). When I post data from years prior to 2005, expect some of the fields themselves to change as well.

All data comes from the CDC’s National Vital Statistics Systems, with the exception of the Icd10Code, which are sourced from the World Health Organization.

Project ideas

The CDC's mortality data was the basis of a widely publicized paper, by Anne Case and Nobel prize winner Angus Deaton, arguing that middle-aged whites are dying at elevated rates. One of the criticisms against the paper is that it failed to properly account for the exact ages within the broad bins available through the CDC's WONDER tool. What do these results look like with exact/not-binned age data?

Similarly, how sensitive are the mortality trends being discussed in the news to the choice of bin-widths?

As noted above, the data preparation process could have introduced errors. Can you find any discrepancies compared to the aggregate metrics on WONDER? If so, please let me know in the forums!

WONDER is cited in numerous economics, sociology, and public health research papers. Can you find any papers whose conclusions would be altered if they used the exact data available here rather than binned data from Wonder?

Differences from the first version of the dataset

This version of the dataset was prepared in a completely different many. This has allowed us to provide a much larger volume of data and ensure that codes are available for every field.

We've replaced the batch of sql files with a single JSON per year. Kaggle's platform currently offer's better support for JSON files, and this keeps the number of files manageable.

A tutorial kernel providing a quick introduction to the new format is available here.

Lastly, I apologize if the transition has interrupted anyone's work! If need be, you can still download v1.
T
CORONAVIRUS DEATHS by Country Dataset
tradingeconomics.com
csv, excel, json, xml
Updated Mar 4, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2020). CORONAVIRUS DEATHS by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/coronavirus-deaths
Explore at:
csv, excel, xml, jsonAvailable download formats
Dataset updated
Mar 4, 2020
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2025
Area covered
World
Description
This dataset provides values for CORONAVIRUS DEATHS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Death Profiles by County
data.chhs.ca.gov
healthdata.gov
+3more
csv, zip
Updated Oct 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Death Profiles by County [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-county
Explore at:
zip, csv(28125832), csv(60023260), csv(15127221), csv(60201673), csv(75015194), csv(5095), csv(52019564), csv(73906266), csv(74351424), csv(1128641), csv(24235858), csv(74497014), csv(74043128), csv(26976161), csv(74689382), csv(51592721), csv(60676655), csv(11738570), csv(60517511)Available download formats
Dataset updated
Oct 2, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
Child and Infant Mortality
kaggle.com
Updated Aug 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hrterhrter (2022). Child and Infant Mortality [Dataset]. https://www.kaggle.com/datasets/programmerrdai/child-and-infant-mortality
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 21, 2022
Dataset provided by
Kaggle
Authors
hrterhrter
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
One in every 100 children dies before completing one year of life. Around 68 percent of infant mortality is attributed to deaths of children before completing 1 month. 15,000 children die every day – Child mortality is an everyday tragedy of enormous scale that rarely makes the headlines Child mortality rates have declined in all world regions, but the world is not on track to reach the Sustainable Development Goal for child mortality Before the Modern Revolution child mortality was very high in all societies that we have knowledge of – a quarter of all children died in the first year of life, almost half died before reaching the end of puberty Over the last two centuries all countries in the world have made very rapid progress against child mortality. From 1800 to 1950 global mortality has halved from around 43% to 22.5%. Since 1950 the mortality rate has declined five-fold to 4.5% in 2015. All countries in the world have benefitted from this progress In the past it was very common for parents to see children die, because both, child mortality rates and fertility rates were very high. In Europe in the mid 18th century parents lost on average between 3 and 4 of their children Based on this overview we are asking where the world is today – where are children dying and what are they dying from?

5.4 million children died in 2017 – Where did these children die? Pneumonia is the most common cause of death, preterm births and neonatal disorders is second, and diarrheal diseases are third – What are children today dying from? This is the basis for answering the question what can we do to make further progress against child mortality? We will extend this entry over the course of 2020.

@article{owidchildmortality, author = {Max Roser, Hannah Ritchie and Bernadeta Dadonaite}, title = {Child and Infant Mortality}, journal = {Our World in Data}, year = {2013}, note = {https://ourworldindata.org/child-mortality} }
Statewide Death Profiles
data.chhs.ca.gov
data.ca.gov
+3more
csv, zip
Updated Oct 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Statewide Death Profiles [Dataset]. https://data.chhs.ca.gov/dataset/statewide-death-profiles
Explore at:
csv(419332), csv(5034), csv(5401561), csv(463460), csv(2026589), csv(16301), csv(200270), csv(4689434), zip, csv(164006), csv(429224)Available download formats
Dataset updated
Oct 2, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This dataset contains counts of deaths for California as a whole based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

The final data tables include both deaths that occurred in California regardless of the place of residence (by occurrence) and deaths to California residents (by residence), whereas the provisional data table only includes deaths that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
Provisional COVID-19 death counts, rates, and percent of total deaths, by...
catalog.data.gov
data.virginia.gov
+2more
Updated Sep 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2025). Provisional COVID-19 death counts, rates, and percent of total deaths, by jurisdiction of residence [Dataset]. https://catalog.data.gov/dataset/provisional-covid-19-death-counts-rates-and-percent-of-total-deaths-by-jurisdiction-of-res
Explore at:
Dataset updated
Sep 26, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
This file contains COVID-19 death counts, death rates, and percent of total deaths by jurisdiction of residence. The data is grouped by different time periods including 3-month period, weekly, and total (cumulative since January 1, 2020). United States death counts and rates include the 50 states, plus the District of Columbia and New York City. New York state estimates exclude New York City. Puerto Rico is included in HHS Region 2 estimates. Deaths with confirmed or presumed COVID-19, coded to ICD–10 code U07.1. Number of deaths reported in this file are the total number of COVID-19 deaths received and coded as of the date of analysis and may not represent all deaths that occurred in that period. Counts of deaths occurring before or after the reporting period are not included in the file. Data during recent periods are incomplete because of the lag in time between when the death occurred and when the death certificate is completed, submitted to NCHS and processed for reporting purposes. This delay can range from 1 week to 8 weeks or more, depending on the jurisdiction and cause of death. Death counts should not be compared across states. Data timeliness varies by state. Some states report deaths on a daily basis, while other states report deaths weekly or monthly. The ten (10) United States Department of Health and Human Services (HHS) regions include the following jurisdictions. Region 1: Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont; Region 2: New Jersey, New York, New York City, Puerto Rico; Region 3: Delaware, District of Columbia, Maryland, Pennsylvania, Virginia, West Virginia; Region 4: Alabama, Florida, Georgia, Kentucky, Mississippi, North Carolina, South Carolina, Tennessee; Region 5: Illinois, Indiana, Michigan, Minnesota, Ohio, Wisconsin; Region 6: Arkansas, Louisiana, New Mexico, Oklahoma, Texas; Region 7: Iowa, Kansas, Missouri, Nebraska; Region 8: Colorado, Montana, North Dakota, South Dakota, Utah, Wyoming; Region 9: Arizona, California, Hawaii, Nevada; Region 10: Alaska, Idaho, Oregon, Washington. Rates were calculated using the population estimates for 2021, which are estimated as of July 1, 2021 based on the Blended Base produced by the US Census Bureau in lieu of the April 1, 2020 decennial population count. The Blended Base consists of the blend of Vintage 2020 postcensal population estimates, 2020 Demographic Analysis Estimates, and 2020 Census PL 94-171 Redistricting File (see https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/2020-2021/methods-statement-v2021.pdf). Rates are based on deaths occurring in the specified week/month and are age-adjusted to the 2000 standard population using the direct method (see https://www.cdc.gov/nchs/data/nvsr/nvsr70/nvsr70-08-508.pdf). These rates differ from annual age-adjusted rates, typically presented in NCHS publications based on a full year of data and annualized weekly/monthly age-adjusted rates which have been adjusted to allow comparison with annual rates. Annualization rates presents deaths per year per 100,000 population that would be expected in a year if the observed period specific (weekly/monthly) rate prevailed for a full year. Sub-national death counts between 1-9 are suppressed in accordance with NCHS data confidentiality standards. Rates based on death counts less than 20 are suppressed in accordance with NCHS standards of reliability as specified in NCHS Data Presentation Standards for Proportions (available from: https://www.cdc.gov/nchs/data/series/sr_02/sr02_175.pdf.).
Leading causes of death, total population, by age group
www150.statcan.gc.ca
open.canada.ca
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Leading causes of death, total population, by age group [Dataset]. http://doi.org/10.25318/1310039401-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310039401-eng
Dataset updated
Feb 19, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Rank, number of deaths, percentage of deaths, and age-specific mortality rates for the leading causes of death, by age group and sex, 2000 to most recent year.
Deaths, by month
www150.statcan.gc.ca
gimi9.com
+2more
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Deaths, by month [Dataset]. http://doi.org/10.25318/1310070801-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310070801-eng
Dataset updated
Feb 19, 2025
Dataset provided by
Government of Canadahttp://www.gg.ca/
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Number and percentage of deaths, by month and place of residence, 1991 to most recent year.
g
World Bank Group Entrepreneurship, Entreprenuership Database World Bank,...
geocommons.com
Updated Apr 29, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data (2008). World Bank Group Entrepreneurship, Entreprenuership Database World Bank, World, 2007 [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
Apr 29, 2008
Dataset provided by
World Bank Group Entrepreneurship
data
Description
The 2007 World Bank Group Entrepreneurship Survey measures entrepreneurial activity in 84 developing and industrial countries over the period 2003-2005. The database includes cross-country, time-series data on the number of total and newly registered businesses, collected directly from Registrar of Companies around the world. In its second year, this survey incorporates improvements in methodology, and expanded participation from countries covered, allowing for greater cross-border compatibility of data compared with the 2006 survey. This joint effort by the IFC SME Department and the World Bank Developing Research Group is the most comprehensive dataset on cross-country firm entry data available today. This database The World Bank Group Entrepreneurship Dataaset presents data collected primarily from country business registries using the first annual World Bank Group Questionnaire on Entrepreneurship (alternative sources were tax authorities, finance ministries, and national statistics offices). For more information on the author of the database, Leora Klapper, visit: http://go.worldbank.org/DK5AHCQSO0. This data was access at the preceeding link, on October 11, 2007. Please visit the link for more information in regards to this dataset.
g
CIA Factbook, Death Rate by Country, World, 2007
geocommons.com
Updated May 27, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data (2008). CIA Factbook, Death Rate by Country, World, 2007 [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
May 27, 2008
Dataset provided by
data
Description
This dataset gives the average annual number of deaths during a year per 1,000 population at midyear; also known as crude death rate. This information was found at the CIA's World Factbook 2007. The site had this to say about death rate, "The death rate, while only a rough indicator of the mortality situation in a country, accurately indicates the current mortality impact on population growth. This indicator is significantly affected by age distribution, and most countries will eventually show a rise in the overall death rate, in spite of continued decline in mortality at all ages, as declining fertility results in an aging population." Source: https://www.cia.gov/library/publications/the-world-factbook/docs/notesanddefs.html#2010 Accessed: 9.17.07
OWID COVID19
kaggle.com
Updated Oct 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
beluga (2020). OWID COVID19 [Dataset]. https://www.kaggle.com/gaborfodor/owid-covid19/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 27, 2020
Dataset provided by
Kaggle
Authors
beluga
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Content

Widely available data on confirmed cases only becomes meaningful when it can be interpreted in light of how much a country is testing. This is why Our World in Data built the global database on COVID-19 testing [1]. The additional smoothing and per capita rates make different countries (somewhat) comparable.

Our World in Data also had a good overview of global cause of death two years ago [2] I shared that data as well for additional comparisons.

Acknowledgements

[1] Max Roser, Hannah Ritchie, Esteban Ortiz-Ospina and Joe Hasell (2020) - "Coronavirus Pandemic (COVID-19)". Published online at OurWorldInData.org. https://ourworldindata.org/coronavirus

[2] Hannah Ritchie (2018) - "Causes of Death". Published online at OurWorldInData.org. https://ourworldindata.org/causes-of-death

Inspiration

First and second wave differences in Europe

Global Forecasting for total 2020 numbers

Covid death toll compared to main cause of death
Deaths Involving COVID-19 by Vaccination Status
open.canada.ca
gimi9.com
+1more
csv, docx, html, xlsx
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Ontario (2025). Deaths Involving COVID-19 by Vaccination Status [Dataset]. https://open.canada.ca/data/dataset/1375bb00-6454-4d3e-a723-4ae9e849d655
Explore at:
docx, csv, html, xlsxAvailable download formats
Dataset updated
Jul 30, 2025
Dataset provided by
Government of Ontariohttps://www.ontario.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Mar 1, 2021 - Nov 12, 2024
Description
This dataset reports the daily reported number of the 7-day moving average rates of Deaths involving COVID-19 by vaccination status and by age group. Learn how the Government of Ontario is helping to keep Ontarians safe during the 2019 Novel Coronavirus outbreak. Effective November 14, 2024 this page will no longer be updated. Information about COVID-19 and other respiratory viruses is available on Public Health Ontario’s interactive respiratory virus tool: https://www.publichealthontario.ca/en/Data-and-Analysis/Infectious-Disease/Respiratory-Virus-Tool Data includes: * Date on which the death occurred * Age group * 7-day moving average of the last seven days of the death rate per 100,000 for those not fully vaccinated * 7-day moving average of the last seven days of the death rate per 100,000 for those fully vaccinated * 7-day moving average of the last seven days of the death rate per 100,000 for those vaccinated with at least one booster ##Additional notes As of June 16, all COVID-19 datasets will be updated weekly on Thursdays by 2pm. As of January 12, 2024, data from the date of January 1, 2024 onwards reflect updated population estimates. This update specifically impacts data for the 'not fully vaccinated' category. On November 30, 2023 the count of COVID-19 deaths was updated to include missing historical deaths from January 15, 2020 to March 31, 2023. CCM is a dynamic disease reporting system which allows ongoing update to data previously entered. As a result, data extracted from CCM represents a snapshot at the time of extraction and may differ from previous or subsequent results. Public Health Units continually clean up COVID-19 data, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes and current totals being different from previously reported cases and deaths. Observed trends over time should be interpreted with caution for the most recent period due to reporting and/or data entry lags. The data does not include vaccination data for people who did not provide consent for vaccination records to be entered into the provincial COVaxON system. This includes individual records as well as records from some Indigenous communities where those communities have not consented to including vaccination information in COVaxON. “Not fully vaccinated” category includes people with no vaccine and one dose of double-dose vaccine. “People with one dose of double-dose vaccine” category has a small and constantly changing number. The combination will stabilize the results. Spikes, negative numbers and other data anomalies: Due to ongoing data entry and data quality assurance activities in Case and Contact Management system (CCM) file, Public Health Units continually clean up COVID-19, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes, negative numbers and current totals being different from previously reported case and death counts. Public Health Units report cause of death in the CCM based on information available to them at the time of reporting and in accordance with definitions provided by Public Health Ontario. The medical certificate of death is the official record and the cause of death could be different. Deaths are defined per the outcome field in CCM marked as “Fatal”. Deaths in COVID-19 cases identified as unrelated to COVID-19 are not included in the Deaths involving COVID-19 reported. Rates for the most recent days are subject to reporting lags All data reflects totals from 8 p.m. the previous day. This dataset is subject to change.
m
An Extensive Dataset for the Heart Disease Classification System
data.mendeley.com
Updated Feb 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sozan S. Maghdid (2022). An Extensive Dataset for the Heart Disease Classification System [Dataset]. http://doi.org/10.17632/65gxgy2nmg.1
Explore at:
Unique identifier
https://doi.org/10.17632/65gxgy2nmg.1
Dataset updated
Feb 15, 2022
Authors
Sozan S. Maghdid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Finding a good data source is the first step toward creating a database. Cardiovascular illnesses (CVDs) are the major cause of death worldwide. CVDs include coronary heart disease, cerebrovascular disease, rheumatic heart disease, and other heart and blood vessel problems. According to the World Health Organization, 17.9 million people die each year. Heart attacks and strokes account for more than four out of every five CVD deaths, with one-third of these deaths occurring before the age of 70 A comprehensive database for factors that contribute to a heart attack has been constructed , The main purpose here is to collect characteristics of Heart Attack or factors that contribute to it. As a result, a form is created to accomplish this. Microsoft Excel was used to create this form. Figure 1 depicts the form which It has nine fields, where eight fields for input fields and one field for output field. Age, gender, heart rate, systolic BP, diastolic BP, blood sugar, CK-MB, and Test-Troponin are representing the input fields, while the output field pertains to the presence of heart attack, which is divided into two categories (negative and positive).negative refers to the absence of a heart attack, while positive refers to the presence of a heart attack.Table 1 show the detailed information and max and min of values attributes for 1319 cases in the whole database.To confirm the validity of this data, we looked at the patient files in the hospital archive and compared them with the data stored in the laboratories system. On the other hand, we interviewed the patients and specialized doctors. Table 2 is a sample for 1320 cases, which shows 44 cases and the factors that lead to a heart attack in the whole database,After collecting this data, we checked the data if it has null values (invalid values) or if there was an error during data collection. The value is null if it is unknown. Null values necessitate special treatment. This value is used to indicate that the target isn’t a valid data element. When trying to retrieve data that isn't present, you can come across the keyword null in Processing. If you try to do arithmetic operations on a numeric column with one or more null values, the outcome will be null. An example of a null values processing is shown in Figure 2.The data used in this investigation were scaled between 0 and 1 to guarantee that all inputs and outputs received equal attention and to eliminate their dimensionality. Prior to the use of AI models, data normalization has two major advantages. The first is to avoid overshadowing qualities in smaller numeric ranges by employing attributes in larger numeric ranges. The second goal is to avoid any numerical problems throughout the process.After completion of the normalization process, we split the data set into two parts - training and test sets. In the test, we have utilized1060 for train 259 for testing Using the input and output variables, modeling was implemented.
g
UNEP, Diseases of the Respiratory System - Number of Deaths per 100000...
geocommons.com
Updated Jun 2, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data (2008). UNEP, Diseases of the Respiratory System - Number of Deaths per 100000 Population by Country, World, 1979-2003 [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
Jun 2, 2008
Dataset provided by
UNEP-United Nations Environment Programme
data
Description
Diseases of the Respiratory System: Effects are generally irritation and reduced lung function with increased incidence of respiratory disease, especially in more susceptible members of the population such as young children, the elderly and asthmatics. Diseases of the Respiratory System includes: ICD-9 BTL codes B31-B32, ICD-9 code CH08 for some ex-USSR countries, ICD-9 code C052 for China, ICD-10 codes J00-J99, European mortality indicator database (HFA-MDB), available at www.euro.who.int, for missing figures for some european countries: indicator "3250 Deaths, Diseases of the Respiratory System" The original dataset uses a value of -9999 to indicate no data available, i have substituted a value of 0. Online resource: http://geodata.grid.unep.ch URL original source: http://www3.who.int/whosis/mort/text/download.cfm?path=whosis,evidence,whsa,mort_download&language=english
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
f
Data from: Epidemiology, resource use, and treatment patterns of locally...
tandf.figshare.com
docx
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florence Joly; Stephane Culine; Morgan Roupret; Aurore Tricotel; Emilie Casarotto; Sandrine Brice; Rafael Minacori; Marthe Vuillet; Marie-Catherine Thomas; Kirsten Leyland; Anil Upadhyay; Vicki Munro; Torsten Strunz-McKendry (2025). Epidemiology, resource use, and treatment patterns of locally advanced or metastatic urothelial carcinoma in France [Dataset]. http://doi.org/10.6084/m9.figshare.28450102.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28450102.v1
Dataset updated
Mar 3, 2025
Dataset provided by
Taylor & Francis
Authors
Florence Joly; Stephane Culine; Morgan Roupret; Aurore Tricotel; Emilie Casarotto; Sandrine Brice; Rafael Minacori; Marthe Vuillet; Marie-Catherine Thomas; Kirsten Leyland; Anil Upadhyay; Vicki Munro; Torsten Strunz-McKendry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Describe real-world epidemiology, treatment patterns, health care resource utilization, and costs of locally advanced or metastatic urothelial carcinoma (la/mUC) in France. Retrospective study including all adults with la/mUC diagnosis during January 2017 to December 2020 in the PMSI database. Annual prevalence and incidence ranged from 36.4 to 38.9 and 16.4 to 18.5 cases per 100,000 people, respectively. Of the 25,314 patients with incident la/mUC, 37.6% did not receive first-line systemic treatment. Of the 14,656 patients who started first-line systemic treatment, 66.6%, 22.5%, and 10.9% received 1, 2, and 3 lines of therapy, respectively. Annual per-patient costs in second-/third-line setting ranged from €8803 to €16,012. The substantial disease burden of la/mUC in France highlights the unmet need for new therapies. What is this article about? Urothelial carcinoma (UC) is a type of cancer affecting the urinary system. It can spread to other parts of the body, described as locally advanced or metastatic (la/m). We used information from a French database recording hospitalizations in France to find out how many people have la/mUC, how many new cases develop each year, what treatments they receive, how many die in the hospital, and how much their care costs. What were the results? Based on database information, 37 to 39 of every 100,000 people have la/mUC and 17 to 19 of every 100,000 people are identified with a new case yearly. Slightly more than one-third of patients with la/mUC did not receive recommended treatment (chemotherapy) when first diagnosed. Chemotherapy was the most common treatment type for the first, second, or third treatment; checkpoint inhibitors (a unique treatment) became more commonly used as a second treatment over time. Yearly in-hospital death rates were high, ranging from 47.8% of patients who died within 1 year from diagnosis to 62.9% dying within 3 years. Yearly cost of care was high (costing €8803 to €16,012) in patients starting a second or third treatment. What do the results of the study mean? The study shows many patients may not be fit enough or choose not to receive treatment. Even those receiving treatment are at high risk for poor outcomes. The burden of la/mUC in France is high, underscoring the need for more therapies and better supportive care early in disease management.
Anime Quest Dataset
kaggle.com
Updated Jun 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Yasmi Tohabar Evon (2023). Anime Quest Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/6045074
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/6045074
Dataset updated
Jun 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Md Yasmi Tohabar Evon
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset contains information about Anime scraped from Anime Planet on 28/06/2023. It contains information about anime (episodes, aired date, rating, genre, etc.), and favorite anime based on the countries and top countries that watch the most anime.

The scraped program of this dataset is in Anime.Quest GitHub repository.

Tableau visualization of this dataset can also be found in Anime Quest: Visualization.

Content

The dataset contains 3 files:

📁 anime_data.csv: 1. Name: Full name of the anime 2. Media Type: TV, Web, Movie, etc. 3. Episodes: Total episodes of the anime 4. Studio: Name of the studios of the anime, from most recent to oldest. 5. Start Year: Release Year of the anime 6. End Year: Last year of the anime airing 7. Ongoing: Is the anime currently airing or not? True or False. 8. Release Season: Spring, Fall, Winter, and Summer 9. Rating: The global rating ranges from 0 to 5. 10. Rank: Global ranking of the anime 11. Members: Total members of the anime 12. Genre: The category of the anime 13. Creator: Creator of the anime

📁 anime_top_by_country_data.csv: 1. Country: Individual country name 2. Most Popular: The most popular anime in the country 3. 2nd Place: Second-most popular anime in the country 4. 3rd Place: Third-most popular anime in the country 5. 4th Place: Fourth-most popular anime in the country 6. 5th Place: The fifth-most popular anime in the country

📁 anime_watching_data.csv: 1. Rank: Ranking of countries based on the number of anime viewers 2. Country: Individual country name 3. Population: Total population of the country 4. Percentage of People Watching: Percentage of people watching anime in the country 5. Number of People Watching: Total number of people watching anime in the country

Acknowledgements

The website Anime Planet was used to scrape this dataset. Please include citations for this dataset if you use it in your own research.

Inspiration

This dataset can be used to find the factors determining an anime's rating and ranking. Additionally, it can be used to make anime recommendations. The pattern can be observed in anime.
d
Shuttle Radar Topography Mission 1-arc second Global
catalog.data.gov
cmr.earthdata.nasa.gov
+1more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DOI/USGS/EROS (2025). Shuttle Radar Topography Mission 1-arc second Global [Dataset]. https://catalog.data.gov/dataset/shuttle-radar-topography-mission-1-arc-second-global
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
DOI/USGS/EROS
Description
The Shuttle Radar Topography Mission (SRTM) was flown aboard the space shuttle Endeavour February 11-22, 2000. The National Aeronautics and Space Administration (NASA) and the National Geospatial-Intelligence Agency (NGA) participated in an international project to acquire radar data which were used to create the first near-global set of land elevations. The radars used during the SRTM mission were actually developed and flown on two Endeavour missions in 1994. The C-band Spaceborne Imaging Radar and the X-Band Synthetic Aperture Radar (X-SAR) hardware were used on board the space shuttle in April and October 1994 to gather data about Earth's environment. The technology was modified for the SRTM mission to collect interferometric radar, which compared two radar images or signals taken at slightly different angles. This mission used single-pass interferometry, which acquired two signals at the same time by using two different radar antennas. An antenna located on board the space shuttle collected one data set and the other data set was collected by an antenna located at the end of a 60-meter mast that extended from the shuttle. Differences between the two signals allowed for the calculation of surface elevation. Endeavour orbited Earth 16 times each day during the 11-day mission, completing 176 orbits. SRTM successfully collected radar data over 80% of the Earth's land surface between 60° north and 56° south latitude with data points posted every 1 arc-second (approximately 30 meters). Two resolutions of finished grade SRTM data are available through EarthExplorer from the collection held in the USGS EROS archive: 1 arc-second (approximately 30-meter) high resolution elevation data offer worldwide coverage of void filled data at a resolution of 1 arc-second (30 meters) and provide open distribution of this high-resolution global data set. Some tiles may still contain voids. The SRTM 1 Arc-Second Global (30 meters) data set will be released in phases starting September 24, 2014. Users should check the coverage map in EarthExplorer to verify if their area of interest is available. 3 arc-second (approximately 90-meter) medium resolution elevation data are available for global coverage. The 3 arc-second data were resampled using cubic convolution interpolation for regions between 60° north and 56° south latitude. [Summary provided by the USGS.]
COVID Vaccination in World (updated daily)
kaggle.com
Updated Jun 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rishav Sharma (2024). COVID Vaccination in World (updated daily) [Dataset]. http://doi.org/10.34740/kaggle/dsv/8704848
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/8704848
Dataset updated
Jun 16, 2024
Dataset provided by
Kaggle
Authors
Rishav Sharma
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
World
Description
Context

The data is collected from OWID (Our World in Data) GitHub repository, which is updated on daily bases.

Content

This dataset contains only one file vaccinations.csv, which contains the records of vaccination doses received by people from all the countries. * location: name of the country (or region within a country). * iso_code: ISO 3166-1 alpha-3 – three-letter country codes. * date: date of the observation. * total_vaccinations: total number of doses administered. This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses). If a person receives one dose of the vaccine, this metric goes up by 1. If they receive a second dose, it goes up by 1 again. * total_vaccinations_per_hundred: total_vaccinations per 100 people in the total population of the country. * daily_vaccinations_raw: daily change in the total number of doses administered. It is only calculated for consecutive days. This is a raw measure provided for data checks and transparency, but we strongly recommend that any analysis on daily vaccination rates be conducted using daily_vaccinations instead. * daily_vaccinations: new doses administered per day (7-day smoothed). For countries that don't report data on a daily basis, we assume that doses changed equally on a daily basis over any periods in which no data was reported. This produces a complete series of daily figures, which is then averaged over a rolling 7-day window. An example of how we perform this calculation can be found here. * daily_vaccinations_per_million: daily_vaccinations per 1,000,000 people in the total population of the country. * people_vaccinated: total number of people who received at least one vaccine dose. If a person receives the first dose of a 2-dose vaccine, this metric goes up by 1. If they receive the second dose, the metric stays the same. * people_vaccinated_per_hundred: people_vaccinated per 100 people in the total population of the country. * people_fully_vaccinated: total number of people who received all doses prescribed by the vaccination protocol. If a person receives the first dose of a 2-dose vaccine, this metric stays the same. If they receive the second dose, the metric goes up by 1. * people_fully_vaccinated_per_hundred: people_fully_vaccinated per 100 people in the total population of the country.

Note: for people_vaccinated and people_fully_vaccinated we are dependent on the necessary data being made available, so we may not be able to make these metrics available for some countries.

Acknowledgements

This data collected by Our World in Data which gets updated daily on their Github.

Inspiration

Possible uses for this dataset could include: - Sentiment analysis in a variety of forms - Statistical analysis over time.
f
Data from: Out-Of-Hospital Cardiac Arrest during the Coronavirus Disease...
scielo.figshare.com
datasetcatalog.nlm.nih.gov
jpeg
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claudio Tinoco Mesquita (2023). Out-Of-Hospital Cardiac Arrest during the Coronavirus Disease 2019 (COVID-19) Pandemic in Brazil: The Hidden Mortality [Dataset]. http://doi.org/10.6084/m9.figshare.14277965.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14277965.v1
Dataset updated
Jun 1, 2023
Dataset provided by
SciELO journals
Authors
Claudio Tinoco Mesquita
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brazil
Description
Abstract The world changed in just a few months after the emergence of the novel coronavirus disease 2019 (COVID-19), caused by a beta coronavirus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). COVID-19 was declared a pandemic by the World Health Organization (WHO) on March 11, 2020. Brazil currently has the world’s second-highest COVID-19 death toll, second only to the USA. The COVID-19 pandemic is spreading fast in the world with more than 181 countries affected. This editorial refers to the article published in Arquivos Brasileiros de Cardiologia: “Increase in home deaths due to cardiorespiratory arrest in times of COVID-19 pandemic.”1 Their main results show a gradual increase in the rate of out-of-hospital cardiac arrest during the Coronavirus disease 2019 (COVID-19) pandemic in the city of Belo Horizonte, Minas Gerais, Brazil. Their data demonstrate a proportional increase of 33% of home deaths in March 2020 compared to previous periods. Their study is the first Brazilian paper to demonstrate the same trend observed in other countries.

Facebook

Twitter

Click to copy link

Link copied

Cite

Centers for Disease Control and Prevention (2017). Death in the United States [Dataset]. https://www.kaggle.com/datasets/cdc/mortality

Death in the United States

Learn more about the leading causes of death from 2005-2015

Explore at:

zip(766333584 bytes)Available download formats

Dataset updated

Aug 3, 2017

Dataset authored and provided by

Centers for Disease Control and Preventionhttp://www.cdc.gov/

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

United States

Description

Every year the CDC releases the country’s most detailed report on death in the United States under the National Vital Statistics Systems. This mortality dataset is a record of every death in the country for 2005 through 2015, including detailed information about causes of death and the demographic background of the deceased.

It's been said that "statistics are human beings with the tears wiped off." This is especially true with this dataset. Each death record represents somebody's loved one, often connected with a lifetime of memories and sometimes tragically too short.

Putting the sensitive nature of the topic aside, analyzing mortality data is essential to understanding the complex circumstances of death across the country. The US Government uses this data to determine life expectancy and understand how death in the U.S. differs from the rest of the world. Whether you’re looking for macro trends or analyzing unique circumstances, we challenge you to use this dataset to find your own answers to one of life’s great mysteries.

Overview

This dataset is a collection of CSV files each containing one year's worth of data and paired JSON files containing the code mappings, plus an ICD 10 code set. The CSVs were reformatted from their original fixed-width file formats using information extracted from the CDC's PDF manuals using this script. Please note that this process may have introduced errors as the text extracted from the pdf is not a perfect match. If you have any questions or find errors in the preparation process, please leave a note in the forums. We hope to publish additional years of data using this method soon.

A more detailed overview of the data can be found here. You'll find that the fields are consistent within this time window, but some of data codes change every few years. For example, the 113_cause_recode entry 069 only covers ICD codes (I10,I12) in 2005, but by 2015 it covers (I10,I12,I15). When I post data from years prior to 2005, expect some of the fields themselves to change as well.

All data comes from the CDC’s National Vital Statistics Systems, with the exception of the Icd10Code, which are sourced from the World Health Organization.

Project ideas

The CDC's mortality data was the basis of a widely publicized paper, by Anne Case and Nobel prize winner Angus Deaton, arguing that middle-aged whites are dying at elevated rates. One of the criticisms against the paper is that it failed to properly account for the exact ages within the broad bins available through the CDC's WONDER tool. What do these results look like with exact/not-binned age data?
Similarly, how sensitive are the mortality trends being discussed in the news to the choice of bin-widths?
As noted above, the data preparation process could have introduced errors. Can you find any discrepancies compared to the aggregate metrics on WONDER? If so, please let me know in the forums!
WONDER is cited in numerous economics, sociology, and public health research papers. Can you find any papers whose conclusions would be altered if they used the exact data available here rather than binned data from Wonder?

Differences from the first version of the dataset

This version of the dataset was prepared in a completely different many. This has allowed us to provide a much larger volume of data and ensure that codes are available for every field.
We've replaced the batch of sql files with a single JSON per year. Kaggle's platform currently offer's better support for JSON files, and this keeps the number of files manageable.
A tutorial kernel providing a quick introduction to the new format is available here.
Lastly, I apologize if the transition has interrupted anyone's work! If need be, you can still download v1.

Clear search

Close search

Google apps

Main menu

Death in the United States

Overview

Project ideas

Differences from the first version of the dataset

CORONAVIRUS DEATHS by Country Dataset

Death Profiles by County

Child and Infant Mortality

Statewide Death Profiles

Provisional COVID-19 death counts, rates, and percent of total deaths, by...

Leading causes of death, total population, by age group

Deaths, by month

World Bank Group Entrepreneurship, Entreprenuership Database World Bank,...

CIA Factbook, Death Rate by Country, World, 2007

OWID COVID19

Content

Acknowledgements

Inspiration

Deaths Involving COVID-19 by Vaccination Status

An Extensive Dataset for the Heart Disease Classification System

UNEP, Diseases of the Respiratory System - Number of Deaths per 100000...

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

Data from: Epidemiology, resource use, and treatment patterns of locally...

Anime Quest Dataset

Context

Content

Acknowledgements

Inspiration

Shuttle Radar Topography Mission 1-arc second Global

COVID Vaccination in World (updated daily)

Context

Content

Acknowledgements

Inspiration

Data from: Out-Of-Hospital Cardiac Arrest during the Coronavirus Disease...

Death in the United States

Learn more about the leading causes of death from 2005-2015

Overview

Project ideas

Differences from the first version of the dataset