65 datasets found

C
Death Profiles by County
data.chhs.ca.gov
data.ca.gov
+3more
csv, zip
Updated Nov 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Death Profiles by County [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-county
Explore at:
csv(74351424), csv(75015194), csv(11738570), csv(1128641), csv(15127221), csv(60517511), csv(73906266), csv(60201673), csv(60676655), csv(28125832), csv(60023260), csv(51592721), csv(74689382), csv(52019564), csv(5095), csv(74043128), csv(24235858), csv(74497014), zip, csv(29775349)Available download formats
Dataset updated
Nov 26, 2025
Dataset authored and provided by
California Department of Public Health
Description
This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
🐥 Global Animal Welfare
kaggle.com
zip
Updated Mar 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2024). 🐥 Global Animal Welfare [Dataset]. https://www.kaggle.com/datasets/mexwell/global-animal-welfare
Explore at:
zip(1859095 bytes)Available download formats
Dataset updated
Mar 11, 2024
Authors
mexwell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
🚨 Starter Script which joines everything comes soon! 🚨

On Our World in Data, we cover many topics related to reducing human suffering: alleviating poverty, reducing child and maternal mortality, curing diseases, and ending hunger.

But if we aim to reduce total suffering, society’s ability to reduce this in other animals – which feel pain, too – also matters.

This is especially true when we look at the numbers: every year, humans slaughter more than 80 billion land-based animals for farming alone. Most of these animals are raised in factory farms, often in painful and inhumane conditions.

Estimates for fish are more uncertain, but when we include them, these numbers more than double.1

These numbers are large – but this also means that there are large opportunities to alleviate animal suffering by reducing the number of animals we use for food, science, cosmetics, and other industries and improving the living conditions of those we continue to raise.

Original Data

Acknowlegement

Foto von Sam Carter auf Unsplash
O
COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE
data.ct.gov
s.cnmilf.com
+2more
csv, xlsx, xml
Updated Jun 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Public Health (2022). COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE [Dataset]. https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-and-Deaths-by-Race-Ethnicity-ARCHIV/7rne-efic
Explore at:
xlsx, csv, xmlAvailable download formats
Dataset updated
Jun 24, 2022
Dataset authored and provided by
Department of Public Health
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.

The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.

The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .

The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .

The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.

COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update.

The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates.

The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.

Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf

Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic.

Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More information on COVID-19 mortality can be found at the following link: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Mortality/Mortality-Statistics

Data are subject to future revision as reporting changes.

Starting in July 2020, this dataset will be updated every weekday.

Additional notes: A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020.

A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports.

Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.
Statewide Death Profiles
data.chhs.ca.gov
data.ca.gov
+3more
csv, zip
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Statewide Death Profiles [Dataset]. https://data.chhs.ca.gov/dataset/statewide-death-profiles
Explore at:
csv(4689434), csv(164006), csv(5034), csv(476576), csv(2026589), csv(5401561), csv(463460), csv(419332), csv(200270), csv(16301), zipAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This dataset contains counts of deaths for California as a whole based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

The final data tables include both deaths that occurred in California regardless of the place of residence (by occurrence) and deaths to California residents (by residence), whereas the provisional data table only includes deaths that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
C
Death Profiles by ZIP Code
data.chhs.ca.gov
data.ca.gov
+2more
csv, zip
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Death Profiles by ZIP Code [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-zip-code
Explore at:
csv(4571), csv(78958555), csv(80055974), csv(80054609), csv(40627562), zipAvailable download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
California Department of Public Health
Description
This dataset contains counts of deaths for California residents by ZIP Code based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths of California residents. The data tables include deaths of residents of California by ZIP Code of residence (by residence). The data are reported as totals, as well as stratified by age and gender. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
Fatalities in the Israeli-Palestinian Conflict
kaggle.com
zip
Updated Feb 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
asaniczka (2024). Fatalities in the Israeli-Palestinian Conflict [Dataset]. https://www.kaggle.com/datasets/asaniczka/fatalities-in-the-israeli-palestinian-conflict
Explore at:
zip(474146 bytes)Available download formats
Dataset updated
Feb 17, 2024
Authors
asaniczka
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Palestine, Israel
Description
This dataset provides information on the individuals killed during the Israeli-Palestinian conflict since the second intifada, which began in September 2000. The data has been meticulously collected and investigated by B’Tselem – The Israeli Information Center for Human Rights in the Occupied Territories.

The dataset includes statistics on all human beings – Palestinians, Israelis, and foreign nationals – who lost their lives during this conflict. It provides details such as name, age, citizenship, date of death, gender, participation in hostilities, place of residence, type of injury, ammunition used, and more.

Some Task Ideas:

Analyze Fatality Trends: Explore the dataset and track the trends in fatalities over time. Identify any significant changes, spikes, or declines in the number of fatalities.

Demographic Analysis: Conduct a demographic analysis by examining the age, gender, and citizenship of the individuals killed. Determine if there are any notable patterns or disparities in the data.

Geospatial Analysis: Utilize the event location, district, and region information to perform geospatial analysis. Visualize the distribution of fatalities on a map and identify areas that have experienced higher levels of violence.

Hostilities Participation Analysis: Investigate the extent of individuals' participation in hostilities before their deaths. Analyze the relationship between participation and the circumstances surrounding each fatality.

Injury Analysis: Examine the types of injuries inflicted on individuals. Identify the most common types of injuries and assess their severity.

Weapons Used: Analyze the ammunition and means by which the individuals were killed. Determine the most frequently used weapons or methods and evaluate their impact.

Victim Profiles: Create profiles of the victims based on the available data such as age, gender, citizenship, and place of residence. Identify common characteristics among the victims.

Please note that the dataset contains sensitive information and focuses on the humanitarian aspect rather than taking any political stance.

If you find this dataset valuable, don't forget to hit the upvote button! 😊💝

Related Datasets

Data on Palestinian Structures Israel Demolished

Daily Public Opinion on Israel-Palestine War

Photo by Levi Meir Clancy on Unsplash
North Carolina Social and Human Services Dataset
kaggle.com
zip
Updated May 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Varun Deepak Gudhe (2024). North Carolina Social and Human Services Dataset [Dataset]. https://www.kaggle.com/datasets/varundeepakgudhe/north-carolina-social-and-human-services-dataset
Explore at:
zip(1460450 bytes)Available download formats
Dataset updated
May 3, 2024
Authors
Varun Deepak Gudhe
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
North Carolina
Description
This dataset encompasses comprehensive social and human services data for North Carolina, offering insights into public assistance, child services, vocational rehabilitation, and transfer payments across state and county levels. Each entry delineates specific services within various geographical areas, classified by type, for each year recorded. This rich dataset enables a deep dive into the trends and distributions of social services, assisting in policy-making and community support initiatives.

Suggested usage

Policy Development and Evaluation:

Government agencies and policymakers can utilize the data to assess the effectiveness of current social service programs and to design new policies. Analyzing trends over time can help identify needs and allocate resources more effectively.

Academic Research:

Researchers in social sciences, public health, and economics could use the dataset to study the impact of social services on various demographics within North Carolina. This can lead to scholarly articles, studies on social welfare, and the development of new theories in social service provision.

Community Planning:

Local government planners and community organizations can use the dataset to better understand the distribution of services such as child services and vocational rehabilitation, and plan community resources accordingly.

Grant Writing and Funding Applications:

Non-profit organizations can use detailed data to justify the need for funding in grant applications. By showing specific needs within communities, they can target their proposals to address gaps in services.

Public Awareness and Advocacy:

Advocacy groups can use the data to raise public awareness about the state of social services in North Carolina. This can drive campaigns for enhanced funding or changes in how services are delivered.

Economic Analysis:

Economists could explore the dataset to correlate the investment in social services with economic outcomes like employment rates, economic mobility, and community health indicators.
Maternal Mortality Dataset
kaggle.com
zip
Updated Jan 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2024). Maternal Mortality Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/maternal-mortality-dataset
Explore at:
zip(10357 bytes)Available download formats
Dataset updated
Jan 5, 2024
Authors
Sourav Banerjee
Description
Context

The Maternal Mortality Ratio (MMR) is a crucial indicator within the Gender Inequality Index (GII), an encompassing measure designed to assess gender disparities and inequities within a society. The GII, an extension of the Human Development Index (HDI), focuses on three primary dimensions: reproductive health, empowerment, and economic activity. Reproductive health, one of the key dimensions, sheds light on the challenges faced by individuals based on their gender. Within this context, the Maternal Mortality Ratio specifically gauges the number of maternal deaths per 100,000 live births, providing insight into the disparities in health outcomes experienced by women. This indicator reflects the state of maternal health and underscores the importance of addressing reproductive rights to mitigate gender inequalities.

Content

This dataset encompasses extensive historical information regarding gender development indicators on a global scale. Key columns include ISO3 (the ISO3 code assigned to each country/territory), Country (the name of the country or territory), Continent (the continent of the country's location), Hemisphere (the hemisphere in which the country is positioned), Human Development Groups, UNDP Developing Regions, HDI Rank (2021) representing the Human Development Index Rank for the year 2021, and Maternal Mortality Ratio (deaths per 100,000 live births) spanning from 1990 to 2021.

Dataset Glossary (Column-wise)

ISO3 - ISO3 for the Country/Territory

Country - Name of the Country/Territory

Continent - Name of the Continent

Hemisphere - Name of the Hemisphere

Human Development Groups - Human Development Groups

UNDP Developing Regions - UNDP Developing Regions

HDI Rank (2021) - Human Development Index Rank for 2021

Maternal Mortality Ratio (deaths per 100,000 live births) from 1990 to 2021 - Maternal Mortality Ratio from 1990 to 2021

Data Dictionary

UNDP Developing Regions:

SSA - Sub-Saharan Africa

LAC - Latin America and the Caribbean

EAP - East Asia and the Pacific

AS - Arab States

ECA - Europe and Central Asia

SA - South Asia

Structure of the Dataset

https://i.imgur.com/d1iGY3d.png" alt="">

Acknowledgement

This Dataset is created from Human Development Reports. This Dataset falls under the Creative Commons Attribution 3.0 IGO License. You can check the Terms of Use of this Data. If you want to learn more, visit the Website.

Cover Photo by: Image by gstudioimagen1 on Freepik

Thumbnail by: Baby icons created by Victoruler - Flaticon
Synthia-v1.3
kaggle.com
huggingface.co
zip
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Synthia-v1.3 [Dataset]. https://www.kaggle.com/datasets/thedevastator/human-machine-dialogue-interactions
Explore at:
zip(79056480 bytes)Available download formats
Dataset updated
Nov 22, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Human-Machine Dialogue Interactions

Exploring Communication Models for Machine Learning

By Huggingface Hub [source]

About this dataset

This Synthia-v1.3 dataset provides insight into the complexities of human-machine communication through its collection of dialogue interactions between humans and machines. Contained within this dataset are details on how conversations develop between the two, detailing behavioural changes in both humans and machines towards one another over time. With information provided on both user instructions to machines, as well as the system, machine responses and other related data points, this dataset offers a detailed overview of machine learning concepts, examining how systems utilise dialogue to interact with people in various scenarios. This can offer valuable insight into how predictive intelligence is applied by these systems in conversational settings, better informing developers seeking to build their own human-machine interfaces for effective two-way communication. By looking at this data set as a whole it can create an understanding of the way connections form between humans and machines providing a deeper level of appreciation for ongoing challenges faced when working on projects with these technological components at play

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

The dataset consists of a collection of dialogue interactions between humans and machines, providing insight into human-machine communication. It includes information about the system being used, instructions given by humans to machines and responses from machines.

To start using this data set: - Download the csv file containing all of the dialogue interactions from Kaggle datasets page. - Open up your favourite spreadsheet software like Excel or Google Sheets and load up the CSV file - Take a look at each of the columns listed in order to familiarize yourself with what they contain: ‘system’ column contains details about what system was used for role play between human and machine; ‘instruction’ column contains instructions given by humans to machines; ‘response’ column contains responses from machines back to humans based on their instructions
- Start exploring how conversations progress between humans and machine over time by examining information in each of these columns separately or together as required

You can also filter out specific conditions within your data set such as searching for conversations that were driven entirely by particular systems or involving certain instruction types etc. In addition, you have an opportunity conduct various kinds of analysis such as statistical analysis (e.g., descriptive statistics or correlation analysis). With so many possibilities for exploration, you are sure find something interesting!

Research Ideas

Utilizing the dataset to understand how various types of instruction styles can influence conversation order and flow between humans and machines.

Using the data to predict potential responses in a given dialogue interaction from varying sources, such as robots or virtual assistants.

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:----------------|:--------------------------------------------------------------| | system | The type of system used in the dialogue interaction. (String) | | instruction | The instruction given by the human to the machine. (String) | | response | The response given by the machine to the human. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
Death Profiles by Leading Causes of Death
data.chhs.ca.gov
data.ca.gov
+4more
web link, zip
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Death Profiles by Leading Causes of Death [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-leading-causes-of-death
Explore at:
web link, zipAvailable download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
Data for deaths by leading cause of death categories are now available in the death profiles dataset for each geographic granularity.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.

Cause of death categories for years 1999 and later are based on tenth revision of International Classification of Diseases (ICD-10) codes. Comparable categories are provided for years 1979 through 1998 based on ninth revision (ICD-9) codes. For more information on the comparability of cause of death classification between ICD revisions see Comparability of Cause-of-death Between ICD Revisions.
n
Coronavirus (Covid-19) Data in the United States
nytimes.com
openicpsr.org
+4more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Explore at:
Dataset provided by
New York Times
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
H
Data from: The HAM10000 dataset, a large collection of multi-source...
dataverse.harvard.edu
opendatalab.com
+1more
Updated Feb 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philipp Tschandl (2023). The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions [Dataset]. http://doi.org/10.7910/DVN/DBW86T
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/DBW86T
Dataset updated
Feb 7, 2023
Dataset provided by
Harvard Dataverse
Authors
Philipp Tschandl
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.7910/DVN/DBW86Thttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.7910/DVN/DBW86T
Description
Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available dataset of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations, acquired and stored by different modalities. The final dataset consists of 10015 dermatoscopic images which can serve as a training set for academic machine learning purposes. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions: Actinic keratoses and intraepithelial carcinoma / Bowen's disease (akiec), basal cell carcinoma (bcc), benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses, bkl), dermatofibroma (df), melanoma (mel), melanocytic nevi (nv) and vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage, vasc). More than 50% of lesions are confirmed through histopathology (histo), the ground truth for the rest of the cases is either follow-up examination (follow_up), expert consensus (consensus), or confirmation by in-vivo confocal microscopy (confocal). The dataset includes lesions with multiple images, which can be tracked by the lesion_id-column within the HAM10000_metadata file. Due to upload size limitations, images are stored in two files: HAM10000_images_part1.zip (5000 JPEG files) HAM10000_images_part2.zip (5015 JPEG files) Additional data for evaluation purposes The HAM10000 dataset served as the training set for the ISIC 2018 challenge (Task 3), with the same sources contributing the majority of the validation- and test-set as well. The test-set images are available herein as ISIC2018_Task3_Test_Images.zip (1511 images), the ground-truth in the same format as the HAM10000 data (public since 2023) is available as ISIC2018_Task3_Test_GroundTruth.csv.. The ISIC-Archive also provides the challenge images and metadata (training, validation, test) at their "ISIC Challenge Datasets" page. Comparison to physicians Test-set evaluations of the ISIC 2018 challenge were compared to physicians on an international scale, where the majority of challenge participants outperformed expert readers: Tschandl P. et al., Lancet Oncol 2019 Human-computer collaboration The test-set images were also used in a study comparing different methods and scenarios of human-computer collaboration: Tschandl P. et al., Nature Medicine 2020 Following corresponding metadata is available herein: ISIC2018_Task3_Test_NatureMedicine_AI_Interaction_Benefit.csv: Human ratings for Test images with and without interaction with a ResNet34 CNN (Malignancy Probability, Multi-Class probability, CBIR) or Human-Crowd Multi-Class probabilities. This is data was collected for and analyzed in Tschandl P. et al., Nature Medicine 2020, therefore please refer to this publication when using the data. Some details on the abbreviated column headings: image_id: This is the ISIC image_id of an image at the time of the study. There should be no duplications in the combination image_id & interaction_modality. As not every image was shown with every interaction modality, not every combination is present. prob_m_dx_akiec, ... : m is "machine probabilities". Values are values after softmax, and "_mal" is all malignant classes summed. prob_h_dx_akiec, ... : h is "human probabilities". Values are aggregated percentages of human ratings from past studies distinguishing between seven classes. Note there is no "prob_h_mal" as this was none of the tested interaction modalities. user_dx_without_interaction_akiec, ...: Number of participants choosing this diagnosis without interaction. user_dx_with_interaction_akiec, ...: Number of participants choosing this diagnosis with interaction. HAM10000_segmentations_lesion_tschandl.zip: To evaluate regions of CNN activations in Tschandl P. et al., Nature Medicine 2020 (please refer to this publication when using the data), a single dermatologist (Tschandl P) created binary segmentation masks for all 10015 images from the HAM10000 dataset. Masks were initialized with the segmentation network as described by Tschandl et al., Computers in Biology and Medicine 2019, and following verified, corrected or replaced via the free-hand selection tool in FIJI.
Weather Disaster Costs and Deaths
kaggle.com
zip
Updated Dec 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Weather Disaster Costs and Deaths [Dataset]. https://www.kaggle.com/datasets/thedevastator/weather-disaster-costs-and-deaths
Explore at:
zip(59216 bytes)Available download formats
Dataset updated
Dec 12, 2023
Authors
The Devastator
Description
Weather Disaster Costs and Deaths

Costs and Deaths of Billion Dollar Weather Disasters in the US

By Throwback Thursday [source]

About this dataset

The Billion Dollar Weather Disasters in the US dataset is a valuable resource containing comprehensive historical data on weather events in the United States that have caused billions of dollars in damages and resulted in loss of lives. It provides insights into various types and categories of weather disasters, such as hurricanes, tornadoes, floods, wildfires, and more.

The dataset includes essential information about each weather disaster event, starting with its name or title referred to as Disaster. A brief summary or description of each event is provided under the column Description, giving readers an understanding of its impact and extent. Furthermore, the dataset categorizes each disaster based on its type under the column Disaster Type. This classification helps researchers and analysts to identify patterns or common characteristics among similar types of weather disasters.

One crucial aspect covered by this dataset is the economic impact of these severe weather events. The total cost incurred due to each catastrophic occurrence has been meticulously recorded in millions of dollars. To ensure accuracy across different time periods, these costs are adjusted for inflation using the Consumer Price Index (CPI), providing a standardized measure that enables meaningful comparisons between different events.

A significant measure reflecting the severity of these weather disasters is the number of deaths they have caused. This dataset presents this valuable statistic under the column Deaths, allowing researchers to assess not only economic implications but also human impacts associated with each disaster event.

Obtained from NOAA National Centers for Environmental Information (NCEI) U.S., this data serves as a reliable source for understanding past weather calamities within US borders. Its wide range includes devastating storms, destructive wildfires, deadly heatwaves, crippling droughts; all contributing to one overarching objective – better preparedness for future climate-related challenges.

By analyzing this comprehensive dataset, researchers can gain insights into trends over time while identifying regions most vulnerable to specific types of extreme weather events. These findings allow policymakers and emergency response planners to make informed decisions regarding resource allocation, risk mitigation strategies, and community resilience-building initiatives

How to use the dataset

1. Understanding the Columns

The dataset contains several columns that provide important information about each weather disaster event. Let's understand what each column represents:

Disaster: The name or title of the weather disaster event.

Disaster Type: The type or category of the weather disaster event.

Total CPI-Adjusted Cost (Millions of Dollars): The total cost of the weather disaster event in millions of dollars, adjusted for inflation using the Consumer Price Index (CPI).

Deaths: The number of deaths caused by the weather disaster event.

Description: A brief description or summary of the weather disaster event.

2. Exploring Total Cost and Deaths

One key aspect to explore is how much damage was caused by each weather disaster event, as well as its human impact in terms of fatalities. By analyzing these factors, you can gain insights into which types of disasters are more costly and have a higher mortality rate.

You can start by visualizing the Total CPI-Adjusted Cost (Millions of Dollars) column to identify which disasters have been more financially devastating over time. Additionally, you can analyze the Deaths column to gauge which types of disasters have had a greater impact on human lives.

3. Comparing Disasters

Another interesting analysis would involve comparing different disasters based on their characteristics such as type, cost, and fatalities. You can group similar types together and compare their costs or death tolls across different time periods.

For example, you could examine whether hurricanes tend to cause higher financial losses compared to floods or wildfires. Or, you could analyze if certain types of disasters have been more deadly than others.

4. Analyzing Descriptions

The Description column provides a brief summary of each weather disaster event. Analyzing the descriptions can give you valuable insights into the specific circumstances surrounding each event. By understanding the context and conditions, you can get a better understanding of why some events resulted i...
WHO Malaysia Health Indicators
kaggle.com
zip
Updated Jan 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). WHO Malaysia Health Indicators [Dataset]. https://www.kaggle.com/datasets/thedevastator/who-malaysia-health-indicators
Explore at:
zip(752315 bytes)Available download formats
Dataset updated
Jan 28, 2023
Authors
The Devastator
Area covered
Malaysia
Description
WHO Malaysia Health Indicators

Malaria, HIV/STIs, Suicide, CVD, Mortality, and more

By Humanitarian Data Exchange [source]

About this dataset

This dataset contains a range of indicators related to health, health systems, and sustainable development from the World Health Organization's data portal. It covers topics ranging from mortality and global health estimates to essential health technologies, youth engagement, mental health initiatives, and infectious diseases. With data points including publich state codes and display values, this dataset provides detailed insight into how healthcare is managed all around the globe. From tracking malaria outbreaks to exploring various international agreements on public healthcare initiatives, this dataset offers a wide array of powerful information for machine learning projects that are designed to improve our understanding of global healthcare trends. Explore the correlations between different countries' universal healthcare coverage measures or investigate any discrepancies between developed and developing nations - unlock deeper insights with the WHO's extensive data!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Getting Started: First, you need to download the dataset from Kaggle. Once you have it saved in your computer, open it with a spreadsheet software such as Excel or Google Sheets.

Exploring the Data: The dataset contains columns that offer information about indicators related to health in Malaysia including mortality rates, prevention programs and providers, financing information, human resource information, and more. To explore particular aspects of this data you should filter the rows using any of these column values. For example if you want results for a specific year or region you can filter by ‘year’ or ‘region’ accordingly. It’s important to note that some columns have relation between them (e.g., country code corresponds with country display name).

Data Outputs:
Using this dataset allows users to generate visual representations such as graphs which can help display trends over time regarding our stability goals concerning human resources funding rates or pregnancies outcomes among other variables included in our report summary outputs on WHO dashboard at global level specifically representing data coming from our members countries likeMalaysia making sense out these actions performed by several governments highlights where we still have areas lacking risk mitigation efforts and core elements when tryingto achieve better life quality around world aiming better efficiency through good governance practices supported on demand reduction strategies coming from healthcare professionals expertise frame work .

Conclusion:

Research Ideas

Analysis of health coverage and services in Malaysia, allowing comparison between different public health organizations and the effect of specific prevention programs.

Identification of gaps between existing healthcare access and provide a standardized data-driven reference point to ensure equitable access across different regions in the country.

Creation of interactive geographical dashboards that display comparisons among relevant indicators, providing visual representation on how to best target distribution resources for optimal impact

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: rsud-service-organization-and-delivery-prevention-programs-and-providers-indicators-for-malaysia-38.csv | Column name | Description | |:--------------------------------------|:----------------------------------------------------------------| | GHO (CODE) | The Global Health Observatory code for the indicator. (String) | | GHO (DISPLAY) | The name of the indicator. (String) | | GHO (URL) | The URL for the indicator. (URL) | | PUBLISHSTATE (CODE) | The code for the publishing state of the indicator. (String) | | PUBLISHSTATE (DISPLAY) | The name of the publishing state of the indicator. (String) | | PUBLISHSTATE (URL) | The URL for the publishing state of the indicator. (URL) | | YEAR (CODE) | The code for...
Z
Wrist-mounted IMU data towards the investigation of free-living human eating...
data.niaid.nih.gov
Updated Jun 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kyritsis, Konstantinos; Diou, Christos; Delopoulos, Anastasios (2022). Wrist-mounted IMU data towards the investigation of free-living human eating behavior - the Free-living Food Intake Cycle (FreeFIC) dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4420038
Explore at:
Dataset updated
Jun 20, 2022
Dataset provided by
Harokopio University of Athens
Aristotle University of Thessaloniki
Authors
Kyritsis, Konstantinos; Diou, Christos; Delopoulos, Anastasios
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

The Free-living Food Intake Cycle (FreeFIC) dataset was created by the Multimedia Understanding Group towards the investigation of in-the-wild eating behavior. This is achieved by recording the subjects’ meals as a small part part of their everyday life, unscripted, activities. The FreeFIC dataset contains the (3D) acceleration and orientation velocity signals ((6) DoF) from (22) in-the-wild sessions provided by (12) unique subjects. All sessions were recorded using a commercial smartwatch ((6) using the Huawei Watch 2™ and the MobVoi TicWatch™ for the rest) while the participants performed their everyday activities. In addition, FreeFIC also contains the start and end moments of each meal session as reported by the participants.

Description

FreeFIC includes (22) in-the-wild sessions that belong to (12) unique subjects. Participants were instructed to wear the smartwatch to the hand of their preference well ahead before any meal and continue to wear it throughout the day until the battery is depleted. In addition, we followed a self-report labeling model, meaning that the ground truth is provided from the participant by documenting the start and end moments of their meals to the best of their abilities as well as the hand they wear the smartwatch on. The total duration of the (22) recordings sums up to (112.71) hours, with a mean duration of (5.12) hours. Additional data statistics can be obtained by executing the provided python script stats_dataset.py. Furthermore, the accompanying python script viz_dataset.py will visualize the IMU signals and ground truth intervals for each of the recordings. Information on how to execute the Python scripts can be found below.

The script(s) and the pickle file must be located in the same directory.

Tested with Python 3.6.4

Requirements: Numpy, Pickle and Matplotlib

Calculate and echo dataset statistics

$ python stats_dataset.py

Visualize signals and ground truth

$ python viz_dataset.py

FreeFIC is also tightly related to Food Intake Cycle (FIC), a dataset we created in order to investigate the in-meal eating behavior. More information about FIC can be found here and here.

Publications

If you plan to use the FreeFIC dataset or any of the resources found in this page, please cite our work:

@article{kyritsis2020data,
title={A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches},
author={Kyritsis, Konstantinos and Diou, Christos and Delopoulos, Anastasios},
journal={IEEE Journal of Biomedical and Health Informatics}, year={2020},
publisher={IEEE}}

@inproceedings{kyritsis2017automated, title={Detecting Meals In the Wild Using the Inertial Data of a Typical Smartwatch}, author={Kyritsis, Konstantinos and Diou, Christos and Delopoulos, Anastasios}, booktitle={2019 41th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)},
year={2019}, organization={IEEE}}

Technical details

We provide the FreeFIC dataset as a pickle. The file can be loaded using Python in the following way:

import pickle as pkl import numpy as np

with open('./FreeFIC_FreeFIC-heldout.pkl','rb') as fh: dataset = pkl.load(fh)

The dataset variable in the snipet above is a dictionary with (5) keys. Namely:

'subject_id'

'session_id'

'signals_raw'

'signals_proc'

'meal_gt'

The contents under a specific key can be obtained by:

sub = dataset['subject_id'] # for the subject id ses = dataset['session_id'] # for the session id raw = dataset['signals_raw'] # for the raw IMU signals proc = dataset['signals_proc'] # for the processed IMU signals gt = dataset['meal_gt'] # for the meal ground truth

The sub, ses, raw, proc and gt variables in the snipet above are lists with a length equal to (22). Elements across all lists are aligned; e.g., the (3)rd element of the list under the 'session_id' key corresponds to the (3)rd element of the list under the 'signals_proc' key.

sub: list Each element of the sub list is a scalar (integer) that corresponds to the unique identifier of the subject that can take the following values: ([1, 2, 3, 4, 13, 14, 15, 16, 17, 18, 19, 20]). It should be emphasized that the subjects with ids (15, 16, 17, 18, 19) and (20) belong to the held-out part of the FreeFIC dataset (more information can be found in ( )the publication titled "A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches" by Kyritsis et al). Moreover, the subject identifier in FreeFIC is in-line with the subject identifier in the FIC dataset (more info here and here); i.e., FIC’s subject with id equal to (2) is the same person as FreeFIC’s subject with id equal to (2).

ses: list Each element of this list is a scalar (integer) that corresponds to the unique identifier of the session that can range between (1) and (5). It should be noted that not all subjects have the same number of sessions.

raw: list Each element of this list is dictionary with the 'acc' and 'gyr' keys. The data under the 'acc' key is a (N_{acc} \times 4) numpy.ndarray that contains the timestamps in seconds (first column) and the (3D) raw accelerometer measurements in (g) (second, third and forth columns - representing the (x, y ) and (z) axis, respectively). The data under the 'gyr' key is a (N_{gyr} \times 4) numpy.ndarray that contains the timestamps in seconds (first column) and the (3D) raw gyroscope measurements in ({degrees}/{second})(second, third and forth columns - representing the (x, y ) and (z) axis, respectively). All sensor streams are transformed in such a way that reflects all participants wearing the smartwatch at the same hand with the same orientation, thusly achieving data uniformity. This transformation is in par with the signals in the FIC dataset (more info here and here). Finally, the length of the raw accelerometer and gyroscope numpy.ndarrays is different ((N_{acc} eq N_{gyr})). This behavior is predictable and is caused by the Android platform.

proc: list Each element of this list is an (M\times7) numpy.ndarray that contains the timestamps, (3D) accelerometer and gyroscope measurements for each meal. Specifically, the first column contains the timestamps in seconds, the second, third and forth columns contain the (x,y) and (z) accelerometer values in (g) and the fifth, sixth and seventh columns contain the (x,y) and (z) gyroscope values in ({degrees}/{second}). Unlike elements in the raw list, processed measurements (in the proc list) have a constant sampling rate of (100) Hz and the accelerometer/gyroscope measurements are aligned with each other. In addition, all sensor streams are transformed in such a way that reflects all participants wearing the smartwatch at the same hand with the same orientation, thusly achieving data uniformity. This transformation is in par with the signals in the FIC dataset (more info here and here). No other preprocessing is performed on the data; e.g., the acceleration component due to the Earth's gravitational field is present at the processed acceleration measurements. The potential researcher can consult the article "A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches" by Kyritsis et al. on how to further preprocess the IMU signals (i.e., smooth and remove the gravitational component).

meal_gt: list Each element of this list is a (K\times2) matrix. Each row represents the meal intervals for the specific in-the-wild session. The first column contains the timestamps of the meal start moments whereas the second one the timestamps of the meal end moments. All timestamps are in seconds. The number of meals (K) varies across recordings (e.g., a recording exist where a participant consumed two meals).

Ethics and funding

Informed consent, including permission for third-party access to anonymised data, was obtained from all subjects prior to their engagement in the study. The work has received funding from the European Union's Horizon 2020 research and innovation programme under Grant Agreement No 727688 - BigO: Big data against childhood obesity.

Contact

Any inquiries regarding the FreeFIC dataset should be addressed to:

Dr. Konstantinos KYRITSIS

Multimedia Understanding Group (MUG) Department of Electrical & Computer Engineering Aristotle University of Thessaloniki University Campus, Building C, 3rd floor Thessaloniki, Greece, GR54124

Tel: +30 2310 996359, 996365 Fax: +30 2310 996398 E-mail: kokirits [at] mug [dot] ee [dot] auth [dot] gr
u
Data from: Genetic connectivity in a cooperatively-breeding carnivore...
verso.uidaho.edu
s.cnmilf.com
+1more
Updated Mar 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jennifer Adams; David Ausband; Bridget Borg; Ariana Cerreta; Mathew Sorum; Lisette Waits; National Park Service (2025). Genetic connectivity in a cooperatively-breeding carnivore between two protected areas: Dataset [Dataset]. https://verso.uidaho.edu/esploro/outputs/dataset/Genetic-connectivity-in-a-cooperatively-breeding-carnivore/996782948201851
Explore at:
Dataset updated
Mar 18, 2025
Dataset provided by
National Park Service/United States Geological Survey
Authors
Jennifer Adams; David Ausband; Bridget Borg; Ariana Cerreta; Mathew Sorum; Lisette Waits; National Park Service
Time period covered
2025
Description
Wildlife populations are increasingly threatened by human activities. Most studies, however, are often short in duration or do not encompass the large spatial extent necessary to measure the potential effects of human activities on population vital rates. Furthermore, the life history features of species with high fecundity and excellent dispersal capabilities can act as buffers against the potential negative effects of human activities on their populations. We used a 30-year dataset of genetic samples from gray wolves (Canis lupus) in Alaska, USA to examine genetic connectivity and diversity between National Park units separated by a region with recurrent human-caused mortality. We found that the 2 protected populations were genetically similar and that dispersal events occurred between them even though they are >450 km apart. We posit that intact ecosystems and a history of continuous distribution of wolves surrounding the affected regions likely maintained the genetic connectivity of wolves in the 2 protected areas.
Deep Fake Dataset
kaggle.com
zip
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCI Machine Learning (2024). Deep Fake Dataset [Dataset]. https://www.kaggle.com/datasets/ucimachinelearning/deep-fake-dataset
Explore at:
zip(16066303 bytes)Available download formats
Dataset updated
Oct 1, 2024
Authors
UCI Machine Learning
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
About Dataset Context The motivation behind the creation of this dataset is to have a challenging Test set for the task of classifying fake and real human faces. Most of the available datasets on Kaggle are "Uniform" and doesn't present a good variance of face features, particularly for the "Fake" class. Moreover, the fake faces collected in this dataset are generated using the StyleGAN2, which present a harder challenge to classify them correctly even for the human eye. The real human faces in this dataset are gathered so that we have a fair representation of different features(Age, Sex, Makeup, Ethnicity, etc…) that may be encountered in a production setup.

Content The images available in this dataset are in a JPEG format and of uniform size of 300x300. There "Fake" faces are collected from the website thispersondoesnotexist.com. There "Real" faces images are collected through the API of the website Unsplash and then the faces are cropped out of using OpenCV library.

Total number of images: 1288 Number of "Fake" faces: 700 Number of "Real" faces: 589

The data.csv contains the images Id and the corresponding label.

Inspiration Can you achieve a high accuracy on this dataset?
O
CT School Learning Model Indicators by County (14-day metrics) - ARCHIVE
data.ct.gov
s.cnmilf.com
+1more
csv, xlsx, xml
Updated Aug 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CT DPH (2021). CT School Learning Model Indicators by County (14-day metrics) - ARCHIVE [Dataset]. https://data.ct.gov/Health-and-Human-Services/CT-School-Learning-Model-Indicators-by-County-14-d/e4bh-ax24
Explore at:
csv, xml, xlsxAvailable download formats
Dataset updated
Aug 5, 2021
Dataset authored and provided by
CT DPH
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Area covered
Connecticut
Description
NOTE: This dataset pertains only to the 2020-2021 school year and is no longer being updated. For additional data on COVID-19, visit data.ct.gov/coronavirus.

This dataset includes the leading and secondary metrics identified by the Connecticut Department of Health (DPH) and the Department of Education (CSDE) to support local district decision-making on the level of in-person, hybrid (blended), and remote learning model for Pre K-12 education.

Data represent daily averages for two-week periods by date of specimen collection (cases and positivity), date of hospital admission, or date of ED visit. Hospitalization data come from the Connecticut Hospital Association and are based on hospital location, not county of patient residence. COVID-19-like illness includes fever and cough or shortness of breath or difficulty breathing or the presence of coronavirus diagnosis code and excludes patients with influenza-like illness. All data are preliminary.

These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).

These metrics were adapted from recommendations by the Harvard Global Institute and supplemented by existing DPH measures.

For national data on COVID-19, see COVID View, the national weekly surveillance summary of U.S. COVID-19 activity, at https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html

DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/CT-School-Learning-Model-Indicators-by-County/rpph-4ysy

As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.

With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).
h
Human-Like-DPO-Dataset
huggingface.co
Updated May 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Human-Like LLMs (2024). Human-Like-DPO-Dataset [Dataset]. https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 19, 2024
Dataset authored and provided by
Human-Like LLMs
License
https://choosealicense.com/licenses/llama3/https://choosealicense.com/licenses/llama3/
Description
Enhancing Human-Like Responses in Large Language Models

🤗 Models | 📊 Dataset | 📄 Paper

Human-Like-DPO-Dataset

This dataset was created as part of research aimed at improving conversational fluency and engagement in large language models. It is suitable for formats like Direct Preference Optimization (DPO) to guide models toward generating more human-like responses. The dataset includes 10,884 samples across 256 topics, including: Technology Daily Life Science… See the full description on the dataset page: https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset.
Microbiomehd: The Human Gut Microbiome In Health And Disease
search.datacite.org
zenodo.org
Updated Aug 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm (2017). Microbiomehd: The Human Gut Microbiome In Health And Disease [Dataset]. http://doi.org/10.5281/zenodo.1146764
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.1146764
Dataset updated
Aug 8, 2017
Dataset provided by
DataCitehttps://www.datacite.org/
Zenodohttp://zenodo.org/
Authors
Claire Duvallet; Sean Gibbons; Thomas Gurry; Rafael Irizarry; Eric Alm
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Overview

MicrobiomeHD is a standardized database of human gut microbiome studies in health and disease. This database includes publicly available 16S data from published case-control studies and their associated patient metadata. Raw sequencing data for each study was downloaded and processed through a standardized pipeline.

To be included in MicrobiomeHD, datasets have:

publicly available raw sequencing data (fastq or fasta) publicly available metadata with at least case and control labels for each patient

Currently, MicrobiomeHD is focused on stool samples. Additional samples may be included in certain datasets, as indicated in the metadata.

Files

Additional information about the datasets included in this MicrobiomeHD release are in the MicrobiomeHD github repo https://github.com/cduvallet/microbiomeHD, in the file db/dataset_info.yaml. Top-level identifiers correspond to dataset IDs labeled by disease_first-author. For the most part, sample sizes in the yaml file are those that were described in the papers, and may not exactly reflect the actual data (due to missing/extra data, samples which didn't pass quality control, etc).

Each dataset was downloaded and processed through a standardized pipeline. The raw processing results are available in the *.tar.gz files here. Each file has the same directory structure and files, as described in the pipeline documentation: http://amplicon-sequencing-pipeline.readthedocs.io/en/latest/output.html.

Specific files of interest in each *.tar.gz folder include:

summary_file.txt: this file contains a summary of all parameters used to process the data datasetID.metadata.txt: the metadata associated with the samples. Note that some samples in the metadata may not have sequencing data, and vice versa. RDP/datasetID.otu_table.100.denovo.rdp_assigned: the 100% OTU tables with Latin taxonomic names assigned using the RDP classifier (c = 0.5). datasetID.otu_seqs.100.fasta: representative sequences for each OTU in the 100% OTU table. OTU labels in the OTU table end with <code>d_denovoID</code> - these denovoIDs correspond to the sequences in this file. README.txt: additional information about steps taken to download and process each dataset, as needed.

The raw data was acquired as described in the supplementary materials of Duvallet et al.'s "Meta analysis of microbiome studies identifies shared and disease-specific patterns" and, when available, the respective dataset README files.

Raw sequencing data was processed with the Alm lab's in-house 16S processing pipeline: https://github.com/thomasgurry/amplicon_sequencing_pipeline

Pipeline documentation is available at: http://amplicon-sequencing-pipeline.readthedocs.io/

Metadata was extracted from the original papers and/or data sources, and formatted manually. When possible, these steps are documented in each dataset's associated README.txt file.

Contributing

MicrobiomeHD is a resource that can be used to extract disease-specific microbiome signals in individual case-control studies. Many microbes respond non-specifically to health and disease, and the majority of bacterial associations within individual studies overlap with this non-specific response. Researchers should cross-check their results with the data presented here to ensure that their identified microbial associations are specific to their disease under study.

We provide an updated list of non-specific microbes here, as well as the raw OTU tables for anyone who wishes to reproduce and adapt this analysis to their study question.

If you would like to include your case-control dataset in MicrobiomeHD, please email ejalm[at]mit.edu and duvallet[at]mit.edu.

For us to process your data through our standard pipeline, you will need to provide the following files and information about your data:

raw sequencing data in fastq or fasta format (preferably fastq) information about which processing steps will be required (e.g. removing primers or barcodes, merging paired-end reads, etc) sample IDs associated with the sequencing data (either mapped to barcodes still in the sequences, or to each de-multiplexed sequencing file) case/control metadata of each sample other relevant metadata (e.g. sampling site, if not all samples are stool; sampling time point, if multiple samples per patient were taken; etc)

By using MicrobiomeHD in your own analyses, you agree to contribute your dataset to this database and to make your raw sequencing data (i.e. fastq files) publicly available.

Citing MicrobiomeHD

The MicrobiomeHD database and original publications for each of these datasets are described in Duvallet et al. (2017): http://dx.doi.org/10.1038/s41467-017-01973-8

Duvallet, C., Gibbons, S. M., Gurry, T., Irizarry, R. A., & Alm, E. J. (2017). Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nature communications, 8(1), 1784.

If you use any of these datasets in your analysis, please cite both MicrobiomeHD (Duvallet et al. (2017)) and the original publication for each dataset that you use.

The code used to process and analyze this data in the paper is available on github: https://github.com/cduvallet/microbiomeHD

Files

Data files

file-S3.nonspecific_genera.txt: Supplemental Table 3 from Duvallet et al. (2017), listing the non-specific health- and disease-associated microbes.
dataset_info.yaml: yaml file with additional dataset metadata.

Datasets

Note that MicrobiomeHD contains all 28 datasets from Duvallet et al. (2017), as well as additional datasets which did not meet the inclusion criteria for the meta-analysis presented in the paper. Additional information about the datasets included in this MicrobiomeHD release are in the original publications and the MicrobiomeHD github repo https://github.com/cduvallet/microbiomeHD, and in the file dataset_info.yaml.

The sample sizes listed here reflect what was reported in the original publications. Some may have discrepancies between what is reported and what is in the actual data due to missing data, quality issues, barcode mismatches, etc.

asd_son_results.tar.gz (asd_son): NT: 44, ASD: 59 http://dx.doi.org/10.1371/journal.pone.0137725 autism_kb_results.tar.gz (asd_kang): H: 20, ASD: 20 http://dx.doi.org/10.1371/journal.pone.0068322 cdi_schubert_results.tar.gz (cdi_schubert): H: 155, nonCDI: 89, CDI: 94 http://dx.doi.org/10.1128/mBio.01021-14 cdi_vincent_v3v5_results.tar.gz (cdi_vincent): H: 25, CDI: 25 http://dx.doi.org/10.1186/2049-2618-1-18 cdi_youngster_results.tar.gz (cdi_youngster): H: 4, CDI: 19 http://dx.doi.org/10.1093/cid/ciu135 crc_baxter_results.tar.gz (crc_baxter): adenoma: 198, H: 172, CRC: 120 http://dx.doi.org/10.1186/s13073-016-0290-3 crc_xiang_results.tar.gz (crc_chen): H: 22, CRC: 21 http://dx.doi.org/10.1371/journal.pone.0039743 crc_zackular_results.tar.gz (crc_zackular): adenoma: 30, H: 30, CRC: 30 http://dx.doi.org/10.1158/1940-6207.CAPR-14-0129 crc_zeller_results.tar.gz (crc_zeller): H: 75, CRC: 41 http://dx.doi.org/10.15252/msb.20145645 crc_zhao_results.tar.gz (crc_wang): H: 56, CRC: 46 http://dx.doi.org/10.1038/ismej.2011.109} edd_singh_results.tar.gz (edd_singh): STEC: 28, CAMP: 71, SALM: 66, SHIG: 34, H: 75 http://dx.doi.org/10.1186/s40168-015-0109-2 hiv_dinh_results.tar.gz (hiv_dinh): H: 16, HIV: 21 http://dx.doi.org/10.1093/infdis/jiu409 hiv_lozupone_results.tar.gz (hiv_lozupone): H: 13, HIV: 25 http://dx.doi.org/10.1016/j.chom.2013.08.006 hiv_noguerajulian_results.tar.gz (hiv_noguerajulian): H: 34, HIV: 206 https://doi.org/10.1016%2Fj.ebiom.2016.01.032 ibd_alm_results.tar.gz (ibd_papa): IBDundef: 1, nonIBD: 24, UC: 43, CD: 23 http://dx.doi.org/10.1371/journal.pone.0039242 ibd_engstrand_maxee_results.tar.gz (ibd_willing): CCD: 12, H: 35, ICD: 15, UC: 16, ICCD: 2 http://dx.doi.org/10.1053/j.gastro.2010.08.049 ibd_gevers_2014_results.tar.gz (ibd_gevers): H: 31, CD: 224 http://dx.doi.org/10.1016/j.chom.2014.02.005 ibd_huttenhower_results.tar.gz (ibd_morgan): H: 18, UC: 48, CD: 62 http://dx.doi.org/10.1186/gb-2012-13-9-r79 mhe_zhang_results.tar.gz (liv_zhang): CIRR: 25, H: 26, MHE: 26 http://dx.doi.org/10.1038/ajg.2013.221 nash_chan_results.tar.gz (nash_wong): H: 22, NASH: 16 http://dx.doi.org/10.1371/journal.pone.0062885 nash_ob_baker_results.tar.gz (nash_ob_zhu): H: 16, NASH: 22, OB: 25 http://dx.doi.org/10.1002/hep.26093 ob_escobar_results.tar.gz (ob_escobar): OW: 10, H: 10, OB: 10 https://doi.org/10.1186/s12866-014-0311-6 ob_goodrich_results.tar.gz (ob_goodrich): OW: 322, H: 433, OB: 183 http://dx.doi.org/10.1016/j.cell.2014.09.053 ob_gordon_2008_v2_results.tar.gz (ob_turnbaugh): H: 61, OB: 219 http://dx.doi.org/10.1038/nature07540 ob_jumpertz_results.tar.gz (ob_jumpertz): H: 12, OB: 9 http://ajcn.nutrition.org/content/early/2011/05/03/ajcn.110.010132 ob_ross_results.tar.gz (ob_ross): H: 26, OB:

Facebook

Twitter

Click to copy link

Link copied

Cite

California Department of Public Health (2025). Death Profiles by County [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-county

Death Profiles by County

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

csv(74351424), csv(75015194), csv(11738570), csv(1128641), csv(15127221), csv(60517511), csv(73906266), csv(60201673), csv(60676655), csv(28125832), csv(60023260), csv(51592721), csv(74689382), csv(52019564), csv(5095), csv(74043128), csv(24235858), csv(74497014), zip, csv(29775349)Available download formats

Dataset updated

Nov 26, 2025

Dataset authored and provided by

California Department of Public Health

Description

This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.

Clear search

Close search

Google apps

Main menu

Death Profiles by County

🐥 Global Animal Welfare

Acknowlegement

COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE

Statewide Death Profiles

Death Profiles by ZIP Code

Fatalities in the Israeli-Palestinian Conflict

Some Task Ideas:

Related Datasets

North Carolina Social and Human Services Dataset

Suggested usage

Maternal Mortality Dataset

Context

Content

Dataset Glossary (Column-wise)

Data Dictionary

Structure of the Dataset

Acknowledgement

Synthia-v1.3

Human-Machine Dialogue Interactions

Exploring Communication Models for Machine Learning

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Death Profiles by Leading Causes of Death

Coronavirus (Covid-19) Data in the United States

Data from: The HAM10000 dataset, a large collection of multi-source...

Weather Disaster Costs and Deaths

Weather Disaster Costs and Deaths

Costs and Deaths of Billion Dollar Weather Disasters in the US

About this dataset

How to use the dataset

1. Understanding the Columns

2. Exploring Total Cost and Deaths

3. Comparing Disasters

4. Analyzing Descriptions

WHO Malaysia Health Indicators

WHO Malaysia Health Indicators

Malaria, HIV/STIs, Suicide, CVD, Mortality, and more

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Wrist-mounted IMU data towards the investigation of free-living human eating...

The script(s) and the pickle file must be located in the same directory.

Tested with Python 3.6.4

Requirements: Numpy, Pickle and Matplotlib

Calculate and echo dataset statistics

Visualize signals and ground truth

Data from: Genetic connectivity in a cooperatively-breeding carnivore...

Deep Fake Dataset

CT School Learning Model Indicators by County (14-day metrics) - ARCHIVE

Human-Like-DPO-Dataset

Microbiomehd: The Human Gut Microbiome In Health And Disease

Death Profiles by County