84 datasets found

Death Profiles by Leading Causes of Death
data.chhs.ca.gov
data.ca.gov
+3more
web link, zip
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2024). Death Profiles by Leading Causes of Death [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-leading-causes-of-death
Explore at:
web link, zipAvailable download formats
Dataset updated
Aug 28, 2024
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
Data for deaths by leading cause of death categories are now available in the death profiles dataset for each geographic granularity.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.

Cause of death categories for years 1999 and later are based on tenth revision of International Classification of Diseases (ICD-10) codes. Comparable categories are provided for years 1979 through 1998 based on ninth revision (ICD-9) codes. For more information on the comparability of cause of death classification between ICD revisions see Comparability of Cause-of-death Between ICD Revisions.
d
COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE
catalog.data.gov
data.ct.gov
Updated Aug 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2023). COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE [Dataset]. https://catalog.data.gov/dataset/covid-19-cases-and-deaths-by-race-ethnicity
Explore at:
Dataset updated
Aug 12, 2023
Dataset provided by
data.ct.gov
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical
g
Coronavirus (Covid-19) Data in the United States
github.com
openicpsr.org
+3more
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://github.com/nytimes/covid-19-data
Explore at:
csvAvailable download formats
Dataset provided by
New York Times
License
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Statewide Death Profiles
data.chhs.ca.gov
data.ca.gov
+1more
csv, zip
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Statewide Death Profiles [Dataset]. https://data.chhs.ca.gov/dataset/statewide-death-profiles
Explore at:
csv(463460), csv(164006), csv(4689434), zip, csv(16301), csv(200270), csv(5034), csv(2026589), csv(5401561), csv(419332), csv(300479)Available download formats
Dataset updated
Mar 25, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This dataset contains counts of deaths for California as a whole based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

The final data tables include both deaths that occurred in California regardless of the place of residence (by occurrence) and deaths to California residents (by residence), whereas the provisional data table only includes deaths that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
O
COVID-19 Tests, Cases, Hospitalizations, and Deaths (Statewide) - ARCHIVE
data.ct.gov
gimi9.com
+1more
application/rdfxml +5
Updated Jun 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Public Health (2022). COVID-19 Tests, Cases, Hospitalizations, and Deaths (Statewide) - ARCHIVE [Dataset]. https://data.ct.gov/Health-and-Human-Services/COVID-19-Tests-Cases-Hospitalizations-and-Deaths-S/rf3k-f8fg
Explore at:
tsv, application/rdfxml, xml, json, csv, application/rssxmlAvailable download formats
Dataset updated
Jun 24, 2022
Dataset authored and provided by
Department of Public Health
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.

The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.

The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .

The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .

The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.

COVID-19 tests, cases, and associated deaths that have been reported among Connecticut residents. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Hospitalization data were collected by the Connecticut Hospital Association and reflect the number of patients currently hospitalized with laboratory-confirmed COVID-19. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the daily COVID-19 update.

Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More information on COVID-19 mortality can be found at the following link: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Mortality/Mortality-Statistics

Data are reported daily, with timestamps indicated in the daily briefings posted at: portal.ct.gov/coronavirus. Data are subject to future revision as reporting changes.

Starting in July 2020, this dataset will be updated every weekday.

Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.

A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020.

A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports.

Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.

Starting April 4, 2022, negative rapid antigen and rapid PCR test results for SARS-CoV-2 are no longer required to be reported to the Connecticut Department of Public Health as of April 4. Negative test results from laboratory based molecular (PCR/NAAT) results are still required to be reported as are all positive test results from both molecular (PCR/NAAT) and antigen tests.

On 5/16/2022, 8,622 historical cases were included in the data. The date range for these cases were from August 2021 – April 2022.”
C
Public Health Statistics - Selected underlying causes of death in Chicago,...
data.cityofchicago.org
datasets.ai
+1more
application/rdfxml +5
Updated Oct 6, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Illinois Department of Public Health (IDPH) (2014). Public Health Statistics - Selected underlying causes of death in Chicago, 2006–2010 - Historical [Dataset]. https://data.cityofchicago.org/Health-Human-Services/Public-Health-Statistics-Selected-underlying-cause/j6cj-r444
Explore at:
csv, tsv, application/rdfxml, json, application/rssxml, xmlAvailable download formats
Dataset updated
Oct 6, 2014
Dataset authored and provided by
Illinois Department of Public Health (IDPH)
Area covered
Chicago
Description
Note: This dataset is historical only and there are not corresponding datasets for more recent time periods. For that more-recent information, please visit the Chicago Health Atlas at https://chicagohealthatlas.org.

This dataset contains the cumulative number of deaths, average number of deaths annually, average annual crude and adjusted death rates with corresponding 95% confidence intervals, and average annual years of potential life lost per 100,000 residents aged 75 and younger due to selected causes of death, by Chicago community area, for the years 2006 – 2010. A ranking for each measure is also provided, with the highest value indicated with a ranking of 1. See the full description at: https://data.cityofchicago.org/api/views/6vw3-8p6f/files/CqPqfHSv8UUAoXCBjn4_tLqcQHhb36Ih4-meM-4zNzs?download=true&filename=P:\EPI\OEPHI\MATERIALS\REFERENCES\MORTALITY\Dataset_Description_06_10_PORTAL_ONLY.pdf
C
Death Profiles by County
data.chhs.ca.gov
data.ca.gov
+2more
csv, zip
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Death Profiles by County [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-county
Explore at:
csv(11738570), csv(15127221), csv(1128641), csv(60023260), csv(60201673), csv(17520989), zip, csv(74497014), csv(60676655), csv(60517511), csv(73906266), csv(74689382), csv(52019564), csv(51592721), csv(28125832), csv(24235858), csv(75015194), csv(74043128), csv(5095), csv(74351424)Available download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
California Department of Public Health
Description
This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
C
Death Profiles by ZIP Code
data.chhs.ca.gov
data.ca.gov
+2more
csv, zip
Updated Dec 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2024). Death Profiles by ZIP Code [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-zip-code
Explore at:
csv(78958555), csv(4571), csv(40627562), csv(80055974), csv(80054609), zipAvailable download formats
Dataset updated
Dec 27, 2024
Dataset authored and provided by
California Department of Public Health
Description
This dataset contains counts of deaths for California residents by ZIP Code based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths of California residents. The data tables include deaths of residents of California by ZIP Code of residence (by residence). The data are reported as totals, as well as stratified by age and gender. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
S
High-risk human papillomavirus status and prognosis in invasive cervical...
snd.se
Updated Oct 21, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pär Sparén (2019). High-risk human papillomavirus status and prognosis in invasive cervical cancer: a nationwide cohort study. Dataset 1 [Dataset]. http://doi.org/10.5878/rtxk-6790
Explore at:
Unique identifier
https://doi.org/10.5878/rtxk-6790
Dataset updated
Oct 21, 2019
Dataset provided by
Karolinska Institutet
Swedish National Data Service
Authors
Pär Sparén
License
https://snd.se/en/search-and-order-data/using-datahttps://snd.se/en/search-and-order-data/using-data
Area covered
Sweden
Dataset funded by
Swedish Cancer Societyhttp://www.cancerfonden.se/sv/Information-in-English/
Swedish Foundation for Strategic Research, SSF
Swedish Research Council
Description
High-risk human papillomavirus (hrHPV) infection is established as the major cause of invasive cervical cancer (ICC). However, whether hrHPV status in the tumor is associated with subsequent prognosis of ICC is controversial. We aim to evaluate the association between tumor hrHPV status and ICC prognosis using national registers and comprehensive human papillomavirus (HPV) genotyping.

In this nationwide population-based cohort study, we identified all ICC diagnosed in Sweden during the years 2002–2011 (4,254 confirmed cases), requested all archival formalin-fixed paraffin-embedded blocks, and performed HPV genotyping. Twenty out of 25 pathology biobanks agreed to the study, yielding a total of 2,845 confirmed cases with valid HPV results. Cases were prospectively followed up from date of cancer diagnosis to 31 December 2015, migration from Sweden, or death, whichever occurred first. The main exposure was tumor hrHPV status classified as hrHPV-positive and hrHPV-negative. The primary outcome was all-cause mortality by 31 December 2015. Five-year relative survival ratios (RSRs) were calculated, and excess hazard ratios (EHRs) with 95% confidence intervals (CIs) were estimated using Poisson regression, adjusting for education, time since cancer diagnosis, and clinical factors including age at cancer diagnosis and International Federation of Gynecology and Obstetrics (FIGO) stage.

Of the 2,845 included cases, hrHPV was detected in 2,293 (80.6%), and we observed 1,131 (39.8%) deaths during an average of 6.2 years follow-up. The majority of ICC cases were diagnosed at age 30–59 years (57.5%) and classified as stage IB (40.7%). hrHPV positivity was significantly associated with screen-detected tumors, young age, high education level, and early stage at diagnosis (p < 0.001). The 5-year RSR compared to the general female population was 0.74 (95% CI 0.72–0.76) for hrHPV-positive cases and 0.54 (95% CI 0.50–0.59) for hrHPV-negative cases, yielding a crude EHR of 0.45 (95% CI 0.38–0.52) and an adjusted EHR of 0.61 (95% CI 0.52–0.71). Risk of all-cause mortality as measured by EHR was consistently and statistically significantly lower for cases with hrHPV-positive tumors for each age group above 29 years and each FIGO stage above IA. The difference in prognosis by hrHPV status was highly robust, regardless of the clinical, histological, and educational characteristics of the cases. The main limitation was that, except for education, we were not able to adjust for lifestyle factors or other unmeasured confounders.

In conclusion, women with hrHPV-positive cervical tumors had a substantially better prognosis than women with hrHPV-negative tumors. hrHPV appears to be a biomarker for better prognosis in cervical cancer independent of age, FIGO stage, and histological type, extending information from already established prognostic factors. The underlying biological mechanisms relating lack of detectable tumor hrHPV to considerably worse prognosis are not known and should be further investigated.

Purpose:

To compile a comprehensive survival and HPV genotyping data and provide a large-scale population-based evaluation of the association between tumor high risk HPV status and prognosis of invasive cervical cancer.

This dataset (ccHPV_RelativeSurvival.dta) comprises 2845 invasive cervical cancer (ICC) cases diagnosed in Sweden during the years 2002-2011, and had valid human papillomavirus (HPV) results assessed from the formalin-fixed, paraffin-embedded (FFPE) blocks.

In order to control the risk of incidental disclosure of personal information, the data available here has been anonymized in the following manner: • The date of diagnosis has been moved to 2008-07-01 for all subjects. • Follow-up time has been censored at five years after diagnosis. • Age at diagnosis and follow-up time after diagnosis have been microaggregated in groups of five subjects (using function microaggregation in R package sdcMicro 2.5.9, available from https://cran.r-project.org/package=sdcMicro)

Analysis of the anonymized data replicates the results presented in main part of the study (Figures 2 & 3, Tables 1-3) with only minor numerical differences, with the following exceptions: • In Figure 2, relative survival can only be calculated up to five years after diagnosis. • In Table 1, the number of person years and the mean follow-up time differ considerably due to censoring; the distribution of subjects between age groups varies somewhat due to microaggregation. • In Figure 3, the excess hazard ratios for age groups 30-44 and 45-59 in Panel A shift noticeably, but without affecting the overall message (comparable reduced risk across all age strata).

The dataset includes 12 variables, eight of which are necessary for the analysis (core variables) and four of which are included for administrative purposes and convenience of coding the analysis (extra variables). Core variables: • dx_date: Date of diagnosis • age: Age (in years) at diagnosis • x_stage_group: International Federation of Gynecology and Obstetrics (FIGO) stage of tumor, IA; IB; II and III+ • edu_cat: Education (categorical, three levels): 1=low (less than high school); 2=middle (high school); 3=high (university exam and above); 99=missing • exit_new: End of follow-up (date) • censor_new: Censoring status: 1=death; 2=censored due to migration, loss of follow-up or end of study • final_type: Histological type of tumor: SCC=squamous cell carcinoma; AC=adenocarcinoma. • hr_hpv: High-risk HPV status of tumor (main exposure, binary): 0=hrHPV negative; 1= hrHPV positive

Extra variables: • entry: Entry date (copy of diagnosis date) • sex: Gender (all female, for linking to standard population mortality file): 2=female. • dx_year: Year of diagnosis (for linking to standard population mortality file)
n
Human Mortality Database
neuinfo.org
dknet.org
+2more
Updated Mar 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Human Mortality Database [Dataset]. http://identifiers.org/RRID:SCR_002370
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002370
Dataset updated
Mar 22, 2025
Description
A database providing detailed mortality and population data to those interested in the history of human longevity. For each country, the database includes calculated death rates and life tables by age, time, and sex, along with all of the raw data (vital statistics, census counts, population estimates) used in computing these quantities. Data are presented in a variety of formats with regard to age groups and time periods. The main goal of the database is to document the longevity revolution of the modern era and to facilitate research into its causes and consequences. New data series is continually added to this collection. However, the database is limited by design to populations where death registration and census data are virtually complete, since this type of information is required for the uniform method used to reconstruct historical data series. As a result, the countries and areas included are relatively wealthy and for the most part highly industrialized. The database replaces an earlier NIA-funded project, known as the Berkeley Mortality Database. * Dates of Study: 1751-present * Study Features: Longitudinal, International * Sample Size: 37 countries or areas
Death rate by age and sex in the U.S. 2021
statista.com
Updated Oct 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Death rate by age and sex in the U.S. 2021 [Dataset]. https://www.statista.com/statistics/241572/death-rate-by-age-and-sex-in-the-us/
Explore at:
Dataset updated
Oct 25, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2021
Area covered
United States
Description
In the United States in 2021, the death rate was highest among those aged 85 and over, with about 17,190.5 men and 14,914.5 women per 100,000 of the population passing away. For all ages, the death rate was at 1,118.2 per 100,000 of the population for males, and 970.8 per 100,000 of the population for women. The death rate Death rates generally are counted as the number of deaths per 1,000 or 100,000 of the population and include both deaths of natural and unnatural causes. The death rate in the United States had pretty much held steady since 1990 until it started to increase over the last decade, with the highest death rates recorded in recent years. While the birth rate in the United States has been decreasing, it is still currently higher than the death rate. Causes of death There are a myriad number of causes of death in the United States, but the most recent data shows the top three leading causes of death to be heart disease, cancers, and accidents. Heart disease was also the leading cause of death worldwide.
Z
INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET
data.niaid.nih.gov
zenodo.org
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafiz Sadman (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Nishat Anjum
Nafiz Sadman
Kishor Datta Gupta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh, United States
Description
Introduction

There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

2 Data-set Introduction

2.1 Data Collection

We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

The headline must have one or more words directly or indirectly related to COVID-19.

The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

Avoid taking duplicate reports.

Maintain a time frame for the above mentioned newspapers.

To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

2.2 Data Pre-processing and Statistics

Some pre-processing steps performed on the newspaper report dataset are as follows:

Remove hyperlinks.

Remove non-English alphanumeric characters.

Remove stop words.

Lemmatize text.

While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

The primary data statistics of the two dataset are shown in Table 1 and 2.

Table 1: Covid-News-USA-NNK data statistics

No of words per headline

7 to 20

No of words per body content

150 to 2100

Table 2: Covid-News-BD-NNK data statistics No of words per headline

10 to 20

No of words per body content

100 to 1500

2.3 Dataset Repository

We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

3 Literature Review

Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

4 Our experiments and Result analysis

We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

In February, both the news paper have talked about China and source of the outbreak.

StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

Washington Post discussed global issues more than StarTribune.

StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
C
California Hospital Inpatient Mortality Rates and Quality Ratings
data.chhs.ca.gov
data.ca.gov
+3more
csv, pdf, xls, zip
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2024). California Hospital Inpatient Mortality Rates and Quality Ratings [Dataset]. https://data.chhs.ca.gov/dataset/california-hospital-inpatient-mortality-rates-and-quality-ratings
Explore at:
csv(3189182), pdf, pdf(150793), pdf(288823), pdf(280571), pdf(238223), pdf(267033), pdf(798633), pdf(306372), pdf(730246), pdf(363570), pdf(791847), pdf(100994), xls(166400), pdf(134270), pdf(445171), pdf(713960), pdf(700782), xls(163840), xls(141824), xls(165376), xls(143872), xls(172032), csv(6420523), pdf(83317), pdf(419645), xls, pdf(264343), pdf(114573), xls(214016), zip, pdf(451935), pdf(538945), pdf(254426), pdf(1235022), pdf(796065), pdf(452858), pdf(146736), pdf(253971)Available download formats
Dataset updated
Aug 28, 2024
Dataset authored and provided by
Department of Health Care Access and Information
Area covered
California
Description
The dataset contains risk-adjusted mortality rates, quality ratings, and number of deaths and cases for 6 medical conditions treated (Acute Stroke, Acute Myocardial Infarction, Heart Failure, Gastrointestinal Hemorrhage, Hip Fracture and Pneumonia) and 5 procedures performed (Abdominal Aortic Aneurysm Repair, Unruptured/Open, Abdominal Aortic Aneurysm Repair, Unruptured/Endovascular, Carotid Endarterectomy, Pancreatic Resection, Percutaneous Coronary Intervention) in California hospitals. The 2022 IMIs were generated using AHRQ Version 2023, while previous years' IMIs were generated with older versions of AHRQ software (2021 IMIs by Version 2022, 2020 IMIs by Version 2021, 2019 IMIs by Version 2020, 2016-2018 IMIs by Version 2019, 2014 and 2015 IMIs by Version 5.0, and 2012 and 2013 IMIs by Version 4.5). The differences in the statistical method employed and inclusion and exclusion criteria using different versions can lead to different results. Users should not compare trends of mortality rates over time. However, many hospitals showed consistent performance over years; “better” performing hospitals may perform better and “worse” performing hospitals may perform worse consistently across years. This dataset does not include conditions treated or procedures performed in outpatient settings. Please refer to statewide table for California overall rates: https://data.chhs.ca.gov/dataset/california-hospital-inpatient-mortality-rates-and-quality-ratings/resource/af88090e-b6f5-4f65-a7ea-d613e6569d96
d
Elephant movement in a risky landscape - Dataset - B2FIND
b2find.dkrz.de
Updated Sep 12, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Elephant movement in a risky landscape - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/c630d4dc-d21f-598c-bf25-3f67717a1a33
Explore at:
Dataset updated
Sep 12, 2018
Description
The illegal killing of elephants, i.e. poaching and human-elephant related mortality, is the greatest immediate threats to elephants. They have led to declining of many populations of elephants in Africa. The Monitoring of Illegal Killing of Elephants (MIKE) program of the Convention on International Trade in Endangered Species (CITES) was set up in the year 2002 as a framework of monitoring trends in illegal killing in 57 African sites. MIKE program seeks to establish the relationships between the levels of illegal killing of elephants and various possible explanatory variables within and beyond the monitoring sites. The effort in implementing MIKE program vary from site to site, and to make the results comparable; a metric referred to as the Proportion of Illegally Killed Elephants (PIKE) out of all recorded deaths in a site has been adopted as the standard measure of severity of illegal killing. Loss of habitat due to the expansion of agriculture and infrastructural developments are the largest long-term threats to elephants. The migratory corridors of elephants and other wildlife in many landscapes have been cut off. The majority of wildlife resides outside formally protected areas on private and community lands. In the landscapes shared by wildlife and humans, competition for resources influences the spatial-temporal distributions of wildlife. Efforts to win the goodwill of private and community landowners regarding hosting of wildlife on their lands are ongoing in many sites across the elephant range. Despite the numerous studies on the nature of risk faced by elephants, fewer studies have focused on the behavioural adaptations of elephants living in those risky landscapes. This thesis sought to understand the site level drivers of illegal killing and how elephants adapt to the threat in Africa’s most intensively monitored site, the Laikipia-Samburu MIKE in northern Kenya. Using field verified records of causes of elephant mortality, the distribution of live elephants, and, the cadastral attributes of land parcels in the ecosystem, the thesis established that land use type is the most important correlate of levels of illegal killing and not its ownership. The study analyses the movement of elephants at hourly, day and night, and overall 24 hr activity cycle in relation to the spatial and temporal variation of the levels of illegal killing. Past studies have given a lot of attention to movement behaviour along corridors. The research in this thesis focusses on movement within core areas. At the hourly time interval, the research showed that elephants walk with lower tortuosity when they are in core areas with higher levels of illegal killing, i.e., higher risk. The study found that elephants move more at night when they are in core areas with higher risk, than when they are in safer core areas. Based on this finding, the research presents a new metric for inferring the levels of risk, i.e., night-day sped ratio. When elephants move from a core area to another one with a different level of risk, they alter their daily activity pattern to include a longer resting phase during the mid-day hours, and this is even more pronounced in core areas closest to permanent human settlements. The study found that as a result of the alteration of activity cycle within 24-hour periods, elephants loose approximately one hour of activity time. The results have the potential use as a remote means of assessing the spatial and temporal variation of risk by analysing elephant movement behaviour remotely thus complimenting patrol based anti-poaching efforts. The study provides new insight into the ecology of elephants living in fear. The confirmed increase of night-time movement potentially predisposes calves to the savannah predators, who are more active at night.
f
Table_1_Structured data vs. unstructured data in machine learning prediction...
frontiersin.figshare.com
figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danielle Hopkins; Debra J. Rickwood; David J. Hallford; Clare Watsford (2023). Table_1_Structured data vs. unstructured data in machine learning prediction models for suicidal behaviors: A systematic review and meta-analysis.xlsx [Dataset]. http://doi.org/10.3389/fdgth.2022.945006.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fdgth.2022.945006.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Danielle Hopkins; Debra J. Rickwood; David J. Hallford; Clare Watsford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Suicide remains a leading cause of preventable death worldwide, despite advances in research and decreases in mental health stigma through government health campaigns. Machine learning (ML), a type of artificial intelligence (AI), is the use of algorithms to simulate and imitate human cognition. Given the lack of improvement in clinician-based suicide prediction over time, advancements in technology have allowed for novel approaches to predicting suicide risk. This systematic review and meta-analysis aimed to synthesize current research regarding data sources in ML prediction of suicide risk, incorporating and comparing outcomes between structured data (human interpretable such as psychometric instruments) and unstructured data (only machine interpretable such as electronic health records). Online databases and gray literature were searched for studies relating to ML and suicide risk prediction. There were 31 eligible studies. The outcome for all studies combined was AUC = 0.860, structured data showed AUC = 0.873, and unstructured data was calculated at AUC = 0.866. There was substantial heterogeneity between the studies, the sources of which were unable to be defined. The studies showed good accuracy levels in the prediction of suicide risk behavior overall. Structured data and unstructured data also showed similar outcome accuracy according to meta-analysis, despite different volumes and types of input data.
d
Data from: High temperatures and human pressures interact to influence...
datadryad.org
data.niaid.nih.gov
zip
Updated Oct 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniella Rabaiotti; Rosemary Groom; J. W. McNutt; Jessica Watermeyer; Helen O'Neill; Rosie Woodroffe (2022). High temperatures and human pressures interact to influence mortality in an African carnivore [Dataset]. http://doi.org/10.5061/dryad.4j0zpc8b9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.4j0zpc8b9
Dataset updated
Oct 29, 2022
Dataset provided by
Dryad
Authors
Daniella Rabaiotti; Rosemary Groom; J. W. McNutt; Jessica Watermeyer; Helen O'Neill; Rosie Woodroffe
Time period covered
2021
Description
Details of the column names can be found in the ReadMe file. There are three csvs, one for each study site. Missing values are indicated by an NA. In particular at the Kenya site there is a period in 2011 where the weather station was broken and therefore there is a period of missing data. There are also a number of time periods where variables such as age or dispersal status are unknown for particular dogs.
Number of victims of the Holocaust and Nazi persecution 1933-1945, by...
statista.com
Updated Aug 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Number of victims of the Holocaust and Nazi persecution 1933-1945, by background [Dataset]. https://www.statista.com/statistics/1071011/holocaust-nazi-persecution-victims-wwii/
Explore at:
Dataset updated
Aug 9, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Europe
Description
Most estimates place the total number of deaths during the Second World War at around 70-85 million people. Approximately 17 million of these deaths (20-25 percent of the total) were due to crimes against humanity carried out by the Nazi regime in Europe. In comparison to the millions of deaths that took place through conflict, famine, or disease, these 17 million stand out due to the reasoning behind them, along with the systematic nature and scale in which they were carried out. Nazi ideology claimed that the Aryan race (a non-existent ethnic group referring to northern Europeans) was superior to all other ethnicities; this became the justification for German expansion and the extermination of others. During the war, millions of people deemed to be of lesser races were captured and used as slave laborers, with a large share dying of exhaustion, starvation, or individual execution. Murder campaigns were also used for systematic extermination; the most famous of these were the extermination camps, such as at Auschwitz, where roughly 80 percent of the 1.1 million victims were murdered in gas chambers upon arrival at the camp. German death squads in Eastern Europe carried out widespread mass shootings, and up to two million people were killed in this way. In Germany itself, many disabled, homosexual, and "undesirables" were also killed or euthanized as part of a wider eugenics program, which aimed to "purify" German society.

The Holocaust Of all races, the Nazi's viewed Jews as being the most inferior. Conspiracy theories involving Jews go back for centuries in Europe, and they have been repeatedly marginalized throughout history. German fascists used the Jews as scapegoats for the economic struggles during the interwar period. Following Hitler's ascendency to the Chancellorship in 1933, the German authorities began constructing concentration camps for political opponents and so-called undesirables, but the share of Jews being transported to these camps gradually increased in the following years, particularly after Kristallnacht (the Night of Broken Glass) in 1938. In 1939, Germany then invaded Poland, home to Europe's largest Jewish population. German authorities segregated the Jewish population into ghettos, and constructed thousands more concentration and detention camps across Eastern Europe, to which millions of Jews were transported from other territories. By the end of the war, over two thirds of Europe's Jewish population had been killed, and this share is higher still when one excludes the neutral or non-annexed territories.

Lebensraum Another key aspect of Nazi ideology was that of the Lebensraum (living space). Both the populations of the Soviet Union and United States were heavily concentrated in one side of the country, with vast territories extending to the east and west, respectively. Germany was much smaller and more densely populated, therefore Hitler aspired to extend Germany's territory to the east and create new "living space" for Germany's population and industry to grow. While Hitler may have envied the U.S. in this regard, the USSR was seen as undeserving; Slavs were the largest major group in the east and the Nazis viewed them as inferior, which was again used to justify the annexation of their land and subjugation of their people. As the Germans took Slavic lands in Poland, the USSR, and Yugoslavia, ethnic cleansings (often with the help of local conspirators) became commonplace in the annexed territories. It is also believed that the majority of Soviet prisoners of war (PoWs) died through starvation and disease, and they were not given the same treatment as PoWs on the western front. The Soviet Union lost as many as 27 million people during the war, and 10 million of these were due to Nazi genocide. It is estimated that Poland lost up to six million people, and almost all of these were through genocide.
s
Fire Ignition Probability Lightning Cause - Dataset - CKAN
ndp.sdsc.edu
nationaldataplatform.org
Updated Mar 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Fire Ignition Probability Lightning Cause - Dataset - CKAN [Dataset]. https://ndp.sdsc.edu/catalog/dataset/clm-fire-ignition-probability-lightning-cause3
Explore at:
Dataset updated
Mar 7, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These rasters depict the predicted human- and lightning-caused ignition probability for the state of California. Ignition is regulated by complex interactions among climate, fuel, topography, and humans. Considerable studies have advanced our knowledge on patterns and drivers of total areas burned and fire frequency, but much is less known about wildfire ignition. To better design effective fire prevention and management strategies, it is critical to understand contemporary ignition patterns and predict the probability of wildfire ignitions from different sources. UC Davis researchers modeled and analyzed human- and lightning-caused ignition probability across the whole state and sub-ecoregions of California, USA. Findings reinforce the importance of varying humans vs biophysical controls in different fire regimes, highlighting the need for locally optimized land management to reduce ignition probability. Based on the most complete ignition database available, researchers developed maximum entropy models to predict the spatial distribution of long-term human- and lightning-caused ignition probability at 1 km and investigated how a set of biophysical and anthropogenic variables controlled their spatial variation in California and across its sub-ecoregions. Results showed that the integrated models with both biophysical and anthropogenic drivers predicted well the spatial patterns of both human- and lightning-caused ignitions in statewide and sub-ecoregions of California. Model diagnostics of the relative contribution and marginalized response curves showed that precipitation, slope, human settlement, and road network were the most important variables for shaping human-caused ignition probability, while snow water equivalent, lightning density, and fuel amount were the most important variables controlling the spatial patterns of lightning-caused ignition probability. The relative importance of biophysical and anthropogenic predictors differed across various sub-ecoregions of California.
H
Dataset: Faces extracted from Time Magazine 1923-2014
dataverse.harvard.edu
marketplace.sshopencloud.eu
Updated Mar 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ana Jofre (2020). Dataset: Faces extracted from Time Magazine 1923-2014 [Dataset]. http://doi.org/10.7910/DVN/JMFQT7
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/JMFQT7
Dataset updated
Mar 18, 2020
Dataset provided by
Harvard Dataverse
Authors
Ana Jofre
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The data presented here consists of three parts: Dataset 1: In this set, we extract 327,322 faces from our entire collection of 3389 issues, and automatically classified each face as male or female. We present this data as a single table with columns identifying the date, issue, page number, the coordinates identifying the position of the face on the page, and classification (male or female). The coordinates identifying the position of the face on the page are based on the size and resolution of the pages found in the “Time Vault”. Dataset 2: Dataset 2 consists of 8,789 classified faces from 100 selected issues. Human labor was used to identify and extract 3,299 face images from 39 issues, which were later classified by another set of workers. This selection of 39 issues contains one issue per decade spanned by the archive plus one issue per year between 1961 and 1991, and the extracted face images were used to train the face extraction algorithm. The remaining 5,490 faces from 61 issues were extracted via machine learning before being classified by human coders. These 61 issues were chosen to complement the first selection of 39 issues: one issue per year for all years in the archive excluding those between 1961 and 1991. Thus, Dataset 2 contains fully-labelled faces from at least one issue per year. Dataset 3: In the interest of transparency, Dataset 3 consists of the raw data collected to create Dataset 2, and consists of 2 tables. Before explaining these tables we first briefly describe our data collection and verification procedures, which have been fully described elsewhere. A custom AMT interface was used to enable human labors to classify faces according the categories in Table 4. Each worker was given a randomly-selected batch of 25 pages, each with a clearly highlighted face to be categorized, of which three pages were verification pages with known features, which were used for quality control. Each face was labeled by two distinct human coders, determined at random so that the paring of coders varied with the image. A proficiency rating was calculated for each coder by considering all images they annotated and computing the average number of labels that matched those identified by the image’s other coder. The tables in Dataset 2 were created by resolving inconsistencies between the two image coders by selecting the labels from the coder with the highest proficiency rating. Prior to calculating the proficiency score, all faces that were tagged as having ‘Poor’ or ‘Error’ image quality by either of the two coders were eliminated. Due to technical bugs when the AMT interface was first implemented, a small number of images were only labeled once; these were also eliminated from Datasets 2 and 3. In Dataset 3, we present the raw annotations for each coder that tagged each face, along with demographic data for each coder. Dataset 3 consists of two tables: the raw data from each of the two sets of coders, and the demographic information for each of the coders.
Human Detection (Drone Imagery)
sdiinnovation-geoplatform.hub.arcgis.com
Updated Feb 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2023). Human Detection (Drone Imagery) [Dataset]. https://sdiinnovation-geoplatform.hub.arcgis.com/content/42bfd5392d834c83aa21193450888a9e
Explore at:
Dataset updated
Feb 28, 2023
Dataset authored and provided by
Esrihttp://esri.com/
Description
Human life is precious and in the event of any unfortunate occurrence, highest efforts are made to safeguard it. To provide timely aid or undertake extraction of humans in distress, it is critical to accurately locate them. There has been an increased usage of drones to detect and track humans in such situations. Drones are used to capture high resolution images during search and rescue purposes. It is possible to find survivors from drone feed, but that requires manual analysis. This is a time taking process and is prone to human errors. This model can detect humans by looking at drone imagery and can draw bounding boxes around the location. This model is trained on IPSAR and SARD datasets where humans are on macadam roads, in quarries, low and high grass, forest shade, and Mediterranean and Sub-Mediterranean landscapes. Deep learning models are highly capable of learning complex semantics and can produce superior results. Use this deep learning model to automate the task of detection, reducing the time and effort required significantly.Using the modelFollow the guide to use the model. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.Fine-tuning the modelThis model can be fine-tuned using the Train Deep Learning Model tool. Follow the guide to fine-tune this model.InputHigh resolution (1-5 cm) individual drone images or an orthomosaic.OutputFeature class containing detected humans.Applicable geographiesThe model is expected to work well in Mediterranean and Sub-Mediterranean landscapes but can also be tried in other areas.Model architectureThis model uses the FasterRCNN model architecture implemented in ArcGIS API for Python.Accuracy metricsThis model has an average precision score of 82.2 percent for human class.Training dataThis model is trained on search and rescue dataset provided by IPSAR and SARD.LimitationsThis model has a tendency to maximize detection of humans and errors towards producing false positives in rocky areas.Sample resultsHere are a few results from the model.

Facebook

Twitter

Click to copy link

Link copied

Cite

California Department of Public Health (2024). Death Profiles by Leading Causes of Death [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-leading-causes-of-death

Death Profiles by Leading Causes of Death

Explore at:

web link, zipAvailable download formats

Dataset updated

Aug 28, 2024

Dataset authored and provided by

California Department of Public Healthhttps://www.cdph.ca.gov/

Description

Data for deaths by leading cause of death categories are now available in the death profiles dataset for each geographic granularity.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.

Cause of death categories for years 1999 and later are based on tenth revision of International Classification of Diseases (ICD-10) codes. Comparable categories are provided for years 1979 through 1998 based on ninth revision (ICD-9) codes. For more information on the comparability of cause of death classification between ICD revisions see Comparability of Cause-of-death Between ICD Revisions.

Clear search

Close search

Google apps

Main menu

Death Profiles by Leading Causes of Death

COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE

Coronavirus (Covid-19) Data in the United States

Statewide Death Profiles

COVID-19 Tests, Cases, Hospitalizations, and Deaths (Statewide) - ARCHIVE

Public Health Statistics - Selected underlying causes of death in Chicago,...

Death Profiles by County

Death Profiles by ZIP Code

High-risk human papillomavirus status and prognosis in invasive cervical...

Human Mortality Database

Death rate by age and sex in the U.S. 2021

INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

California Hospital Inpatient Mortality Rates and Quality Ratings

Elephant movement in a risky landscape - Dataset - B2FIND

Table_1_Structured data vs. unstructured data in machine learning prediction...

Data from: High temperatures and human pressures interact to influence...

Number of victims of the Holocaust and Nazi persecution 1933-1945, by...

Fire Ignition Probability Lightning Cause - Dataset - CKAN

Dataset: Faces extracted from Time Magazine 1923-2014

Human Detection (Drone Imagery)

Death Profiles by Leading Causes of Death