11 datasets found

US county-level mortality
kaggle.com
Updated Nov 17, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IHME (2019). US county-level mortality [Dataset]. https://www.kaggle.com/IHME/us-countylevel-mortality/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2019
Dataset provided by
Kaggle
Authors
IHME
Area covered
United States
Description
Context

IHME United States Mortality Rates by County 1980-2014: National - All. (Deaths per 100,000 population)

To quickly get started creating maps, like the one below, see the Quick Start R kernel.

https://storage.googleapis.com/montco-stats/kaggleNeoplasms.png" alt="NeoplasmsMap">

How the Dataset was Created

This Dataset was created from the Excel Spreadsheet, which can be found in the download. Or, you can view the source here. If you take a look at the row for United States, for the column Mortality Rate, 1980*, you'll see the set of numbers 1.52 (1.44, 1.61). Numbers in parentheses are 95% uncertainty. The 1.52 is an age-standardized mortality rate for both sexes combined (deaths per 100,000 population).

In this Dataset 1.44 will be placed in the named column Mortality Rage, 1989 (Min)* and 1.61 is in column named Mortality Rate, 1980 (Max)* . For information on how these Age-standardized mortality rates were calculated, see the December JAMA 2016 article, which you can download for free.

https://storage.googleapis.com/montco-stats/kaggleUSMort.png" alt="Spreadsheet">

Reference

JAMA Full Article

Video Describing this Study (Short and this is worth viewing)

Data Resources

How Americans Die May Depend On Where They Live, by Anna Maria Barry-Jester (FiveThirtyEight)

Interactive Map from healthdata.org

IHME Data

Acknowledgements

This Dataset was provided by IHME

Institute for Health Metrics and Evaluation 2301 Fifth Ave., Suite 600, Seattle, WA 98121, USA Tel: +1.206.897.2800 Fax: +1.206.897.2899 © 2016 University of Washington
T
Vital Signs: Life Expectancy – Bay Area
data.bayareametro.gov
application/rdfxml +5
Updated Apr 7, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of California, Department of Health: Death Records (2017). Vital Signs: Life Expectancy – Bay Area [Dataset]. https://data.bayareametro.gov/dataset/Vital-Signs-Life-Expectancy-Bay-Area/emjt-svg9
Explore at:
xml, csv, tsv, application/rssxml, json, application/rdfxmlAvailable download formats
Dataset updated
Apr 7, 2017
Dataset authored and provided by
State of California, Department of Health: Death Records
Area covered
San Francisco Bay Area
Description
VITAL SIGNS INDICATOR Life Expectancy (EQ6)

FULL MEASURE NAME Life Expectancy

LAST UPDATED April 2017

DESCRIPTION Life expectancy refers to the average number of years a newborn is expected to live if mortality patterns remain the same. The measure reflects the mortality rate across a population for a point in time.

DATA SOURCE State of California, Department of Health: Death Records (1990-2013) No link

California Department of Finance: Population Estimates Annual Intercensal Population Estimates (1990-2010) Table P-2: County Population by Age (2010-2013) http://www.dof.ca.gov/Forecasting/Demographics/Estimates/

CONTACT INFORMATION vitalsigns.info@mtc.ca.gov

METHODOLOGY NOTES (across all datasets for this indicator) Life expectancy is commonly used as a measure of the health of a population. Life expectancy does not reflect how long any given individual is expected to live; rather, it is an artificial measure that captures an aspect of the mortality rates across a population. Vital Signs measures life expectancy at birth (as opposed to cohort life expectancy). A statistical model was used to estimate life expectancy for Bay Area counties and Zip codes based on current life tables which require both age and mortality data. A life table is a table which shows, for each age, the survivorship of a people from a certain population.

Current life tables were created using death records and population estimates by age. The California Department of Public Health provided death records based on the California death certificate information. Records include age at death and residential Zip code. Single-year age population estimates at the regional- and county-level comes from the California Department of Finance population estimates and projections for ages 0-100+. Population estimates for ages 100 and over are aggregated to a single age interval. Using this data, death rates in a population within age groups for a given year are computed to form unabridged life tables (as opposed to abridged life tables). To calculate life expectancy, the probability of dying between the jth and (j+1)st birthday is assumed uniform after age 1. Special consideration is taken to account for infant mortality. For the Zip code-level life expectancy calculation, it is assumed that postal Zip codes share the same boundaries as Zip Code Census Tabulation Areas (ZCTAs). More information on the relationship between Zip codes and ZCTAs can be found at https://www.census.gov/geo/reference/zctas.html. Zip code-level data uses three years of mortality data to make robust estimates due to small sample size. Year 2013 Zip code life expectancy estimates reflects death records from 2011 through 2013. 2013 is the last year with available mortality data. Death records for Zip codes with zero population (like those associated with P.O. Boxes) were assigned to the nearest Zip code with population. Zip code population for 2000 estimates comes from the Decennial Census. Zip code population for 2013 estimates are from the American Community Survey (5-Year Average). The ACS provides Zip code population by age in five-year age intervals. Single-year age population estimates were calculated by distributing population within an age interval to single-year ages using the county distribution. Counties were assigned to Zip codes based on majority land-area.

Zip codes in the Bay Area vary in population from over 10,000 residents to less than 20 residents. Traditional life expectancy estimation (like the one used for the regional- and county-level Vital Signs estimates) cannot be used because they are highly inaccurate for small populations and may result in over/underestimation of life expectancy. To avoid inaccurate estimates, Zip codes with populations of less than 5,000 were aggregated with neighboring Zip codes until the merged areas had a population of more than 5,000. In this way, the original 305 Bay Area Zip codes were reduced to 218 Zip code areas for 2013 estimates. Next, a form of Bayesian random-effects analysis was used which established a prior distribution of the probability of death at each age using the regional distribution. This prior is used to shore up the life expectancy calculations where data were sparse.
COVID-19 Visualisation and Epidemic Analysis Data
kaggle.com
Updated Jan 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Shen (2021). COVID-19 Visualisation and Epidemic Analysis Data [Dataset]. https://www.kaggle.com/dylansp/covid19-country-level-data-for-epidemic-model/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 24, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dylan Shen
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
COVID-19 Dataset for Epidemic Model Development

I combined several data sources to gain an integrated dataset involving country-level COVID-19 confirmed, recovered and fatalities cases which can be used to build some epidemic models such as SIR, SIR with mortality. Adding information regarding population which can be used for calculating incidence rate and prevalence rate. One of my applications based on this dataset is published at https://dylansp.shinyapps.io/COVID19_Visualization_Analysis_Tool/.

Content

My approach is to retrieve cumulative confirmed cases, fatalities and recovered cases since 2020-01-22 onwards from the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) COVID-19 dataset, merged with country code as well as population of each country. For the purpose of building epidemic models, I calculated information regarding daily new confirmed cases, recovered cases, and fatalities, together with remaining confirmed cases which equal to cumulative confirmed cases - cumulative recovered cases - cumulative fatalities. I haven't yet to find creditable data sources regarding probable cases of various countries yet. I'll add them once I found them.

Date: The date of the record.

Country_Region: The name of the country/region. -alpha-3_code: country code for that can be used for map visualization.

Population: The population of the given country/region.

Total_Confirmed_Cases: Cumulative confirmed cases.

Total_Fatalities: Cumulative fatalities.

Total_Recovered_Cases: Cumulative recovered cases.

New_Confirmed_Cases: Daily new confirmed cases.

New_Fatalities: Daily new fatalities.

New_Recovered_Cases: Daily new recovered cases.

Remaining_Confirmed_Cases: Remaining infected cases which equal to (cumulative confirmed cases - cumulative recovered cases - cumulative fatalities).

Acknowledgements

The data source of confirmed cases, recovered cases and deaths is JHU CSSE https://github.com/CSSEGISandData/COVID-19;

The data source of the country-level population mainly comes from https://storage.guidotti.dev/covid19/data/ and Worldometer (https://www.worldometers.info/population/).

Inspiration

Building up the country-level COVID-19 case track dashboard.

Insights regarding the incidence rate, prevalence rate, mortality and recovery rate of various countries.

Building up epidemic models for forecasting.
w
National Demographic and Health Survey 2022 - Philippines
microdata.worldbank.org
catalog.ihsn.org
+1more
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philippine Statistics Authority (PSA) (2023). National Demographic and Health Survey 2022 - Philippines [Dataset]. https://microdata.worldbank.org/index.php/catalog/5846
Explore at:
Dataset updated
Jun 7, 2023
Dataset authored and provided by
Philippine Statistics Authority (PSA)
Time period covered
2022
Area covered
Philippines
Description
Abstract

The 2022 Philippines National Demographic and Health Survey (NDHS) was implemented by the Philippine Statistics Authority (PSA). Data collection took place from May 2 to June 22, 2022.

The primary objective of the 2022 NDHS is to provide up-to-date estimates of basic demographic and health indicators. Specifically, the NDHS collected information on fertility, fertility preferences, family planning practices, childhood mortality, maternal and child health, nutrition, knowledge and attitudes regarding HIV/AIDS, violence against women, child discipline, early childhood development, and other health issues.

The information collected through the NDHS is intended to assist policymakers and program managers in designing and evaluating programs and strategies for improving the health of the country’s population. The 2022 NDHS also provides indicators anchored to the attainment of the Sustainable Development Goals (SDGs) and the new Philippine Development Plan for 2023 to 2028.

Geographic coverage

National coverage

Analysis unit

Household

Individual

Children age 0-5

Woman age 15-49

Universe

The survey covered all de jure household members (usual residents), all women aged 15-49, and all children aged 0-4 resident in the household.

Kind of data

Sample survey data [ssd]

Sampling procedure

The sampling scheme provides data representative of the country as a whole, for urban and rural areas separately, and for each of the country’s administrative regions. The sample selection methodology for the 2022 NDHS was based on a two-stage stratified sample design using the Master Sample Frame (MSF) designed and compiled by the PSA. The MSF was constructed based on the listing of households from the 2010 Census of Population and Housing and updated based on the listing of households from the 2015 Census of Population. The first stage involved a systematic selection of 1,247 primary sampling units (PSUs) distributed by province or HUC. A PSU can be a barangay, a portion of a large barangay, or two or more adjacent small barangays.

In the second stage, an equal take of either 22 or 29 sample housing units were selected from each sampled PSU using systematic random sampling. In situations where a housing unit contained one to three households, all households were interviewed. In the rare situation where a housing unit contained more than three households, no more than three households were interviewed. The survey interviewers were instructed to interview only the preselected housing units. No replacements and no changes of the preselected housing units were allowed in the implementing stage in order to prevent bias. Survey weights were calculated, added to the data file, and applied so that weighted results are representative estimates of indicators at the regional and national levels.

All women age 15–49 who were either usual residents of the selected households or visitors who stayed in the households the night before the survey were eligible to be interviewed. Among women eligible for an individual interview, one woman per household was selected for a module on women’s safety.

For further details on sample design, see APPENDIX A of the final report.

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

Two questionnaires were used for the 2022 NDHS: the Household Questionnaire and the Woman’s Questionnaire. The questionnaires, based on The DHS Program’s model questionnaires, were adapted to reflect the population and health issues relevant to the Philippines. Input was solicited from various stakeholders representing government agencies, academe, and international agencies. The survey protocol was reviewed by the ICF Institutional Review Board.

After all questionnaires were finalized in English, they were translated into six major languages: Tagalog, Cebuano, Ilocano, Bikol, Hiligaynon, and Waray. The Household and Woman’s Questionnaires were programmed into tablet computers to allow for computer-assisted personal interviewing (CAPI) for data collection purposes, with the capability to choose any of the languages for each questionnaire.

Cleaning operations

Processing the 2022 NDHS data began almost as soon as fieldwork started, and data security procedures were in place in accordance with confidentiality of information as provided by Philippine laws. As data collection was completed in each PSU or cluster, all electronic data files were transferred securely via SyncCloud to a server maintained by the PSA Central Office in Quezon City. These data files were registered and checked for inconsistencies, incompleteness, and outliers. The field teams were alerted to any inconsistencies and errors while still in the area of assignment. Timely generation of field check tables allowed for effective monitoring of fieldwork, including tracking questionnaire completion rates. Only the field teams, project managers, and NDHS supervisors in the provincial, regional, and central offices were given access to the CAPI system and the SyncCloud server.

A team of secondary editors in the PSA Central Office carried out secondary editing, which involved resolving inconsistencies and recoding “other” responses; the former was conducted during data collection, and the latter was conducted following the completion of the fieldwork. Data editing was performed using the CSPro software package. The secondary editing of the data was completed in August 2022. The final cleaning of the data set was carried out by data processing specialists from The DHS Program in September 2022.

Response rate

A total of 35,470 households were selected for the 2022 NDHS sample, of which 30,621 were found to be occupied. Of the occupied households, 30,372 were successfully interviewed, yielding a response rate of 99%. In the interviewed households, 28,379 women age 15–49 were identified as eligible for individual interviews. Interviews were completed with 27,821 women, yielding a response rate of 98%.

Sampling error estimates

The estimates from a sample survey are affected by two types of errors: (1) nonsampling errors and (2) sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and in data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2022 Philippines National Demographic and Health Survey (2022 NDHS) to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.

Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2022 NDHS is only one of many samples that could have been selected from the same population, using the same design and identical size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.

A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95% of all possible samples of identical size and design.

If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2022 NDHS sample was the result of a multistage stratified design, and, consequently, it was necessary to use more complex formulas. Sampling errors are computed in SAS using programs developed by ICF. These programs use the Taylor linearization method to estimate variances for survey estimates that are means, proportions, or ratios. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.

A more detailed description of estimates of sampling errors are presented in APPENDIX B of the survey report.

Data appraisal

Data Quality Tables

Household age distribution

Age distribution of eligible and interviewed women

Age displacement at age 14/15

Age displacement at age 49/50

Pregnancy outcomes by years preceding the survey

Completeness of reporting

Observation of handwashing facility

School attendance by single year of age

Vaccination cards photographed

Population pyramid

Five-year mortality rates

See details of the data quality tables in Appendix C of the final report.
a
5 Year Female Cancer Incidence MSSA
usc-geohealth-hub-uscssi.hub.arcgis.com
Updated Nov 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spatial Sciences Institute (2021). 5 Year Female Cancer Incidence MSSA [Dataset]. https://usc-geohealth-hub-uscssi.hub.arcgis.com/datasets/5-year-female-cancer-incidence-mssa
Explore at:
Dataset updated
Nov 10, 2021
Dataset authored and provided by
Spatial Sciences Institute
Area covered

Description
Medical Service Study Areas (MSSAs)As defined by California's Office of Statewide Health Planning and Development (OSHPD) in 2013, "MSSAs are sub-city and sub-county geographical units used to organize and display population, demographic and physician data" (Source). Each census tract in CA is assigned to a given MSSA. The most recent MSSA dataset (2014) was used. Spatial data are available via OSHPD at the California Open Data Portal. This information may be useful in studying health equity.Age-Adjusted Incidence Rate (AAIR)Age-adjustment is a statistical method that allows comparisons of incidence rates to be made between populations with different age distributions. This is important since the incidence of most cancers increases with age. An age-adjusted cancer incidence (or death) rate is defined as the number of new cancers (or deaths) per 100,000 population that would occur in a certain period of time if that population had a 'standard' age distribution. In the California Health Maps, incidence rates are age-adjusted using the U.S. 2000 Standard Population.
Cancer incidence rates
Incidence rates were calculated using case counts from the California Cancer Registry. Population data from 2010 Census and SEER 2015 census tract estimates by race/origin (controlling to Vintage 2015) were used to estimate population denominators. Yearly SEER 2015 census tract estimates by race/origin (controlling to Vintage 2015) were used to estimate population denominators for 5-year incidence rates (2013-2017)According to California Department of Public Health guidelines, cancer incidence rates cannot be reported if based on <15 cancer cases and/or a population <10,000 to ensure confidentiality and stable statistical rates.Spatial extent: CaliforniaSpatial Unit: MSSACreated: n/aUpdated: n/aSource: California Health MapsContact Email: gbacr@ucsf.eduSource Link: https://www.californiahealthmaps.org/?areatype=mssa&address=&sex=Both&site=AllSite&race=&year=05yr&overlays=none&choropleth=Obesity
a
5 year Colorectal Cancer Incidence MSSA
usc-geohealth-hub-uscssi.hub.arcgis.com
Updated Nov 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spatial Sciences Institute (2021). 5 year Colorectal Cancer Incidence MSSA [Dataset]. https://usc-geohealth-hub-uscssi.hub.arcgis.com/datasets/5-year-colorectal-cancer-incidence-mssa
Explore at:
Dataset updated
Nov 12, 2021
Dataset authored and provided by
Spatial Sciences Institute
Area covered

Description
Medical Service Study Areas (MSSAs)As defined by California's Office of Statewide Health Planning and Development (OSHPD) in 2013, "MSSAs are sub-city and sub-county geographical units used to organize and display population, demographic and physician data" (Source). Each census tract in CA is assigned to a given MSSA. The most recent MSSA dataset (2014) was used. Spatial data are available via OSHPD at the California Open Data Portal. This information may be useful in studying health equity.Age-Adjusted Incidence Rate (AAIR)Age-adjustment is a statistical method that allows comparisons of incidence rates to be made between populations with different age distributions. This is important since the incidence of most cancers increases with age. An age-adjusted cancer incidence (or death) rate is defined as the number of new cancers (or deaths) per 100,000 population that would occur in a certain period of time if that population had a 'standard' age distribution. In the California Health Maps, incidence rates are age-adjusted using the U.S. 2000 Standard Population.Cancer incidence ratesIncidence rates were calculated using case counts from the California Cancer Registry. Population data from 2010 Census and SEER 2015 census tract estimates by race/origin (controlling to Vintage 2015) were used to estimate population denominators. Yearly SEER 2015 census tract estimates by race/origin (controlling to Vintage 2015) were used to estimate population denominators for 5-year incidence rates (2013-2017)According to California Department of Public Health guidelines, cancer incidence rates cannot be reported if based on <15 cancer cases and/or a population <10,000 to ensure confidentiality and stable statistical rates.Spatial extent: CaliforniaSpatial Unit: MSSACreated: n/aUpdated: n/aSource: California Health MapsContact Email: gbacr@ucsf.eduSource Link: https://www.californiahealthmaps.org/?areatype=mssa&address=&sex=Both&site=AllSite&race=&year=05yr&overlays=none&choropleth=Obesity
s
Data from: Spatiotemporal incidence of Zika and associated environmental...
eprints.soton.ac.uk
data.niaid.nih.gov
+1more
Updated Jan 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siraj, Amir S.; Rodriguez-Barraquer, I.; Barker, Christopher M.; Tejedor-Garavito, Natalia; Harding, Dennis; Lorton, Christopher; Lukacevic, Dejan; Oates, Gene; Espana, Guido; Kraemer, Moritz U. G.; Manore, Carrie; Johansson, Michael A.; Tatem, Andrew J.; Reiner, Robert C.; Perkins, T. Alex (2019). Data from: Spatiotemporal incidence of Zika and associated environmental drivers for the 2015-2016 epidemic in Colombia [Dataset]. http://doi.org/10.5061/dryad.83nj1
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.83nj1
Dataset updated
Jan 1, 2019
Dataset provided by
DRYAD
Authors
Siraj, Amir S.; Rodriguez-Barraquer, I.; Barker, Christopher M.; Tejedor-Garavito, Natalia; Harding, Dennis; Lorton, Christopher; Lukacevic, Dejan; Oates, Gene; Espana, Guido; Kraemer, Moritz U. G.; Manore, Carrie; Johansson, Michael A.; Tatem, Andrew J.; Reiner, Robert C.; Perkins, T. Alex
Area covered
Colombia
Description
Despite a long history of mosquito-borne virus epidemics in the Americas, the impact of the Zika virus (ZIKV) epidemic of 2015-2016 was unexpected. The need for scientifically informed decision-making is driving research to understand the emergence and spread of ZIKV. To support that research, we assembled a data set of key covariates for modeling ZIKV transmission dynamics in Colombia, where ZIKV transmission was widespread and the government made incidence data publically available. On a weekly basis between January 1, 2014 and October 1, 2016 at three administrative levels, we collated spatiotemporal Zika incidence data, nine environmental variables, and demographic data into a single downloadable database. These new datasets and those we identified, processed, and assembled at comparable spatial and temporal resolutions will save future researchers considerable time and effort in performing these data processing steps, enabling them to focus instead on extracting epidemiological insights from this important data set. Similar approaches could prove useful for filling data gaps to enable epidemiological analyses of future disease emergence events.,Weekly mean temperature at 2.5 arc-minutesRaster brick of weekly mean temperature calculated as the average of the daily mean temperature (a total of 143 weeks between Jan 5, 2014 and Oct 1, 2016), in GRI format, at a resolution of 2.5 arc-minutes. Layer names indicate the date each week starts on. For example, the layer named tmean_wk151025 has mean temperature for the week that starts on October 25, 2015.mean_temperature.zipWeekly minimum temperature at 2.5 arc-minutesRaster brick of weekly minimum temperature calculated as the average of the daily minimum temperature (a total of 143 weeks between Jan 5, 2014 and Oct 1, 2016), in GRI format, at a resolution of 2.5 arc-minutes. Layer names indicate the date each week starts on. For example, the layer named tmin_wk151025 has minimum temperature for the week that starts on October 25, 2015.min_temperature.zipWeekly maximum temperature at 2.5 arc-minutesRaster brick of weekly maximum temperature calculated as the average of the daily maximum temperature (a total of 143 weeks between Jan 5, 2014 and Oct 1, 2016), in GRI format, at a resolution of 2.5 arc-minutes. Layer names indicate the date each week starts on. For example, the layer named tmax_wk151025 has maximum temperature for the week that starts on October 25, 2015.max_temperature.zipWeekly relative humidity at 2.5 arc-minutesRaster brick of weekly average relative humidity calculated as the average of the daily mean relative humidity (a total of 143 weeks between Jan 5, 2014 and Oct 1, 2016), in GRI format, at a resolution of 2.5 arc-minutes. Layer names indicate the date each week starts on. For example, the layer named rh_wk151025 has average relative humidity for the week that starts on October 25, 2015.rel_humidity.zipWeekly MODIS Terra NDVI at 2.5 arc-minutesRaster brick of weekly average NDVI from NASA's Terra satellite and MODIS sensor (a total of 143 weeks between Jan 5, 2014 and Oct 1, 2016), in GRI format, at a resolution of 2.5 arc-minutes. Layer names indicate the date each week starts on. For example, the layer named ndvi_modis_terra_wk151025 has the NDVI values for the week that starts on October 25, 2015.ndvi_modis_terra.zipWeekly MODIS Aqua NDVI at 2.5 arc-minutesRaster brick of weekly average NDVI from NASA's Aqua satellite and MODIS sensor (a total of 143 weeks between Jan 5, 2014 and Oct 1, 2016), in GRI format, at a resolution of 2.5 arc-minutes. Layer names indicate the date each week starts on. For example, the layer named ndvi_modis_aqua_wk151025 has average NDVI for the week starting on October 25, 2015.ndvi_modis_aqua.zipWeekly precipitation at 2.5 arc-minutesRaster brick of weekly total precipitation calculated as the total of the daily precipitation (a total of 143 weeks between Jan 5, 2014 and Oct 1, 2016), in GRI format, at a resolution of 2.5 arc-minutes. Layer names indicate the date each week starts on. For example, the layer named precip_wk151025 has precipitation for the week that starts on October 25, 2015.precipitation.zipAedes aegypti population at 2.5 arc-minutesRaster brick of ratio of Aedes aegypti population to human population at each week of the year (a total of 52 weeks), in GRI format, at a resolution of 2.5 arc-minutes.aegypti_population.zipGridded population in 2015 at 3 arc-secondsColombia population in 2015. The file is in BIL format, at a resolution of 3 arc-seconds.wpop_ppp_v2b_col_2015_0_05m.zipGridded births in 2015 at 3 arc-secondsColombia births in 2015. The file is in BIL format, at a resolution of 3 arc-seconds.wpop_births_col_2015_0_05m.zipGridded urban population in 2015 at 15 arc-secondsColombia urban population in 2015 obtained by multiplying WorldPop gridded population by urban extent binary raster file. the file in BIL format at a resolution of 15 arc-seconds.urban_pop_col_0_25m.zipGridded travel time at 30 arc-secondsColombia travel time (in minutes) to the nearest city of 50,000 or more population in year 2000. The file is in BIL format, at a resolution of 30 arc-seconds.travel_time_50k_col_0_5m.zipGridded gross cell product at 2.5 arc-minutesPer capita gross cell product in 2005 $US for Colombia cropped to match all other raster outputs. The file is in BIL format, at a resolution of 2.5 arc-minutes (resampled from the original 60 arc-minutes raster file).gecon_col_pcppp_2005_2_5m.zipWeekly Zika casesWeekly Zika cases at municipality, department and national levels (in .csv format).weekly_zika_cases.zipWeekly covariates aggregated at municipality levelTime series of weekly covariates aggregated at municipality level. This .zip file contains eight tables (in .csv format) of time series for aegypti population, maximum temperature, mean temperature, minimum temperature, NDVI MODIS Aqua, NDIV MODIS Terra, precipitation and relative humidity.spatial_aggregates_municip.zipWeekly covariates aggregated at department levelTime series of weekly covariates aggregated at department level. This .zip file contains eight tables (in .csv format) of time series for aegypti population, maximum temperature, mean temperature, minimum temperature, NDVI MODIS Aqua, NDIV MODIS Terra, precipitation and relative humidity.spatial_aggregates_dept.zipWeekly covariates aggregated at national levelTime series of weekly covariates aggregated at national level. This .zip file contains eight tables (in .csv format) of time series for aegypti population, maximum temperature, mean temperature, minimum temperature, NDVI MODIS Aqua, NDIV MODIS Terra, precipitation and relative humidity.spatial_aggregates_national.zipWeekly weighted covariates aggregated at municipality levelTime series of weekly covariates, weighted by population, aggregated at municipality level. This .zip file contains eight tables (in .csv format) of time series for aegypti population, maximum temperature, mean temperature, minimum temperature, NDVI MODIS Aqua, NDIV MODIS Terra, precipitation and relative humidity.weighted_spatial_aggregates_municip.zipWeekly weighted covariates aggregated at department levelTime series of weekly covariates, weighted by population, aggregated at department level. This .zip file contains eight tables (in .csv format) of time series for aegypti population, maximum temperature, mean temperature, minimum temperature, NDVI MODIS Aqua, NDIV MODIS Terra, precipitation and relative humidity.weighted_spatial_aggregates_dept.zipWeekly weighted covariates aggregated at national levelTime series of weekly covariates, weighted by population, aggregated at national level. This .zip file contains eight tables (in .csv format) of time series for aegypti population, maximum temperature, mean temperature, minimum temperature, NDVI MODIS Aqua, NDIV MODIS Terra, precipitation and relative humidity.weighted_spatial_aggregates_national.zipFixed time covariates aggregated at all levelsFixed time covariates aggregated at municipality, department and national levels (in .csv format). This .zip file contains three tables (in .csv format), one for each level, with data on population, births, urban population, mean gross cell product and mean travel time.spatial_aggregate_non_timeseries.zipSpatial time series movies ZIKV cases and environmental drivers.Spatial time series movies ZIKV cases and environmental drivers at weekly time step and municipality level (in .MP4 format). This file includes nine movies: weekly number of cases, cumulative number of cases, average NDVI (Aqua), average NDVI (Terra), total precipitation, relative humidity, minimum temperature, mean temperature and maximum temperature.spatial_timeseries_movies.zip
f
Data_Sheet_1_Excess multi-cause mortality linked to influenza virus...
figshare.com
frontiersin.figshare.com
docx
Updated Jun 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tian-Lu Yin; Ning Chen; Jin-Yao Zhang; Shuang Yang; Wei-Min Li; Xiao-Huan Gao; Hao-Lin Shi; Hong-Pu Hu (2024). Data_Sheet_1_Excess multi-cause mortality linked to influenza virus infection in China, 2012–2021: a population-based study.docx [Dataset]. http://doi.org/10.3389/fpubh.2024.1399672.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpubh.2024.1399672.s001
Dataset updated
Jun 3, 2024
Dataset provided by
Frontiers
Authors
Tian-Lu Yin; Ning Chen; Jin-Yao Zhang; Shuang Yang; Wei-Min Li; Xiao-Huan Gao; Hao-Lin Shi; Hong-Pu Hu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectivesThe aim of this study is to estimate the excess mortality burden of influenza virus infection in China from 2012 to 2021, with a concurrent analysis of its associated disease manifestations.MethodsLaboratory surveillance data on influenza, relevant population demographics, and mortality records, including cause of death data in China, spanning the years 2012 to 2021, were incorporated into a comprehensive analysis. A negative binomial regression model was utilized to calculate the excess mortality rate associated with influenza, taking into consideration factors such as year, subtype, and cause of death.ResultsThere was no evidence to indicate a correlation between malignant neoplasms and any subtype of influenza, despite the examination of the effect of influenza on the mortality burden of eight diseases. A total of 327,520 samples testing positive for influenza virus were isolated between 2012 and 2021, with a significant decrease in the positivity rate observed during the periods of 2012–2013 and 2019–2020. China experienced an average annual influenza-associated excess deaths of 201721.78 and an average annual excess mortality rate of 14.53 per 100,000 people during the research period. Among the causes of mortality that were examined, respiratory and circulatory diseases (R&C) accounted for the most significant proportion (58.50%). Fatalities attributed to respiratory and circulatory diseases exhibited discernible temporal patterns, whereas deaths attributable to other causes were dispersed over the course of the year.ConclusionTheoretically, the contribution of these disease types to excess influenza-related fatalities can serve as a foundation for early warning and targeted influenza surveillance. Additionally, it is possible to assess the costs of prevention and control measures and the public health repercussions of epidemics with greater precision.
a
5 year Female Kidney Cancer Incidence MSSA
usc-geohealth-hub-uscssi.hub.arcgis.com
Updated Nov 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spatial Sciences Institute (2021). 5 year Female Kidney Cancer Incidence MSSA [Dataset]. https://usc-geohealth-hub-uscssi.hub.arcgis.com/datasets/USCSSI::5-year-female-kidney-cancer-incidence-mssa
Explore at:
Dataset updated
Nov 12, 2021
Dataset authored and provided by
Spatial Sciences Institute
Area covered

Description
Medical Service Study Areas (MSSAs)As defined by California's Office of Statewide Health Planning and Development (OSHPD) in 2013, "MSSAs are sub-city and sub-county geographical units used to organize and display population, demographic and physician data" (Source). Each census tract in CA is assigned to a given MSSA. The most recent MSSA dataset (2014) was used. Spatial data are available via OSHPD at the California Open Data Portal. This information may be useful in studying health equity.Age-Adjusted Incidence Rate (AAIR)Age-adjustment is a statistical method that allows comparisons of incidence rates to be made between populations with different age distributions. This is important since the incidence of most cancers increases with age. An age-adjusted cancer incidence (or death) rate is defined as the number of new cancers (or deaths) per 100,000 population that would occur in a certain period of time if that population had a 'standard' age distribution. In the California Health Maps, incidence rates are age-adjusted using the U.S. 2000 Standard Population.Cancer incidence ratesIncidence rates were calculated using case counts from the California Cancer Registry. Population data from 2010 Census and SEER 2015 census tract estimates by race/origin (controlling to Vintage 2015) were used to estimate population denominators. Yearly SEER 2015 census tract estimates by race/origin (controlling to Vintage 2015) were used to estimate population denominators for 5-year incidence rates (2013-2017)According to California Department of Public Health guidelines, cancer incidence rates cannot be reported if based on <15 cancer cases and/or a population <10,000 to ensure confidentiality and stable statistical rates.Spatial extent: CaliforniaSpatial Unit: MSSACreated: n/aUpdated: n/aSource: California Health MapsContact Email: gbacr@ucsf.eduSource Link: https://www.californiahealthmaps.org/?areatype=mssa&address=&sex=Both&site=AllSite&race=&year=05yr&overlays=none&choropleth=Obesity
f
Table 1_Trends in cervical cancer incidence and mortality in the United...
frontiersin.figshare.com
docx
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xianying Cheng; Ping Wang; Li Cheng; Feng Zhao; Jiangang Liu (2025). Table 1_Trends in cervical cancer incidence and mortality in the United States, 1975–2018: a population-based study.docx [Dataset]. http://doi.org/10.3389/fmed.2025.1579446.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmed.2025.1579446.s001
Dataset updated
Apr 30, 2025
Dataset provided by
Frontiers
Authors
Xianying Cheng; Ping Wang; Li Cheng; Feng Zhao; Jiangang Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundCervical cancer incidence and mortality rates in the United States have substantially declined over recent decades, primarily driven by reductions in squamous cell carcinoma cases. However, the trend in recent years remains unclear. This study aimed to explore the trends in cervical cancer incidence and mortality, stratified by demographic and tumor characteristics from 1975 to 2018.MethodsThe age-adjusted incidence, incidence-based mortality, and relative survival of cervical cancer were calculated using the Surveillance, Epidemiology, and End Results (SEER)-9 database. Trend analyses with annual percent change (APC) and average annual percent change (AAPC) calculations were performed using Joinpoint Regression Software (Version 4.9.1.0, National Cancer Institute).ResultsDuring 1975–2018, 49,658 cervical cancer cases were diagnosed, with 17,099 recorded deaths occurring between 1995 and 2018. Squamous cell carcinoma was the most common histological type, with 34,169 cases and 11,859 deaths. Over the study period, the cervical cancer incidence rate decreased by an average of 1.9% (95% CI: −2.3% to −1.6%) per year, with the APCs decreased in recent years (−0.5% [95% CI: −1.1 to 0.1%] in 2006–2018). Squamous cell carcinoma incidence trends closely paralleled overall cervical cancer patterns, but the incidence of squamous cell carcinoma in the distant stage increased significantly (1.1% [95% CI: 0.4 to 1.8%] in 1990–2018). From 1995 to 2018, the overall cervical cancer mortality rate decreased by 1.0% (95% CI: −1.2% to −0.8%) per year. But for distant-stage squamous cell carcinoma, the mortality rate increased by 1.2% (95% CI: 0.3 to 2.1%) per year.ConclusionFor cervical cancer cases diagnosed in the United States from 1975 to 2018, the overall incidence and mortality rates decreased significantly. However, there was an increase in the incidence and mortality of advanced-stage squamous cell carcinoma. These epidemiological patterns offer critical insights for refining cervical cancer screening protocols and developing targeted interventions for advanced-stage cases.
i
AIDS Impact Survey III 2008 - Botswana
catalog.ihsn.org
dev.ihsn.org
+1more
Updated Mar 29, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Statistics Office (CSO) (2019). AIDS Impact Survey III 2008 - Botswana [Dataset]. https://catalog.ihsn.org/index.php/catalog/2045
Explore at:
Dataset updated
Mar 29, 2019
Dataset provided by
National AIDS Coordinating Agency (NACA)
Central Statistics Office (CSO)
Time period covered
2008
Area covered
Botswana
Description
Abstract

The primary objective of the 2008 BAIS III was to update current information on the behavioral patterns of the populations aged 10-64 years and the HIV prevalence and incidence rates among those aged 18 months and above at national, district and sub-district level. This information will be used for continuous strategic prevention and national HIV program planning and future HIV and AIDS research.

Specifically, the survey was intended to provide: i. current national HIV prevalence and incidence estimates among the population aged 18 months and above; ii. indicative trends in sexual and preventive behavior among the population aged 10-64 years; iii. a comparison between HIV rate, behavior, knowledge, attitude, and cultural factors that are associated with the epidemic with estimates derived from previous surveys; and iv. information on demographic, socio-economic, housing and household members' conditions associated with and/or are determinants and consequences of the pandemic.

A related objective is to produce survey results in a timely manner and ensure that the data are disseminated to a wide audience of potential users in government and non-governmental organizations within and outside Botswana as part of facilitation of broader effort to strengthen strategies aimed at combating the disease.

Geographic coverage

National

Kind of data

Sample survey data [ssd]

Sampling procedure

The 2008 Botswana AIDS Impact Survey III (BAIS III) was designed to provide a comparison (trend) between HIV prevalence rate, behavior, knowledge, attitude, and other factors that are associated with the epidemic with estimates derived from previous survey, i.e. 2004 BAIS II.

Sampling Frame For BAIS-II the sampling frame based on the 2001 Population and Housing Census. This comprised the list of all Enumeration Areas (EAs) together with number of households. In 2001 Census the EAs were framed of manageable size (in terms of dwellings/households). So, the primary sampling units (PSUs) were EAs.

Stratification Stratification was undertaken such that all districts and major urban centers become their own strata. With regard to increase precision consideration was also given to group EAs according to ecological zones in rural districts and according to income categories in cities/towns. Geographical stratification along ecological zones and income categories was expected to improve the accuracy of survey data in view that homogeneity of the variables within stratum was relatively high.

Sample Design A stratified two-stage probability sample design was used for the selection of the sample.

The first stage was the selection of EAs as Primary Sampling Units (PSUs) selected with probability proportional to measures of size (PPS), where measures of size (MOS) were the number of households in the EA as defined by the 2001 Population and Housing Census. In all 459 EAs were selected with probability proportional to size.

At the second stage of sampling, the households were systematically selected from fresh list of occupied households prepared at the beginning of the survey's fieldwork (i.e. listing of households for the selected EAs). Overall 8,275 households were drawn systematically.

Note: See detailed sampling information in BAIS-III final report.

Mode of data collection

Face-to-face [f2f]

Research instrument

The questionnaires are the primary recording documents of the survey. In the development of the questionnaires, along with the professionals, the other members (Including some users) were also invited. The final version of the questionnaires was finalized on the basis of the experiences aimed from the pretest conducted using the drafted questionnaires for the survey.

The 208 BAIS III has three major components, namely the Household Questionnaire, the Individual Questionnaire and the Blood Collection Form.

The Household Questionnaire was used to list all members of the selected households and their demographic characteristics such as age, sex, orphan hood (0-17 years) and economic activity.

The Individual Questionnaire was designed to capture information regarding demographic characteristics, care and support, marriage and cohabiting partnerships, alcohol consumption and drug use, sexual history and behavior, male circumcision and sexually transmitted diseases, knowledge about HIV/AIDS and level of interventions, attitudes towards people with HIV/AIDS, childbearing and antenatal care as well as availability of social and medical services in response to the pandemic.

The third component is on the scale of the pandemic. It was designed to collect blood samples from members of households aged 18 months and over for testing and estimation of HIV prevalence and derivation of incidence measures.

Cleaning operations

Before data entry was carried out, the questionnaires were edited to check if all the relevant questions have been responded to and coded according to the codes designed for the study. Editing and coding started in July 2007 and finished in February 2008. Data entry was carried out under the supervision of one programmer/supervisor. Consistency checks on the data set as per the computer edit specifications designed by the subject matter specialists were performed.

Sampling error estimates

SAMPLING ERRORS The estimates from a sample survey are affected by two types of errors: (1) non-sampling error, and (2) sampling errors. Non-sampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2008 BAIS III to minimize these type of errors, non-sampling errors are impossible to avoid and difficult to evaluate statistically.

Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2008 BAIS III is only one of many samples that could have been selected from the same population, using the same sample design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.

A sampling error is usually measured in terms of standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.

The standard error can also be used to compute the design effect (DEFT) for each estimate, which is defined as the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used. A DEFT value of 1 indicates that the sample design is as efficient as simple random sample: a value greater than 1 indicates that increase in the sampling error is due to the use of more complex and less statistically efficient design.

If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulae for calculating standard errors. However, the BFHS sample is the results of a stratified two stage design which is considered a complex design, hence special methods and software's are required to take into account the complexity of the design.

WesVar 4.3 statistical software (supported by WESTAT) was used to obtain standard errors, confidence intervals and design effect for selected indicators. It is a powerful tool for statistical data analysis from complex survey designs which includes multi-stage, stratification and unequal probability samples. Jackknife replication method was applied which forms part of the replication options within this software. To estimate variances using the jackknife method requires forming replications from the full sample by randomly eliminating one sample cluster (enumeration area) from a domain or stratum at a time. Then a pseudo-estimate is formed from the retained EAs, which are re-weighted to compensate for the eliminated unit. Thus, for a particular stratum containing k clusters, k replicated estimates are formed by eliminating one of these, at a time, and increasing the weight of the remaining (k - 1) clusters by a factor of k /(k - 1). This process is repeated for each cluster.

Note: See detailed sampling error calculation which is presented in 2008 BAIS-III final report.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

IHME (2019). US county-level mortality [Dataset]. https://www.kaggle.com/IHME/us-countylevel-mortality/data

US county-level mortality

United States Mortality Rates by County 1980-2014

Explore at:

33 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 17, 2019

Dataset provided by

Kaggle

Authors

IHME

Area covered

United States

Description

Context

IHME United States Mortality Rates by County 1980-2014: National - All. (Deaths per 100,000 population)

To quickly get started creating maps, like the one below, see the Quick Start R kernel.

https://storage.googleapis.com/montco-stats/kaggleNeoplasms.png" alt="NeoplasmsMap">

How the Dataset was Created

This Dataset was created from the Excel Spreadsheet, which can be found in the download. Or, you can view the source here. If you take a look at the row for United States, for the column Mortality Rate, 1980*, you'll see the set of numbers 1.52 (1.44, 1.61). Numbers in parentheses are 95% uncertainty. The 1.52 is an age-standardized mortality rate for both sexes combined (deaths per 100,000 population).

In this Dataset 1.44 will be placed in the named column Mortality Rage, 1989 (Min)* and 1.61 is in column named Mortality Rate, 1980 (Max)* . For information on how these Age-standardized mortality rates were calculated, see the December JAMA 2016 article, which you can download for free.

https://storage.googleapis.com/montco-stats/kaggleUSMort.png" alt="Spreadsheet">

Reference

JAMA Full Article

Video Describing this Study (Short and this is worth viewing)

Data Resources

How Americans Die May Depend On Where They Live, by Anna Maria Barry-Jester (FiveThirtyEight)

Interactive Map from healthdata.org

IHME Data

Acknowledgements

This Dataset was provided by IHME

Clear search

Close search

Google apps

Main menu

US county-level mortality

Context

How the Dataset was Created

Reference

Acknowledgements

Vital Signs: Life Expectancy – Bay Area

COVID-19 Visualisation and Epidemic Analysis Data

COVID-19 Dataset for Epidemic Model Development

Content

Acknowledgements

Inspiration

National Demographic and Health Survey 2022 - Philippines

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Data appraisal

5 Year Female Cancer Incidence MSSA

Cancer incidence rates

5 year Colorectal Cancer Incidence MSSA

Data from: Spatiotemporal incidence of Zika and associated environmental...

Data_Sheet_1_Excess multi-cause mortality linked to influenza virus...

5 year Female Kidney Cancer Incidence MSSA

Table 1_Trends in cervical cancer incidence and mortality in the United...

AIDS Impact Survey III 2008 - Botswana

Abstract

Geographic coverage

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Sampling error estimates

US county-level mortality

United States Mortality Rates by County 1980-2014

Context

How the Dataset was Created

Reference

Acknowledgements