Data set is for private consumption for the competition.
According to IBEF “Domestic automobiles production increased at 2.36% CAGR between FY16-20 with 26.36 million vehicles being manufactured in the country in FY20.Overall, domestic automobiles sales increased at 1.29% CAGR between FY16-FY20 with 21.55 million vehicles being sold in FY20”.The rise in vehicles on the road will also lead to multiple challenges and the road will be more vulnerable to accidents.Increased accident rates also leads to more insurance claims and payouts rise for insurance companies.
In order to pre-emptively plan for the losses, the insurance firms leverage accident data to understand the risk across the geographical units e.g. Postal code/district etc.
In this challenge, we are providing you the dataset to predict the “Accident_Risk_Index” against the postcodes.Accident_Risk_Index (mean casualties at a postcode) = sum(Number_of_casualities)/count(Accident_ID)
Working example:
Train Data (given)
Accident_ID Postcode Number_of_casualities
1 AL1 1JJ 2
2 AL1 1JP 3
3 AL1 3PS 2
4 AL1 3PS 1
5 AL1 3PS 1
Modelling Train Data (Rolled up at Postcode level)
Postcode Derived_feature1 Derived_feature2 Accident_risk_Index
AL1 1JJ _ _ 2
AL1 1JP _ _ 3
AL1 3PS _ _ 1.33
The participants are required to predict the 'Accident_risk_index' for the test.csv and against the postcode on the test data.
Then submit your 'my_submission_file.csv' on the submission tab of the hackathon page.
Pro-tip: The participants are required to perform feature engineering to first roll-up the train data at postcode level and create a column as “accident_risk_index” and optimize the model against postcode level.
Few Hypothesis to help you think: "More accidents happen in the later part of the day as those are office hours causing congestion"
"Postal codes with more single carriage roads have more accidents"
(***In the above hypothesis features such as office_hours_flag and #single _carriage roads can be formed)
Additionally, we are providing you with road network data (contains info on the nearest road to a postcode and it's characteristics) and population data (contains info about population at area level). This info are for augmentation of features, but not mandatory to use.
The provided dataset contains the following files:
train.csv & test.csv:
'Accident_ID', 'Police_Force', 'Number_of_Vehicles', 'Number_of_Casualties', 'Date', 'Day_of_Week', 'Time', ‘Local_Authority_(District)', 'Local_Authority_(Highway)', '1st_Road_Class', '1st_Road_Number', 'Road_Type', 'Speed_limit', '2nd_Road_Class', '2nd_Road_Number', 'Pedestrian_Crossing-Human_Control', 'Pedestrian_Crossing-Physical_Facilities', 'Light_Conditions', ‘'Weather_Conditions', 'Road_Surface_Conditions', 'Special_Conditions_at_Site', 'Carriageway_Hazards', 'Urban_or_Rural_Area', 'Did_Police_Officer_Attend_Scene_of_Accident', 'state', 'postcode', 'country'
population.csv:
'postcode', 'Rural Urban', 'Variable: All usual residents; measures: Value', 'Variable: Males; measures: Value', 'Variable: Females; measures: Value', ‘Variable: Lives in a household; measures: Value', ‘Variable: Lives in a communal establishment; measures: Value', 'Variable: Schoolchild or full-time student aged 4 and over at their non term-time address; measures: Value', 'Variable: Area (Hectares); measures: Value', 'Variable: Density (number of persons per hectare); measures: Value'
roads_network.csv:
'WKT', 'roadClassi', ‘roadFuncti', 'formOfWay', 'length', 'primaryRou', 'distance to the nearest point on rd', 'postcode’
Overview Swiss Re is one of the largest reinsurers in the world headquartered in Zurich with offices in over 25 countries. Swiss Re’s core expertise is in underwriting in life, health, as well as the property and casualty insurance space whereas its tech strategy focuses on developing smarter and innovative solutions for clients’ value chains by leveraging data and technology.
The company’s vision is to make the world more resilient. Swiss Re believes in applying fresh perspectives, knowledge and capital to anticipate and manage risk to create smarter solutions and help the world rebuild, renew and move forward.About 1300 professionals that work in the Swiss Re Global Business Solutions Center (BSC), Bangalore combine experience, expertise and out-of-the-box thinking to bring Swiss Re's core business to life by creating new business opportunities.
Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses. Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 12+ and age 5+ denominators have been uploaded as archived tables. Starting June 30, 2021, the dataset has been reconfigured so that all updates are appended to one dataset to make it easier for API and other interfaces. In addition, historical data has been extended back to January 5, 2021. This dataset shows full, partial, and at least 1 dose coverage rates by zip code tabulation area (ZCTA) for the state of California. Data sources include the California Immunization Registry and the American Community Survey’s 2015-2019 5-Year data. This is the data table for the LHJ Vaccine Equity Performance dashboard. However, this data table also includes ZTCAs that do not have a VEM score. This dataset also includes Vaccine Equity Metric score quartiles (when applicable), which combine the Public Health Alliance of Southern California’s Healthy Places Index (HPI) measure with CDPH-derived scores to estimate factors that impact health, like income, education, and access to health care. ZTCAs range from less healthy community conditions in Quartile 1 to more healthy community conditions in Quartile 4. The Vaccine Equity Metric is for weekly vaccination allocation and reporting purposes only. CDPH-derived quartiles should not be considered as indicative of the HPI score for these zip codes. CDPH-derived quartiles were assigned to zip codes excluded from the HPI score produced by the Public Health Alliance of Southern California due to concerns with statistical reliability and validity in populations smaller than 1,500 or where more than 50% of the population resides in a group setting. These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons. For some ZTCAs, vaccination coverage may exceed 100%. This may be a result of many people from outside the county coming to that ZTCA to get their vaccine and providers reporting the county of administration as the county of residence, and/or the DOF estimates of the population in that ZTCA are too low. Please note that population numbers provided by DOF are projections and so may not be accurate, especially given unprecedented shifts in population as a result of the pandemic.
As of 1/19/2022, this dataset is no longer being updated. For more data on COVID-19 in Connecticut, visit data.ct.gov/coronavirus.
This tables shows the percent of people who have received at least one dose of COVID-19 vaccine who live in a Priority SVI Zip Code. About a third of people in CT live in a Priority SVI zip code.
SVI refers to the CDC's Social Vulnerability Index - a measure that combines 15 demographic variables to identify communities most vulnerable to negative health impacts from disasters and public health crises. Measures of social vulnerability include socioeconomic status, household composition, disability, race, ethnicity, language, and transportation limitations - among others. SVI scores were calculated for each zip code in CT. The zip codes in the top 20% were designated as Priority SVI zip codes. Percentages are based on 2018 zip code population data supplied by ESRI corporation.
All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected.
The data are presented cumulatively and by week of first dose of vaccine. Percentages are reported for all providers combined and for pharmacies, FQHCs (Federally Qualified Health Centers), local public health departments / districts and hospitals. The table excludes people with a missing or out-of-state zip code and doses administered by the Federal government (including Department of Defense, Department of Correction, Department of Veteran’s Affairs, Indian Health Service) or out-of-state providers.
VITAL SIGNS INDICATOR Life Expectancy (EQ6)
FULL MEASURE NAME Life Expectancy
LAST UPDATED April 2017
DESCRIPTION Life expectancy refers to the average number of years a newborn is expected to live if mortality patterns remain the same. The measure reflects the mortality rate across a population for a point in time.
DATA SOURCE State of California, Department of Health: Death Records (1990-2013) No link
California Department of Finance: Population Estimates Annual Intercensal Population Estimates (1990-2010) Table P-2: County Population by Age (2010-2013) http://www.dof.ca.gov/Forecasting/Demographics/Estimates/
U.S. Census Bureau: Decennial Census ZCTA Population (2000-2010) http://factfinder.census.gov
U.S. Census Bureau: American Community Survey 5-Year Population Estimates (2013) http://factfinder.census.gov
CONTACT INFORMATION vitalsigns.info@mtc.ca.gov
METHODOLOGY NOTES (across all datasets for this indicator) Life expectancy is commonly used as a measure of the health of a population. Life expectancy does not reflect how long any given individual is expected to live; rather, it is an artificial measure that captures an aspect of the mortality rates across a population that can be compared across time and populations. More information about the determinants of life expectancy that may lead to differences in life expectancy between neighborhoods can be found in the Bay Area Regional Health Inequities Initiative (BARHII) Health Inequities in the Bay Area report at http://www.barhii.org/wp-content/uploads/2015/09/barhii_hiba.pdf. Vital Signs measures life expectancy at birth (as opposed to cohort life expectancy). A statistical model was used to estimate life expectancy for Bay Area counties and ZIP Codes based on current life tables which require both age and mortality data. A life table is a table which shows, for each age, the survivorship of a people from a certain population.
Current life tables were created using death records and population estimates by age. The California Department of Public Health provided death records based on the California death certificate information. Records include age at death and residential ZIP Code. Single-year age population estimates at the regional- and county-level comes from the California Department of Finance population estimates and projections for ages 0-100+. Population estimates for ages 100 and over are aggregated to a single age interval. Using this data, death rates in a population within age groups for a given year are computed to form unabridged life tables (as opposed to abridged life tables). To calculate life expectancy, the probability of dying between the jth and (j+1)st birthday is assumed uniform after age 1. Special consideration is taken to account for infant mortality.
For the ZIP Code-level life expectancy calculation, it is assumed that postal ZIP Codes share the same boundaries as ZIP Code Census Tabulation Areas (ZCTAs). More information on the relationship between ZIP Codes and ZCTAs can be found at http://www.census.gov/geo/reference/zctas.html. ZIP Code-level data uses three years of mortality data to make robust estimates due to small sample size. Year 2013 ZIP Code life expectancy estimates reflects death records from 2011 through 2013. 2013 is the last year with available mortality data. Death records for ZIP Codes with zero population (like those associated with P.O. Boxes) were assigned to the nearest ZIP Code with population. ZIP Code population for 2000 estimates comes from the Decennial Census. ZIP Code population for 2013 estimates are from the American Community Survey (5-Year Average). ACS estimates are adjusted using Decennial Census data for more accurate population estimates. An adjustment factor was calculated using the ratio between the 2010 Decennial Census population estimates and the 2012 ACS 5-Year (with middle year 2010) population estimates. This adjustment factor is particularly important for ZCTAs with high homeless population (not living in group quarters) where the ACS may underestimate the ZCTA population and therefore underestimate the life expectancy. The ACS provides ZIP Code population by age in five-year age intervals. Single-year age population estimates were calculated by distributing population within an age interval to single-year ages using the county distribution. Counties were assigned to ZIP Codes based on majority land-area.
ZIP Codes in the Bay Area vary in population from over 10,000 residents to less than 20 residents. Traditional life expectancy estimation (like the one used for the regional- and county-level Vital Signs estimates) cannot be used because they are highly inaccurate for small populations and may result in over/underestimation of life expectancy. To avoid inaccurate estimates, ZIP Codes with populations of less than 5,000 were aggregated with neighboring ZIP Codes until the merged areas had a population of more than 5,000. ZIP Code 94103, representing Treasure Island, was dropped from the dataset due to its small population and having no bordering ZIP Codes. In this way, the original 305 Bay Area ZIP Codes were reduced to 217 ZIP Code areas for 2013 estimates. Next, a form of Bayesian random-effects analysis was used which established a prior distribution of the probability of death at each age using the regional distribution. This prior is used to shore up the life expectancy calculations where data were sparse.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The first resource below provides a list of all 2011 census frozen postcodes across the UK as well as the:
Suppressed postcodes in Northern Ireland
For confidentiality reasons, counts were suppressed for postcodes that had less than 10 usual residents and had only 1, 2 or 3 households in them.
The Registrar General took steps to ensure that the confidentiality of respondents was fully protected. Accordingly, all published results from the 2011 Census (including those relating to Postcodes) were subject to statistical processes to ensure that individuals could not be identified. For these postcodes, averages were taken at Postcode District level and released in a separate table, which can be found below.
Missing postcodes
These postcodes are based upon the sets of enumeration postcodes provided by the three UK census agencies. Enumeration postcodes are a subset of the complete set of live postcodes at the time of the 2011 Census. These are aggregated to create census output areas, which are themselves aggregated to create most other census geographies.
Only postcodes with at least one resident person are included. Many postcodes, such as those assigned to businesses, don't have any resident populations and so won't appear in the table.
Postcodes are quite volatile; new postcodes are created and old ones are terminated regularly. Existing/live postcodes can also change through the addition or removal of delivery points. The ONSPD records all live and terminated postcodes. Each postcode has a date of introduction and, if relevant, a date of termination. Things are complicated further because postcodes can be re-used, so a postcode can be terminated and then reappear with a new date of introduction, replacing/removing the record for the previous instance of the postcode. Postcodes that weren't current at the time of the census also won't appear in the table.
Note: Starting June 30, 2021, the dataset has been reconfigured so that all updates are appended to one dataset to make it easier for API and other interfaces. In addition, historical data has been extended back to January 5, 2021.
Note: Starting on November 10, 2021, the denominator for calculating vaccine coverage has been changed from age 12+ to age 5+ to reflect new vaccine eligibility criteria. The previous dataset based on age 12+ denominators has been uploaded as an archived table. Previously on May 18, 2021, the denominator was changed from age 16+ to age 12+ to reflect a previous change in vaccine eligibility criteria.
This dataset shows full, partial, and at least 1 dose coverage rates by zip code tabulation area (ZCTA) for the state of California. Data sources include the California Immunization Registry and the American Community Survey’s 2015-2019 5-Year data.
This is the data table for the LHJ Vaccine Equity Performance dashboard. However, this data table also includes ZTCAs that do not have a VEM score.
This dataset also includes Vaccine Equity Metric score quartiles (when applicable), which combine the Public Health Alliance of Southern California’s Healthy Places Index (HPI) measure with CDPH-derived scores to estimate factors that impact health, like income, education, and access to health care. ZTCAs range from less healthy community conditions in Quartile 1 to more healthy community conditions in Quartile 4.
The Vaccine Equity Metric is for weekly vaccination allocation and reporting purposes only. CDPH-derived quartiles should not be considered as indicative of the HPI score for these zip codes. CDPH-derived quartiles were assigned to zip codes excluded from the HPI score produced by the Public Health Alliance of Southern California due to concerns with statistical reliability and validity in populations smaller than 1,500 or where more than 50% of the population resides in a group setting.
These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.
For some ZTCAs, vaccination coverage may exceed 100%. This may be a result of many people from outside the county coming to that ZTCA to get their vaccine and providers reporting the county of administration as the county of residence, and/or the DOF estimates of the population in that ZTCA are too low. Please note that population numbers provided by DOF are projections and so may not be accurate, especially given unprecedented shifts in population as a result of the pandemic.
NOTE: As of 2/16/2023, this page is not being updated. This tables shows the number and percent of people that have initiated COVID-19 vaccination, are fully vaccinated and had additional dose 1 grouped by whether they live in an SVI Priority Zip Code. People with an out-of-state zip code are excluded from this analysis. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. A person who has received at least one dose of any COVID-19 vaccine is considered to have initiated vaccination. A person is considered fully vaccinated if they have completed a primary vaccine series by receiving 2 doses of the Pfizer, Novavax or Moderna vaccines or 1 dose of the Johnson & Johnson vaccine. The fully vaccinated are a subset of the number who have received at least one dose. A person who completed a Pfizer, Moderna, Novavax or Johnson & Johnson primary series (as defined above) and then had an additional dose of COVID-19 vaccine is considered to have had additional dose 1. The additional monovalent dose may be Pfizer, Moderna, Novavax or Johnson & Johnson and may be a different type from the primary series. For people who had a primary Pfizer or Moderna series, additional dose 1 was counted starting August 18th, 2021. For people with a Johnson & Johnson primary series additional dose 1 was counted starting October 22nd, 2021. For most people, additional dose 1 is a booster. However, additional dose 1 may represent a supplement to the primary series for a people who is moderately or severely immunosuppressed. Bivalent booster administrations are not included in the additional dose 1 calculations. SVI refers to the CDC's Social Vulnerability Index - a measure that combines 15 demographic variables to identify communities most vulnerable to negative health impacts from disasters and public health crises. Measures of social vulnerability include socioeconomic status, household composition, disability, race, ethnicity, language, and transportation limitations - among others. SVI scores were calculated for each zip code in CT. The zip codes in the top 20% were designated as SVI Priority Zip Codes. Percentages are based on 2018 zip code population data supplied by ESRI corporation. The percent with at least one dose many be over-estimated and the percent fully vaccinated and with additional dose 1 may be under-estimated because of vaccine administration records for individuals that cannot be linked because of differences in how names or date of birth are reported. Connecticut COVID-19 Vaccine Program providers are required to report information on all COVID-19 vaccine doses administered to CT WiZ, the Connecticut Immunization Information System. Data on doses administered to CT residents out-of-state are being added to CT WiZ jurisdiction-by-jurisdiction. Doses administered by some Federal entities (including Department of Defense, Department of Correction, Department of Veteran’s Affairs, Indian Health Service) are not yet reported to CT WiZ. Data reported here reflect the vaccination records currently reported to CT WiZ. Note: As part of continuous data quality improvement efforts, duplicate records were removed from the COVID-19 vaccination data during the weeks of 4/19/2021 and 4/26/2021.
Abstract copyright UK Data Service and data collection copyright owner.
The UK censuses took place on 29th April 2001. They were run by the Northern Ireland Statistics & Research Agency (NISRA), General Register Office for Scotland (GROS), and the Office for National Statistics (ONS) for both England and Wales. The UK comprises the countries of England, Wales, Scotland and Northern Ireland.
Statistics from the UK censuses help paint a picture of the nation and how we live. They provide a detailed snapshot of the population and its characteristics, and underpin funding allocation to provide public services.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
National and subnational mid-year population estimates for the UK and its constituent countries by administrative area, age and sex (including components of population change, median age and population density).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data associated with the paper submitted entitled:
A Novel Approach for Mapping Exposure to Land Cover at the Small Statistical Geography Level
Joanne K. Garrett1, Lewis R. Elliott1, Rebecca Lovell1, Benedict W. Wheeler1, Tom Marshall2, Fränze Kibowski2, Benjamin B. Philips3, Kevin J. Gaston3
This dataset includes the Living England percentage land cover at the LSOA level calculated by two methods. These methods are referred to as the "proposed" method and the "typical" method. Our proposed method uses data at both LSOA and postcode (sub-LSOA) levels for England, first calculating the percentage coverage of land cover types within 300 m postcode buffers, then averaging these at the LSOA level weighted by the number of domestic postal delivery addresses (as a proxy for population per postcode). It mitigates edge effects by allowing habitat exposure to extend beyond the LSOA boundary through the use of a 300-metre postcode buffer and maintains consistency across varying LSOA sizes. We argue that the new proposed approach reduces the potential for exposure misclassification associated with variable unit size at the small statistical geography level.
The variables are described in the included file data_variable_descriptions.xlsx
The code is available on Github at github.com/j-k-garrett/RENEW_mapping.
LSOAs were obtained for the year 2011 [1]. Postcode locations were obtained from the UK’s Ordnance Survey dataset of postcode locations (Codepoint; [2]), which is accessible through the Edina Digimap service for UK educational and research institutions. The Living England Habitat Map is a probability-based map showing the extent and distribution of broad habitats across England [3]. Estuaries and rias were obtained from the Coastal Physiographic features product from JNCC [4.5]. Boundary data for Scotland and Wales were obtained from the Ordnance Survey dataset Boundary-Line [6].
The study was funded by the Natural Environment Research Council ‘Renewing biodiversity through a people-in-nature approach (RENEW)’ project (NE/W004941/1)
References
[1] Office for National Statistics: Lower Layer Super Output Areas (December 2011) Boundaries Full Clipped (BFC) EW V3. https://geoportal.statistics.gov.uk/datasets/1f23484eafea45f98485ef816e4fee2d_0/explore; 2021.
[2] Ordnance Survey: Code-Point. August 2021. EDINA Digimap Ordnance Survey Service; 2021.
[3] Kilcoyne A, Clement M, Moore C, Picton Phillipps G, Keane R, Woodget A, Potter S, Stefaniak A, Trippier B: Living England: Technical User Guide. NERR108. http://nepubprod.appspot.com/publication/4918342350798848; 2022: 38.
[4] Joint Nature Conservation Committee: Coastal Physiographic Features - Estuaries. https://www.data.gov.uk/dataset/225fb0e1-5cfd-43fa-a6bf-c108091f3825/coastal-physiographic-features-estuaries; 2018.
[5] Joint Nature Conservation Committee: Coastal Physiographic Features - Ria https://www.data.gov.uk/dataset/71bb8571-6214-45ba-8f14-a9b8d014b90c/coastal-physiographic-features-ria; 2018.
[6] Ordnance survey: Boundary-Line, Scotland and Wales region. 23rd April 2022 edn. https://digimap.edina.ac.uk/os: EDINA Digimap Ordnance Survey Service; 2022.
Understanding Society, (UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
This release combines fourteen waves of Understanding Society data with harmonised data from all eighteen waves of the BHPS. As multi-topic studies, the purpose of Understanding Society and BHPS is to understand short- and long-term effects of social and economic change in the UK at the household and individual levels. The study has a strong emphasis on domains of family and social ties, employment, education, financial resources, and health. Understanding Society is an annual survey of each adult member of a nationally representative sample. The same individuals are re-interviewed in each wave approximately 12 months apart. When individuals move they are followed within the UK and anyone joining their households are also interviewed as long as they are living with them. The study has five sample components: the general population sample; a boost sample of ethnic minority group members; an immigrant and ethnic minority boost sample (from wave 6); participants from the BHPS; and a second general population boost sample added at this wave. In addition, there is the Understanding Society Innovation Panel (which is a separate standalone survey (see SN 6849)). The fieldwork period is for 24 months. Data collection uses computer assisted personal interviewing (CAPI) and web interviews (from wave 7), and includes a telephone mop-up. From March 2020 (the end of wave 10 and the 2nd year of wave 11), due to the coronavirus pandemic, face-to-face interviews were suspended, and the survey was conducted by web and telephone only, but otherwise has continued as before. Face-to-face interviewing was resumed from April 2022. One person completes the household questionnaire. Each person aged 16 is invited to complete the individual adult interview and self-completed questionnaire. Parents are asked questions about their children under 10 years old. Youths aged 10 to 15 are asked to respond to a self-completion questionnaire. For the general and BHPS samples biomarker, genetic and epigenetic data are also available. The biomarker data, and summary genetics and epigenetic scores, are available via UKDS (see SN 7251); detailed genetics and epigenetics data are available by application (see below). In 2020-21 an additional frequent web survey was separately issued to sample members to capture data on the rapid changes in people’s lives due to the COVID-19 pandemic (see SN 8644). Participants are asked consent to link their data to wide-ranging administrative data sets (see below).
Further information may be found on the Understanding Society Main stage webpage and links to publications based on the study can be found on the Understanding Society Latest Research webpage.
Co-funders
In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.
End User Licence, Special Licence and Secure Access versions:
There are three versions of the main Understanding Society data with different access conditions. One is available under the standard End User Licence (EUL) agreement (this study), one is a Special Licence (SL) version (SN 6931) and the third is a Secure Access version (SN 6676). The SL version contains month as well as year of birth variables, more detailed country and occupation coding for a number of variables, various income variables that have not been top-coded, and other potentially sensitive variables (see 6931_eul_vs_sl_variable_differences document available with the SL version for full details of the differences). The Secure Access version, in addition to containing all the variables in the SL version, also contains day of birth as well as Grid Reference geographical variables. Users are advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL and Secure Access versions of the data have more restrictive access conditions and prospective users of those versions should visit the catalogue entries for SN 6931 and SN 6676 respectively for further information.
Low- and Medium-level geographical identifiers are also available subject to SL access conditions; see SNs 6666, 6668-6675, 7453-4, 7629-30, 7245, 7248-9 and 9169-9170. Schools data are available subject to SL access conditions in SN 7182. Higher Education establishments for Wave 5 are available subject to SL access conditions in SN 8578. Interviewer Characteristics data, also subject to SL access conditions is available in SN 8579. In addition, a fine detail geographic dataset (SN 6676) is available under more restrictive Secure Access conditions that contains National Grid postcode grid references (at 1m resolution) for the unit postcode of each household surveyed, derived from ONS Postcode Directories (ONSPD). For details on how to make an application for Secure Access dataset, please see the SN 6676 catalogue record.
How to access genetic and/or bio-medical sample data from Understanding Society:
Information on how to access genetics and epigenetics data directly from the study team is available on the Understanding Society Accessing data webpage.
Linked administrative data
Linked Understanding Society / administrative data are available on a number of different platforms. See the Understanding Society Data linkage webpage for details of those currently available and how they can be accessed.
Latest edition information
For the 19th edition (November 2024) Wave 14 data has been added. Other minor changes and corrections have also been made to Waves 1-13. Please refer to the revisions document for full details.
m_hhresp and n_hhresp files updated, December 2024
In the previous release (19th edition, November 2024), there was an issue with household income estimates in m_hhresp and n_hhresp where a household resides in a new local authority (approx. 300 households in wave 14). The issue has been corrected and imputation models re-estimated and imputed values updated for the full sample. Imputed values will therefore change compared to the versions in the original release. The variable affected is n_ctband_dv.
Suitable data analysis software
These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata. Transfer to other formats may result in unforeseen issues. Stata SE or MP software is needed to analyse the larger files, which contain over 2,047 variables.
VITAL SIGNS INDICATOR Life Expectancy (EQ6)
FULL MEASURE NAME Life Expectancy
LAST UPDATED April 2017
DESCRIPTION Life expectancy refers to the average number of years a newborn is expected to live if mortality patterns remain the same. The measure reflects the mortality rate across a population for a point in time.
DATA SOURCE State of California, Department of Health: Death Records (1990-2013) No link
California Department of Finance: Population Estimates Annual Intercensal Population Estimates (1990-2010) Table P-2: County Population by Age (2010-2013) http://www.dof.ca.gov/Forecasting/Demographics/Estimates/
CONTACT INFORMATION vitalsigns.info@mtc.ca.gov
METHODOLOGY NOTES (across all datasets for this indicator) Life expectancy is commonly used as a measure of the health of a population. Life expectancy does not reflect how long any given individual is expected to live; rather, it is an artificial measure that captures an aspect of the mortality rates across a population. Vital Signs measures life expectancy at birth (as opposed to cohort life expectancy). A statistical model was used to estimate life expectancy for Bay Area counties and Zip codes based on current life tables which require both age and mortality data. A life table is a table which shows, for each age, the survivorship of a people from a certain population.
Current life tables were created using death records and population estimates by age. The California Department of Public Health provided death records based on the California death certificate information. Records include age at death and residential Zip code. Single-year age population estimates at the regional- and county-level comes from the California Department of Finance population estimates and projections for ages 0-100+. Population estimates for ages 100 and over are aggregated to a single age interval. Using this data, death rates in a population within age groups for a given year are computed to form unabridged life tables (as opposed to abridged life tables). To calculate life expectancy, the probability of dying between the jth and (j+1)st birthday is assumed uniform after age 1. Special consideration is taken to account for infant mortality. For the Zip code-level life expectancy calculation, it is assumed that postal Zip codes share the same boundaries as Zip Code Census Tabulation Areas (ZCTAs). More information on the relationship between Zip codes and ZCTAs can be found at https://www.census.gov/geo/reference/zctas.html. Zip code-level data uses three years of mortality data to make robust estimates due to small sample size. Year 2013 Zip code life expectancy estimates reflects death records from 2011 through 2013. 2013 is the last year with available mortality data. Death records for Zip codes with zero population (like those associated with P.O. Boxes) were assigned to the nearest Zip code with population. Zip code population for 2000 estimates comes from the Decennial Census. Zip code population for 2013 estimates are from the American Community Survey (5-Year Average). The ACS provides Zip code population by age in five-year age intervals. Single-year age population estimates were calculated by distributing population within an age interval to single-year ages using the county distribution. Counties were assigned to Zip codes based on majority land-area.
Zip codes in the Bay Area vary in population from over 10,000 residents to less than 20 residents. Traditional life expectancy estimation (like the one used for the regional- and county-level Vital Signs estimates) cannot be used because they are highly inaccurate for small populations and may result in over/underestimation of life expectancy. To avoid inaccurate estimates, Zip codes with populations of less than 5,000 were aggregated with neighboring Zip codes until the merged areas had a population of more than 5,000. In this way, the original 305 Bay Area Zip codes were reduced to 218 Zip code areas for 2013 estimates. Next, a form of Bayesian random-effects analysis was used which established a prior distribution of the probability of death at each age using the regional distribution. This prior is used to shore up the life expectancy calculations where data were sparse.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset provides detailed information on the 2019 Index of Multiple Deprivation (IMD) for Birmingham, UK. The data is available at the postcode level and includes the Lower Layer Super Output Area (LSOA) information.Data is provided at the LSOA 2011 Census geography.The decile score ranges from 1-10 with decile 1 representing the most deprived 10% of areas while decile 10 representing the least deprived 10% of areas.The IMD rank and decile score is allocated to the LSOA and all postcodes within it at the time of creation (2019).Note that some postcodes cross over LSOA boundaries. The Office for National Statistics sets boundaries for LSOAs and allocates every postcode to one LSOA only: this is the one which contains the majority of residents in that postcode area (as at 2011 Census).
The English Indices of Deprivation 2019 provide a comprehensive measure of relative deprivation across small areas in England. The indices are divided into several domains, each capturing a different aspect of deprivation:
Income Deprivation: Measures the proportion of the population experiencing deprivation due to low income, including those receiving income-related benefits. Employment Deprivation: Captures the proportion of the working-age population excluded from the labor market due to unemployment, illness, disability, or caring responsibilities. Education, Skills, and Training Deprivation: Assesses the lack of educational attainment and skills in the local population, including adult qualifications and children's educational performance. Health Deprivation and Disability: Measures the risk of premature death and the impairment of quality of life through poor physical or mental health. Crime Deprivation: Assesses the risk of personal and material victimization, including recorded crimes for violence, burglary, theft, and criminal damage. Barriers to Housing and Services: Measures the physical and financial accessibility of housing and local services, including overcrowding, homelessness, housing affordability, and distance to key services. Living Environment Deprivation: Assesses the quality of the local environment, including housing quality, air quality, and road traffic accidents.
Additionally, there are two supplementary indices:
Income Deprivation Affecting Children Index (IDACI): Focuses on children aged 0-15 living in income-deprived families. Income Deprivation Affecting Older People Index (IDAOPI): Focuses on people aged 60 and over living in income-deprived households.
These indices help identify areas with high levels of deprivation, guiding policy interventions and resource allocation to address socio-economic inequalities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘COVID - 19 Vaccination by Residence in a SVI Priority Zip Code’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/2c39dcc1-e82b-43d4-8e76-3831022cab08 on 27 January 2022.
--- Dataset description provided by original source is as follows ---
This tables shows the number and percent of people that have initiated COVID-19 vaccination and are fully vaccinated grouped by whether they live in an SVI Priority Zip Code.
All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected.
A person who has received at least one dose of any vaccine is considered to have initiated vaccination. A person is considered fully vaccinated if they have completed a primary series by receiving 2 doses of the Pfizer or Moderna vaccines or 1 dose of the Johnson & Johnson vaccine. The fully vaccinated are a subset of the number who have received at least one dose.
SVI refers to the CDC's Social Vulnerability Index - a measure that combines 15 demographic variables to identify communities most vulnerable to negative health impacts from disasters and public health crises. Measures of social vulnerability include socioeconomic status, household composition, disability, race, ethnicity, language, and transportation limitations - among others. SVI scores were calculated for each zip code in CT. The zip codes in the top 20% were designated as SVI Priority Zip Codes. Percentages are based on 2018 zip code population data supplied by ESRI corporation.
People with an out-of-state zip code are excluded from this analysis. This table does not included doses administered to CT residents by out-of-state providers or by some Federal entities (including Department of Defense, Department of Correction, Department of Veteran’s Affairs, Indian Health Service) because they are not yet reported to CT WiZ (the CT immunization Information System). It is expected that these data will be added in the future.
Note: As part of continuous data quality improvement efforts, duplicate records were removed from the COVID-19 vaccination data during the weeks of 4/19/2021 and 4/26/2021.
--- Original source retains full ownership of the source dataset ---
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The indicator reports the number of credits in progress during the year (without taking into account the year of signature of the contract) to the population aged 18 and over. All loans are registered with the National Bank (including credit openings of less than 1250 euros and repayable within 3 months, which mainly concern overdraft possibilities on bank account). Having credit is therefore not necessarily an indicator of "over-indebtedness risk". At the end of 2013, only 7.3% of Walloons with outstanding credits are in default of payment for credit. Note: the data at contract level are disseminated by postal code on the website of the personal credit centre. They have been aggregated at the municipal level by IWEPS. It is possible that this aggregation leads to some double counting. When a credit is taken out by several people who do not live in the same postal code, the data are included in the file for each of the postal codes concerned. If two contractors live in the same municipality but not in the same postal code, there will be duplication in the information related to the credit (amount, number, ...). These cases are probably rare because loans to several borrowers most often concern people domiciled at the same address. See also: - the website of the National Bank of Belgium (NBB), "\2".
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This is a MD iMAP hosted service. Find more information at http://imap.maryland.gov. The units of geography used for the 2010 Census maps displayed here are the Census tracts. Census tracts generally have a population size between 1 - 200 and 8 - 000 people - with an optimum size of 4 - 000 people. When first delineated - census tracts were designed to be homogeneous with respect to population characteristics - economic status - and living conditions. Census tract boundaries generally follow visible and identifiable features. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances - a census tract may consist of noncontiguous areas. The data collected on the short form survey are general demographic characteristics such as age - race - ethnicity - household relationship - housing vacancy and tenure (owner/renter).Feature Service Link:https://mdgeodata.md.gov/imap/rest/services/Demographics/MD_CensusData/FeatureServer ADDITIONAL LICENSE TERMS: The Spatial Data and the information therein (collectively "the Data") is provided "as is" without warranty of any kind either expressed implied or statutory. The user assumes the entire risk as to quality and performance of the Data. No guarantee of accuracy is granted nor is any responsibility for reliance thereon assumed. In no event shall the State of Maryland be liable for direct indirect incidental consequential or special damages of any kind. The State of Maryland does not accept liability for any damages or misrepresentation caused by inaccuracies in the Data or as a result to changes to the Data nor is there responsibility assumed to maintain the Data in any manner or form. The Data can be freely distributed as long as the metadata entry is not modified or deleted. Any data derived from the Data must acknowledge the State of Maryland in the metadata.
These statistics update the English indices of deprivation 2015.
The English indices of deprivation measure relative deprivation in small areas in England called lower-layer super output areas. The index of multiple deprivation is the most widely used of these indices.
The statistical release and FAQ document (above) explain how the Indices of Deprivation 2019 (IoD2019) and the Index of Multiple Deprivation (IMD2019) can be used and expand on the headline points in the infographic. Both documents also help users navigate the various data files and guidance documents available.
The first data file contains the IMD2019 ranks and deciles and is usually sufficient for the purposes of most users.
Mapping resources and links to the IoD2019 explorer and Open Data Communities platform can be found on our IoD2019 mapping resource page.
Further detail is available in the research report, which gives detailed guidance on how to interpret the data and presents some further findings, and the technical report, which describes the methodology and quality assurance processes underpinning the indices.
We have also published supplementary outputs covering England and Wales.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Live births and stillbirths annual summary statistics, by sex, age of mother, whether within marriage or civil partnership, percentage of non-UK-born mothers, birth rates and births by month and mothers' area of usual residence.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Data set is for private consumption for the competition.
According to IBEF “Domestic automobiles production increased at 2.36% CAGR between FY16-20 with 26.36 million vehicles being manufactured in the country in FY20.Overall, domestic automobiles sales increased at 1.29% CAGR between FY16-FY20 with 21.55 million vehicles being sold in FY20”.The rise in vehicles on the road will also lead to multiple challenges and the road will be more vulnerable to accidents.Increased accident rates also leads to more insurance claims and payouts rise for insurance companies.
In order to pre-emptively plan for the losses, the insurance firms leverage accident data to understand the risk across the geographical units e.g. Postal code/district etc.
In this challenge, we are providing you the dataset to predict the “Accident_Risk_Index” against the postcodes.Accident_Risk_Index (mean casualties at a postcode) = sum(Number_of_casualities)/count(Accident_ID)
Working example:
Train Data (given)
Accident_ID Postcode Number_of_casualities
1 AL1 1JJ 2
2 AL1 1JP 3
3 AL1 3PS 2
4 AL1 3PS 1
5 AL1 3PS 1
Modelling Train Data (Rolled up at Postcode level)
Postcode Derived_feature1 Derived_feature2 Accident_risk_Index
AL1 1JJ _ _ 2
AL1 1JP _ _ 3
AL1 3PS _ _ 1.33
The participants are required to predict the 'Accident_risk_index' for the test.csv and against the postcode on the test data.
Then submit your 'my_submission_file.csv' on the submission tab of the hackathon page.
Pro-tip: The participants are required to perform feature engineering to first roll-up the train data at postcode level and create a column as “accident_risk_index” and optimize the model against postcode level.
Few Hypothesis to help you think: "More accidents happen in the later part of the day as those are office hours causing congestion"
"Postal codes with more single carriage roads have more accidents"
(***In the above hypothesis features such as office_hours_flag and #single _carriage roads can be formed)
Additionally, we are providing you with road network data (contains info on the nearest road to a postcode and it's characteristics) and population data (contains info about population at area level). This info are for augmentation of features, but not mandatory to use.
The provided dataset contains the following files:
train.csv & test.csv:
'Accident_ID', 'Police_Force', 'Number_of_Vehicles', 'Number_of_Casualties', 'Date', 'Day_of_Week', 'Time', ‘Local_Authority_(District)', 'Local_Authority_(Highway)', '1st_Road_Class', '1st_Road_Number', 'Road_Type', 'Speed_limit', '2nd_Road_Class', '2nd_Road_Number', 'Pedestrian_Crossing-Human_Control', 'Pedestrian_Crossing-Physical_Facilities', 'Light_Conditions', ‘'Weather_Conditions', 'Road_Surface_Conditions', 'Special_Conditions_at_Site', 'Carriageway_Hazards', 'Urban_or_Rural_Area', 'Did_Police_Officer_Attend_Scene_of_Accident', 'state', 'postcode', 'country'
population.csv:
'postcode', 'Rural Urban', 'Variable: All usual residents; measures: Value', 'Variable: Males; measures: Value', 'Variable: Females; measures: Value', ‘Variable: Lives in a household; measures: Value', ‘Variable: Lives in a communal establishment; measures: Value', 'Variable: Schoolchild or full-time student aged 4 and over at their non term-time address; measures: Value', 'Variable: Area (Hectares); measures: Value', 'Variable: Density (number of persons per hectare); measures: Value'
roads_network.csv:
'WKT', 'roadClassi', ‘roadFuncti', 'formOfWay', 'length', 'primaryRou', 'distance to the nearest point on rd', 'postcode’
Overview Swiss Re is one of the largest reinsurers in the world headquartered in Zurich with offices in over 25 countries. Swiss Re’s core expertise is in underwriting in life, health, as well as the property and casualty insurance space whereas its tech strategy focuses on developing smarter and innovative solutions for clients’ value chains by leveraging data and technology.
The company’s vision is to make the world more resilient. Swiss Re believes in applying fresh perspectives, knowledge and capital to anticipate and manage risk to create smarter solutions and help the world rebuild, renew and move forward.About 1300 professionals that work in the Swiss Re Global Business Solutions Center (BSC), Bangalore combine experience, expertise and out-of-the-box thinking to bring Swiss Re's core business to life by creating new business opportunities.