Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe accurate measurement of educational attainment is of great importance for population research. Past studies measuring average years of schooling rely on strong assumptions to incorporate binned data. These assumptions, which we refer to as the standard duration method, have not been previously evaluated for bias or accuracy.MethodsWe assembled a database of 1,680 survey and census datasets, representing both binned and single-year education data. We developed two models that split bins of education into single year values. We evaluate our models, and compare them to the standard duration method, using out-of-sample predictive validity.ResultsOur results indicate that typical methods used to split bins of educational attainment introduce substantial error and bias into estimates of average years of schooling, as compared to new approaches. Globally, the standard duration method underestimates average years of schooling, with a median error of -0.47 years. This effect is especially pronounced in datasets with a smaller number of bins or higher true average attainment, leading to irregular error patterns between geographies and time periods. Both models we developed resulted in unbiased predictions of average years of schooling, with smaller average error than previous methods. We find that one approach using a metric of distance in space and time to identify training data, had the best performance, with a root mean squared error of mean attainment of 0.26 years, compared to 0.92 years for the standard duration algorithm.ConclusionsEducation is a key social indicator and its accurate estimation should be a population research priority. The use of a space-time distance bin-splitting model drastically improved the estimation of average years of schooling from binned education data. We provide a detailed description of how to use the method and recommend that future studies estimating educational attainment across time or geographies use a similar approach.
What does the data show?
The dataset is derived from projections of seasonal mean wind speeds from UKCP18 which are averaged to produce values for the 1981-2000 baseline and two warming levels: 2.0°C and 4.0°C above the pre-industrial (1850-1900) period. All wind speeds have units of metres per second (m / s). These data enable users to compare future seasonal mean wind speeds to those of the baseline period.
What is a warming level and why are they used?
The wind speeds were calculated from the UKCP18 local climate projections which used a high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g., decades) for this scenario, the dataset is calculated at two levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), so this dataset allows for the exploration of greater levels of warming.
The global warming levels available in this dataset are 2°C and 4°C in line with recommendations in the third UK Climate Risk Assessment. The data at each warming level were calculated using 20 year periods over which the average warming was equal to 2°C and 4°C. The exact time period will be different for different model ensemble members. To calculate the seasonal mean wind speeds, an average is taken across the 20 year period. Therefore, the seasonal wind speeds represent those for a given level of warming.
We cannot provide a precise likelihood for particular emission scenarios being followed in the real world in the future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected under current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate; the warming level reached will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.
What are the naming conventions and how do I explore the data?
The columns (fields) correspond to each global warming level and two baselines. They are named 'windspeed' (Wind Speed), the season, warming level or baseline, and ‘upper’ ‘median’ or ‘lower’ as per the description below. For example, ‘windspeed winter 2.0 median’ is the median winter wind speed for the 2°C projection. Decimal points are included in field aliases but not field names; e.g., ‘windspeed winter 2.0 median’ is ‘ws_winter_20_median’.
To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578
What do the ‘median’, ‘upper’, and ‘lower’ values mean?
Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.
For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, seasonal mean wind speeds were calculated for each ensemble member and then ranked in order from lowest to highest for each location.
The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.
This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.
‘Lower’, ‘median’ and ‘upper’ are also given for the baseline periods as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past.
Data source
The seasonal mean wind speeds were calculated from daily values of wind speeds generated from the UKCP Local climate projections; they are one of the standard UKCP18 products. These projections were created with a 2.2km convection-permitting climate model. To aid comparison with other models and UK-based datasets, the UKCP Local model data were aggregated to a 5km grid on the British National grid; the 5km data were processed to generate the seasonal mean wind speeds.
Useful links
Further information on the UK Climate Projections (UKCP). Further information on understanding climate data within the Met Office Climate Data Portal.
[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 0.13°C.]What does the data show? This dataset shows the change in annual temperature for a range of global warming levels, including the recent past (2001-2020), compared to the 1981-2000 baseline period. Note, as the values in this dataset are averaged over a year they do not represent possible extreme conditions.The dataset uses projections of daily average air temperature from UKCP18 which are averaged to give values for the 1981-2000 baseline, the recent past (2001-2020) and global warming levels. The warming levels available are 1.5°C, 2.0°C, 2.5°C, 3.0°C and 4.0°C above the pre-industrial (1850-1900) period. The recent past value and global warming level values are stated as a change (in °C) relative to the 1981-2000 value. This enables users to compare annual average temperature trends for the different periods. In addition to the change values, values for the 1981-2000 baseline (corresponding to 0.51°C warming) and recent past (2001-2020, corresponding to 0.87°C warming) are also provided. This is summarised in the table below.
PeriodDescription 1981-2000 baselineAverage temperature (°C) for the period 2001-2020 (recent past)Average temperature (°C) for the period 2001-2020 (recent past) changeTemperature change (°C) relative to 1981-2000 1.5°C global warming level changeTemperature change (°C) relative to 1981-2000 2°C global warming level changeTemperature change (°C) relative to 1981-20002.5°C global warming level changeTemperature change (°C) relative to 1981-2000 3°C global warming level changeTemperature change (°C) relative to 1981-2000 4°C global warming level changeTemperature change (°C) relative to 1981-2000What is a global warming level?The Annual Average Temperature Change is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Average Temperature Change, an average is taken across the 21 year period.We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for the 1981-2000 baseline, 2001-2020 period and each warming level. They are named 'tas annual change' (change in air 'temperature at surface'), the warming level or historic time period, and 'upper' 'median' or 'lower' as per the description below. e.g. 'tas annual change 2.0 median' is the median value for the 2.0°C warming level. Decimal points are included in field aliases but not in field names, e.g. 'tas annual change 2.0 median' is named 'tas_annual_change_20_median'. To understand how to explore the data, refer to the New Users ESRI Storymap. Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘tas annual change 2.0°C median’ values.What do the 'median', 'upper', and 'lower' values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Annual Average Temperature Change was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location.The ‘lower’ fields are the second lowest ranked ensemble member. The ‘higher’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and higher fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline period as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksFor further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual descriptive price statistics for each calendar year 2005 – 2023 for 11 Local Government Districts in Northern Ireland. The statistics include: • Minimum sale price • Lower quartile sale price • Median sale price • Simple Mean sale price • Upper Quartile sale price • Maximum sale price • Number of verified sales Prices are available where at least 30 sales were recorded in the area within the calendar year which could be included in the regression model i.e. the following sales are excluded: • Non Arms-Length sales • sales of properties where the habitable space are less than 30m2 or greater than 1000m2 • sales less than £20,000. Annual median or simple mean prices should not be used to calculate the property price change over time. The quality (where quality refers to the combination of all characteristics of a residential property, both physical and locational) of the properties that are sold may differ from one time period to another. For example, sales in one quarter could be disproportionately skewed towards low-quality properties, therefore producing a biased estimate of average price. The median and simple mean prices are not ‘standardised’ and so the varying mix of properties sold in each quarter could give a false impression of the actual change in prices. In order to calculate the pure property price change over time it is necessary to compare like with like, and this can only be achieved if the ‘characteristics-mix’ of properties traded is standardised. To calculate pure property change over time please use the standardised prices in the NI House Price Index Detailed Statistics file.
[update 28/03/24 - This description previously stated that the the field “2001-2020 (recent past) change” was a percentage change. This field is actually the difference, in units of mm/day. The table below has been updated to reflect this.][Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell but for the fixed periods which are expressed in mm, the average difference between the 'lower' values before and after this update is 0.04mm. For the fixed periods and global warming levels which are expressed as percentage changes, the average difference between the 'lower' values before and after this update is 4.65%.]What does the data show?
This dataset shows the change in summer precipitation rate for a range of global warming levels, including the recent past (2001-2020), compared to the 1981-2000 baseline period. Here, summer is defined as June-July-August. Note, as the values in this dataset are averaged over a season they do not represent possible extreme conditions.
The dataset uses projections of daily precipitation from UKCP18 which are averaged over the summer period to give values for the 1981-2000 baseline, the recent past (2001-2020) and global warming levels. The warming levels available are 1.5°C, 2.0°C, 2.5°C, 3.0°C and 4.0°C above the pre-industrial (1850-1900) period. The recent past value and global warming level values are stated as a percentage change (%) relative to the 1981-2000 value. This enables users to compare summer precipitation trends for the different periods. In addition to the change values, values for the 1981-2000 baseline (corresponding to 0.51°C warming) and recent past (2001-2020, corresponding to 0.87°C warming) are also provided. This is summarised in the table below.
Period
Description
1981-2000 baseline
Average value for the period (mm/day)
2001-2020 (recent past)
Average value for the period (mm/day)
2001-2020 (recent past) change
Change (mm/day) relative to 1981-2000
1.5°C global warming level change
Percentage change (%) relative to 1981-2000
2°C global warming level change
Percentage change (%) relative to 1981-2000
2.5°C global warming level change
Percentage change (%) relative to 1981-2000
3°C global warming level change
Percentage change (%) relative to 1981-2000
4°C global warming level change
Percentage change (%) relative to 1981-2000
What is a global warming level?
The Summer Precipitation Change is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming.
The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Summer Precipitation Change, an average is taken across the 21 year period.
We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.
What are the naming conventions and how do I explore the data?
These data contain a field for each warming level and the 1981-2000 baseline. They are named 'pr summer change', the warming level or baseline, and 'upper' 'median' or 'lower' as per the description below. e.g. 'pr summer change 2.0 median' is the median value for summer for the 2.0°C warming level. Decimal points are included in field aliases but not in field names, e.g. 'pr summer change 2.0 median' is named 'pr_summer_change_20_median'.
To understand how to explore the data, refer to the New Users ESRI Storymap.
Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘pr summer change 2.0°C median’ values.
What do the 'median', 'upper', and 'lower' values mean?
Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.
For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Summer Precipitation Change was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location.
The ‘lower’ fields are the second lowest ranked ensemble member.
The ‘higher’ fields are the second highest ranked ensemble member.
The ‘median’ field is the central value of the ensemble.
This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and higher fields, the greater the uncertainty.
‘Lower’, ‘median’ and ‘upper’ are also given for the baseline period as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past.
Useful links
For further information on the UK Climate Projections (UKCP).
Further information on understanding climate data within the Met Office Climate Data Portal.
[update 28/03/24 - This description previously stated that the the field “2001-2020 (recent past) change” was a percentage change. This field is actually the difference, in units of mm/day. The table below has been updated to reflect this.][Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell but for the fixed periods which are expressed in mm, the average difference between the 'lower' values before and after this update is 0.04mm. For the fixed periods and global warming levels which are expressed as percentage changes, the average difference between the 'lower' values before and after this update is 3.2%.]What does the data show?
This dataset shows the change in winter precipitation rate for a range of global warming levels, including the recent past (2001-2020), compared to the 1981-2000 baseline period. Here, winter is defined as December-January-February. Note, as the values in this dataset are averaged over a season they do not represent possible extreme conditions.
The dataset uses projections of daily precipitation from UKCP18 which are averaged over the winter period to give values for the 1981-2000 baseline, the recent past (2001-2020) and global warming levels. The warming levels available are 1.5°C, 2.0°C, 2.5°C, 3.0°C and 4.0°C above the pre-industrial (1850-1900) period. The recent past value and global warming level values are stated as a percentage change (%) relative to the 1981-2000 value. This enables users to compare winter precipitation trends for the different periods. In addition to the change values, values for the 1981-2000 baseline (corresponding to 0.51°C warming) and recent past (2001-2020, corresponding to 0.87°C warming) are also provided. This is summarised in the table below.
Period
Description
1981-2000 baseline
Average value for the period (mm/day)
2001-2020 (recent past)
Average value for the period (mm/day)
2001-2020 (recent past) change
Change (mm/day) relative to 1981-2000
1.5°C global warming level change
Percentage change (%) relative to 1981-2000
2°C global warming level change
Percentage change (%) relative to 1981-2000
2.5°C global warming level change
Percentage change (%) relative to 1981-2000
3°C global warming level change
Percentage change (%) relative to 1981-2000
4°C global warming level change
Percentage change (%) relative to 1981-2000
What is a global warming level?
The Winter Precipitation Change is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming.
The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Winter Precipitation Change, an average is taken across the 21 year period.
We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.
What are the naming conventions and how do I explore the data?
These data contain a field for each warming level and the 1981-2000 baseline. They are named 'pr winter change', the warming level or baseline, and 'upper' 'median' or 'lower' as per the description below. e.g. 'pr winter change 2.0 median' is the median value for summer for the 2.0°C warming level. Decimal points are included in field aliases but not in field names, e.g. 'pr winter change 2.0 median' is named 'pr_winter_change_20_median'.
To understand how to explore the data, refer to the New Users ESRI Storymap.
Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘pr winter change 2.0°C median’ values.
What do the 'median', 'upper', and 'lower' values mean?
Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.
For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Summer Precipitation Change was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location.
The ‘lower’ fields are the second lowest ranked ensemble member.
The ‘higher’ fields are the second highest ranked ensemble member.
The ‘median’ field is the central value of the ensemble.
This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and higher fields, the greater the uncertainty.
‘Lower’, ‘median’ and ‘upper’ are also given for the baseline period as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past.
Useful links
For further information on the UK Climate Projections (UKCP).
Further information on understanding climate data within the Met Office Climate Data Portal.
[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 0.0.]What does the data show? The Annual Count of Extreme Summer Days is the number of days per year where the maximum daily temperature is above 35°C. It measures how many times the threshold is exceeded (not by how much) in a year. Note, the term ‘extreme summer days’ is used to refer to the threshold and temperatures above 35°C outside the summer months also contribute to the annual count. The results should be interpreted as an approximation of the projected number of days when the threshold is exceeded as there will be many factors such as natural variability and local scale processes that the climate model is unable to represent.The Annual Count of Extreme Summer Days is calculated for two baseline (historical) periods 1981-2000 (corresponding to 0.51°C warming) and 2001-2020 (corresponding to 0.87°C warming) and for global warming levels of 1.5°C, 2.0°C, 2.5°C, 3.0°C, 4.0°C above the pre-industrial (1850-1900) period. This enables users to compare the future number of extreme summer days to previous values.What are the possible societal impacts?The Annual Count of Extreme Summer Days indicates increased health risks, transport disruption and damage to infrastructure from high temperatures. It is based on exceeding a maximum daily temperature of 35°C. Impacts include:Increased heat related illnesses, hospital admissions or death affecting not just the vulnerable. Transport disruption due to overheating of road and railway infrastructure.Other metrics such as the Annual Count of Summer Days (days above 25°C), Annual Count of Hot Summer Days (days above 30°C) and the Annual Count of Tropical Nights (where the minimum temperature does not fall below 20°C) also indicate impacts from high temperatures, however they use different temperature thresholds.What is a global warming level?The Annual Count of Extreme Summer Days is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Count of Extreme Summer Days, an average is taken across the 21 year period. Therefore, the Annual Count of Extreme Summer Days show the number of extreme summer days that could occur each year, for each given level of warming. We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for each global warming level and two baselines. They are named ‘ESD’ (where ESD means Extreme Summer Days, the warming level or baseline, and ‘upper’ ‘median’ or ‘lower’ as per the description below. E.g. ‘Extreme Summer Days 2.5 median’ is the median value for the 2.5°C warming level. Decimal points are included in field aliases but not field names e.g. ‘Extreme Summer Days 2.5 median’ is ‘ExtremeSummerDays_25_median’. To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘ESD 2.0°C median’ values.What do the ‘median’, ‘upper’, and ‘lower’ values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future. For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Annual Count of Extreme Summer Days was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location. The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline periods as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksThis dataset was calculated following the methodology in the ‘Future Changes to high impact weather in the UK’ report and uses the same temperature thresholds as the 'State of the UK Climate' report.Further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
What does the data show?
Wind-driven rain refers to falling rain blown by a horizontal wind so that it falls diagonally towards the ground and can strike a wall. The annual index of wind-driven rain is the sum of all wind-driven rain spells for a given wall orientation and time period. It’s measured as the volume of rain blown from a given direction in the absence of any obstructions, with the unit litres per square metre per year.
Wind-driven rain is calculated from hourly weather and climate data using an industry-standard formula from ISO 15927–3:2009, which is based on the product of wind speed and rainfall totals. Wind-driven rain is only calculated if the wind would strike a given wall orientation. A wind-driven rain spell is defined as a wet period separated by at least 96 hours with little or no rain (below a threshold of 0.001 litres per m2 per hour).
The annual index of wind-driven rain is calculated for a baseline (historical) period of 1981-2000 (corresponding to 0.61°C warming) and for global warming levels of 2.0°C and 4.0°C above the pre-industrial period (defined as 1850-1900). The warming between the pre-industrial period and baseline is the average value from six datasets of global mean temperatures available on the Met Office Climate Dashboard: https://climate.metoffice.cloud/dashboard.html. Users can compare the magnitudes of future wind-driven rain with the baseline values.
What is a warming level and why are they used?
The annual index of wind-driven rain is calculated from the UKCP18 local climate projections which used a high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g., decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), so this dataset allows for the exploration of greater levels of warming.
The global warming levels available in this dataset are 2°C and 4°C in line with recommendations in the third UK Climate Risk Assessment. The data at each warming level were calculated using 20 year periods over which the average warming was equal to 2°C and 4°C. The exact time period will be different for different model ensemble members. To calculate the value for the annual wind-driven rain index, an average is taken across the 20 year period. Therefore, the annual wind-driven rain index provides an estimate of the total wind-driven rain that could occur in each year, for a given level of warming.
We cannot provide a precise likelihood for particular emission scenarios being followed in the real world in the future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected under current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate; the warming level reached will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.
What are the naming conventions and how do I explore the data?
Each row in the data corresponds to one of eight wall orientations – 0, 45, 90, 135, 180, 225, 270, 315 compass degrees. This can be viewed and filtered by the field ‘Wall orientation’.
The columns (fields) correspond to each global warming level and two baselines. They are named 'WDR' (Wind-Driven Rain), the warming level or baseline, and ‘upper’ ‘median’ or ‘lower’ as per the description below. For example, ‘WDR 2.0 median’ is the median value for the 2°C projection. Decimal points are included in field aliases but not field names; e.g., ‘WDR 2.0 median’ is ‘WDR_20_median’.
Please note that this data MUST be filtered with the ‘Wall orientation’ field before styling it by warming level. Otherwise it will not show the data you expect to see on the map. This is because there are several overlapping polygons at each location, for each different wall orientation.
To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578
What do the ‘median’, ‘upper’, and ‘lower’ values mean?
Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.
For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, annual wind-driven rain indices were calculated for each ensemble member and they were then ranked in order from lowest to highest for each location.
The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.
This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.
‘Lower’, ‘median’ and ‘upper’ are also given for the baseline periods as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past.
Data source
The annual wind-driven rain index was calculated from hourly values of rainfall, wind speed and wind direction generated from the UKCP Local climate projections. These projections were created with a 2.2km convection-permitting climate model. To aid comparison with other models and UK-based datasets, the UKCP Local model data were aggregated to a 5km grid on the British National Grid; the 5 km data were processed to generate the wind-driven rain data.
Useful links
Further information on the UK Climate Projections (UKCP). Further information on understanding climate data within the Met Office Climate Data Portal.
[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average percentage change between the 'lower' values before and after this update is -1%.]What does the data show? A Growing Degree Day (GDD) is a day in which the average temperature is above 5.5°C. It is the number of degrees above this threshold that counts as a Growing Degree Day. For example if the average temperature for a specific day is 6°C, this would contribute 0.5 Growing Degree Days to the annual sum, alternatively an average temperature of 10.5°C would contribute 5 Growing Degree Days. Given the data shows the annual sum of Growing Degree Days, this value can be above 365 in some parts of the UK.Annual Growing Degree Days are calculated for two baseline (historical) periods 1981-2000 (corresponding to 0.51°C warming) and 2001-2020 (corresponding to 0.87°C warming) and for global warming levels of 1.5°C, 2.0°C, 2.5°C, 3.0°C, 4.0°C above the pre-industrial (1850-1900) period. This enables users to compare the future number of GDD to previous values. What are the possible societal impacts?Annual Growing Degree Days indicate if conditions are suitable for plant growth. An increase in GDD can indicate larger crop yields due to increased crop growth from warm temperatures, but crop growth also depends on other factors. For example, GDD do not include any measure of rainfall/drought, sunlight, day length or wind, species vulnerability, or plant dieback in extremely high temperatures. GDD can indicate increased crop growth until temperatures reach a critical level above which there are detrimental impacts on plant physiology.GDD does not estimate the growth of specific species and is not a measure of season length.What is a global warming level?Annual Growing Degree Days are calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Growing Degree Days, an average is taken across the 21 year period. Therefore, the Annual Growing Degree Days show the number of growing degree days that could occur each year, for each given level of warming. We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for each global warming level and two baselines. They are named 'GDD' (Growing Degree Days), the warming level or baseline, and ‘upper’ ‘median’ or ‘lower’ as per the description below. E.g. ‘GDD 2.5 median’ is the median value for the 2.5°C projection. Decimal points are included in field aliases but not field names e.g. ‘GDD 2.5 median’ is ‘GDD_25_median’. To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘GDD 2.0°C median’ values.What do the ‘median’, ‘upper’, and ‘lower’ values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future. For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, Annual Growing Degree Days were calculated for each ensemble member and they were then ranked in order from lowest to highest for each location. The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline periods as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksThis dataset was calculated following the methodology in the ‘Future Changes to high impact weather in the UK’ report and uses the same temperature thresholds as the 'State of the UK Climate' report.Further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This is the unadjusted median house priced for residential property sales (transactions) in the area for a 12 month period with April in the middle (year-ending September). These figures have been produced by the ONS (Office for National Statistics) using the Land Registry (LR) Price Paid data on residential dwelling transactions.
The LR Price Paid data are comprehensive in that they capture changes of ownership for individual residential properties which have sold for full market value and covers both cash sales and those involving a mortgage.
The median is the value determined by putting all the house sales for a given year, area and type in order of price and then selecting the price of the house sale which falls in the middle. The median is less susceptible to distortion by the presence of extreme values than is the mean. It is the most appropriate average to use because it best takes account of the skewed distribution of house prices.
Note that a transaction occurs when a change of freeholder or leaseholder takes place regardless of the amount of money involved and a property can transact more than once in the time period.
The LR records the actual price for which the property changed hands. This will usually be an accurate reflection of the market value for the individual property, but it is not always the case. In order to generate statistics that more accurately reflect market values, the LR has excluded records of houses that were not sold at market value from the dataset. The remaining data are considered a good reflection of market values at the time of the transaction. For full details of exclusions and more information on the methodology used to produce these statistics please see http://www.ons.gov.uk/peoplepopulationandcommunity/housing/qmis/housepricestatisticsforsmallareasqmi
The LR Price Paid data are not adjusted to reflect the mix of houses in a given area. Fluctuations in the types of house that are sold in that area can cause differences between the median transactional value of houses and the overall market value of houses. Therefore these statistics differ to the new UK House Price Index (HPI) which reports mix-adjusted average house prices and house price indices.
If, for a given year, for house type and area there were fewer than 5 sales records in the LR Price Paid data, the house price statistics are not reported. Data is Powered by LG Inform Plus and automatically checked for new data on the 3rd of each month.
https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/
Dataset contains counts and measures for individuals from the 2013, 2018, and 2023 Censuses. Data is available by statistical area 1.
The variables included in this dataset are for the census usually resident population count (unless otherwise stated). All data is for level 1 of the classification.
The variables for part 2 of the dataset are:
Download lookup file for part 2 from Stats NZ ArcGIS Online or embedded attachment in Stats NZ geographic data service. Download data table (excluding the geometry column for CSV files) using the instructions in the Koordinates help guide.
Footnotes
Te Whata
Under the Mana Ōrite Relationship Agreement, Te Kāhui Raraunga (TKR) will be publishing Māori descent and iwi affiliation data from the 2023 Census in partnership with Stats NZ. This will be available on Te Whata, a TKR platform.
Geographical boundaries
Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.
Subnational census usually resident population
The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city.
Population counts
Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts.
Caution using time series
Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data), while the 2013 Census used a full-field enumeration methodology (with no use of administrative data).
Study participation time series
In the 2013 Census study participation was only collected for the census usually resident population count aged 15 years and over.
About the 2023 Census dataset
For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.
Data quality
The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.
Concept descriptions and quality ratings
Data quality ratings for 2023 Census variables has additional details about variables found within totals by topic, for example, definitions and data quality.
Disability indicator
This data should not be used as an official measure of disability prevalence. Disability prevalence estimates are only available from the 2023 Household Disability Survey. Household Disability Survey 2023: Final content has more information about the survey.
Activity limitations are measured using the Washington Group Short Set (WGSS). The WGSS asks about six basic activities that a person might have difficulty with: seeing, hearing, walking or climbing stairs, remembering or concentrating, washing all over or dressing, and communicating. A person was classified as disabled in the 2023 Census if there was at least one of these activities that they had a lot of difficulty with or could not do at all.
Using data for good
Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.
Confidentiality
The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.
Measures
Measures like averages, medians, and other quantiles are calculated from unrounded counts, with input noise added to or subtracted from each contributing value
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Effective population size (Ne) is a particularly useful metric for conservation as it affects genetic drift, inbreeding and adaptive potential within populations. Current guidelines recommend a minimum Ne of 50 and 500 to avoid short-term inbreeding and to preserve long-term adaptive potential, respectively. However, the extent to which wild populations reach these thresholds globally has not been investigated, nor has the relationship between Ne and human activities. Through a quantitative review, we generated a dataset with 4610 georeferenced Ne estimates from 3829 unique populations, extracted from 723 articles. These data show that certain taxonomic groups are less likely to meet 50/500 thresholds and are disproportionately impacted by human activities; plant, mammal, and amphibian populations had a <54% probability of reaching = 50 and a <9% probability of reaching = 500. Populations listed as being of conservation concern according to the IUCN Red List had a smaller median than unlisted populations, and this was consistent across all taxonomic groups. was reduced in areas with a greater Global Human Footprint, especially for amphibians, birds, and mammals, however relationships varied between taxa. We also highlight several considerations for future works, including the role that gene flow and subpopulation structure plays in the estimation of in wild populations, and the need for finer-scale taxonomic analyses. Our findings provide guidance for more specific thresholds based on Ne and help prioritize assessment of populations from taxa most at risk of failing to meet conservation thresholds. Methods Literature search, screening, and data extraction A primary literature search was conducted using ISI Web of Science Core Collection and any articles that referenced two popular single-sample Ne estimation software packages: LDNe (Waples & Do, 2008), and NeEstimator v2 (Do et al., 2014). The initial search included 4513 articles published up to the search date of May 26, 2020. Articles were screened for relevance in two steps, first based on title and abstract, and then based on the full text. For each step, a consistency check was performed using 100 articles to ensure they were screened consistently between reviewers (n = 6). We required a kappa score (Collaboration for Environmental Evidence, 2020) of ³ 0.6 in order to proceed with screening of the remaining articles. Articles were screened based on three criteria: (1) Is an estimate of Ne or Nb reported; (2) for a wild animal or plant population; (3) using a single-sample genetic estimation method. Further details on the literature search and article screening are found in the Supplementary Material (Fig. S1). We extracted data from all studies retained after both screening steps (title and abstract; full text). Each line of data entered in the database represents a single estimate from a population. Some populations had multiple estimates over several years, or from different estimation methods (see Table S1), and each of these was entered on a unique row in the database. Data on N̂e, N̂b, or N̂c were extracted from tables and figures using WebPlotDigitizer software version 4.3 (Rohatgi, 2020). A full list of data extracted is found in Table S2. Data Filtering After the initial data collation, correction, and organization, there was a total of 8971 Ne estimates (Fig. S1). We used regression analyses to compare Ne estimates on the same populations, using different estimation methods (LD, Sibship, and Bayesian), and found that the R2 values were very low (R2 values of <0.1; Fig. S2 and Fig. S3). Given this inconsistency, and the fact that LD is the most frequently used method in the literature (74% of our database), we proceeded with only using the LD estimates for our analyses. We further filtered the data to remove estimates where no sample size was reported or no bias correction (Waples, 2006) was applied (see Fig. S6 for more details). Ne is sometimes estimated to be infinity or negative within a population, which may reflect that a population is very large (i.e., where the drift signal-to-noise ratio is very low), and/or that there is low precision with the data due to small sample size or limited genetic marker resolution (Gilbert & Whitlock, 2015; Waples & Do, 2008; Waples & Do, 2010) We retained infinite and negative estimates only if they reported a positive lower confidence interval (LCI), and we used the LCI in place of a point estimate of Ne or Nb. We chose to use the LCI as a conservative proxy for in cases where a point estimate could not be generated, given its relevance for conservation (Fraser et al., 2007; Hare et al., 2011; Waples & Do 2008; Waples 2023). We also compared results using the LCI to a dataset where infinite or negative values were all assumed to reflect very large populations and replaced the estimate with an arbitrary large value of 9,999 (for reference in the LCI dataset only 51 estimates, or 0.9%, had an or > 9999). Using this 9999 dataset, we found that the main conclusions from the analyses remained the same as when using the LCI dataset, with the exception of the HFI analysis (see discussion in supplementary material; Table S3, Table S4 Fig. S4, S5). We also note that point estimates with an upper confidence interval of infinity (n = 1358) were larger on average (mean = 1380.82, compared to 689.44 and 571.64, for estimates with no CIs or with an upper boundary, respectively). Nevertheless, we chose to retain point estimates with an upper confidence interval of infinity because accounting for them in the analyses did not alter the main conclusions of our study and would have significantly decreased our sample size (Fig. S7, Table S5). We also retained estimates from populations that were reintroduced or translocated from a wild source (n = 309), whereas those from captive sources were excluded during article screening (see above). In exploratory analyses, the removal of these data did not influence our results, and many of these populations are relevant to real-world conservation efforts, as reintroductions and translocations are used to re-establish or support small, at-risk populations. We removed estimates based on duplication of markers (keeping estimates generated from SNPs when studies used both SNPs and microsatellites), and duplication of software (keeping estimates from NeEstimator v2 when studies used it alongside LDNe). Spatial and temporal replication were addressed with two separate datasets (see Table S6 for more information): the full dataset included spatially and temporally replicated samples, while these two types of replication were removed from the non-replicated dataset. Finally, for all populations included in our final datasets, we manually extracted their protection status according to the IUCN Red List of Threatened Species. Taxa were categorized as “Threatened” (Vulnerable, Endangered, Critically Endangered), “Nonthreatened” (Least Concern, Near Threatened), or “N/A” (Data Deficient, Not Evaluated). Mapping and Human Footprint Index (HFI) All populations were mapped in QGIS using the coordinates extracted from articles. The maps were created using a World Behrmann equal area projection. For the summary maps, estimates were grouped into grid cells with an area of 250,000 km2 (roughly 500 km x 500 km, but the dimensions of each cell vary due to distortions from the projection). Within each cell, we generated the count and median of Ne. We used the Global Human Footprint dataset (WCS & CIESIN, 2005) to generate a value of human influence (HFI) for each population at its geographic coordinates. The footprint ranges from zero (no human influence) to 100 (maximum human influence). Values were available in 1 km x 1 km grid cell size and were projected over the point estimates to assign a value of human footprint to each population. The human footprint values were extracted from the map into a spreadsheet to be used for statistical analyses. Not all geographic coordinates had a human footprint value associated with them (i.e., in the oceans and other large bodies of water), therefore marine fishes were not included in our HFI analysis. Overall, 3610 Ne estimates in our final dataset had an associated footprint value.
VITAL SIGNS INDICATOR
Rent Payments (EC8)
FULL MEASURE NAME
Median rent payment
LAST UPDATED
January 2023
DESCRIPTION
Rent payments refer to the cost of leasing an apartment or home and serves as a measure of housing costs for individuals who do not own a home. The data reflect the median monthly rent paid by Bay Area households across apartments and homes of various sizes and various levels of quality. This differs from advertised rents for available apartments, which usually are higher. Note that rent can be presented using nominal or real (inflation-adjusted) dollar values; data are presented inflation-adjusted to reflect changes in household purchasing power over time.
DATA SOURCE
U.S. Census Bureau: Decennial Census - https://nhgis.org
Count 2 (1970)
Form STF1 (1980-1990)
Form SF3a (2000)
U.S. Census Bureau: American Community Survey - https://data.census.gov/
Form B25058 (2005-2021; median contract rent)
Bureau of Labor Statistics: Consumer Price Index - https://www.bls.gov/data/
1970-2021
CONTACT INFORMATION
vitalsigns.info@mtc.ca.gov
METHODOLOGY NOTES (across all datasets for this indicator)
Rent data reflects median rent payments rather than list rents (refer to measure definition above). American Community Survey 1-year data is used for larger geographies – Bay counties and most metropolitan area counties – while smaller geographies rely upon 5-year rolling average data due to their smaller sample sizes. Note that 2020 data uses the 5-year estimates because the ACS did not collect 1-year data for 2020.
1970 Census data for median rent payments has been imputed from quintiles using methodology from California Department of Finance as the source data only provided the mean, rather than the median, monthly rent. Metro area boundaries reflects today’s metro area definitions by county for consistency, rather than historical metro area boundaries.
Inflation-adjusted data are presented to illustrate how rent payments have grown relative to overall price increases; that said, the use of the Consumer Price Index (CPI) does create some challenges given the fact that housing represents a major chunk of consumer goods bundle used to calculate CPI. This reflects a methodological tradeoff between precision and accuracy and is a common concern when working with any commodity that is a major component of CPI itself.
VITAL SIGNS INDICATOR List Rents (EC9)
FULL MEASURE NAME List Rents
LAST UPDATED October 2016
DESCRIPTION List rent refers to the advertised rents for available rental housing and serves as a measure of housing costs for new households moving into a neighborhood, city, county or region.
DATA SOURCE real Answers (1994 – 2015) no link
Zillow Metro Median Listing Price All Homes (2010-2016) http://www.zillow.com/research/data/
CONTACT INFORMATION vitalsigns.info@mtc.ca.gov
METHODOLOGY NOTES (across all datasets for this indicator) List rents data reflects median rent prices advertised for available apartments rather than median rent payments; more information is available in the indicator definition above. Regional and local geographies rely on data collected by real Answers, a research organization and database publisher specializing in the multifamily housing market. real Answers focuses on collecting longitudinal data for individual rental properties through quarterly surveys. For the Bay Area, their database is comprised of properties with 40 to 3,000+ housing units. Median list prices most likely have an upward bias due to the exclusion of smaller properties. The bias may be most extreme in geographies where large rental properties represent a small portion of the overall rental market. A map of the individual properties surveyed is included in the Local Focus section.
Individual properties surveyed provided lower- and upper-bound ranges for the various types of housing available (studio, 1 bedroom, 2 bedroom, etc.). Median lower- and upper-bound prices are determined across all housing types for the regional and county geographies. The median list price represented in Vital Signs is the average of the median lower- and upper-bound prices for the region and counties. Median upper-bound prices are determined across all housing types for the city geographies. The median list price represented in Vital Signs is the median upper-bound price for cities. For simplicity, only the mean list rent is displayed for the individual properties. The metro areas geography rely upon Zillow data, which is the median price for rentals listed through www.zillow.com during the month. Like the real Answers data, Zillow's median list prices most likely have an upward bias since small properties are underrepresented in Zillow's listings. The metro area data for the Bay Area cannot be compared to the regional Bay Area data. Due to afore mentioned data limitations, this data is suitable for analyzing the change in list rents over time but not necessarily comparisons of absolute list rents. Metro area boundaries reflects today’s metro area definitions by county for consistency, rather than historical metro area boundaries.
Due to the limited number of rental properties surveyed, city-level data is unavailable for Atherton, Belvedere, Brisbane, Calistoga, Clayton, Cloverdale, Cotati, Fairfax, Half Moon Bay, Healdsburg, Hillsborough, Los Altos Hills, Monte Sereno, Moranga, Oakley, Orinda, Portola Valley, Rio Vista, Ross, San Anselmo, San Carlos, Saratoga, Sebastopol, Windsor, Woodside, and Yountville.
Inflation-adjusted data are presented to illustrate how rents have grown relative to overall price increases; that said, the use of the Consumer Price Index does create some challenges given the fact that housing represents a major chunk of consumer goods bundle used to calculate CPI. This reflects a methodological tradeoff between precision and accuracy and is a common concern when working with any commodity that is a major component of CPI itself. Percent change in inflation-adjusted median is calculated with respect to the median price from the fourth quarter or December of the base year.
[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 0.1.]What does the data show? The Annual Count of Icing Days is the number of days per year where the maximum daily temperature is below 0°C. Note the Annual Count of Icing Days is more severe than the Annual Count of Frost Days as icing days refer to the daily maximum temperature whereas the frost days refer to the daily minimum temperature. The Annual Count of Icing Days measures how many times the threshold is exceeded (not by how much) in a year. The results should be interpreted as an approximation of the projected number of days when the threshold is exceeded as there will be many factors such as natural variability and local scale processes that the climate model is unable to represent.The Annual Count of Icing Days is calculated for two baseline (historical) periods 1981-2000 (corresponding to 0.51°C warming) and 2001-2020 (corresponding to 0.87°C warming) and for global warming levels of 1.5°C, 2.0°C, 2.5°C, 3.0°C, 4.0°C above the pre-industrial (1850-1900) period. This enables users to compare the future number of icing days to previous values. What are the possible societal impacts?The Annual Count of Icing Days indicates increased cold weather disruption due to a higher than normal chance of ice and snow. It is based on the maximum daily temperature being below 0°C, the temperature does not rise above 0°C for the entire day. Impacts include:Damage to crops.Transport disruption.Increased energy demand.The Annual Count of Frost Days, is a similar metric measuring impacts from cold temperatures, it indicates less severe cold weather impacts.What is a global warming level?The Annual Count of Icing Days is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Count of Icing Days, an average is taken across the 21 year period. Therefore, the Annual Count of Icing Days show the number of icing days that could occur each year, for each given level of warming. We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for each global warming level and two baselines. They are named ‘Icing Days’, the warming level or baseline, and ‘upper’ ‘median’ or ‘lower’ as per the description below. E.g. ‘Icing Days 2.5 median’ is the median value for the 2.5°C warming level. Decimal points are included in field aliases but not field names e.g. ‘Icing Days 2.5 median’ is ‘IcingDays_25_median’. To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘Icing Days 2.0°C median’ values.What do the ‘median’, ‘upper’, and ‘lower’ values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future. For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Annual Count of Icing Days was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location. The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline periods as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksThis dataset was calculated following the methodology in the ‘Future Changes to high impact weather in the UK’ report and uses the same temperature thresholds as the 'State of the UK Climate' report.Further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 1.2.]What does the data show? The Annual Count of Frost Days is the number of days per year where the minimum daily temperature is below 0°C. It measures how many times the threshold is exceeded (not by how much) in a year. The results should be interpreted as an approximation of the projected number of days when the threshold is exceeded as there will be many factors such as natural variability and local scale processes that the climate model is unable to represent.The Annual Count of Frost Days is calculated for two baseline (historical) periods 1981-2000 (corresponding to 0.51°C warming) and 2001-2020 (corresponding to 0.87°C warming) and for global warming levels of 1.5°C, 2.0°C, 2.5°C, 3.0°C, 4.0°C above the pre-industrial (1850-1900) period. This enables users to compare the future number of frost days to previous values. What are the possible societal impacts?The Annual Count of Frost Days indicates increased cold weather disruption due to a higher than normal chance of ice and snow. It is based on the minimum daily temperature being below 0°C. Impacts include:Damage to crops.Transport disruption.Increased energy demand.The Annual Count of Icing Days, is a similar metric measuring impacts from cold temperatures, it indicates more severe cold weather impacts.What is a global warming level?The Annual Count of Frost Days is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Count of Frost Days, an average is taken across the 21 year period. Therefore, the Annual Count of Frost Days show the number of frost days that could occur each year, for each given level of warming. We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for each global warming level and two baselines. They are named ‘Frost Days’, the warming level or baseline, and ‘upper’ ‘median’ or ‘lower’ as per the description below. E.g. ‘Frost Days 2.5 median’ is the median value for the 2.5°C warming level. Decimal points are included in field aliases but not field names e.g. ‘Frost Days 2.5 median’ is ‘FrostDays_25_median’. To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘Frost Days 2.0°C median’ values.What do the ‘median’, ‘upper’, and ‘lower’ values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future. For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Annual Count of Frost Days was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location. The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline periods as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksThis dataset was calculated following the methodology in the ‘Future Changes to high impact weather in the UK’ report and uses the same temperature thresholds as the 'State of the UK Climate' report.Further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 0.0.]What does the data show? The Annual Count of Tropical Nights is the number of days per year where the minimum daily temperature is above 20°C. It measures how many times the threshold is exceeded (not by how much). It measures how many times the threshold is exceeded (not by how much) in a year. The results should be interpreted as an approximation of the projected number of days when the threshold is exceeded as there will be many factors such as natural variability and local scale processes that the climate model is unable to represent.The Annual Count of Tropical Nights is calculated for two baseline (historical) periods 1981-2000 (corresponding to 0.51°C warming) and 2001-2020 (corresponding to 0.87°C warming) and for global warming levels of 1.5°C, 2.0°C, 2.5°C, 3.0°C, 4.0°C above the pre-industrial (1850-1900) period. This enables users to compare the future number of tropical nights to previous values. What are the possible societal impacts?The Annual Count of Tropical Nights indicates increased health risks and heat stress due to high night-time temperatures. It is based on exceeding a minimum daily temperature of 20°C, i.e. the temperature does not fall below 20°C for the entire day. Impacts include:Increased heat related illnesses, hospital admissions or death for vulnerable people.Increased heat stress, it is important the body has time to recover from high daytime temperatures during the lower temperatures at night.Other metrics such as the Annual Count of Summer Days (days above 25°C), Annual Count of Hot Summer Days (days above 30°C) and the Annual Count of Extreme Summer Days (days above 35°C) also indicate impacts from high temperatures, however they use different temperature thresholds.What is a global warming level?The Annual Count of Tropical Nights is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming.The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Count of Tropical Nights, an average is taken across the 21 year period. Therefore, the Annual Count of Tropical Nights show the number of tropical nights that could occur each year, for each given level of warming. We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for each global warming level and two baselines. They are named ‘Tropical Nights’, the warming level or baseline, and ‘upper’ ‘median’ or ‘lower’ as per the description below. E.g. ‘Tropical Nights 2.5 median’ is the median value for the 2.5°C warming level. Decimal points are included in field aliases but not field names e.g. ‘Tropical Nights 2.5 median’ is ‘TropicalNights_25_median’. To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘Tropical Nights 2.0°C median’ values.What do the ‘median’, ‘upper’, and ‘lower’ values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future. For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Annual Count of Tropical Nights was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location. The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline periods as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksThis dataset was calculated following the methodology in the ‘Future Changes to high impact weather in the UK’ report and uses the same temperature thresholds as the 'State of the UK Climate' report.Further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average percentage change between the 'lower' values before and after this update is -1%.]What does the data show? A Heating Degree Day (HDD) is a day in which the average temperature is below 15.5°C. It is the number of degrees above this threshold that counts as a Heating Degree Day. For example if the average temperature for a specific day is 15°C, this would contribute 0.5 Heating Degree Days to the annual sum, alternatively an average temperature of 10.5°C would contribute 5 Heating Degree Days. Given the data shows the annual sum of Heating Degree Days, this value can be above 365 in some parts of the UK.Annual Heating Degree Days is calculated for two baseline (historical) periods 1981-2000 (corresponding to 0.51°C warming) and 2001-2020 (corresponding to 0.87°C warming) and for global warming levels of 1.5°C, 2.0°C, 2.5°C, 3.0°C, 4.0°C above the pre-industrial (1850-1900) period. This enables users to compare the future number of HDD to previous values.What are the possible societal impacts?Heating Degree Days indicate the energy demand for heating due to cold days. A higher number of HDD means an increase in power consumption for heating, therefore this index is useful for predicting future changes in energy demand for heating.What is a global warming level?Annual Heating Degree Days are calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Heating Degree Days, an average is taken across the 21 year period. Therefore, the Annual Heating Degree Days show the number of heating degree days that could occur each year, for each given level of warming. We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for each warming level and two baselines. They are named ‘HDD’ (Heating Degree Days), the warming level or baseline, and 'upper' 'median' or 'lower' as per the description below. E.g. 'HDD 2.5 median' is the median value for the 2.5°C projection. Decimal points are included in field aliases but not field names e.g. 'HDD 2.5 median' is 'HDD_25_median'. To understand how to explore the data, see this page: https://storymaps.arcgis.com/stories/457e7a2bc73e40b089fac0e47c63a578Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘HDD 2.0°C median’ values.What do the ‘median’, ‘upper’, and ‘lower’ values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future. For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, Annual Heating Degree Days were calculated for each ensemble member and they were then ranked in order from lowest to highest for each location. The ‘lower’ fields are the second lowest ranked ensemble member. The ‘upper’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and upper fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline periods as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksThis dataset was calculated following the methodology in the ‘Future Changes to high impact weather in the UK’ report and uses the same temperature thresholds as the 'State of the UK Climate' report.Further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe accurate measurement of educational attainment is of great importance for population research. Past studies measuring average years of schooling rely on strong assumptions to incorporate binned data. These assumptions, which we refer to as the standard duration method, have not been previously evaluated for bias or accuracy.MethodsWe assembled a database of 1,680 survey and census datasets, representing both binned and single-year education data. We developed two models that split bins of education into single year values. We evaluate our models, and compare them to the standard duration method, using out-of-sample predictive validity.ResultsOur results indicate that typical methods used to split bins of educational attainment introduce substantial error and bias into estimates of average years of schooling, as compared to new approaches. Globally, the standard duration method underestimates average years of schooling, with a median error of -0.47 years. This effect is especially pronounced in datasets with a smaller number of bins or higher true average attainment, leading to irregular error patterns between geographies and time periods. Both models we developed resulted in unbiased predictions of average years of schooling, with smaller average error than previous methods. We find that one approach using a metric of distance in space and time to identify training data, had the best performance, with a root mean squared error of mean attainment of 0.26 years, compared to 0.92 years for the standard duration algorithm.ConclusionsEducation is a key social indicator and its accurate estimation should be a population research priority. The use of a space-time distance bin-splitting model drastically improved the estimation of average years of schooling from binned education data. We provide a detailed description of how to use the method and recommend that future studies estimating educational attainment across time or geographies use a similar approach.