Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Related article: Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39.
In this dataset:
We present temporally dynamic population distribution data from the Helsinki Metropolitan Area, Finland, at the level of 250 m by 250 m statistical grid cells. Three hourly population distribution datasets are provided for regular workdays (Mon – Thu), Saturdays and Sundays. The data are based on aggregated mobile phone data collected by the biggest mobile network operator in Finland. Mobile phone data are assigned to statistical grid cells using an advanced dasymetric interpolation method based on ancillary data about land cover, buildings and a time use survey. The data were validated by comparing population register data from Statistics Finland for night-time hours and a daytime workplace registry. The resulting 24-hour population data can be used to reveal the temporal dynamics of the city and examine population variations relevant to for instance spatial accessibility analyses, crisis management and planning.
Please cite this dataset as:
Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. https://doi.org/10.1038/s41597-021-01113-4
Organization of data
The dataset is packaged into a single Zipfile Helsinki_dynpop_matrix.zip which contains following files:
HMA_Dynamic_population_24H_workdays.csv represents the dynamic population for average workday in the study area.
HMA_Dynamic_population_24H_sat.csv represents the dynamic population for average saturday in the study area.
HMA_Dynamic_population_24H_sun.csv represents the dynamic population for average sunday in the study area.
target_zones_grid250m_EPSG3067.geojson represents the statistical grid in ETRS89/ETRS-TM35FIN projection that can be used to visualize the data on a map using e.g. QGIS.
Column names
YKR_ID : a unique identifier for each statistical grid cell (n=13,231). The identifier is compatible with the statistical YKR grid cell data by Statistics Finland and Finnish Environment Institute.
H0, H1 ... H23 : Each field represents the proportional distribution of the total population in the study area between grid cells during a one-hour period. In total, 24 fields are formatted as “Hx”, where x stands for the hour of the day (values ranging from 0-23). For example, H0 stands for the first hour of the day: 00:00 - 00:59. The sum of all cell values for each field equals to 100 (i.e. 100% of total population for each one-hour period)
In order to visualize the data on a map, the result tables can be joined with the target_zones_grid250m_EPSG3067.geojson data. The data can be joined by using the field YKR_ID as a common key between the datasets.
License Creative Commons Attribution 4.0 International.
Related datasets
Järv, Olle; Tenkanen, Henrikki & Toivonen, Tuuli. (2017). Multi-temporal function-based dasymetric interpolation tool for mobile phone data. Zenodo. https://doi.org/10.5281/zenodo.252612
Tenkanen, Henrikki, & Toivonen, Tuuli. (2019). Helsinki Region Travel Time Matrix [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3247564
Effective September 27, 2023, this dataset will no longer be updated. Similar data are accessible from wonder.cdc.gov. This visualization provides data that can be used to illustrate potential differences in the burden of deaths due to COVID-19 by race and ethnicity.
The statistic shows the growth of the distribution software market worldwide from 2018 to 2025. In 2020, the distribution software market was valued at approximately **** billion U.S. dollars, an increase from the previous year.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).
It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:
In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).
Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).
After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.
Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).
Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.
On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).
Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).
Validation set
Model
True
False
Presence
A
B
Background
C
D
We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).
The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.
Regarding the model evaluation and estimation, we selected the following estimators:
1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).
2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.
As of August 2024, approximately ** percent of business-to-business (B2B) content marketers surveyed worldwide (but predominantly based in North America) included social media advertising and promoted posts among their paid channels for content distribution. Search engine marketing (SEM) and pay-per-click (PPC) tactics followed, mentioned by ** percent of respondents. According to the same study, B2B content marketers' top channels included organic social media and corporate blogs.
Repository that contains alerts that will be sent to SSA employees when certain conditions exist, to inform them of work that needs to be done, is being reviewed, or has been completed.
Note: Starting April 27, 2023 updates change from daily to weekly. Summary The cumulative number of probable COVID-19 deaths among Maryland residents by race and ethnicity: African American; White; Hispanic; Asian; Other; Unknown. Description The MD COVID-19 - Probable Deaths by Race and Ethnicity Distribution data layer is a collection of the statewide confirmed and probable COVID-19 related deaths that have been reported each day by the Vital Statistics Administration by categories of race and ethnicity. A death is classified as probable if the person's death certificate notes COVID-19 to be a probable, suspect or presumed cause or condition. Probable deaths are not yet been confirmed by a laboratory test. Some data on deaths may be unavailable due to the time lag between the death, typically reported by a hospital or other facility, and the submission of the complete death certificate. Confirmed deaths are available from the MD COVID-19 - Confirmed Deaths by Race and Ethnicity Distribution data layer. Terms of Use The Spatial Data, and the information therein, (collectively the "Data") is provided "as is" without warranty of any kind, either expressed, implied, or statutory. The user assumes the entire risk as to quality and performance of the Data. No guarantee of accuracy is granted, nor is any responsibility for reliance thereon assumed. In no event shall the State of Maryland be liable for direct, indirect, incidental, consequential or special damages of any kind. The State of Maryland does not accept liability for any damages or misrepresentation caused by inaccuracies in the Data or as a result to changes to the Data, nor is there responsibility assumed to maintain the Data in any manner or form. The Data can be freely distributed as long as the metadata entry is not modified or deleted. Any data derived from the Data must acknowledge the State of Maryland in the metadata.
These experimental statistics about PPE items distributed for use by health and social care services in England include a breakdown of deliveries by PPE item, and information about orders using the e-Portal.
The ‘Monthly PPE data’ attachment gives a more detailed breakdown of daily PPE deliveries from 1 March to 31 March 2022, and a breakdown of e-Portal orders by sector and PPE item type.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The performance of eight different methods to predict human volume of distribution (VDss) using a large data set (N > 100) was evaluated.The accuracy was assessed by the end points % within two-fold and absolute average fold error (AAFE). The ability to rank order was accessed by the σ and bias was examined using average fold error. Significance of observed differences was established using statistical permutation testing.The Rodgers-Lukova equation, a tissue composition model, for acids and single species scaling based on rat for other ion classes showed the best results in absence of non-rodent data.The semimechanistic Øie-Tozer model based on all thee preclinical species showed the best performance overall (81% within two-fold, AAFE 1.55, σ 0.62). This was not statistically significantly better at the 95% confidence level than the same model based on two preclinical species or single species scaling from monkey. Thus, the use of primates appears difficult to justify when the sole goal is to extrapolate human volume of distribution. The performance of eight different methods to predict human volume of distribution (VDss) using a large data set (N > 100) was evaluated. The accuracy was assessed by the end points % within two-fold and absolute average fold error (AAFE). The ability to rank order was accessed by the σ and bias was examined using average fold error. Significance of observed differences was established using statistical permutation testing. The Rodgers-Lukova equation, a tissue composition model, for acids and single species scaling based on rat for other ion classes showed the best results in absence of non-rodent data. The semimechanistic Øie-Tozer model based on all thee preclinical species showed the best performance overall (81% within two-fold, AAFE 1.55, σ 0.62). This was not statistically significantly better at the 95% confidence level than the same model based on two preclinical species or single species scaling from monkey. Thus, the use of primates appears difficult to justify when the sole goal is to extrapolate human volume of distribution.
https://www.polarismarketresearch.com/privacy-policyhttps://www.polarismarketresearch.com/privacy-policy
U.S. Chemical Distribution Market is size is valued at USD 49.99 Billion by 2032 and CAGR 6.1% Market by Indications by Distribution Channels.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
November 2022 VersionThis dataset represents the "Observed Distribution" for coho salmon in California by using observations made only between 1990 and the present. It was developed for the express purpose of assisting with species recovery planning efforts. The process for developing this dataset was to collect as many observations of the species as possible and derive the stream-based geographic distribution for the species based solely on these positive observations.For the purpose of this dataset an observation is defined as a report of a sighting or other evidence of the presence of the species at a given place and time. As such, observations are modeled by year observed as point locations in the GIS. All such observations were collected with information regarding who reported the observation, their agency/organization/affiliation, the date that they observed the species, who compiled the information, etc. This information is maintained in the developers file geodatabase (©Environmental Science Research Institute (ESRI) 2016).To develop this distribution dataset, the species observations were applied to California Streams, a CDFW derivative of USGS National Hydrography Dataset (NHD) High Resolution hydrography. For each observation, a path was traced down the hydrography from the point of observation to the ocean, thereby deriving the shortest migration route from the point of observation to the sea. By appending all of these migration paths together, the "Observed Distribution" for the species is developed.It is important to note that this layer does not attempt to model the entire possible distribution of the species. Rather, it only represents the known distribution based on where the species has been observed and reported. While some observations indeed represent the upstream extent of the species (e.g., an observation made at a hard barrier), the majority of observations only indicate where the species was sampled for or otherwise observed. Because of this, this dataset likely underestimates the absolute geographic distribution of the species.It is also important to note that the species may not be found on an annual basis in all indicated reaches due to natural variations in run size, water conditions, and other environmental factors. As such, the information in this dataset should not be used to verify that the species are currently present in a given stream. Conversely, the absence of distribution linework for a given stream does not necessarily indicate that the species does not occur in that stream. The observation data were compiled from a variety of disparate sources including but not limited to CDFW, USFS, NMFS, timber companies, and the public. Forms of documentation include CDFW administrative reports, personal communications with biologists, observation reports, and literature reviews. The source of each feature (to the best available knowledge) is included in the data attributes for the observations in the geodatabase, but not for the resulting linework. The spatial data has been referenced to California Streams, a CDFW derivative of USGS National Hydrography Dataset (NHD) High Resolution hydrography.Usage of this dataset:Examples of appropriate uses include:- species recovery planning- Evaluation of future survey sites for the species- Validating species distribution modelsExamples of inappropriate uses include:- Assuming absence of a line feature means that the species are not present in that stream.- Using this data to make parcel or ground level land use management decisions.- Using this dataset to prove or support non-existence of the species at any spatial scale.- Assuming that the line feature represents the maximum possible extent of species distribution.All users of this data should seek the assistance of qualified professionals such as surveyors, hydrologists, or fishery biologists as needed to ensure that such users possess complete, precise, and up to date information on species distribution and water body location.Any copy of this dataset is considered to be a snapshot of the species distribution at the time of release. It is impingent upon the user to ensure that they have the most recent version prior to making management or planning decisions.Please refer to "Use Constraints" section below.
This dataset includes percent distribution of births for females by age group in the United States since 1933.
The number of states in the reporting area differ historically. In 1915 (when the birth registration area was established), 10 states and the District of Columbia reported births; by 1933, 48 states and the District of Columbia were reporting births, with the last two states, Alaska and Hawaii, added to the registration area in 1959 and 1960, when these regions gained statehood. Reporting area information is detailed in references 1 and 2 below. Trend lines for 1909–1958 are based on live births adjusted for under-registration; beginning with 1959, trend lines are based on registered live births.
SOURCES
NCHS, National Vital Statistics System, birth data (see https://www.cdc.gov/nchs/births.htm); public-use data files (see https://www.cdc.gov/nchs/data_access/VitalStatsOnline.htm); and CDC WONDER (see http://wonder.cdc.gov/).
REFERENCES
National Office of Vital Statistics. Vital Statistics of the United States, 1950, Volume I. 1954. Available from: https://www.cdc.gov/nchs/data/vsus/vsus_1950_1.pdf.
Hetzel AM. U.S. vital statistics system: major activities and developments, 1950-95. National Center for Health Statistics. 1997. Available from: https://www.cdc.gov/nchs/data/misc/usvss.pdf.
National Center for Health Statistics. Vital Statistics of the United States, 1967, Volume I–Natality. 1969. Available from: https://www.cdc.gov/nchs/data/vsus/nat67_1.pdf.
Martin JA, Hamilton BE, Osterman MJK, et al. Births: Final data for 2015. National vital statistics reports; vol 66 no 1. Hyattsville, MD: National Center for Health Statistics. 2017. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr66/nvsr66_01.pdf.
Martin JA, Hamilton BE, Osterman MJK, Driscoll AK, Drake P. Births: Final data for 2016. National Vital Statistics Reports; vol 67 no 1. Hyattsville, MD: National Center for Health Statistics. 2018. Available from: https://www.cdc.gov/nvsr/nvsr67/nvsr67_01.pdf.
Martin JA, Hamilton BE, Osterman MJK, Driscoll AK, Births: Final data for 2018. National vital statistics reports; vol 68 no 13. Hyattsville, MD: National Center for Health Statistics. 2019. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr68/nvsr68_13.pdf.
Note: Starting April 27, 2023 updates change from daily to weekly. Summary The cumulative number of confirmed COVID-19 deaths among Maryland residents by gender: Female; Male; Unknown. Description The MD COVID-19 - Confirmed Deaths by Gender Distribution data layer is a collection of the statewide confirmed and probable COVID-19 related deaths that have been reported each day by the Vital Statistics Administration by gender. A death is classified as confirmed if the person had a laboratory-confirmed positive COVID-19 test result. Some data on deaths may be unavailable due to the time lag between the death, typically reported by a hospital or other facility, and the submission of the complete death certificate. Probable deaths are available from the MD COVID-19 - Probable Deaths by Gender Distribution data layer. Terms of Use The Spatial Data, and the information therein, (collectively the "Data") is provided "as is" without warranty of any kind, either expressed, implied, or statutory. The user assumes the entire risk as to quality and performance of the Data. No guarantee of accuracy is granted, nor is any responsibility for reliance thereon assumed. In no event shall the State of Maryland be liable for direct, indirect, incidental, consequential or special damages of any kind. The State of Maryland does not accept liability for any damages or misrepresentation caused by inaccuracies in the Data or as a result to changes to the Data, nor is there responsibility assumed to maintain the Data in any manner or form. The Data can be freely distributed as long as the metadata entry is not modified or deleted. Any data derived from the Data must acknowledge the State of Maryland in the metadata.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Total population per municipality harmonized according to INSPIRE requirements Description copied from catalog.inspire.geoportail.lu.
Total sales of flowers, plants, fruit and vegetables to retail florists, wholesalers, market chain stores, public agencies, etc.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ukraine Population Distribution: with Avg Income per Capita: 3360.1 to 3720.0 UAH data was reported at 10.800 % in 2017. This records an increase from the previous number of 7.900 % for 2016. Ukraine Population Distribution: with Avg Income per Capita: 3360.1 to 3720.0 UAH data is updated yearly, averaging 3.450 % from Dec 2012 (Median) to 2017, with 6 observations. The data reached an all-time high of 10.800 % in 2017 and a record low of 2.000 % in 2013. Ukraine Population Distribution: with Avg Income per Capita: 3360.1 to 3720.0 UAH data remains active status in CEIC and is reported by State Statistics Service of Ukraine. The data is categorized under Global Database’s Ukraine – Table UA.H009: Household Income and Expenditure: Annual.
This section includes statistical data on the distribution of invasive alien plants in the land transportation corridor of the Qinghai Tibet Plateau at different altitudes, distances from the road, and in different regions by family and functional groups. This data is mainly based on the list of invasive alien plants in China to determine the main invasive alien plants in China. Then, according to the point data of species distribution recorded on the GBIF website nationwide, ArcGIS spatial analysis is used to determine the invasive alien plants distributed on the Qinghai Tibet Plateau, and the species and distribution point numbers of each family and fruit type functional group are counted by family. Due to the distribution of some species not being included in the website, there may be a lack of data. This data can provide a basis for research on invasive plants in the Qinghai Tibet Plateau.
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Statistics illustrates consumption, production, prices, and trade of Dairy Spreads in Cyprus from 2007 to 2024.
In 2024, **** percent of Medicare's Part D beneficiaries were insured through United Healthcare. Part D covers prescription drugs and must be separately enrolled for beneficiaries in traditional Medicare plans in the United States. This statistic shows the distribution of Medicare Part D enrollment in 2024, by firm.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Related article: Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39.
In this dataset:
We present temporally dynamic population distribution data from the Helsinki Metropolitan Area, Finland, at the level of 250 m by 250 m statistical grid cells. Three hourly population distribution datasets are provided for regular workdays (Mon – Thu), Saturdays and Sundays. The data are based on aggregated mobile phone data collected by the biggest mobile network operator in Finland. Mobile phone data are assigned to statistical grid cells using an advanced dasymetric interpolation method based on ancillary data about land cover, buildings and a time use survey. The data were validated by comparing population register data from Statistics Finland for night-time hours and a daytime workplace registry. The resulting 24-hour population data can be used to reveal the temporal dynamics of the city and examine population variations relevant to for instance spatial accessibility analyses, crisis management and planning.
Please cite this dataset as:
Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. https://doi.org/10.1038/s41597-021-01113-4
Organization of data
The dataset is packaged into a single Zipfile Helsinki_dynpop_matrix.zip which contains following files:
HMA_Dynamic_population_24H_workdays.csv represents the dynamic population for average workday in the study area.
HMA_Dynamic_population_24H_sat.csv represents the dynamic population for average saturday in the study area.
HMA_Dynamic_population_24H_sun.csv represents the dynamic population for average sunday in the study area.
target_zones_grid250m_EPSG3067.geojson represents the statistical grid in ETRS89/ETRS-TM35FIN projection that can be used to visualize the data on a map using e.g. QGIS.
Column names
YKR_ID : a unique identifier for each statistical grid cell (n=13,231). The identifier is compatible with the statistical YKR grid cell data by Statistics Finland and Finnish Environment Institute.
H0, H1 ... H23 : Each field represents the proportional distribution of the total population in the study area between grid cells during a one-hour period. In total, 24 fields are formatted as “Hx”, where x stands for the hour of the day (values ranging from 0-23). For example, H0 stands for the first hour of the day: 00:00 - 00:59. The sum of all cell values for each field equals to 100 (i.e. 100% of total population for each one-hour period)
In order to visualize the data on a map, the result tables can be joined with the target_zones_grid250m_EPSG3067.geojson data. The data can be joined by using the field YKR_ID as a common key between the datasets.
License Creative Commons Attribution 4.0 International.
Related datasets
Järv, Olle; Tenkanen, Henrikki & Toivonen, Tuuli. (2017). Multi-temporal function-based dasymetric interpolation tool for mobile phone data. Zenodo. https://doi.org/10.5281/zenodo.252612
Tenkanen, Henrikki, & Toivonen, Tuuli. (2019). Helsinki Region Travel Time Matrix [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3247564