This dataset contains estimates of the number of persons per square kilometer consistent with national censuses and population registers. There is one image for each modeled year. General Documentation The Gridded Population of World Version 4 (GPWv4), Revision 11 models the distribution of global human population for the years 2000, 2005, 2010, 2015, and 2020 on 30 arc-second (approximately 1 km) grid cells. Population is distributed to cells using proportional allocation of population from census and administrative units. Population input data are collected at the most detailed spatial resolution available from the results of the 2010 round of censuses, which occurred between 2005 and 2014. The input data are extrapolated to produce population estimates for each modeled year.
The Crisis Mapping Toolkit (CMT) is a collection of tools for processing geospatial data (images, satellite data, etc.) into cartographic products that improve understanding of large-scale crises, such as natural disasters. The cartographic products produced by CMT include flood inundation maps, maps of damaged or destroyed structures, forest fire maps, population density estimates, etc. CMT is designed to rapidly process large-scale data using Google Earth Engine and other geospatial data systems.
Global high-resolution, contemporary data on human population distributions are a prerequisite for the accurate measurement of the impacts of population growth, for monitoring changes, and for planning interventions. The WorldPop project aims to meet these needs through the provision of detailed and open access population distribution datasets built using transparent and peer-reviewed approaches. Full details on the methods and datasets used in constructing the data, along with open access publications, are provided on the WorldPop website. In brief, recent census-based population counts matched to their associated administrative units are disaggregated to ~100x100m grid cells through machine learning approaches that exploit the relationships between population densities and a range of geospatial covariate layers. The mapping approach is Random Forest-based dasymetric redistribution. This dataset depict estimated number of people residing in each grid cell in 2010, 2015, and other years. For 2020, the breakdown of population by age and sex is available in the WorldPop/GP/100m/pop_age_sex and WorldPop/GP/100m/pop_age_sex_cons_unadj collections. Further WorldPop gridded datasets on population age structures, poverty, urban growth, and population dynamics are freely available on the WorldPop website. WorldPop represents a collaboration between researchers at the University of Southampton, Universite Libre de Bruxelles, and University of Louisville. The project is principally funded by the Bill and Melinda Gates Foundation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This archive reproduces a figure titled "Figure 3.2 Boone County population distribution" from Wang and vom Hofe (2007, p.60). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses the Census API to retrieve data, reproduce the figure, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration, and management. The Census API is used to obtain population counts from the 2000 Decennial Census (Summary File 1, 100% data). Shapefiles are downloaded from the TIGER/Line FTP Server. All downloaded data are maintained in the notebook's temporary working directory while in use. The data and shapefiles are stored separately with this archive. The final map is also stored as an HTML file.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code that performs the following functions:install/import necessary Python packagesdownload the Census Tract shapefile from the TIGER/Line FTP Serverdownload Census data via CensusAPI manipulate Census tabular data merge Census data with TIGER/Line shapefileapply a coordinate reference systemcalculate land area and population densitymap and export the map to HTMLexport the map to ESRI shapefileexport the table to CSVThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the TIGER/Line shapefile and Census API downloads. The notebook can be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).
The computed population density data for the map is based on a media CD released by ESRI in 2006. According to the media CD, China in 2006 comprised of 33 provinces. These include Tibet (now named Xizang, an autonomously administered region), Hong Kong and Macau (both of which are designated as special districts) along with Xingiang in the west, parts of which are involved in an unsettled border dispute with a neighboring country, as can be seen by a dotted line in google base map of the region and Taiwan. Compare this map with the population density map of 2002 that now has only 32 provinces...
The Crisis Mapping Toolkit (CMT) is a collection of tools for processing geospatial data (images, satellite data, etc.) into cartographic products that improve understanding of large-scale crises, such as natural disasters. The cartographic products produced by CMT include flood inundation maps, maps of damaged or destroyed structures, forest fire maps, population density estimates, etc. CMT is designed to rapidly process large-scale data using Google Earth Engine and other geospatial data systems.
Description
This dataset includes the inputs and outputs generated in the spatial modeling of CES using social media data for eight mountain parks in Spain and Portugal (Aigüestortes, Sierra de Guadarrama, Ordesa, Peneda-Gerês, Picos de Europa, Sierra de las Nieves, Sierra Nevada and Teide). This spatial modeling is addressed in the article in preparation entitled: "What drives cultural ecosystem services in mountain protected areas? An AI-assisted answer using social media."
The variables used as inputs to generate the models come from different sources:
-CES presence points come from social media photos (Flickr and Twitter) labeled using AI models and validated by experts. The models used for automatic labeling were Dino v2 and OPENAI's GPT 4.1 model. Consensus was sought on the labels from these two label sources, which showed F1 values above 0.75, and these labels were used as presence data.
The environmental variables used are mainly derived from:
OpenStreetMap (OSM) https://www.openstreetmap.org/
Variables derived from remote sensing
Topographic variables
Current and future climate variables derived from CHELSA (https://chelsa-climate.org/)
Landscape metrics (calculated with Fragstats software https://www.fragstats.org/)
Viewshed
Land use and land cover maps (https://land.copernicus.eu/en/products/corine-land-cover)
The models were generated with the maximum entropy (MaxEnt) algorithm using the biomod2 R package, leveraging its suitability for presence-only data, low sample sizes, and mixed predictor types. To address sampling bias, we generated 10 pseudo-absence replicates based on the “target-group background” method. Models were evaluated using AUC-ROC and True Skill Statistic (TSS), with performance validation via 10-fold cross-validation, resulting in 100 runs per model. Ensemble models were created from runs with AUC-ROC > 0.6, using the median for spatial projections of CES and the coefficient of variation to estimate uncertainty. We implemented two modelling approaches: one assuming consistent CES preferences across parks, and another assuming park-specific preferences shaped by local environmental contexts.
Table 1. Categories used in social media photo tagging: Stoten, based on the scientific framework proposed by Moreno-Llorca et al. (2020) (https://doi.org/10.1016/j.scitotenv.2020.140067).
Stoten
Cultural
Fauna/Flora
Gastronomy
Nature & Landscape
Not relevant
Recreational
Religious
Rural tourism
Sports
Sun and beach
Urban
Table 2. Table of contents of the dataset
Folder
format
Description
Inputs
Base layers
by National Park
100-meter grid
grid_wgs84_atrib
.shp
100 x 100 meter grid for each of the studied national parks that cover the study area
Biosphere Reserve
MAB_wgs84
.shp
Biosphere reserve layers present in each of the national parks studied
Municipality
Municipality
.shp
Layers of municipalities that overlap with the park area, biosphere reserve, Natura 2000 and the socioeconomic influence area with a 100-meter buffer
National park limit
National_park_limit
.shp
Boundaries of each of the national parks studied
Natura 2000
RN2000
.shp
Layers of the Natura 2000 for each of the national parks studied
Socioeconomic influence area
AIS
.shp
Area of socioeconomic influence of each of the parks studied
Readme
.txt
File containing layer metadata, including download locations and descriptions of shape attributes.
by National Park
Accessibility
.tif
Accessibility variables that include routes, streets, parking, and train tracks
Climate
.tif
Chelsea-derived climate variable layers and solar radiation layers
Ecosystem functioning
.tif
Layers derived from remote sensing that are related with the functional attributes of ecosystems
Ecosystem structure
.tif
Landscape and spectral diversity metrics
Geodiversity
.tif
Topographic and derived variables
Land use Land cover
.tif
Layers related to land use and cover
Tourism and Culture
.tif
Layers related to infrastructure associated with tourism such as bars, restaurants, lodgings and places of cultural interest such as monuments
Scripts
Modeling to get output data
Biomod_modelling_by_park
.R
Script used for modeling CES using data from social media by fitting one ENM for each park and CES.
Biomod_modelling_all_parks
.R
Script used for modeling CES using data from social media by fitting one ENM for each CES.
Modeling to get output data
Downloading and processing variables
EFAS
EFAs code
.js
GEE scripts used to download the Ecosystem Functional Attributes (EFAs) (Paruelo et al.2001; Alcaraz-Segura et al. 2006) derived from Sentinel 2 dataset for each of the national parks studied
OSM
1) Download layers
.py
Python scripts used to download the OpenStreetMap layers of interest for each of the national parks studied.
2) Join layers
.py
Scripts used to merge OSM layers belonging to the same category. e.g., primary, secondary, and tertiary highways.
3) Count point
.py
Scripts used to count the number of points in each of the 100 grid cells for each park, used in case of point type data
4) Presence and absence
.py
Scripts used to assess presence in each of the cells of the 100-square grid for each park, used in the case of data types such as points, lines, and polygons.
Remote sensing
Canopy
.js
GEE scripts used to download the canopy (https://gee-community-catalog.org/projects/canopy/) downloaded and cropped for each of the national parks studied
ESPI
.js
GEE scripts used to download the ESPI index (Ecosystem Service Provision Index) downloaded and cropped for each of the national parks studied
European disturbance map
.js
GEE scripts used to download European disturbance maps (//https://www.eea.europa.eu/data-and-maps/figures/biogeographical-regions-in-europe-2)
downloaded and cropped for each of the national parks studied
LST
.js
GEE scripts used to download LST maps (from Landsat Collection)
downloaded and cropped for each of the national parks studied
Night lights
.js
GEE scripts used to download nighttime light maps (https://developers.google.com/earth-engine/datasets/catalog/NOAA_VIIRS_DNB_ANNUAL_V22)
downloaded and cropped for each of the national parks studied
Population density
.js
GEE scripts used to download population density maps (https://developers.google.com/earth-engine/datasets/catalog/CIESIN_GPWv411_GPW_Population_Density)
downloaded and cropped for each of the national parks studied
Soil groups
.js
GEE scripts used to download Hydrologic Soil Group maps (https://gee-community-catalog.org/projects/hihydro_soil/)
downloaded and cropped for each of the national parks studied
Solar radiation
.js
GEE scripts used to download solar radiation maps (https://globalsolaratlas.info/support/faq)
downloaded and cropped for each of the national parks studied
RGB diversity
Seasonal KMeans clustering
.js
GEE scripts were used to calculate seasonal clusters using Sentinel 2 RGB bands with GEE's .wekaKMeans algorithm. These layers were downloaded and cropped for each of the national parks studied.
Colour diversity analysis
.R
R script used to calculate spectral diversity (Shannon, Simpson and inverse Simpson) using the cluster layers and RGB bands derived from Sentinel 2.
Post processing
Align_and_Clip_rasters
.py
Python scripts used to align and clip the downloaded layers to a 100-meter grid reference layer for each of the national parks studied.
Outputs
CES projections
proj_Aiguestortes_Sports_ensemble
.tif
Spatial projections for the best models obtained for each CES and park
References:
Alcaraz-Segura, D., Paruelo, J., and Cabello, J. 2006: Identification of current ecosystem functional types in the Iberian Peninsula, Global Ecol. Biogeogr., 15, 200–212, https://doi.org/10.1111/j.1466-822X.2006.00215.x
Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E., Linder, H.P., Kessler, M., 2017. Climatologies at high resolution for the earth’s land surface areas. Sci Data 4, 170122. https://doi.org/10.1038/sdata.2017.122
Lobo, J.M., Jiménez-Valverde, A., Hortal, J., 2010. The uncertain nature of absences and their importance in species distribution modelling. Ecography 33, 103–114. https://doi.org/10.1111/j.1600-0587.2009.06039.x
Paruelo, J. M., Jobbágy, E. G., and Sala, O. E. 2001: Current Distribution of Ecosystem Functional Types in Temperate South America, Ecosystems, 4, 683–698, https://doi.org/10.1007/s10021-001-0037-9
Phillips, S.J., Anderson, R.P., Schapire, R.E., 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190, 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026
Phillips, S.J., Dudík, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., Ferrier, S., 2009. Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications 19, 181–197. https://doi.org/10.1890/07-2153.1
Thuiller, W., Georges, D., Gueguen, M., Engler, R., Breiner, F., Lafourcade, B., Patin, R., 2023. biomod2: Ensemble Platform for Species Distribution Modeling.
Sillero, N., Arenas-Castro, S., Enriquez‐Urzelai, U., Vale, C.G., Sousa-Guedes, D., Martínez-Freiría, F., Real,
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Identifying environmental characteristics that limit species’ distributions is important for contemporary conservation and inferring responses to future environmental change. The Tasmanian native hen is an island-endemic flightless rail and a survivor of a prehistoric extirpation event. Little is known about the regional-scale environmental characteristics influencing the distribution of native hens, or how their future distribution might be impacted by environmental shifts (e.g., climate change). Using a combination of local fieldwork and species distribution modelling, we assess environmental factors shaping the contemporary distribution of the native hen, and project future distribution changes under predicted climate change. We find 37.2% of Tasmania is currently suitable for the native hens, owing to low summer precipitation, low elevation, human-modified vegetation, and urban areas. Moreover, in unsuitable regions, urban areas can create ‘oases’ of habitat, able to support populations with high breeding activity by providing resources and buffering against environmental constraints. Under climate change predictions, native hens were predicted to lose only 5% of their occupied range by 2055. We conclude that the species is resilient to climate change and benefits overall from anthropogenic landscape modifications. As such, this constitutes a rare example of a flightless rail to have adapted to human activity. Methods Local-scale factors measurements (fieldwork) We selected geographically distant populations presenting different rainfall profiles during the late-autumn to spring period, April-November 2019, as rainfall is an important factor for native hens’ survival and reproduction (Ridpath, 1972a; Lévêque, 2022): ‘East’ (wukaluwikiwayna/Maria Island National park; 42°34'51"S 148°03'56"E), ‘North’ (Narawntapu National park; 41°08'53"S 146°36'52"E), and ‘West’ (adjacent to the town of Zeehan [712 inhabitants]; 41°53'03"S 145°19'56"E). The period April-November corresponds to the six-month period preceding the middle point of the breeding season, generally used for native hens’ surveys (Goldizen et al., 1998; Lévêque, 2022). All three populations were surveyed between the 10th and the 22nd of November 2019 (late spring, in the middle point of the breeding season) to determine population structure (total number of groups, group composition [number of adults and young], and breeding activity). Each population was monitored over two to five days, depending on habitat complexity and extent of the population area, until all native hens in the area had been surveyed, i.e., when the territories’ structure was found identical at least four times for populations with no previous data (‘North’ and ‘West’), and at least two times in well-known populations (‘East’; Lévêque, 2022), over two different half-day. To align with methods used by Lévêque (2022), we used territory mapping (Bibby et al., 2000; Gibbons & Gregory, 2006) as native-hens maintain year-round territories, and population sizes were measurable with our survey methodology. Territory mapping consists of establishing the location of birds over a number of visits to obtain distinct clusters representing each territory. Boundaries are determined by vocal disputes between neighbours, which are frequent in native hens. During each survey, a minimum of two observers conducted repeated group identification, based on location, neighbours’ location, and number of individuals per group (from two to five individuals per group in this study). The number of individuals and their age category (fledgling, juvenile, or adult) were recorded per territory. The total pasture area surveyed per population, and the total pasture area occupied by native hens were: North population: 2.0 km2 (1.3 km2 occupied); West population: 1.5 km2 (0.7 km2 occupied); East population: 1.5 km2 (0.6 km2 occupied). We measured environmental characteristics in the native-hens’ territory following methods established by Goldizen et al. (1998) to obtain quantitative measures of i) protection cover, ii) water availability, and iii) food availability; these parameters are important for native hen reproduction (Goldizen et al., 1998).
Protection cover was determined as the length (m) of the interface between dense patches of bushes and pasture, used by native hens for hiding and protecting chicks against predators (Lévêque, 2022). It is an important parameter for breeding success (Goldizen et al., 1998). We measured the total protection cover available to native hens in each population using satellite data from Google Maps (www.google.com/maps, accessed on 09/12/2019). For measures of food availability (grass) on territories, we selected random transects of a total length of 1 m across all territories (East: n = 15, North: n = 26, West: n = 22). Measurements of vegetation characteristics were measured and recorded every 2 cm along each transect, including the percentage of i) total vegetation cover, ii) green vegetation, iii) vegetation cover that was grass, iv) vegetation cover that was moss, and v) the grass height (average length of grass blades). The same observer (LL) recorded all measures. Water availability on territories was recorded as territories that had access to water (running or stagnant) at the time the surveys were undertaken. Rainfall data was collected from the Bureau of Meteorology (B.O.M.; www.bom.gov.au/climate/data) at the three population sites: North population at Port Sorell (Narawntapu National Park – 4km away from the population site), West population at Zeehan (West Coast Pioneers Museum), East population at Maria Island (Darlington). Rainfall was reported as the amount of rainwater that had accumulated i) during the six months prior to breeding season midpoint (31/10/2019); following Goldizen et al. (1998)) and ii) during summer [December-February]. Information on recent droughts (on a 3- to 11-month period prior to 31/10/2019) was assessed using values on rainfall percentile deficiency (below the 10th percentile) from B.O.M. (http://www.bom.gov.au/climate/drought/#tabs=Rainfall-tracker). The 6-, 7-, and 12- month-periods were not accessible. B.O.M. defines the category ‘Serious deficiency’ as rainfall that “lies above the lowest five percent of recorded rainfall but below the lowest ten percent (decile range 1) for the period in question”, and ‘severe deficiency’ as “rainfall is among the lowest five percent for the period in question”.
Species Distribution Modelling Data preparation We collected presence-point data for native hens across Tasmania from the Atlas of Living Australia (ALA: www.ala.org.au; accessed 19 February 2021). We additionally included data from BirdLife Tasmania, the Department of Primary Industries, Water and Environment (DPIPWE) reports, and our personal observations, resulting in a total of 23,923 occurrences. Our study area included the Tasmanian mainland and nearby islands, however a large area from the south-west of Tasmania was removed where native hen distribution is not well documented, however, they are thought to be rare or absent in this region due to large proportion of button grass vegetation creating unsuitable habitat (Fig. S2). All subsequent analyses were undertaken in Program R v4.0.4 (R Core Team, 2021). Duplicates were removed by converting presence points into grid presences at 1 km2 resolution and retaining one native hen observation per grid (n = 2447 grid points after this step). Occurrences were visually inspected for any potential errors/outliers from outside Tasmania and Tasmanian islands: this removed seven false occurrences on King and Flinders islands and two observations in freshwater inland lakes (Lake Crescent and Great Lake). As true-absence records were mostly unavailable, we generated pseudoabsences for sites where other land-bird species had been recorded (indicating observation effort at that point), but without native hen detections (Hanberry et al., 2012; Amin et al., 2021; Barlow et al., 2021). Native hens are large-bodied, ground-dwelling, active in the day, and have a loud, distinct call, all of which accounts for a high detectability, if present at a location. We extracted these data from ALA, with 780,499 possible observations on the Tasmanian mainland and all nearby islands. We then excluded all grid cells with a native hen presence and removed any records within 3 km of native hen records: this value was chosen because it is the dispersal distance under which a native hen can naturally move outside of its territory (Ridpath, 1972a). This process resulted in 3,222 pseudoabsence grid points. Citizen-science datasets offer unique opportunities to study a species distribution using ‘crowd-sourced’ effort, however, they tend to be access-biased and have non-random, clustered observations, leading to overrepresentation of certain regions and biases towards some environmental conditions (usually near urban areas; Steen et al., 2021). One way to reduce spatial autocorrelation is to selectively de-cluster occurrences in biased areas using a pre-defined (minimum linear) Nearest Minimum-neighbour Distance NMD (Pearson et al. 2007). As un-urbanised, sparsely populated areas have the least spatial point clustering (and hence spatial bias), the average number of observations in low human densities areas provides the threshold number of records that can be used to tune and select the optimal NMD (Amin et al., 2021). Therefore, we subdivided our data on a grid of 25 km2 cells to be relevant to the metric of human density and used the median of population density index (excluding cells < 1 human/km2) to define thresholds for low and high density. Population density was extracted from the ‘2011 Census of Population and Housing across Australia’ (bit.ly/3bth7W9). ‘Low density’ was defined as < 6 people/km2 and ‘High density’ as
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This MSOA atlas provides a summary of demographic and related data for each Middle Super Output Area in Greater London. The average population of an MSOA in London in 2010 was 8,346, compared with 1,722 for an LSOA and 13,078 for a ward. The profiles are designed to provide an overview of the population in these small areas by combining a range of data on the population, births, deaths, health, housing, crime, commercial property/floorspace, income, poverty, benefits, land use, environment, deprivation, schools, and employment. If you need to find an MSOA and you know the postcode of the area, the ONS NESS search page has a tool for this. The MSOA Atlas is available as an XLS as well as being presented using InstantAtlas mapping software. This is a useful tool for displaying a large amount of data for numerous geographies, in one place (requires HTML 5). CURRENT MSOA BOUNDARIES (2011) PREVIOUS MSOA BOUNDARIES (2001) NB. It is currently not possible to export the map as a picture due to a software issue with the Google Maps background. We advise you to print screen to copy an image to the clipboard. Tips: - Select a new indicator from the Data box on the left. Select the theme, then indicator and then year to show the data. - To view data just for one borough*, use the filter tool. - The legend settings can be altered by clicking on the pencil icon next to the MSOA tick box within the map legend. - The areas can be ranked in order by clicking at the top of the indicator column of the data table. Themes included here are Census 2011 Population, Mid-year Estimates, Population by Broad Age, Households, Household composition, Ethnic Group, Country of Birth, Language, Religion, Tenure, Dwelling type, Land Area, Population Density, Births, General Fertility Rate, Deaths, Standardised Mortality Ratio (SMR), Population Turnover Rates (per 1000), Crime (numbers), Crime (rates), House Prices, Commercial property (number), Rateable Value (£ per m2), Floorspace; ('000s m2), Household Income, Household Poverty, County Court Judgements (2005), Qualifications, Economic Activity, Employees, Employment, Claimant Count, Pupil Absence, Early Years Foundation Stage, Key Stage 1, GCSE and Equivalent, Health, Air Emissions, Car or Van availability, Income Deprivation, Central Heating, Incidence of Cancer, Life Expectancy, and Road Casualties. The London boroughs are: City of London, Barking and Dagenham, Barnet, Bexley, Brent, Bromley, Camden, Croydon, Ealing, Enfield, Greenwich, Hackney, Hammersmith and Fulham, Haringey, Harrow, Havering, Hillingdon, Hounslow, Islington, Kensington and Chelsea, Kingston upon Thames, Lambeth, Lewisham, Merton, Newham, Redbridge, Richmond upon Thames, Southwark, Sutton, Tower Hamlets, Waltham Forest, Wandsworth, Westminster. These profiles were created using the most up to date information available at the time of collection (Spring 2014). You may also be interested in LSOA Atlas and Ward Atlas.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dear Scientist!This database contains data collected due to conducting study: "Analysis of the route safety of abnormal vehicle from the perspective of traffic parameters and infrastructure characteristics with the use of web technologies and machine learning" funded by National Science Centre Poland (Grant reference 2021/05/X/ST8/01669). The structure of files is arising from the aims of the study and numerous of sources needed to tailor suitable data possible to use as an input layer for neural network. You can find a following folders and files:1. Road_Parameters_Data (.csv) - which is data colleced by author before the study (2021). Here you can find information about technical quality and types of main roads located in Mazovia province (Poland). The source of data was Polish General Directorate for National Roads and Motorways. 2. Google_Maps_Data (.json) - here you can find the data, which was collected using the authors’ webservice created using the Python language, which downloaded the said data in the Distance Matrix API service on Google Maps at two-hour intervals from 25 May 2022 to 22 June 2022. The application retrieved the TRAFFIC FACTOR parameter, which was a ratio of actual time of travel divided by historical time of travel for particular roads.3. Geocoding_Roads_Data (.json) - in this folder you can find data gained from reverse geocoding approach based on geographical coordinates and the request parameter latlng were employed. As a result, Google Maps returned a response containing the postal code for the field types defined as postal_code and the name of the lowest possible level of the territorial unit for the field administrative_area_level. 4. Population_Density_Data (.csv) - here you can find date for territorial units, which were assigned to individual records were used to search the database of the Polish Postal Service using the authors' original web service written in the Python programming language. The records which contained a postal code were assigned the name of the municipality which corresponded to it. Finally, postal codes and names of territorial units were compared with the database of the Statistics Poland (GUS) containing information on population density for individual municipalities and assigned to existing records from the database.5. Roads_Incidents_Data (.json) - in this folder you can find a data collected by a webservice, which was programmed in the Python language and used for analysing the reported obstructions available on the website of the General Directorate for National Roads and Motorways. In the event of traffic obstruction emergence in the Mazovia Province, the application, on the basis of the number and kilometre of the road on which it occurred, could associate it later with appropriate records based on the links parameters. The data was colleced from 26 May to 22 June 2022.6. Weather_For_Roads_Data (.json) - here you can find the data concerning the weather conditions on the roads occurring at days of the study. To make this feasible, a webservice was programmed in the Python language, by means of which the selected items from the response returned by the www.timeanddate.com server for the corresponding input parameters were retrieved – geographical coordinates of the midpoint between the nodes of the particular roads. The data was colleced for day between 27 May and 22 June 2022.7. data_v_1 (.csv) - collected only data for road parameters8. data_v_2 (.csv) - collected data for road parameters + population density9. data_v_3 (.json) - collected data for road parameters + population density + traffic10. data_v_4 (.json) - collected data for road parameters + population density + traffic + weather + road incidents11. data_v_5 (.csv) - collected VALIDATED and cleaned data for road parameters + population density + traffic + weather + road incidents. At this stage, the road sections for which the parameter traffic factor was assessed to have been estimated incorrectly were eliminated. These were combinations for which the value of the traffic factor remained the same regardless the time of day or which took several of the same values during the course of the whole study. Moreover, it was also assumed that the final database should consist of road sections for traffic factor less than 1.2 constitute at least 10% of all results. Thus, the sections with no tendency to become congested and characterized by a small number of road traffic users were eliminated.Good luck with your research!Igor Betkier, PhD
Walk Score measures the walkability of any address using a patented system developed by the Walk Score company. For each 2010 Census Tract centroid, Walk Score analyzed walking routes to nearby amenities. Points are awarded based on the distance to amenities in each category. Amenities within a 5 minute walk (.25 miles) are given maximum points. A decay function is used to give points to more distant amenities, with no points given after a 30 minute walk. Walk Score also measures pedestrian friendliness by analyzing population density and road metrics such as block length and intersection density. Data sources include Google, Education.com, Open Street Map, the U.S. Census, Localeze, and places added by the Walk Score user community.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was developed for the project of analyzing the transport network in the Mazowieckie Voivodeship and comprises a wide range of traffic-related information. The data were collected from various sources, including road technical quality and road incident data from the Polish General Directorate for National Roads and Motorways, travel time information from Google Maps, data obtained from reverse geocoding, population density data from the GUS database, and specific weather conditions for roads.
Key Features of the Dataset:
Multidimensional Information: The dataset includes information on the date, days of the week, holidays, time (in minutes), and various temporal parameters (T1 - T24).
Road and Node Identifiers: Each record contains identifiers for the road (roadId), the start node (start_node), and the end node (end_node).
Traffic Factors: It includes key traffic information such as the traffic factor (trafficFactor), midlongitude and midlatitude of the road segment (midLongitude, midLatitude), and details about the number of lanes, road width, presence of two-way traffic (two_ways), and traffic density (density).
Weather Conditions: The dataset accounts for various weather conditions, including heavy rain, partial rain, no rain, partial clouds, heavy clouds, clear sky, storms, and fog.
Prediction Outcomes: Data include results on traffic speed (result_speed) and conditions such as shuttle traffic (result_shuttle), full (result_fullyclosed) and partial (result_partiallyclosed) road closures, and the presence of traffic lights (result_trafficlight).
Data Collection Period:
Traffic data were collected from May 25, 2022, to June 22, 2022, providing a comprehensive view of traffic conditions over a nearly one-month period. Data Preparation Process:
The collected data were unified and processed to create one large CSV file. This file was then divided into 384 smaller files, each representing the state of the transport network at a specific moment. This dataset forms a comprehensive basis for analyzing and forecasting traffic conditions, offering extensive possibilities for use in machine learning models.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The LSOA atlas provides a summary of demographic and related data for each Lower Super Output Area in Greater London. The average population of an LSOA in London in 2010 was 1,722 compared with 8,346 for an MSOA and 13,078 for a ward. The profiles are designed to provide an overview of the population in these small areas by combining a range of data on the population, diversity, households, health, housing, crime, benefits, land use, deprivation, schools, and employment. Due to significant population change in some areas, not all 2011 LSOA boundaries are the same as previous LSOA boundaries that had been used from 2001. A lot of data is still only available using the 2001 boundaries therefore two Atlases have been created - one using the current LSOA boundaries (2011) and one using the previous boundaries (2001). If you need to find an LSOA and you know the postcode of the area, the ONS NESS search page has a tool for this. The LSOA Atlas is available as an XLS as well as being presented using InstantAtlas mapping software. This is a useful tool for displaying a large amount of data for numerous geographies, in one place (requires HTML 5). CURRENT LSOA BOUNDARIES (2011) NOTE: There is comparatively less data for the new boundaries compared with the old boundaries PREVIOUS LSOA BOUNDARIES (2001) For 2011 Census data used in the 2001 Boundaries Atlas: For simplicity, where two or more areas have been merged, the figures for these areas have been divided by the number of LSOAs that used to make that area up. Therefore, these data are not official ONS statisitcs, but presented here as indicative to display trends. NB. It is currently not possible to export the map as a picture due to a software issue with the Google Maps background. We advise you to print screen to copy an image to the clipboard. IMPORTANT: Due to the large amount of data and areas, the LSOA Atlas may take up to a minute to fully load. Once loaded, the report will work more efficiently by using the filter tool and selecting one borough at a time. Displaying every LSOA in London will slow down the data reload. Tips: - Select a new indicator from the Data box on the left. Select the theme, then indicator and then year to show the data. - To view data just for one borough, use the filter tool. - The legend settings can be altered by clicking on the pencil icon next to the LSOA tick box within the map legend. - The areas can be ranked in order by clicking at the top of the indicator column of the data table. Beware of large file size for 2001 Boundary Atlas (58MB) alternatively download Zip file (21MB). Themes included in the atlases are Census 2011 population, Mid-year Estimates by age, Population Density, Households, Household Composition, Ethnic Group, Language, Religion, Country of Birth, Tenure, Number of dwellings, Vacant Dwellings, Dwellings by Council Tax Band, Crime (numbers), Crime (rates), Economic Activity, Qualifications, House Prices, Workplace employment numbers, Claimant Count, Employment and Support Allowance, Benefits claimants, State Pension, Pension Credit, Incapacity Benefit/ SDA, Disability Living Allowance, Income Support, Financial vulnerability, Health and Disability, Land use, Air Emissions, Energy consumption, Car or Van access, Accessibility by Public Transport/walk, Road Casualties, Child Benefit, Child Poverty, Lone Parent Families, Out-of-Work families, Fuel Poverty, Free School Meals, Pupil Absence, Early Years Foundation Stage, Key Stage 1, Key Stage 2, GCSE, Level 3 (e.g A/AS level), The Indices of Deprivation 2010, Economic Deprivation Index, and The IMD 2010 Underlying Indicators. The London boroughs are: City of London, Barking and Dagenham, Barnet, Bexley, Brent, Bromley, Camden, Croydon, Ealing, Enfield, Greenwich, Hackney, Hammersmith and Fulham, Haringey, Harrow, Havering, Hillingdon, Hounslow, Islington, Kensington and Chelsea, Kingston upon Thames, Lambeth, Lewisham, Merton, Newham, Redbridge, Richmond upon Thames, Southwark, Sutton, Tower Hamlets, Waltham Forest, Wandsworth, Westminster. These profiles were created using the most up to date information available at the time of collection (Spring 2014). You may also be interested in MSOA Atlas and Ward Atlas.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This score equally weights two components: 1) female adult obesity rate, and 2) elderly population. Female adult obesity rate is defined as the percentage of female adults aged 15 to 49 with a body mass index of over 30. This indicator is only available for females. Elderly population is defined as the number of people over the age of 50. Indicators are combined with z-scores at the 1km squared grid level.
Source: Fraym 2020
The Fraym platform weaves together the latest satellite imagery and geostatistical datasets with professionally enumerated household surveys. This allows for the disaggregation and re-aggregation of large datasets to cover any geographically bounded area. Indicators are drawn and harmonized from a wide variety of household surveys and other data sources. These include the following sources:USAID: Demographic and health surveysUnited Nations: UN population division databaseWorld Bank: Enterprise surveys, living standards, global index surveys, and respective country statisticsNational Statistical Offices: National censuses and surveys covering population, businesses, health, housing, agriculture, and other areasInternational Monetary Fund: World economic outlook databases and respective country statisticsNational Air and Space Administration: Remote sensing satellite data, such as vegetation, temperature, and precipitationUSGS: Landscan, Google Earth, GeoData Institute, OSMWorldPop: Population density by age groups
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The exposure risk score equally weights three components: 1) population density, 2) proximity to others in the household and 3) water, sanitation, and hygiene. Three indicators are equally weighted within the proximity to others in the household component: intergenerational households (household has a child less than eleven years old and a person above 60 years old), average number of household members per sleeping room, and adults aged 15 to 49 employed in essential occupations (health professions, construction, manufacturing, transport, sales, textiles, protective services). Five indicators are equally weighted within the water, sanitation, and hygiene component: soap for hand washing inside the home is unavailable, water for handwashing inside the home is unavailable, household must walk 30 or more minutes roundtrip to collect drinking water, household shares toilet with other households or does not have a toilet, and household does not have piped-in drinking water. Population density is defined as population at the 1km squared level. Indicators within components are combined with z-scores at the 1km squared grid level.
Source: Fraym 2020
The Fraym platform weaves together the latest satellite imagery and geostatistical datasets with professionally enumerated household surveys. This allows for the disaggregation and re-aggregation of large datasets to cover any geographically bounded area. Indicators are drawn and harmonized from a wide variety of household surveys and other data sources. These include the following sources:USAID: Demographic and health surveysUnited Nations: UN population division databaseWorld Bank: Enterprise surveys, living standards, global index surveys, and respective country statisticsNational Statistical Offices: National censuses and surveys covering population, businesses, health, housing, agriculture, and other areasInternational Monetary Fund: World economic outlook databases and respective country statisticsNational Air and Space Administration: Remote sensing satellite data, such as vegetation, temperature, and precipitationUSGS: Landscan, Google Earth, GeoData Institute, OSMWorldPop: Population density by age groups
The World Values Survey (www.worldvaluessurvey.org) is a global network of social scientists studying changing values and their impact on social and political life, led by an international team of scholars, with the WVS association and secretariat headquartered in Stockholm, Sweden. The survey, which started in 1981, seeks to use the most rigorous, high-quality research designs in each country. The WVS consists of nationally representative surveys conducted in almost 100 countries which contain almost 90 percent of the world’s population, using a common questionnaire. The WVS is the largest non-commercial, cross-national, time series investigation of human beliefs and values ever executed, currently including interviews with almost 400,000 respondents. Moreover the WVS is the only academic study covering the full range of global variations, from very poor to very rich countries, in all of the world’s major cultural zones. The WVS seeks to help scientists and policy makers understand changes in the beliefs, values and motivations of people throughout the world. Thousands of political scientists, sociologists, social psychologists, anthropologists and economists have used these data to analyze such topics as economic development, democratization, religion, gender equality, social capital, and subjective well-being. These data have also been widely used by government officials, journalists and students, and groups at the World Bank have analyzed the linkages between cultural factors and economic development.
The Survey covers China.
The WVS for China covers national population aged 18 years and over, for both sexes.
Sample survey data [ssd]
To meet the requirement of overall coverage of Chinese adults including migrant population, GPS/GIS Assistant Area Sampling 1 was used in this survey. Respondents are sampled through stratified, multi-stage PPS (probability proportional to size) sampling. With careful considerations of representativeness, feasibility, and budgetary constrains, it was decided this project would draw a subsidiary probability sample out of a RCCCs previous national survey Social Inequality and Distributive Justice in China conducted in 2004. The 2004 survey was a national survey conducted through out the country. The target population was the same as the one defined for this survey. In the meanwhile, through the stratification, the proportionally allocated multi-stage PPS technique was employed in order to obtain the self-weighted household samples. Sampling Frames A GIS dataset was established as the sampling frame for this project, which was based on: 1) township level population data from the 2000 Census,2 2) the most recent and detailed (paper and electronic) maps, 3) the highest possible resolution images from Google Earth. Compile all the information above the population density was calculated for each of the HSMs in township level units. Within the target population, there were 3,004 half-square degree (HSD) of latitude and longitude for the first stage sampling. The total population were 1,242,612,226.
Sampling Processes: 1) Out of 3,004 half-square degree(HSD) in China, 40 HSDs were chosen by PPS. 2) Two OSMs were selected by PPS within each of the selected HSD. 3) One HSM was drawn by PPS within each of the selected OSM; The measures of size (MOS) used at these stages were the density of the population per sampling unit. 4)Within each of the selected HSM,the number of SSSs (90m*90m)was calculated based on the population density, and then selected the SSSs simple randomly. 5) Trained surveyors equipped with GPS receivers were then sent to locate and enumerate the sampled spatial square seconds (SSS). For maintaining equal probabilities of selection across households, all dwellings enumerated in the SSSs were included in the sample. Using system sampling, we draw 50 dwellings in each HSM. 6) Respondents were selected from dwellings using the Kish Grid method3.
The sample size for China is N=1991.
Face-to-face [f2f]
The sample size was determined to be approximately 2,800 eligible individuals are to be drawn out of the above defined target population in all provinces of China. 2,873 Target sample size 2,534 Sample drawn in the field 1,991 Completed, valid interviews 78.6% Response rate
+/- 2,2%
ps-places-metadata-v1.01
This dataset comprises a pair of layers, (points and polys) which attempt to better locate "populated places" in NZ. Populated places are defined here as settled areas, either urban or rural where densitys of around 20 persons per hectare exist, and something is able to be seen from the air.
The only liberally licensed placename dataset is currently LINZ geographic placenames, which has the following drawbacks: - coordinates are not place centers but left most label on 260 series map - the attributes are outdated
This dataset necessarily involves cleaving the linz placenames set into two, those places that are poplulated, and those unpopulated. Work was carried out in four steps. First placenames were shortlisted according to the following criterion:
- all places that rated at least POPL in the linz geographic places layer, ie POPL, METR or TOWN or USAT were adopted.
- Then many additional points were added from a statnz meshblock density analysis.
- Finally remaining points were added from a check against linz residential polys, and zenbu poi clusters.
Spelling is broadly as per linz placenames, but there are differences for no particular reason. Instances of LINZ all upper case have been converted to sentance case. Some places not presently in the linz dataset are included in this set, usually new places, or those otherwise unnamed. They appear with no linz id, and are not authoritative, in some cases just wild guesses.
Density was derived from the 06 meshblock boundarys (level 2, geometry fixed), multipart conversion, merging in 06 usually resident MB population then using the formula pop/area*10000. An initial urban/rural threshold level of 0.6 persons per hectare was used.
Step two was to trace the approx extent of each populated place. The main purpose of this step was to determine the relative area of each place, and to create an intersection with meshblocks for population. Step 3 involved determining the political center of each place, broadly defined as the commercial center.
Tracing was carried out at 1:9000 for small places, and 1:18000 for large places using either bing or google satellite views. No attempt was made to relate to actual town 'boundarys'. For example large parks or raceways on the urban fringe were not generally included. Outlying industrial areas were included somewhat erratically depending on their connection to urban areas.
Step 3 involved determining the centers of each place. Points were overlaid over the following layers by way of a base reference:
a. original linz placenames b. OSM nz-locations points layer c. zenbu pois, latest set as of 5/4/11 d. zenbu AllSuburbsRegions dataset (a heavily hand modified) LINZ BDE extract derived dataset courtesy Zenbu. e. LINZ road-centerlines, sealed and highway f. LINZ residential areas, g. LINZ building-locations and building footprints h. Olivier and Co nz-urban-north and south
Therefore in practice, sources c and e, form the effective basis of the point coordinates in this dataset. Be aware that e, f and g are referenced to the LINZ topo data, while c and d are likely referenced to whatever roading dataset google possesses. As such minor discrepencys may occur when moving from one to the other.
Regardless of the above, this place centers dataset was created using the following criteria, in order of priority:
To be clear the coordinates are manually produced by eye without any kind of computation. As such the points are placed approximately perhaps plus or minus 10m, but given that the roads layers are not that flash, no attempt was made to actually snap the coordinates to the road junctions themselves.
The final step involved merging in population from SNZ meshblocks (merge+sum by location) of popl polys). Be aware that due to the inconsistent way that meshblocks are defined this will result in inaccurate populations, particular small places will collect population from their surrounding area. In any case the population will generally always overestimate by including meshblocks that just nicked the place poly. Also there are a couple of dozen cases of overlapping meshblocks between two place polys and these will double count. Which i have so far made no attempt to fix.
Merged in also tla and regions from SNZ shapes, a few of the original linz atrributes, and lastly grading the size of urban areas according to SNZ 'urban areas" criteria. Ie: class codes:
Note that while this terminology is shared with SNZ the actual places differ owing to different decisions being made about where one area ends an another starts, and what constiutes a suburb or satellite. I expect some discussion around this issue. For example i have included tinwald and washdyke as part of ashburton and timaru, but not richmond or waikawa as part of nelson and picton. Im open to discussion on these.
No attempt has or will likely ever be made to locate the entire LOC and SBRB data subsets. We will just have to wait for NZFS to release what is thought to be an authoritative set.
Shapefiles are all nztm. Orig data from SNZ and LINZ was all sourced in nztm, via koordinates, or SNZ. Satellite tracings were in spherical mercator/wgs84 and converted to nztm by Qgis. Zenbu POIS were also similarly converted.
Shapefile: Points id : integer unique to dataset name : name of popl place, string class : urban area size as above. integer tcode : SNZ tla code, integer rcode : SNZ region code, 1-16, integer area : area of poly place features, integer in square meters. pop : 2006 usually resident popluation, being the sum of meshblocks that intersect the place poly features. Integer lid : linz geog places id desc_code : linz geog places place type code
Shapefile: Polygons gid : integer unique to dataset, shared by points and polys name : name of popl place, string, where spelling conflicts occur points wins area : place poly area, m2 Integer
Clarification about the minorly derived nature of LINZ and google data needs to be sought. But pending these copyright complications, the actual points data is essentially an original work, released as public domain. I retain no copyright, nor any responsibility for data accuracy, either as is, or regardless of any changes that are subsequently made to it.
Peter Scott 16/6/2011
v1.01 minor spelling and grammar edits 17/6/11
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This dataset contains estimates of the number of persons per square kilometer consistent with national censuses and population registers. There is one image for each modeled year. General Documentation The Gridded Population of World Version 4 (GPWv4), Revision 11 models the distribution of global human population for the years 2000, 2005, 2010, 2015, and 2020 on 30 arc-second (approximately 1 km) grid cells. Population is distributed to cells using proportional allocation of population from census and administrative units. Population input data are collected at the most detailed spatial resolution available from the results of the 2010 round of censuses, which occurred between 2005 and 2014. The input data are extrapolated to produce population estimates for each modeled year.