Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The process of automatic generalization is one of the elements of spatial data preparation for the purpose of creating digital cartographic studies. The presented data include a part of the process of generalization of building groups obtained from the Open Street Map databases (OSM) [1].
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data used in the forthcoming “The modifiable areal unit problem in geospatial least-cost electrification modelling” publication.
The work describes how different methods of aggregation of population data effects the results produced by the Open Source Spatial Electrification Tool (OnSSET, https://github.com/OnSSET). In the initial study three countries have been assessed: Benin, Malawi and Namibia. The choice of countries is due to their different national population densities and starting electrification rates. The following repository includes three zipped files, one for each country, containing the 26 input files used in the study. These input files are generated with the QGIS tools published in the OnSSET repository (https://github.com/onsset). This data repository also contains a file describing the naming conventions for the results used and the summary files generated with OnSSET.
For more information on how to generate these datasets, please refer to the following GitHub repository https://github.com/babakkhavari/MAUP and the corresponding publication (To Be Added)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The process of automatic generalization is one of the elements of spatial data preparation for the purpose of creating digital cartographic studies. The presented data include a part of the process of generalization of building groups obtained from the national geodesy and cartography resource from BDOT10k (10k topographic database) [1].
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In each experiment, 100 datasets were generated.
The 2006 Second Edition TIGER/Line files are an extract of selected geographic and cartographic information from the Census TIGER database. The geographic coverage for a single TIGER/Line file is a county or statistical equivalent entity, with the coverage area based on the latest available governmental unit boundaries. The Census TIGER database represents a seamless national file with no overlaps or gaps between parts. However, each county-based TIGER/Line file is designed to stand alone as an independent data set or the files can be combined to cover the whole Nation. The 2006 Second Edition TIGER/Line files consist of line segments representing physical features and governmental and statistical boundaries. This shapefile represents the current State House Districts for New Mexico as posted on the Census Bureau website for 2006.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Health data and environmental data are commonly collected at different levels of aggregation. A persistent challenge of using a spatial regression model to link these data is that their associations can vary as a function of aggregation. This results into ecological fallacy if association at one aggregation level is used for inferencing at another level. We address this challenge by presenting a hierarchically adaptable spatial regression model. In essence, the model extends the spatially varying coefficient model to allow the response to be count data at larger aggregation levels than that of the covariates. A Bayesian hierarchical approach is used for inferencing the model parameters. Robust inference and optimal prediction over geographical space and at different spatial aggregation levels are studied by simulated data sets. The spatial associations at different spatial supports are largely different, but can be efficiently inferred when prior knowledge of the associations is available. The model is applied to study hand, foot and mouth disease (HFMD) in Da Nang city, Viet Nam. Decrease in vegetated areas corresponds with elevated HFMD risks. A study to the identifiability of the parameters shows a strong need for a highly informative prior distribution. We conclude that the model is robust to the underlying aggregation levels of the calibrating data for association inference and it is ready for application in health geography.
The 2006 Second Edition TIGER/Line files are an extract of selected geographic and cartographic information from the Census TIGER database. The geographic coverage for a single TIGER/Line file is a county or statistical equivalent entity, with the coverage area based on the latest available governmental unit boundaries. The Census TIGER database represents a seamless national file with no overlaps or gaps between parts. However, each county-based TIGER/Line file is designed to stand alone as an independent data set or the files can be combined to cover the whole Nation. The 2006 Second Edition TIGER/Line files consist of line segments representing physical features and governmental and statistical boundaries. This shapefile represents the current State Senate Districts for New Mexico as posted on the Census Bureau website for 2006.
The 2014 Austin Digital Assessment Project was supported by the Telecommunications & Regulatory Affairs Office of the City of Austin, the Telecommunications and Information Policy Institute at the University of Texas, and faculty and graduate students from the Department of Radio, Television, and Film and the University of Texas. This dataset includes the individual survey responses. To see aggregated dataset weighted to reflect Austin demographics, refer to the attached document
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/0MER8Phttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/0MER8P
The Record of American Democracy (ROAD) data provide election returns, socioeconomic summaries, and demographic details about the American public at unusually low levels of geographic aggregation. The NSF-supported ROAD project spans every state in the country from 1984 through 1990 (including some off-year elections). These data enable research on topics such as electoral behavior, the political characteristics of local community context, electoral geography, the role of minority groups in elections and legislative redistricting, split ticket voting and divided government, and elections under federalism. The data included in this particular collection contain all of the geographic boundary files, so that users may easily draw maps with the data. Documentation and frequently asked questions are available online at the ROAD Website. A downloadable PDF codebook is also available in the files section of this study.
Archived as of 6/26/2025: The datasets will no longer receive updates but the historical data will continue to be available for download. This dataset provides information related to access and transportation related claims. It contains information about the total number of patients, total number of claims, and total dollar amount, grouped by provider. Restricted to claims with service date between 01/2012 to 12/2017. Transportation claims identified as billing provider type 26 and related category of service type. This data is for research purposes and is not intended to be used for reporting. Due to differences in geographic aggregation, time period considerations, and units of analysis, these numbers may differ from those reported by FSSA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Negative values mean that spatial aggregation estimates for peak measures were smaller than spatial aggregation differences for onset measures. Bolded values denote mean estimates that we interpret to have statistical significance; that is, the 95% credible intervals did not overlap with zero.
ssurgoOnDemandThe purpose of these tools are to give users the ability to get Soil Survey Geographic Database (SSURGO) properties and interpretations in an efficient manner. They are very similiar to the United States Department of Agriculture - Natural Resource Conservation Service's distributed Soil Data Viewer (SDV), although there are distinct differences. The most important difference is the data collected with the SSURGO On-Demand (SOD) tools are collected in real-time via web requests to Soil Data Access (https://sdmdataaccess.nrcs.usda.gov/). SOD tools do not require users to have the data found in a traditional SSURGO download from the NRCS's official repository, Web Soil Survey (https://websoilsurvey.sc.egov.usda.gov/App/HomePage.htm). The main intent of both SOD and SDV are to hide the complex relationships of the SSURGO tables and allow the users to focus on asking the question they need to get the information they want. This is accomplished in the user interface of the tools and the subsequent SQL is built and executed for the user. Currently, the tools packaged here are designed to run within the ESRI ArcGIS Desktop Application - ArcMap, version 10.1 or greater. However, much of the Python code is recyclable and could run within a Python intepreter or other GIS applications such as Quantum GIS with some modification.NOTE: The queries in these tools only consider the major components of soil map units.Within the SOD tools are 2 primary toolsets, descibed as follows:<1. AreasymbolThe Areasymbol tools collect SSURGO properties and interpretations based on a user supplied list of Soil Survey areasymbols (e.g. NC123). After the areasymbols have been collected, an aggregation method (see below) is selected . Tee aggregation method has no affect on interpretations other than how the SSURGO data aggregated. For soil properties, the aggregation method drives what properties can be run. For example, you can't run the weighted average aggregation method on Taxonomic Order. Similarly, for the same soil property, you wouldn't specify a depth range. The point here is the aggregation method affects what parameters need to be supplied for the SQL generation. It is important to note the user can specify any number of areasymbols and any number of interpretations. This is another distinct advantage of these tools. You could collect all of the SSURGO interpretations for every soil survey area (areasymbol) by executing the tool 1 time. This also demonstrates the flexibility SOD has in defining the geographic extent over which information is collected. The only constraint is the extent of soil survey areas selected to run (and these can be discontinuous).As SOD Areasymbol tools execute, 2 lists are collected from the tool dialog, a list of interpretations/properties and a list of areasymbols. As each interpretation/property is run, every areasymbol is run against the interpretation/property requested. For instance, suppose you wanted to collect the weighted average of sand, silt and clay for 5 soil survey areas. The sand property would run for all 5 soil survey areas and built into a table. Next the silt would run for all 5 soil survey areas and built into a table, and so on. In this example a total of 15 web request would have been sent and 3 tables are built. Two VERY IMPORTANT things here...A. All the areasymbol tools do is generate tables. They are not collecting spatial data.B. They are collecting stored information. They are not making calculations(with the exception of the weighted average aggregation method).<2. ExpressThe Express toolset is nearly identical to the Areasymbol toolset, with 2 exceptions.A. The area to collect SSURGO information over is defined by the user. The user digitizes coordinates into a 'feature set' after the tool is open. The points in the feature set are closed (first point is also the last) into a polygon. The polygon is sent to Soil Data Access and the features set points (polygon) are used to clip SSURGO spatial data. The geomotries of the clip operation are returned, along with the mapunit keys (unique identifier). It is best to keep the points in the feature set simple and beware of self intersections as they are fatal.B. Instead of running on a list of areasymbols, the SQL queries on a list of mapunit keys.The properties and interpretations options are identical to what was discussed for the Areasymbol toolset.The Express tools present the user the option of creating layer files (.lyr) where the the resultant interpretation/property are joined to the geometry and saved to disk as a virtual join. Additionally, for soil properties, an option exists to append all of the selected soil properties to a single table. In this case, if the user ran sand, silt, and clay properties, instead of 3 output tables, there is only 1 table with a sand column, a silt column, and a clay column.<Supplemental Information<sAggregation MethodAggregation is the process by which a set of component attribute values is reduced to a single value to represent the map unit as a whole.A map unit is typically composed of one or more "components". A component is either some type of soil or some nonsoil entity, e.g., rock outcrop. The components in the map unit name represent the major soils within a map unit delineation. Minor components make up the balance of the map unit. Great differences in soil properties can occur between map unit components and within short distances. Minor components may be very different from the major components. Such differences could significantly affect use and management of the map unit. Minor components may or may not be documented in the database. The results of aggregation do not reflect the presence or absence of limitations of the components which are not listed in the database. An on-site investigation is required to identify the location of individual map unit components. For queries of soil properties, only major components are considered for Dominant Component (numeric) and Weighted Average aggregation methods (see below). Additionally, the aggregation method selected drives the available properties to be queried. For queries of soil interpretations, all components are condisered.For each of a map unit's components, a corresponding percent composition is recorded. A percent composition of 60 indicates that the corresponding component typically makes up approximately 60% of the map unit. Percent composition is a critical factor in some, but not all, aggregation methods.For the attribute being aggregated, the first step of the aggregation process is to derive one attribute value for each of a map unit's components. From this set of component attributes, the next step of the aggregation process derives a single value that represents the map unit as a whole. Once a single value for each map unit is derived, a thematic map for soil map units can be generated. Aggregation must be done because, on any soil map, map units are delineated but components are not.The aggregation method "Dominant Component" returns the attribute value associated with the component with the highest percent composition in the map unit. If more than one component shares the highest percent composition, the value of the first named component is returned.The aggregation method "Dominant Condition" first groups like attribute values for the components in a map unit. For each group, percent composition is set to the sum of the percent composition of all components participating in that group. These groups now represent "conditions" rather than components. The attribute value associated with the group with the highest cumulative percent composition is returned. If more than one group shares the highest cumulative percent composition, the value of the group having the first named component of the mapunit is returned.The aggregation method "Weighted Average" computes a weighted average value for all components in the map unit. Percent composition is the weighting factor. The result returned by this aggregation method represents a weighted average value of the corresponding attribute throughout the map unit.The aggregation method "Minimum or Maximum" returns either the lowest or highest attribute value among all components of the map unit, depending on the corresponding "tie-break" rule. In this case, the "tie-break" rule indicates whether the lowest or highest value among all components should be returned. For this aggregation method, percent composition ties cannot occur. The result may correspond to a map unit component of very minor extent. This aggregation method is appropriate for either numeric attributes or attributes with a ranked or logically ordered domain.
Archived as of 6/2/2025: The datasets will no longer receive updates but the historical data will continue to be available for download. This dataset provides information related to IN211 callers during the time period 1/2020 to 9/2022. It contains information about the total numbers of unique users and their county of residence at the time of calling IN211. This data is for research purposes and is not intended to be used for reporting. Due to difference in geographic aggregation, time period considerations, and units of analysis, these numbers may differ from those reported by FSSA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In spite of the potentially groundbreaking environmental sentinel applications, studies of canine cancer data sources are often limited due to undercounting of cancer cases. This source of uncertainty might be further amplified through the process of spatial data aggregation, manifested as part of the modifiable areal unit problem (MAUP). In this study, we explore potential explanatory factors for canine cancer incidence retrieved from the Swiss Canine Cancer Registry (SCCR) in a regression modeling framework. In doing so, we also evaluate differences in statistical performance and associations resulting from a dasymetric refinement of municipal units to their portion of residential land. Our findings document severe underascertainment of cancer cases in the SCCR, which we linked to specific demographic characteristics and reduced use of veterinary care. These explanatory factors result in improved statistical performance when computed using dasymetrically refined units. This suggests that dasymetric mapping should be further tested in geographic correlation studies of canine cancer incidence and in future comparative studies involving human cancers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Austin Digital Assessment - Aggregated Responses by Geography’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/ffc3e38b-bb92-40ec-9ec4-35a7d894b1e2 on 26 January 2022.
--- Dataset description provided by original source is as follows ---
The 2014 Austin Digital Assessment Project was supported by the Telecommunications & Regulatory Affairs Office of the City of Austin, the Telecommunications and Information Policy Institute at the University of Texas, and faculty and graduate students from the Department of Radio, Television, and Film and the University of Texas. This dataset includes the individual survey responses. To see aggregated dataset weighted to reflect Austin demographics, refer to the attached document
--- Original source retains full ownership of the source dataset ---
2020 Census tracts for Yolo County. Census geographies are created ahead of each decennial census to tabulate census data. The geographic files are released ahead of data releases. Blocks are the smallest geographic unit available and are the basis for all other census geographic tabulations. Block groups are an aggregation of blocks; they are the the next level up in the census geography hierarchy. Census tracts are an aggregation level above block groups. They nest within counties.
The National Aggregates of Geospatial Data Collection: Population, Landscape, And Climate Estimates, Version 4 (PLACE IV) provides measures of population (head counts) and land area (square kilometers) as totals and by urban and rural designation, within multiple biophysical themes for 248 statistical areas (countries and other territories recognized by the United Nations (UN)), UN geographic regions and subregions, and World Bank economic classifications. It improves upon previous versions by providing these estimates at both the national level, and where possible, at subnational administrative level 1 for the years 2000, 2005, 2010, 2015, and 2020, and by 5-year and broad age groups for the year 2010.
The 2006 Second Edition TIGER/Line files are an extract of selected geographic and cartographic information from the Census TIGER database. The geographic coverage for a single TIGER/Line file is a county or statistical equivalent entity, with the coverage area based on the latest available governmental unit boundaries. The Census TIGER database represents a seamless national file with no overlaps or gaps between parts. However, each county-based TIGER/Line file is designed to stand alone as an independent data set or the files can be combined to cover the whole Nation. The 2006 Second Edition TIGER/Line files consist of line segments representing physical features and governmental and statistical boundaries. This shapefile represents the current State Senate Districts for New Mexico as posted on the Census Bureau website for 2006.
2020 Census tracts for Placer County. Census geographies are created ahead of each decennial census to tabulate census data. The geographic files are released ahead of data releases. Blocks are the smallest geographic unit available and are the basis for all other census geographic tabulations. Block groups are an aggregation of blocks; they are the the next level up in the census geography hierarchy. Census tracts are an aggregation level above block groups. They nest within counties.2020 Census data is expected to be released by September 30, 2021.
The 2006 Second Edition TIGER/Line files are an extract of selected geographic and cartographic information from the Census TIGER database. The geographic coverage for a single TIGER/Line file is a county or statistical equivalent entity, with the coverage area based on the latest available governmental unit boundaries. The Census TIGER database represents a seamless national file with no overlaps or gaps between parts. However, each county-based TIGER/Line file is designed to stand alone as an independent data set or the files can be combined to cover the whole Nation. The 2006 Second Edition TIGER/Line files consist of line segments representing physical features and governmental and statistical boundaries. This shapefile represents the current State House Districts for New Mexico as posted on the Census Bureau website for 2006.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The process of automatic generalization is one of the elements of spatial data preparation for the purpose of creating digital cartographic studies. The presented data include a part of the process of generalization of building groups obtained from the Open Street Map databases (OSM) [1].