Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
N: number of pairwise comparisons; r: correlation coefficient; U and L: upper and lower bounds for the 95% confidence interval about the null hypothesis of no spatial structure, Ur and Lr: 95% error bounds about r as determined by bootstrap resampling. The probability P of a one-tailed test for positive autocorrelation, permutated r and bootstrapped r are also shown.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spatial autocorrelation plays an important role in geographical analysis; however, there is still room for improvement of this method. The formula for Moran’s index is complicated, and several basic problems remain to be solved. Therefore, I will reconstruct its mathematical framework using mathematical derivation based on linear algebra and present four simple approaches to calculating Moran’s index. Moran’s scatterplot will be ameliorated, and new test methods will be proposed. The relationship between the global Moran’s index and Geary’s coefficient will be discussed from two different vantage points: spatial population and spatial sample. The sphere of applications for both Moran’s index and Geary’s coefficient will be clarified and defined. One of theoretical findings is that Moran’s index is a characteristic parameter of spatial weight matrices, so the selection of weight functions is very significant for autocorrelation analysis of geographical systems. A case study of 29 Chinese cities in 2000 will be employed to validate the innovatory models and methods. This work is a methodological study, which will simplify the process of autocorrelation analysis. The results of this study will lay the foundation for the scaling analysis of spatial autocorrelation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets for spatial correlation dimension and spatial autocorrelation analysis (Partial results).
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Sex-biased dispersal is pervasive and has diverse evolutionary implications, but the fundamental drivers of dispersal sex biases remain unresolved. This is due in part to limited diversity within taxonomic groups in the direction of dispersal sex biases, which leaves hypothesis testing critically dependent upon identifying rare reversals of taxonomic norms. Here we use a combination of observational and genetic data to demonstrate a rare reversal of the avian sex-bias in dispersal in the cooperatively breeding white-browed sparrow weaver (Plocepasser mahali). Direct observations revealed that i) natal philopatry was rare, with both sexes typically dispersing locally to breed, and ii), unusually for birds, males bred at significantly greater distances from their natal group than females. Population genetic analyses confirmed these patterns, as i) corrected Assignment index (AIc), FST tests and isolation-by-distance metrics were all indicative of longer dispersal distances among males than females, and ii) spatial autocorrelation analysis indicated stronger within-group genetic structure among females than males. Examining the spatial scale of extra-group mating highlighted that the resulting ‘sperm dispersal’ could have acted in concert with individual dispersal to generate these genetic patterns, but gamete dispersal alone cannot account entirely for the sex differences in genetic structure observed. That leading hypotheses for the evolution of dispersal sex biases cannot readily account for these sex-reversed patterns of dispersal in white-browed sparrow-weavers, highlights the continued need for attention to alternative explanations for this enigmatic phenomenon. We highlight the potential importance of sex differences in the distances over which dispersal opportunities can be detected.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets for spatial autocorrelation function (ACF) and partial spatial autocorrelation function (PACF) based on Moran’s index (partial results).
Most of the reef-building corals have pelagic larval duration and thus estimating the spatial range of larval dispersal is of great importance for conservation. Spatial autocorrelation analysis is a well-established technique to estimate spatial dispersal range and useful for deciding conservation units and sampling strategies for species with limited larval dispersal. While a few studies have examined spatial genetic structure of reef-building coral species using several loci and provided important insights on spatial genetic structure within coral population, no study has ever used genome wide loci and compared its result with dispersal range directly surveyed in the field. In this study, to examine the robustness of spatial autocorrelation method to estimate larval dispersal in coral species, we examined spatial genetic structure of reef-building coral species, Heliopora coerulea at two different reefs (Shiraho and Akashi) using a moderate number of genome-wide SNPs derived from MIG-seq analysis as well as 9 microsatellite loci for a comparison.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The dataset is the product of a study which assessed the impact of three spatial correlation structures on spatial
predictions. Spatial predictions were calibrated using Bayesian model
averaging (BMA) based on replicated, irregular point-referenced data.
The data were measured in 17 chambers randomly placed across a 271 m2 field between October 2007 and September 2008 in Mooloolah, Queensland.
A Bayesian geostatistical model and a Bayesian spatial conditional
autoregressive (CAR) model were used to investigate and accommodate spatial
dependency, and to estimate the effects of environmental variables on N2O emissions across the study site.
The dataset presents summary statistics of observed variables for the 17 chambers over the sampling period.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Results of spatial autocorrelation analyses for the predictor variables.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the amount of social housing units of the Netherlands per urban area, as well as the intensity of their spatial autocorrelation and its proportion compared to the total housing stock, for the year 2023. Original data comes from Statistics Netherlands (Centraal Bureau voor de Statistiek) released for 100 m x 100 m grid cells covering a large share of the Dutch territory. Grid cells with missing values were excluded from the analysis. The spatial autocorrelation of social housing was calculated with urban area-level and U-style computations of Global Moran's I based on the share of social housing units compared to the total housing stock of every grid cell. Limits and definition of urban areas are extracted from the OECD. Data show considerable variation in the levels of social housing and its spatial concentration among Dutch urban areas.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Understanding dispersal mechanisms—the movement of propagules—can shed light on how organisms are adapted for their ecosystem. Guyanagaster necrorhizus is a sequestrate fungus, meaning its dispersal propagules, or spores, are entirely enclosed within a fruiting body, termed a sporocarp. This fungus is most closely related to Armillaria and its allies. While Armillaria species form mushrooms with forcibly discharged spores, G. necrorhizus spores have lost this ability, and by necessity, must be passively dispersed. However, G. necrorhizus does not possess characteristics of other sequestrate fungi with known dispersal mechanisms. Repeated observations of termites feeding on G. necrorhizus sporocarps, and spores adhering to their exoskeleton, led to the hypothesis that termites disperse G. necrorhizus spores. To test this hypothesis, we used microsatellite markers and implemented population genetics analyses to understand patterns of clonality and population structure of G. necrorhizus. While Armillaria individuals can spread vegetatively over large areas, high genotypic diversity in G. necrorhizus populations suggest spores are the primary mode of dispersal, consistent with termite dispersal. Spatial genetic structure analyses suggest that G. necrorhizus sporocarps within 238 m of each other are more closely related than would be expected by chance. Conservative estimates from population assignment tests suggest populations separated by two km no longer exchange genes. Patterns of spatial genetic structure and population structure are consistent with previous studies analyzing foraging distances of termites found associated with G. necrorhizus sporocarps. Termites have rarely been recorded to specifically target fungal sporocarps, making this a potentially novel fungal-insect interaction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a research compendium (RC) for the publication "Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data".
The code (including figures, appendices and the manuscript) is packed in pathogen-modeling-master.zip or can be found directly in the Github repository.
Publication figures: analysis/paper/submission/3/latex-source-files/
Appendices: analysis/paper/submission/3/
This RC represents a static snapshot of the publication mentioned above. The Github repo will receive changes after the publication was published.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Geostatistics analyzes and predicts the values associated with spatial or spatial-temporal phenomena. It incorporates the spatial (and in some cases temporal) coordinates of the data within the analyses. It is a practical means of describing spatial patterns and interpolating values for locations where samples were not taken (and measures the uncertainty of those values, which is critical to informed decision making). This archive contains results of geostatistical analysis of COVID-19 case counts for all available US counties. Test results were obtained with ArcGIS Pro (ESRI). Sources are state health departments, which are scraped and aggregated by the Johns Hopkins Coronavirus Resource Center and then pre-processed by MappingSupport.com.
This update of the Zenodo dataset (version 6) consists of three compressed archives containing geostatistical analyses of SARS-CoV-2 testing data. This dataset utilizes many of the geostatistical techniques used in previous versions of this Zenodo archive, but has been significantly expanded to include analyses of up-to-date U.S. COVID-19 case data (from March 24th to September 8th, 2020):
Archive #1: “1.Geostat. Space-Time analysis of SARS-CoV-2 in the US (Mar24-Sept6).zip” – results of a geostatistical analysis of COVID-19 cases incorporating spatially-weighted hotspots that are conserved over one-week timespans. Results are reported starting from when U.S. COVID-19 case data first became available (March 24th, 2020) for 25 consecutive 1-week intervals (March 24th through to September 6th, 2020). Hotspots, where found, are reported in each individual state, rather than the entire continental United States.
Archive #2: "2.Geostat. Spatial analysis of SARS-CoV-2 in the US (Mar24-Sept8).zip" – the results from geostatistical spatial analyses only of corrected COVID-19 case data for the continental United States, spanning the period from March 24th through September 8th, 2020. The geostatistical techniques utilized in this archive includes ‘Hot Spot’ analysis and ‘Cluster and Outlier’ analysis.
Archive #3: "3.Kriging and Densification of SARS-CoV-2 in LA and MA.zip" – this dataset provides preliminary kriging and densification analysis of COVID-19 case data for certain dates within the U.S. states of Louisiana and Massachusetts.
These archives consist of map files (as both static images and as animations) and data files (including text files which contain the underlying data of said map files [where applicable]) which were generated when performing the following Geostatistical analyses: Hot Spot analysis (Getis-Ord Gi*) [‘Archive #1’: consecutive weeklong Space-Time Hot Spot analysis; ‘Archive #2’: daily Hot Spot Analysis], Cluster and Outlier analysis (Anselin Local Moran's I) [‘Archive #2’], Spatial Autocorrelation (Global Moran's I) [‘Archive #2’], and point-to-point comparisons with Kriging and Densification analysis [‘Archive #3’].
The Word document provided ("Description-of-Archive.Updated-Geostatistical-Analysis-of-SARS-CoV-2 (version 6).docx") details the contents of each file and folder within these three archives and gives general interpretations of these results.
Mixed-methods designs, especially those in which case selection is regression-based, have become popular across the social sciences. In this paper, we highlight why tools from spatial analysis—which have largely been overlooked in the mixed-methods literature—can be used for case selection and be particularly fruitful for theory development. We discuss two tools for integrating quantitative and qualitative analysis: (1) spatial autocorrelation in the outcome of interest; and (2) spatial autocorrelation in the residuals of a regression model. The case selection strategies presented here enable scholars to systematically use geography to learn more about their data and select cases that help identify scope conditions, evaluate the appropriate unit or level of analysis, examine causal mechanisms, and uncover previously omitted variables.
The fragmentation of habitats by roads and other artificial linear structures can have a profound effect on the movement of arboreal species due to their strong fidelity to canopies. Here, we used 12 microsatellite DNA loci to investigate the fine-scale spatial genetic structure and the effects of a major road and a narrow artificial waterway on a population of the endangered western ringtail possum (Pseudocheirus occidentalis) in Busselton, Western Australia. Using spatial autocorrelation analysis, we found positive genetic structure in continuous habitat over distances up to 600 m. These patterns are consistent with the sedentary nature of P. occidentalis and highlight their vulnerability to the effects of habitat fragmentation. Pairwise relatedness values and Bayesian cluster analysis also revealed significant genetic divergences across an artificial waterway, suggesting that it was a barrier to gene flow. By contrast, no genetic divergences were detected across the major road. While...
The data I submitted included topographic slope, irrigation guarantee rate, gross product, population size, soil pH, pesticide use, and total agricultural machinery power for each county and district in Henan Province. It also contains layer data of administrative divisions.
This is a research compendium (RC) for the publication "Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data". The code (including figures, appendices and the manuscript) is packed in pathogen-modeling-3.zip or can be found directly in the Github repository. Publication figures: analysis/paper/submission/3/latex-source-files/ Appendices: analysis/paper/submission/3/ This RC represents a static snapshot at the time of submission. The Github repository will receive changes after the publication was published. Data sources Atlas Climatico: http://opengis.uab.es/wms/iberia/index.htm DEM: ftp://ftp.geo.euskadi.eus/lidar/MDE_LIDAR_2016_ETRS89/ Lithology: http://www.geo.euskadi.eus/geonetwork/srv/spa/main.home pH: https://esdac.jrc.ec.europa.eu/content/soil-ph-europe#tabs-0-description=0 soil: https://www.isric.org/explore/soilgrids Licenses All files are shared via the given license with the exception of "soil.tif" which is shared via the ODbL license.
Lianas are an important component of subtropical forests, but the mechanisms underlying their spatial distribution patterns have received relatively little attention. Here, we selected 12 most abundant liana species, constituting up to 96.9% of the total liana stems, in a 20-ha plot in a subtropical evergreen broadleaved forest at 2,472 – 2,628 m elevation in SW China. Combining data on topography (convexity, slope, aspect, and elevation) and host trees (density and size) of the plot, we addressed how liana distribution is shaped by host tree properties, topography and spatial autocorrelation by using principal coordinates of neighbor matrices (PCNM) analysis. We found that lianas had an aggregated distribution based on the Ripley’s K function. At the community level, PCNM analysis showed that spatial autocorrelation explained 43% variance in liana spatial distribution. Host trees and topography explained 4% and 18% of the variance, but less than 1% variance after taking spatial autocor...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets for spatial autocorrelation functions based on Geary’s coefficient and Getis-Ord’s index (Partial results).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project presents a comprehensive spatial analysis and species distribution modeling (SDM) of American beech (Fagus grandifolia), integrating a range of geospatial and statistical methods to assess both current habitat suitability and future distribution projections under various climate scenarios. The workflow began by merging selected ecoregions to delineate the study area and aligning all spatial datasets to a common coordinate reference system. Raster datasets—including presence data and the target group background data—were processed by resampling to a coarser resolution (1km2) for ecological consistency and then masked to the study area, ensuring that only relevant data were analyzed.
Subsequent data processing involved extracting non‐NA pixel values, applying k‑means clustering and Fisher’s natural breaks classification to determine optimal thresholds for categorizing the presence and background raster data. These thresholds served as critical cut-offs for distinguishing between different levels of basal area—a key ecological parameter. The derived threshold values were then used to filter raster data, facilitating the extraction of presence (and background) points which were subsequently converted into spatial vector (sf) objects.
A pivotal aspect of the analysis was the measurement of spatial autocorrelation. This was accomplished through the computation of semivariograms for both the basal area and target group datasets. By fitting a spherical model to the semivariogram, the study was able to quantify the range of spatial autocorrelation, thereby providing insights into the spatial structure inherent in the ecological data. This analysis not only confirmed the degree of spatial clustering but also informed subsequent modeling efforts by highlighting the spatial dependency in the dataset.
Environmental variables were then carefully selected and processed, with multicollinearity among predictors being assessed using the variance inflation factor (VIF) and visualized through correlation matrices. The refined set of environmental predictors was integrated into the BIOMOD2 framework, where the modeling data were formatted to include both species presence–absence data and environmental layers. A diverse array of algorithms—including ANN, CTA, FDA, GAM, GBM, GLM, MARS, MAXENT, MAXNET, RF, SRE, and XGBOOST—was employed to develop individual models using a nested k‑fold cross-validation approach, ensuring robust model evaluation based on metrics such as TSS, ROC, and Boyce.
Ensemble modeling strategies were also implemented. Selected models (GAM, GBM, GLM, MARS, MAXNET, and RF) were combined using both algorithm-specific ensemble approaches and an “all models” ensemble strategy, where predictions were weighted (with options to incorporate basal area weights) and then aggregated to produce ensemble forecasts. Variable importance metrics and response curves were generated to elucidate the influence of each predictor on species distribution.
Looking forward, the project incorporated future climate scenarios (SSP1, SSP2, SSP3, SSP5) for multiple time periods (2011–2040, 2041–2070, and 2071–2100). For each scenario, ensemble forecasting was performed to predict potential shifts in habitat suitability. Additionally, a Multivariate Environmental Similarity Surface (MESS) analysis was conducted to evaluate the reliability of model predictions when extrapolating to novel future conditions. Finally, a range change analysis was carried out by comparing current and future model predictions, quantifying changes in the potential distribution area of American beech.
In summary, this study integrates advanced spatial data processing, semivariogram-based spatial autocorrelation analysis, multivariate statistical techniques, and ensemble species distribution modeling to provide a robust evaluation of American beech habitat suitability and projected distribution shifts in response to climate change.
https://www.icpsr.umich.edu/web/ICPSR/studies/2824/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/2824/terms
CrimeStat III is a spatial statistics program for the analysis of crime incident locations, developed by Ned Levine and Associates under the direction of Ned Levine, PhD, that was funded by grants from the National Institute of Justice (grants 1997-IJ-CX-0040, 1999-IJ-CX-0044, 2002-IJ-CX-0007, and 2005-IJ-CX-K037). The program is Windows-based and interfaces with most desktop GIS programs. The purpose is to provide supplemental statistical tools to aid law enforcement agencies and criminal justice researchers in their crime mapping efforts. CrimeStat is being used by many police departments around the country as well as by criminal justice and other researchers. The program inputs incident locations (e.g., robbery locations) in 'dbf', 'shp', ASCII or ODBC-compliant formats using either spherical or projected coordinates. It calculates various spatial statistics and writes graphical objects to ArcGIS, MapInfo, Surfer for Windows, and other GIS packages. CrimeStat is organized into five sections: Data Setup Primary file - this is a file of incident or point locations with X and Y coordinates. The coordinate system can be either spherical (lat/lon) or projected. Intensity and weight values are allowed. Each incident can have an associated time value. Secondary file - this is an associated file of incident or point locations with X and Y coordinates. The coordinate system has to be the same as the primary file. Intensity and weight values are allowed. The secondary file is used for comparison with the primary file in the risk-adjusted nearest neighbor clustering routine and the duel kernel interpolation. Reference file - this is a grid file that overlays the study area. Normally, it is a regular grid though irregular ones can be imported. CrimeStat can generate the grid if given the X and Y coordinates for the lower-left and upper-right corners. Measurement parameters - This page identifies the type of distance measurement (direct, indirect or network) to be used and specifies parameters for the area of the study region and the length of the street network. CrimeStat III has the ability to utilize a network for linking points. Each segment can be weighted by travel time, travel speed, travel cost or simple distance. This allows the interaction between points to be estimated more realistically. Spatial Description Spatial distribution - statistics for describing the spatial distribution of incidents, such as the mean center, center of minimum distance, standard deviational ellipse, the convex hull, or directional mean. Spatial autocorrelation - statistics for describing the amount of spatial autocorrelation between zones, including general spatial autocorrelation indices - Moran's I , Geary's C, and the Getis-Ord General G, and correlograms that calculate spatial autocorrelation for different distance separations - the Moran, Geary, Getis-Ord correlograms. Several of these routines can simulate confidence intervals with a Monte Carlo simulation. Distance analysis I - statistics for describing properties of distances between incidents including nearest neighbor analysis, linear nearest neighbor analysis, and Ripley's K statistic. There is also a routine that assigns the primary points to the secondary points, either on the basis of nearest neighbor or point-in-polygon, and then sums the results by the secondary point values. Distance analysis II - calculates matrices representing the distance between points for the primary file, for the distance between the primary and secondary points, and for the distance between either the primary or secondary file and the grid. 'Hot spot' analysis I - routines for conducting 'hot spot' analysis including the mode, the fuzzy mode, hierarchical nearest neighbor clustering, and risk-adjusted nearest neighbor hierarchical clustering. The hierarchical nearest neighbor hot spots can be output as ellipses or convex hulls. 'Hot spot' analysis II - more routines for conducting hot spot analysis including the Spatial and Temporal Analysis of Crime (STAC), K-means clustering, Anselin's local Moran, and the Getis-Ord local G statistics. The STAC and K-means hot spots can be output as ellipses or convex hulls. All of these routines can simulate confidence intervals with a Monte Carlo simulation. Spatial Modeling Interpolation I - a single-variable kernel density estimation routine for producin
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
N: number of pairwise comparisons; r: correlation coefficient; U and L: upper and lower bounds for the 95% confidence interval about the null hypothesis of no spatial structure, Ur and Lr: 95% error bounds about r as determined by bootstrap resampling. The probability P of a one-tailed test for positive autocorrelation, permutated r and bootstrapped r are also shown.