Use digitized natural history collection occurrence data from the Global Biodiversity Information Facility (GBIF) to map the distribution of the beaver in the state of Oregon from 1800-2020 using QGIS
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The twisted-wing parasite order (Strepsiptera Kirby, 1813) is difficult to study due to the complexity of strepsipteran life histories, small body sizes, and a lack of accessible distribution data for most species. Here, we present a review of the strepsipteran species known from New York State. We also demonstrate successful collection methods and a survey of species carried out in an old-growth deciduous forest dominated by native New York species (Black Rock Forest, Cornwall, NY) and a private site in the Catskill Mountains (Shandaken, NY). Additionally, we model suitable habitat for Strepsiptera in the United States with species distribution modeling. We base our models on host distributions and climatic variables to inform predictions of where these twisted-wing parasites are likely to be found. With this work, we hope to provide a useful reference for the future collection of Strepsiptera. Methods Our specimens were collected in Black Rock Forest (BRF), Cornwall, New York over the course of six trips in July and August of 2022 and 2023. BRF is an old growth forest protected and maintained by a namesake scientific organization dedicated to its study—as such, this forest provides a uniquely mature and native environment in which to collect ecological data. We sampled six areas: native growth by the Black Rock Forest (BRF) Science Center (41.41408°, -74.011919°), a patch of wild growth in the parking lot (41.413249°, -74.011421°), the meadow of the Upper Reservoir (41.411015°, -74.007048°), Aleck Meadow (41.406405°, -74.014587°), meadows of Jim’s Pond (41.387490°, -74.020348°), and brush near the Stone House (41.397177°, -74.021423°) (Figure S1). In addition to the BRF sites, we sampled one privately owned site in the Catskill Mountains, Shandaken, New York in June and July 2023 (42.129425°, -74.377613°). To generate predictive models of host and Strepsiptera ranges, we gathered occurrence data for each host-parasite pair for which collection coordinates were available from the Global Biodiversity Information Facility (GBIF) and combined it with the locality data from our collection efforts. Of the 78 strepsipteran species documented in the United States, only a subset had occurrence data. Of these, 51 species included specific coordinate data, and only 15 species had multiple unique coordinates. If hosts of these strepsipterans did not have occurrence data, we excluded these host species from the predictive analyses as well. Since our models require at least 5 occurrence datapoints to run, we ran models on genera instead of species to ensure that our predictions were robust. Our list was based on a checklist of strepsipteran species and their hosts in the United States from Kathirithamby, 2005, plus a United States checklist (Zabinski & Cook, 2023) and world checklist of the genus Stylops (Straka et al., 2015). Our GBIF search parameters specified human observation and preserved specimens as basis of record, data with coordinates, and the United States as an administrative area to restrict the search. When necessary for lessening computational time, we thinned the data by specifying coordinate uncertainty between 0-1 meters. We took a species distribution modeling approach with the R package “wallace” and its modeling application Wallace v2.0 (Kass et al., 2018, 2023), using the algorithm MaxEnt (Maximum Entropy) (Phillips et al., 2004) and incorporating Bioclim environmental data (Booth et al., 2014) as explanatory variables driving species presence. For each species of Strepsiptera, we incorporated its host presence-absence prediction (10 percentile training presence threshold visualization) as a categorical variable. We standardized our models by specifying their region of study to a shapefile of the 48 contiguous United States, which we generated in QGIS using publicly available data (United States Government, 2023). We chose each model based on corrected Akaike information criterion (AICc), average omission rate when applying a 10-percentile training presence threshold to withheld validation data (OR.10p), and area under the curve of a receiver operating characteristic plot (auc.val.avg) (Kass et al., 2021; Peterson et al., 2011). Our R scripts for each model are openly available at Dryad. We visualized all data resulting from our models in QGIS v3.2.6 (Flenniken et al., 2020), and generated our host-parasite and species richness maps by using the QGIS Raster Calculator addition function.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The subspecies of American badger (Taxidea taxus berlandieri Baird, 1858), also called tlalcoyote (Figure 1), is distributed in north-central Mexico. However, its occurrence records are scarce and the few that exist are uncertain due to incorrect georeferencing or identification of the taxonomic unit. In view of this, we disgned a spatial sampling in part of the states of Coahuila de Zaragoza, Durango, Nuevo León, San Luis Potosí and Zacatecas. In this north-central protion of Mexico, we generated a grid of squares measuring 5 × 5 km over the entire study area using QGIS® 3.10 software. Subsequently, we excluded squares that included urban settlements, agricultural land, or water bodies in more than 30% of their extension; we also descarted squares located at an altitude over 2,250 meters above sea level. To perform this filtering, we used both the land use and vegetation chart of the INEGI [Instituto Nacional de Estadística, Geografía e Informática] (2018) and the Digital Elevation Model (DEM) downloaded from the USGS page [United States Geological Survey] (2019) as a basis. As result, we obtained 3,471 squares separated by at least 5 km. Then, through simple random sampling, 177 (≈5%) squares were selected, where we generated centroids to be used as sampling sites.
In field work, between 2009 and 2015, at these 177 sites we traced a 10 × 100 m transect, where we searched for T. t. berlandieri signs (i.e., burrows and scratching posts). In this case, their burrows and scratching posts are easily observed and quantified, and there is no chance of mistaking them for burrows of other species (Long 1973; Merlin 1999). Also, we recorded possible sightings, as other studies (e.g., Merlin 1999; Elbroch 2003). As result, we only found 33 with signs of occurrence.
Figure 1. Individual of tlalcoyote (Taxidea taxus Berlandieri). Photo obtained from Naturalista (2023) and uploaded by David Molina©. All rights reserved (CC BY-NC-ND).
To increase the number of records, we included occurrence data from GBIF [Global Biodiversity Information Facility portal] (2022). We downloaded only the records that included coordinates and that their basis of registration was "preserved specimen". This, because they are correctly identified as specimens from biological collections (Maldonado et al. 2015). In addition, we only selected records for Mexico. Subsequently, we filtered the downloaded database, discarding records that were incorrectly georeferenced, with atypical and duplicate coordinates, as well as with low geospatial accuracy (e.g., less than three decimals of precision).
We loaded the remaining data into the QGIS® software and performed a spatial filtering, where we excluded data that were outside the study area, located in unlikely areas (e.g., human settlements, bodies of water, agricultural areas) and with a distance of less than 5 km from the records obtained in the field. This gave a total of 10 records from the GBIF portal. Finally, we loaded the raster layers of elevation (Elev; INEGI 2007), normalized difference vegetation index (NDVI, USGS 2019) and the slope of the terrain into the software to extract the pixel values based on the GBIF records and those obtained in the field. With this, we generated a new global dataset to which we performed environmental filtering to find environmental outliers. We plotted the normality distribution of the data for each variable and the dispersion of the data among the variables. In this filtering, we conserve all records. Figure 2 shows the normality distribution of the records as a function of Elev. Figure 3 shows the dispersion of the data between Elev and NDVI.
Figure 2. Normality distribution of T. t. berlandieri occurrence records as a function of the elevation variable (Elev).
Figure 3. Scatter plot of T. t. berlandieri occurrence records as a function of elevation (Elev) and normalized difference vegetation index (NDVI).
For the north-central region of Mexico, we present the global database (i.e., Tatabe_joint.csv), as well as the database that contains only the field evidence records (i.e., Tatabe_first_order.csv) and another one with the filtered GBIF records (i.e., Tatabe_GBIF.csv).
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Understanding species distribution and habitat preferences is crucial for effective conservation strategies. However, the lack of information about population responses to environmental change at different scales hinders effective conservation measures. In this study, we estimate the potential and realized distribution of Atractus lasallei, a semi-fossorial snake endemic to the northwestern region of Colombia. We modelled the potential distribution of A. lasallei based on ecological niche theory (using maxent), and habitat use was characterized while accounting for imperfect detection using a single-season occupancy model. Our results suggest that A. lasallei selects areas characterized by slopes below 10°, with high average annual precipitation (>2500mm/year) and herbaceous and shrubby vegetation. Its potential distribution encompasses the northern Central Cordillera and two smaller centers along the Western Cordillera, but its habitat is heavily fragmented within this potential distribution. When the two models are combined, the species’ realized distribution sums up to 935 km2, highlighting its vulnerability. We recommend approaches that focus on variability at different spatio-temporal scales to better comprehend the variables that affect species’ ranges and identify threats to vulnerable species. Prompt actions are needed to protect herbaceous and shrub vegetation in this region, highly demanded for agriculture and cattle grazing. Methods Ecological Niche Model and Potential Distribution Presence data were acquired from three sources: 1) specimens from biological collections, obtained from the Global Biodiversity Information Facility (accessed 22 March 2022) [35] and most of them revised in situ in the following collections: MHUA-Museo de Herpetología Universidad de Antioquia (Curator: J.M. Daza-Rojas), CSJ-h-Museo de Ciencias Naturales de La Salle (D.Z. Urrego), CBUCES-D-Colecciones Biológicas Universidad CES (J.C. Duque); 2) iNaturalist records obtained directly from them (accessed March-May 2022; we did not use iNaturalist records from GBIF) by searching Atractus records in the northwest of Colombia that included pictures that allowed verification through morphology (coloration patterns, and scale counts when possible), and 3) individuals encountered during the field phase for occupancy models. Identification of individuals was based on the original species description and taxonomic revisions of the genus [28, 33]. Further, a geographical filter was applied to presence records that were within 1 km of each other to reduce spatial autocorrelation [36, 37]. We used the final database to delimit the species accessible area or M [38] based on the intersection between the minimum convex polygon generated with 50 km buffers around each presence record using QGIS (v.3.10) [39], and the biogeographic regions of the northern Central Cordillera and the Western Cordillera of Colombia 40. The environmental variables for the niche model included topographic variables, atmospheric climate variables including temperature corrected to ground level [41] and soil variables 42. Climate variables represent long-term averages (1980-2010 in the case of atmospheric variables, and 2000-2020 in the case of ground-level temperature; see S1). These variables were selected based on previous research findings regarding the distribution of semi-fossorial reptiles [25, 43-45]. All variables were used in the models at a spatial resolution of 1 km. Variables with finer resolutions were resampled using the bilinear method, with the R-package “raster” v.3.5 [46] in R v.4.2.1 [47]. Subsequently, a Spearman correlation test (S2) was conducted to select non-correlated variables (< 0.8), using R-package “corrplot” v.0.92 [48]. Finally, two sets of variables were created, one that included two ground-level temperature variables estimated at five centimeters above the ground (S1) [41], and the second included the same two variables but measured at atmospheric level [49]. Models were calibrated with each data set independently, ensuring all variables used were not correlated. The ecological niche model was generated using the maximum entropy algorithm [15] through the R-package “kuenm” [50]. This methodology allows the evaluation of different sets of environmental variables (set 1 and set 2, S1) and various model parameterizations to ultimately identify the best model according to a set of criteria. We allowed the regularization parameter to vary from 0.1 (very strict in relation to observed values) to 5 (more flexible in relation to observed values), where 1 is the default value. We also evaluated across linear (l), quadratic (q), and linear-quadratic (lq) responses. The models were trained using a 20% random partition of the occurrence data for model evaluation. The evaluation criteria included omission rate (<5%), partial receiver operating characteristic (partial ROC), area under the curve (AUC ratio>1), and the Akaike Information Criterion corrected for sample size (AICc) [51]. In case several models achieved the evaluation criteria, we performed a consensus model using the median of the selected models. Finally, to obtain the geographic projection, a cutoff threshold was applied using the 10 percentile training presence criteria from the best model(s) to generate a presence/absence map. Occupancy Models To identify fine-scale factors influencing the occupancy of A. lasallei within its known distribution area, single-season occupancy models were employed [24]. The sampling design followed the recommendations of a previous study for semi-fossorial species [52], wherein 30 linear transects of 100 m x 2 m were established within the sampling area, spaced at a minimum of 200 m apart to ensure independence of detection histories across sites (Fig 1). Each transect was equipped with nine artificial cover objects (three roof tiles, three boards, and three plastic sheets), which were installed a minimum of two months prior to sampling for the organisms to habituate to their presence (S3). The transects were surveyed between October 2021 and January 2022 to ensure consistent occupancy status during the sampling period (closed-site assumption) between 8 AM and 4 PM. Surveys involved searching beneath leaf litter and under cover objects (both artificial and natural). Each transect was surveyed a minimum of four times, with visits spaced at least two weeks apart to satisfy the assumption of temporal independence. Animals were photographed and examined in the field to ensure correct identification (Approval Act No. 138, February 9, 2021, granted by the Committee on Ethics for Animal Experimentation, Universidad de Antioquia). Occupancy models were constructed using the R-package “unmarked” (v.1.2.5) [53] implemented in the R software. All covariates were standardized (mean=0, units in standard deviations) prior to modelling. To identify the best models, we first established the best detection model assuming constant occupancy, and then we used this detection model in all occupancy models [54]. To model detectability, we included as covariates, the number of cover objects, both natural and artificial (N_obj); vegetation height (Veg_H) [55]; soil moisture (Soil_moisture); and soil temperature (T_ground), both measured using a HOBO proV2 datalogger beneath a roof tile or under the object where an individual of the species was located at the time of each visit. As covariates for occupancy, we used vegetation height (Veg_H) [55]; terrain slope (Slope); topographic convergence (Con); compound topographic index (CTI) [56]; annual mean soil temperature (Tprom), maximum temperature of the warmest month (Tmax), and minimum temperature of the coldest month (Tmin)[41]; depth of leaf litter (Leaf_Dep) and depth of the 0 horizon (Hori0), both measured in the field using a soil auger; euclidean distance to the nearest house (D_house), nearest forest (D_forest), and nearest water body (D_water). These distances were estimated in QGIS [39], identifying the nearest houses and forests to the centroid of each transect using satellite imagery from GoogleEarth (https://www.google.com/intl/es/earth/). To calculate the distance to water bodies, it was necessary to construct a detailed hydrographic network for the area using a 12.5 m resolution DEM obtained from Alaska vertex (https://search.asf.alaska.edu/), utilizing the hydrology toolbox in ArcGIS Pro (v.2.7) [57]. A total of 87 biologically plausible and simple models were evaluated, each including one or two variables (S4), 20 of the models were for the detection component with constant occupancy, and the remaining models were for the occupancy component. Finally, to evaluate model fit to our data, we performed a parametric bootstrap test on the chosen model, using the parboot function of R package “unmarked” v.1.4.1 [53]. This test generates multiple sets of data iteratively from the best model and then compares these sets with the detection histories obtained in the field. A chi-squared test was employed to evaluate the null hypothesis that the observations are consistent with the proposed model. Integration of models To estimate the species’ realized distribution area [58], we used the binary (presence-absence) geographic projection from the consensus niche model to identify the areas where the macro conditions were suitable and applied the best occupancy model within those areas at a higher spatial resolution (0.00025° 27 meters). Finally, the resulting map was transformed into a binary outcome using a threshold of 0.78, based on the Q3 (third quartile) of the occupancy distribution values of that map; this threshold corresponds to 4 m of vegetation height according to the best occupancy model (Fig 2), which is biologically justified if we consider that all presence records obtained in the field phase were found in places with vegetation below 4 m. References 15.
Aim: Recent climate projections have shown that the distribution of organisms in island biotas is highly affected by climate change. Here, we present the results of the analysis of niche dynamics of a plant group, Memecylon on Sri Lanka, an island, using species occurrences and climate data. We aim to determine which climate variables explain current distribution, model how climate change impacts the availability of suitable habitat for Memecylon, and determine conservation priority areas for Sri Lankan Memecylon.
Location: Sri Lanka
Methods: We used georeferenced occurrence data of Sri Lankan Memecylon to develop ecological niche models and assess both current and future potential distributions under six climate change scenarios in 2041-2060 and 2061-2080. We also overlaid land-cover, and protected area maps and performed a gap analysis to understand the impacts of land-cover changes on Memecylon distributions and propose new areas for conservation.
Results: Differences among suitab...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Use digitized natural history collection occurrence data from the Global Biodiversity Information Facility (GBIF) to map the distribution of the beaver in the state of Oregon from 1800-2020 using QGIS