Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Taxonomic names associated with digitized biocollections labels have flooded into repositories such as GBIF, iDigBio and VertNet. The names on these labels are often misspelled, out of date, or present other problems, as they were often captured only once during accessioning of specimens, or have a history of label changes without clear provenance. Before records are reliably usable in research, it is critical that these issues be addressed. However, still missing is an assessment of the scope of the problem, the effort needed to solve it, and a way to improve effectiveness of tools developed to aid the process. We present a carefully human-vetted analysis of 1000 verbatim scientific names taken at random from those published via the data aggregator VertNet, providing the first rigorously reviewed, reference validation data set. In addition to characterizing formatting problems, human vetting focused on detecting misspelling, synonymy, and the incorrect use of Darwin Core. Our results reveal a sobering view of the challenge ahead, as less than 47% of name strings were found to be currently valid. More optimistically, nearly 97% of name combinations could be resolved to a currently valid name, suggesting that computer-aided approaches may provide feasible means to improve digitized content. Finally, we associated names back to biocollections records and fit logistic models to test potential drivers of issues. A set of candidate variables (geographic region, year collected, higher-level clade, and the institutional digitally accessible data volume) and their 2-way interactions all predict the probability of records having taxon name issues, based on model selection approaches. We strongly encourage further experiments to use this reference data set as a means to compare automated or computer-aided taxon name tools for their ability to resolve and improve the existing wealth of legacy data.
The biodiversity informatics community has discussed aspirations and approaches for assigning globally unique identifiers (GUIDs) to biocollections for nearly a decade. During that time, and despite misgivings, the de facto standard identifier has become the “Darwin Core Triplet†, which is a concatenation of values for institution code, collection code, and catalog number associated with biocollections material. Our aim is not to rehash the challenging discussions regarding which GUID system in theory best supports the biodiversity informatics use case of discovering and linking digital data across the Internet, but how well we can link those data together at this moment, utilizing the current identifier schemes that have already been deployed. We gathered Darwin Core Triplets from a subset of VertNet records, along with vertebrate records from GenBank and the Barcode of Life Data System, in order to determine how Darwin Core Triplets are deployed “in the wild†. We asked if those triple...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The collection contains approximately 35,000 specimens of reptiles and amphibians. Most of the material is from southern Georgia, although some collections from other areas in the southeast are included. The collection contains representation over 95% of Georgia's herpetofauna and is the second largest collection in the state. Specimen data is digitized (Specify 6.4), and is available via VertNet and/or upon request. Contact Lance McBrayer for more information, loans, data requests, and/or a visit.
In response to the growth of biology datasets and broad efforts to digitize data, an increasingly important skill for science students is the management and analysis of large datasets. We designed an inquiry-based lab module to introduce students to museum research by quantitatively evaluating ecogeographical patterns using a VertNet dataset. VertNet is a free, NSF-funded database of museum specimens from over 100 research museums with spatial, temporal, and morphological data for thousands of individual specimens. Patterns observed by natural historians provide a context for students to enter the world of museum research. These patterns, especially in mammals, are largely associated with latitudinal gradients. For example, Bergmann's Rule states that animals are larger in colder environments, an adaptation to conserve energy in harsh climates. Allen's Rule states that endotherms in colder environments will have shorter extremities. After learning these general patterns, students develop questions to pursue for a particular group of mammals. Students measure available museum specimens and supplement their data with a downloadable VertNet dataset. Datasets include over 150 columns, requiring students to choose appropriate variables while accounting for errors that might occur in large datasets collected across many institutions. Students flex their statistical skills to examine their research question and present their results to the class, perhaps discovering that "Rules" were meant to be broken. By completing this module, students become familiar with how museums aid in research, gain confidence in asking and pursuing their own scientific questions, and practice managing and analyzing large datasets.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Detailed list of references and data sources : Explanation note: List of references used to determine historical extent and regional first-occurrences of coyotes (Canis latrans) in North and Central America.
The world’s “100 worst invasive species†were listed in 2000. The list is taxonomically diverse and often cited (typically for single-species studies), and its species are frequently reported in global biodiversity databases. We acted on the principle that these notorious species should be well-reported to help answer two questions about global biogeography of invasive species (i.e., not just their invaded ranges): (1) “how are data distributed globally?†and (2) “what predicts diversity?†We collected location data for each of the 100 species from multiple databases; 95 had sufficient data for analyses. For question (1), we mapped global species richness and cumulative occurrences since 2000 in (0.5 degree)2 grids. For question (2) we compared alternative regression models representing non-exclusive hypotheses for geography (i.e., spatial autocorrelation), sampling effort, climate, and anthropocentric effects. Reported locations of the invasive species were spatially-biased, leaving la..., Data Acquisition and Processing Data were acquired from multiple data bases for the 100 invasive species in February 2022 using the spocc package in R (Chamberlain 2021). Data sources (in alphabetical order) included: the Atlas of Living Australia ('ALA'; https://www.ala.org.au); eBird (http://www.ebird.org/home; Sullivan et al. 2009); the Integrated Digitized Biocollections ('iDigBio'; https://www.idigbio.org; Matsunaga et al. 2013); the Global Biodiversity Information Facility (GBIF (https://www.gbif.org); Ocean 'Biogeographic' Information System ('OBIS'; https://portal.obis.org; Grassle and Stocks 1999); VertNet (https://vertnet.org; Constable et al. 2010); and the US Geological Survey’s Biodiversity Information Serving Our Nation ('BISON'; replaced December 2021 by GBIF). Several databases set limits to 100,000 initial point records (before cleaning, described below) when accessed using spocc. As a result, data for 19 species with >100,000 point records (e.g., the European starli..., , # Biogeography of the world’s worst invasive species has spatially-biased knowledge gaps but is predictable
https://doi.org/10.5061/dryad.zw3r228bh
The provided datatoanalyze.csv file represents data further processed in provided R code to include spatial autocorrelation for each of species richness and cumulative occurences analyses.
The data file includes 59586 rows and 19 columns. NAs indicate missing data. Columns include:
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The relative importance of abiotic and biotic factors in determining species distributions has long been of interest to ecologists but is often difficult to assess due to the lack of spatially and temporally robust occurrence records. Furthermore, locating places where potentially highly competitive species co-occur may be challenging but would provide critical knowledge into the effects of competition on species ranges. We built species distribution models for two closely related species of small mammals (Neotoma) that are largely parapatric along mountainsides throughout the Great Basin Desert, USA using extensive modern occurrence records. We hindcasted these models to the mid-Holocene to compare the response of each species to dramatic climatic change and used paleontological records to validate our models. Model results showed species co-occurrence at mid-elevations along select mountain ranges in this region. We confirmed our model results with fine-scale field surveys in a single mountain range containing one of the most extensive survey datasets across an elevational gradient in the Great Basin. We found close alignment of realized distributions to the respective abiotic species distribution model predictions, despite the presence of the congener, indicating that climate may be more influential than competition in shaping distribution at the scale of a single mountain range. Our models also predict differential species responses to historic climate change, leading to a reduced probability of species interactions during warmer and dryer climatic conditions. Our results emphasize the utility of examining species distributions with regard to both abiotic variables and species interactions and at various spatial scales to make inferences about the mechanisms underlying distributional limits. Methods Occurrence records for species distribution models came from the Global Biodiversity Information Facility (GBIF) (https://www.gbif.org/, accessed January 2022) and VertNet (http://vertnet.org/, accessed January 2022), and data we collected from surveys throughout the Great Basin in the Summer and Fall of 2021. We cropped all records to the Great Basin ecoregion, the boundary of which was obtained from the United States Geological Survey (USGS) database. We constrained records to dates on or after 1950, and with less than or equal to one kilometer coordinate uncertainty to reflect the resolution of the data layers. We used the spatial analysis georeferencing accuracy (SAGA) protocol to georeference data with no recorded coordinate uncertainty (Bloom et al. 2018). We thinned locality data to 1km raster cells. From April through October 2022, we conducted surveys (24 sites, 6555 trap nights) for woodrats in the southern Snake Range of eastern Nevada. We also obtained mid-Holocene fossil and midden records assembled from the Neotoma Paleoecology Database (http://www.neotoma.db.org; January 2024) (Williams, Grimm et al. 2018), and primary literature (Grayson 1985, Terry et al. 2011) to validate mid-Holocene model projections. We filtered records by excluding records with uncertain identification, restricting to only those with a calibrated median age between 4,500 and 7,500 years before the present and max age <11,700 (Grayson 2011), and trimming all records to our Great Basin extent. All paleontological records included were morphologically identified to species.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Taxonomic names associated with digitized biocollections labels have flooded into repositories such as GBIF, iDigBio and VertNet. The names on these labels are often misspelled, out of date, or present other problems, as they were often captured only once during accessioning of specimens, or have a history of label changes without clear provenance. Before records are reliably usable in research, it is critical that these issues be addressed. However, still missing is an assessment of the scope of the problem, the effort needed to solve it, and a way to improve effectiveness of tools developed to aid the process. We present a carefully human-vetted analysis of 1000 verbatim scientific names taken at random from those published via the data aggregator VertNet, providing the first rigorously reviewed, reference validation data set. In addition to characterizing formatting problems, human vetting focused on detecting misspelling, synonymy, and the incorrect use of Darwin Core. Our results reveal a sobering view of the challenge ahead, as less than 47% of name strings were found to be currently valid. More optimistically, nearly 97% of name combinations could be resolved to a currently valid name, suggesting that computer-aided approaches may provide feasible means to improve digitized content. Finally, we associated names back to biocollections records and fit logistic models to test potential drivers of issues. A set of candidate variables (geographic region, year collected, higher-level clade, and the institutional digitally accessible data volume) and their 2-way interactions all predict the probability of records having taxon name issues, based on model selection approaches. We strongly encourage further experiments to use this reference data set as a means to compare automated or computer-aided taxon name tools for their ability to resolve and improve the existing wealth of legacy data.