7 datasets found

f
A Standardized Reference Data Set for Vertebrate Taxon Name Resolution
plos.figshare.com
datasetcatalog.nlm.nih.gov
tiff
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paula F. Zermoglio; Robert P. Guralnick; John R. Wieczorek (2023). A Standardized Reference Data Set for Vertebrate Taxon Name Resolution [Dataset]. http://doi.org/10.1371/journal.pone.0146894
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0146894
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Paula F. Zermoglio; Robert P. Guralnick; John R. Wieczorek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Taxonomic names associated with digitized biocollections labels have flooded into repositories such as GBIF, iDigBio and VertNet. The names on these labels are often misspelled, out of date, or present other problems, as they were often captured only once during accessioning of specimens, or have a history of label changes without clear provenance. Before records are reliably usable in research, it is critical that these issues be addressed. However, still missing is an assessment of the scope of the problem, the effort needed to solve it, and a way to improve effectiveness of tools developed to aid the process. We present a carefully human-vetted analysis of 1000 verbatim scientific names taken at random from those published via the data aggregator VertNet, providing the first rigorously reviewed, reference validation data set. In addition to characterizing formatting problems, human vetting focused on detecting misspelling, synonymy, and the incorrect use of Darwin Core. Our results reveal a sobering view of the challenge ahead, as less than 47% of name strings were found to be currently valid. More optimistically, nearly 97% of name combinations could be resolved to a currently valid name, suggesting that computer-aided approaches may provide feasible means to improve digitized content. Finally, we associated names back to biocollections records and fit logistic models to test potential drivers of issues. A set of candidate variables (geographic region, year collected, higher-level clade, and the institutional digitally accessible data volume) and their 2-way interactions all predict the probability of records having taxon name issues, based on model selection approaches. We strongly encourage further experiments to use this reference data set as a means to compare automated or computer-aided taxon name tools for their ability to resolve and improve the existing wealth of legacy data.
d
Data from: The trouble with triplets in biodiversity informatics: a...
dataone.org
datasetcatalog.nlm.nih.gov
+2more
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Guralnick; Tom Conlin; John Deck; Brian Stucky; Nico Cellinese; Brian J. Stucky (2025). The trouble with triplets in biodiversity informatics: a data-driven case against current identifier practices [Dataset]. http://doi.org/10.5061/dryad.4b115
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.4b115
Dataset updated
May 28, 2025
Dataset provided by
Dryad Digital Repository
Authors
Robert Guralnick; Tom Conlin; John Deck; Brian Stucky; Nico Cellinese; Brian J. Stucky
Time period covered
Nov 5, 2015
Description
The biodiversity informatics community has discussed aspirations and approaches for assigning globally unique identifiers (GUIDs) to biocollections for nearly a decade. During that time, and despite misgivings, the de facto standard identifier has become the â€œDarwin Core Tripletâ€ , which is a concatenation of values for institution code, collection code, and catalog number associated with biocollections material. Our aim is not to rehash the challenging discussions regarding which GUID system in theory best supports the biodiversity informatics use case of discovering and linking digital data across the Internet, but how well we can link those data together at this moment, utilizing the current identifier schemes that have already been deployed. We gathered Darwin Core Triplets from a subset of VertNet records, along with vertebrate records from GenBank and the Barcode of Life Data System, in order to determine how Darwin Core Triplets are deployed â€œin the wildâ€ . We asked if those triple...
Data from: Georgia Southern University - Savannah Science Museum Herpetology...
demo.gbif.org
gbif.org
Updated Jun 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgia Southern University (2017). Georgia Southern University - Savannah Science Museum Herpetology Collection [Dataset]. http://doi.org/10.15468/nruxuc
Explore at:
Unique identifier
https://doi.org/10.15468/nruxuc
Dataset updated
Jun 8, 2017
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Georgia Southern University
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 1956 - Dec 31, 1956
Area covered

Description
The collection contains approximately 35,000 specimens of reptiles and amphibians. Most of the material is from southern Georgia, although some collections from other areas in the southeast are included. The collection contains representation over 95% of Georgia's herpetofauna and is the second largest collection in the state. Specimen data is digitized (Specify 6.4), and is available via VertNet and/or upon request. Contact Lance McBrayer for more information, loans, data requests, and/or a visit.
q
Teaching Biodiversity with Museum Specimens in an Inquiry-Based Lab
qubeshub.org
Updated Nov 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisa Walsh*†; Cynthia Giffen†; Cody Thompson (2021). Teaching Biodiversity with Museum Specimens in an Inquiry-Based Lab [Dataset]. http://doi.org/10.24918/cs.2019.45
Explore at:
Unique identifier
https://doi.org/10.24918/cs.2019.45
Dataset updated
Nov 23, 2021
Dataset provided by
QUBES
Authors
Lisa Walsh*†; Cynthia Giffen†; Cody Thompson
Description
In response to the growth of biology datasets and broad efforts to digitize data, an increasingly important skill for science students is the management and analysis of large datasets. We designed an inquiry-based lab module to introduce students to museum research by quantitatively evaluating ecogeographical patterns using a VertNet dataset. VertNet is a free, NSF-funded database of museum specimens from over 100 research museums with spatial, temporal, and morphological data for thousands of individual specimens. Patterns observed by natural historians provide a context for students to enter the world of museum research. These patterns, especially in mammals, are largely associated with latitudinal gradients. For example, Bergmann's Rule states that animals are larger in colder environments, an adaptation to conserve energy in harsh climates. Allen's Rule states that endotherms in colder environments will have shorter extremities. After learning these general patterns, students develop questions to pursue for a particular group of mammals. Students measure available museum specimens and supplement their data with a downloadable VertNet dataset. Datasets include over 150 columns, requiring students to choose appropriate variables while accounting for errors that might occur in large datasets collected across many institutions. Students flex their statistical skills to examine their research question and present their results to the class, perhaps discovering that "Rules" were meant to be broken. By completing this module, students become familiar with how museums aid in research, gain confidence in asking and pursuing their own scientific questions, and practice managing and analyzing large datasets.
Supplementary material 1 from: Hody JW, Kays R (2018) Mapping the expansion...
zenodo.org
data.niaid.nih.gov
bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James W. Hody; Roland Kays; James W. Hody; Roland Kays (2020). Supplementary material 1 from: Hody JW, Kays R (2018) Mapping the expansion of coyotes (Canis latrans) across North and Central America. ZooKeys 759: 81-97. https://doi.org/10.3897/zookeys.759.15149 [Dataset]. http://doi.org/10.3897/zookeys.759.15149.suppl1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.3897/zookeys.759.15149.suppl1
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
James W. Hody; Roland Kays; James W. Hody; Roland Kays
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Central America
Description
Detailed list of references and data sources : Explanation note: List of references used to determine historical extent and regional first-occurrences of coyotes (Canis latrans) in North and Central America.
d
Biogeography of the worldâ€™s worst invasive species has spatially-biased...
search.dataone.org
Updated Jul 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Jenkins; Hannah Bevan; Wei Chen; Jacob Hart; Amanda Lindsay; Laura Macamo; Mekail Negash; Leo Ohyama; Alessandra Pandolfi; George Zaragoza (2025). Biogeography of the worldâ€™s worst invasive species has spatially-biased knowledge gaps but is predictable [Dataset]. http://doi.org/10.5061/dryad.zw3r228bh
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.zw3r228bh
Dataset updated
Jul 28, 2025
Dataset provided by
Dryad Digital Repository
Authors
David Jenkins; Hannah Bevan; Wei Chen; Jacob Hart; Amanda Lindsay; Laura Macamo; Mekail Negash; Leo Ohyama; Alessandra Pandolfi; George Zaragoza
Time period covered
Jan 1, 2022
Description
The worldâ€™s â€œ100 worst invasive speciesâ€ were listed in 2000. The list is taxonomically diverse and often cited (typically for single-species studies), and its species are frequently reported in global biodiversity databases. We acted on the principle that these notorious species should be well-reported to help answer two questions about global biogeography of invasive species (i.e., not just their invaded ranges): (1) â€œhow are data distributed globally?â€ and (2) â€œwhat predicts diversity?â€ We collected location data for each of the 100 species from multiple databases; 95 had sufficient data for analyses. For question (1), we mapped global species richness and cumulative occurrences since 2000 in (0.5 degree)2 grids. For question (2) we compared alternative regression models representing non-exclusive hypotheses for geography (i.e., spatial autocorrelation), sampling effort, climate, and anthropocentric effects. Reported locations of the invasive species were spatially-biased, leaving la..., Data Acquisition and Processing Data were acquired from multiple data bases for the 100 invasive species in February 2022 using the spocc package in R (Chamberlain 2021). Data sources (in alphabetical order) included: the Atlas of Living Australia ('ALA'; https://www.ala.org.au); eBird (http://www.ebird.org/home; Sullivan et al. 2009); the Integrated Digitized Biocollections ('iDigBio'; https://www.idigbio.org; Matsunaga et al. 2013); the Global Biodiversity Information Facility (GBIF (https://www.gbif.org); Ocean 'Biogeographic' Information System ('OBIS'; https://portal.obis.org; Grassle and Stocks 1999); VertNet (https://vertnet.org; Constable et al. 2010); and the US Geological Surveyâ€™s Biodiversity Information Serving Our Nation ('BISON'; replaced December 2021 by GBIF). Several databases set limits to 100,000 initial point records (before cleaning, described below) when accessed using spocc. As a result, data for 19 species with >100,000 point records (e.g., the European starli..., , # Biogeography of the worldâ€™s worst invasive species has spatially-biased knowledge gaps but is predictable

https://doi.org/10.5061/dryad.zw3r228bh

The provided datatoanalyze.csv file represents data further processed in provided R code to include spatial autocorrelation for each of species richness and cumulative occurences analyses.

Description of the data and file structure

The data file includes 59586 rows and 19 columns. NAs indicate missing data. Columns include:

a row ID

lon: longitude (decimal degrees) for the center of a 0.5 degree grid cell

lat: latitude (decimal degrees) for the center of a 0.5 degree grid cell

UN: the UN code for the country

ISO3: the ISO3 code for the country

NAME: the country name

Country: may be identical to NAME, but some differences (e.g., The Republic of ...) occur via different data sources

corrupt: the corruption score (range = -2.5 to 2.5) for the country, from the Wo...
Data from: The role of climate and species interactions in determining the...
data.niaid.nih.gov
datadryad.org
zip
Updated Dec 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandra Coconis; Kenneth Nussear; Rebecca Rowe; Angela Hornsby; Marjorie Matocq (2024). The role of climate and species interactions in determining the distribution of two elevationally segregated species of small mammals through time [Dataset]. http://doi.org/10.5061/dryad.mpg4f4r8q
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.mpg4f4r8q
Dataset updated
Dec 17, 2024
Dataset provided by
Bell Museum of Natural History
University of New Hampshire
University of Nevada, Reno
Authors
Alexandra Coconis; Kenneth Nussear; Rebecca Rowe; Angela Hornsby; Marjorie Matocq
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The relative importance of abiotic and biotic factors in determining species distributions has long been of interest to ecologists but is often difficult to assess due to the lack of spatially and temporally robust occurrence records. Furthermore, locating places where potentially highly competitive species co-occur may be challenging but would provide critical knowledge into the effects of competition on species ranges. We built species distribution models for two closely related species of small mammals (Neotoma) that are largely parapatric along mountainsides throughout the Great Basin Desert, USA using extensive modern occurrence records. We hindcasted these models to the mid-Holocene to compare the response of each species to dramatic climatic change and used paleontological records to validate our models. Model results showed species co-occurrence at mid-elevations along select mountain ranges in this region. We confirmed our model results with fine-scale field surveys in a single mountain range containing one of the most extensive survey datasets across an elevational gradient in the Great Basin. We found close alignment of realized distributions to the respective abiotic species distribution model predictions, despite the presence of the congener, indicating that climate may be more influential than competition in shaping distribution at the scale of a single mountain range. Our models also predict differential species responses to historic climate change, leading to a reduced probability of species interactions during warmer and dryer climatic conditions. Our results emphasize the utility of examining species distributions with regard to both abiotic variables and species interactions and at various spatial scales to make inferences about the mechanisms underlying distributional limits. Methods Occurrence records for species distribution models came from the Global Biodiversity Information Facility (GBIF) (https://www.gbif.org/, accessed January 2022) and VertNet (http://vertnet.org/, accessed January 2022), and data we collected from surveys throughout the Great Basin in the Summer and Fall of 2021. We cropped all records to the Great Basin ecoregion, the boundary of which was obtained from the United States Geological Survey (USGS) database. We constrained records to dates on or after 1950, and with less than or equal to one kilometer coordinate uncertainty to reflect the resolution of the data layers. We used the spatial analysis georeferencing accuracy (SAGA) protocol to georeference data with no recorded coordinate uncertainty (Bloom et al. 2018). We thinned locality data to 1km raster cells. From April through October 2022, we conducted surveys (24 sites, 6555 trap nights) for woodrats in the southern Snake Range of eastern Nevada. We also obtained mid-Holocene fossil and midden records assembled from the Neotoma Paleoecology Database (http://www.neotoma.db.org; January 2024) (Williams, Grimm et al. 2018), and primary literature (Grayson 1985, Terry et al. 2011) to validate mid-Holocene model projections. We filtered records by excluding records with uncertain identification, restricting to only those with a calibrated median age between 4,500 and 7,500 years before the present and max age <11,700 (Grayson 2011), and trimming all records to our Great Basin extent. All paleontological records included were morphologically identified to species.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Paula F. Zermoglio; Robert P. Guralnick; John R. Wieczorek (2023). A Standardized Reference Data Set for Vertebrate Taxon Name Resolution [Dataset]. http://doi.org/10.1371/journal.pone.0146894

A Standardized Reference Data Set for Vertebrate Taxon Name Resolution

Explore at:

10 scholarly articles cite this dataset (View in Google Scholar)

tiffAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0146894

Dataset updated

Jun 2, 2023

Dataset provided by

PLOS ONE

Authors

Paula F. Zermoglio; Robert P. Guralnick; John R. Wieczorek

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Taxonomic names associated with digitized biocollections labels have flooded into repositories such as GBIF, iDigBio and VertNet. The names on these labels are often misspelled, out of date, or present other problems, as they were often captured only once during accessioning of specimens, or have a history of label changes without clear provenance. Before records are reliably usable in research, it is critical that these issues be addressed. However, still missing is an assessment of the scope of the problem, the effort needed to solve it, and a way to improve effectiveness of tools developed to aid the process. We present a carefully human-vetted analysis of 1000 verbatim scientific names taken at random from those published via the data aggregator VertNet, providing the first rigorously reviewed, reference validation data set. In addition to characterizing formatting problems, human vetting focused on detecting misspelling, synonymy, and the incorrect use of Darwin Core. Our results reveal a sobering view of the challenge ahead, as less than 47% of name strings were found to be currently valid. More optimistically, nearly 97% of name combinations could be resolved to a currently valid name, suggesting that computer-aided approaches may provide feasible means to improve digitized content. Finally, we associated names back to biocollections records and fit logistic models to test potential drivers of issues. A set of candidate variables (geographic region, year collected, higher-level clade, and the institutional digitally accessible data volume) and their 2-way interactions all predict the probability of records having taxon name issues, based on model selection approaches. We strongly encourage further experiments to use this reference data set as a means to compare automated or computer-aided taxon name tools for their ability to resolve and improve the existing wealth of legacy data.

Clear search

Close search

Google apps

Main menu

A Standardized Reference Data Set for Vertebrate Taxon Name Resolution

Data from: The trouble with triplets in biodiversity informatics: a...

Data from: Georgia Southern University - Savannah Science Museum Herpetology...

Teaching Biodiversity with Museum Specimens in an Inquiry-Based Lab

Supplementary material 1 from: Hody JW, Kays R (2018) Mapping the expansion...

Biogeography of the worldâ€™s worst invasive species has spatially-biased...

Description of the data and file structure

Data from: The role of climate and species interactions in determining the...

A Standardized Reference Data Set for Vertebrate Taxon Name Resolution