7 datasets found
  1. f

    A Standardized Reference Data Set for Vertebrate Taxon Name Resolution

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    tiff
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paula F. Zermoglio; Robert P. Guralnick; John R. Wieczorek (2023). A Standardized Reference Data Set for Vertebrate Taxon Name Resolution [Dataset]. http://doi.org/10.1371/journal.pone.0146894
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Paula F. Zermoglio; Robert P. Guralnick; John R. Wieczorek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Taxonomic names associated with digitized biocollections labels have flooded into repositories such as GBIF, iDigBio and VertNet. The names on these labels are often misspelled, out of date, or present other problems, as they were often captured only once during accessioning of specimens, or have a history of label changes without clear provenance. Before records are reliably usable in research, it is critical that these issues be addressed. However, still missing is an assessment of the scope of the problem, the effort needed to solve it, and a way to improve effectiveness of tools developed to aid the process. We present a carefully human-vetted analysis of 1000 verbatim scientific names taken at random from those published via the data aggregator VertNet, providing the first rigorously reviewed, reference validation data set. In addition to characterizing formatting problems, human vetting focused on detecting misspelling, synonymy, and the incorrect use of Darwin Core. Our results reveal a sobering view of the challenge ahead, as less than 47% of name strings were found to be currently valid. More optimistically, nearly 97% of name combinations could be resolved to a currently valid name, suggesting that computer-aided approaches may provide feasible means to improve digitized content. Finally, we associated names back to biocollections records and fit logistic models to test potential drivers of issues. A set of candidate variables (geographic region, year collected, higher-level clade, and the institutional digitally accessible data volume) and their 2-way interactions all predict the probability of records having taxon name issues, based on model selection approaches. We strongly encourage further experiments to use this reference data set as a means to compare automated or computer-aided taxon name tools for their ability to resolve and improve the existing wealth of legacy data.

  2. d

    Data from: The trouble with triplets in biodiversity informatics: a...

    • dataone.org
    • datasetcatalog.nlm.nih.gov
    • +2more
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Guralnick; Tom Conlin; John Deck; Brian Stucky; Nico Cellinese; Brian J. Stucky (2025). The trouble with triplets in biodiversity informatics: a data-driven case against current identifier practices [Dataset]. http://doi.org/10.5061/dryad.4b115
    Explore at:
    Dataset updated
    May 28, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Robert Guralnick; Tom Conlin; John Deck; Brian Stucky; Nico Cellinese; Brian J. Stucky
    Time period covered
    Nov 5, 2015
    Description

    The biodiversity informatics community has discussed aspirations and approaches for assigning globally unique identifiers (GUIDs) to biocollections for nearly a decade. During that time, and despite misgivings, the de facto standard identifier has become the “Darwin Core Triplet†, which is a concatenation of values for institution code, collection code, and catalog number associated with biocollections material. Our aim is not to rehash the challenging discussions regarding which GUID system in theory best supports the biodiversity informatics use case of discovering and linking digital data across the Internet, but how well we can link those data together at this moment, utilizing the current identifier schemes that have already been deployed. We gathered Darwin Core Triplets from a subset of VertNet records, along with vertebrate records from GenBank and the Barcode of Life Data System, in order to determine how Darwin Core Triplets are deployed “in the wild†. We asked if those triple...

  3. Data from: Georgia Southern University - Savannah Science Museum Herpetology...

    • demo.gbif.org
    • gbif.org
    Updated Jun 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgia Southern University (2017). Georgia Southern University - Savannah Science Museum Herpetology Collection [Dataset]. http://doi.org/10.15468/nruxuc
    Explore at:
    Dataset updated
    Jun 8, 2017
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Georgia Southern University
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1956 - Dec 31, 1956
    Area covered
    Description

    The collection contains approximately 35,000 specimens of reptiles and amphibians. Most of the material is from southern Georgia, although some collections from other areas in the southeast are included. The collection contains representation over 95% of Georgia's herpetofauna and is the second largest collection in the state. Specimen data is digitized (Specify 6.4), and is available via VertNet and/or upon request. Contact Lance McBrayer for more information, loans, data requests, and/or a visit.

  4. q

    Teaching Biodiversity with Museum Specimens in an Inquiry-Based Lab

    • qubeshub.org
    Updated Nov 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisa Walsh*†; Cynthia Giffen†; Cody Thompson (2021). Teaching Biodiversity with Museum Specimens in an Inquiry-Based Lab [Dataset]. http://doi.org/10.24918/cs.2019.45
    Explore at:
    Dataset updated
    Nov 23, 2021
    Dataset provided by
    QUBES
    Authors
    Lisa Walsh*†; Cynthia Giffen†; Cody Thompson
    Description

    In response to the growth of biology datasets and broad efforts to digitize data, an increasingly important skill for science students is the management and analysis of large datasets. We designed an inquiry-based lab module to introduce students to museum research by quantitatively evaluating ecogeographical patterns using a VertNet dataset. VertNet is a free, NSF-funded database of museum specimens from over 100 research museums with spatial, temporal, and morphological data for thousands of individual specimens. Patterns observed by natural historians provide a context for students to enter the world of museum research. These patterns, especially in mammals, are largely associated with latitudinal gradients. For example, Bergmann's Rule states that animals are larger in colder environments, an adaptation to conserve energy in harsh climates. Allen's Rule states that endotherms in colder environments will have shorter extremities. After learning these general patterns, students develop questions to pursue for a particular group of mammals. Students measure available museum specimens and supplement their data with a downloadable VertNet dataset. Datasets include over 150 columns, requiring students to choose appropriate variables while accounting for errors that might occur in large datasets collected across many institutions. Students flex their statistical skills to examine their research question and present their results to the class, perhaps discovering that "Rules" were meant to be broken. By completing this module, students become familiar with how museums aid in research, gain confidence in asking and pursuing their own scientific questions, and practice managing and analyzing large datasets.

  5. Supplementary material 1 from: Hody JW, Kays R (2018) Mapping the expansion...

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James W. Hody; Roland Kays; James W. Hody; Roland Kays (2020). Supplementary material 1 from: Hody JW, Kays R (2018) Mapping the expansion of coyotes (Canis latrans) across North and Central America. ZooKeys 759: 81-97. https://doi.org/10.3897/zookeys.759.15149 [Dataset]. http://doi.org/10.3897/zookeys.759.15149.suppl1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    James W. Hody; Roland Kays; James W. Hody; Roland Kays
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Central America
    Description

    Detailed list of references and data sources : Explanation note: List of references used to determine historical extent and regional first-occurrences of coyotes (Canis latrans) in North and Central America.

  6. d

    Biogeography of the world’s worst invasive species has spatially-biased...

    • search.dataone.org
    Updated Jul 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Jenkins; Hannah Bevan; Wei Chen; Jacob Hart; Amanda Lindsay; Laura Macamo; Mekail Negash; Leo Ohyama; Alessandra Pandolfi; George Zaragoza (2025). Biogeography of the world’s worst invasive species has spatially-biased knowledge gaps but is predictable [Dataset]. http://doi.org/10.5061/dryad.zw3r228bh
    Explore at:
    Dataset updated
    Jul 28, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    David Jenkins; Hannah Bevan; Wei Chen; Jacob Hart; Amanda Lindsay; Laura Macamo; Mekail Negash; Leo Ohyama; Alessandra Pandolfi; George Zaragoza
    Time period covered
    Jan 1, 2022
    Description

    The world’s “100 worst invasive species†were listed in 2000. The list is taxonomically diverse and often cited (typically for single-species studies), and its species are frequently reported in global biodiversity databases. We acted on the principle that these notorious species should be well-reported to help answer two questions about global biogeography of invasive species (i.e., not just their invaded ranges): (1) “how are data distributed globally?†and (2) “what predicts diversity?†We collected location data for each of the 100 species from multiple databases; 95 had sufficient data for analyses. For question (1), we mapped global species richness and cumulative occurrences since 2000 in (0.5 degree)2 grids. For question (2) we compared alternative regression models representing non-exclusive hypotheses for geography (i.e., spatial autocorrelation), sampling effort, climate, and anthropocentric effects. Reported locations of the invasive species were spatially-biased, leaving la..., Data Acquisition and Processing Data were acquired from multiple data bases for the 100 invasive species in February 2022 using the spocc package in R (Chamberlain 2021). Data sources (in alphabetical order) included: the Atlas of Living Australia ('ALA'; https://www.ala.org.au); eBird (http://www.ebird.org/home; Sullivan et al. 2009); the Integrated Digitized Biocollections ('iDigBio'; https://www.idigbio.org; Matsunaga et al. 2013); the Global Biodiversity Information Facility (GBIF (https://www.gbif.org); Ocean 'Biogeographic' Information System ('OBIS'; https://portal.obis.org; Grassle and Stocks 1999); VertNet (https://vertnet.org; Constable et al. 2010); and the US Geological Survey’s Biodiversity Information Serving Our Nation ('BISON'; replaced December 2021 by GBIF). Several databases set limits to 100,000 initial point records (before cleaning, described below) when accessed using spocc. As a result, data for 19 species with >100,000 point records (e.g., the European starli..., , # Biogeography of the world’s worst invasive species has spatially-biased knowledge gaps but is predictable

    https://doi.org/10.5061/dryad.zw3r228bh

    The provided datatoanalyze.csv file represents data further processed in provided R code to include spatial autocorrelation for each of species richness and cumulative occurences analyses.

    Description of the data and file structure

    The data file includes 59586 rows and 19 columns. NAs indicate missing data. Columns include:

    • a row ID
    • lon: longitude (decimal degrees) for the center of a 0.5 degree grid cell
    • lat: latitude (decimal degrees) for the center of a 0.5 degree grid cell
    • UN: the UN code for the country
    • ISO3: the ISO3 code for the country
    • NAME: the country name
    • Country: may be identical to NAME, but some differences (e.g., The Republic of ...) occur via different data sources
    • corrupt: the corruption score (range = -2.5 to 2.5) for the country, from the Wo...
  7. Data from: The role of climate and species interactions in determining the...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandra Coconis; Kenneth Nussear; Rebecca Rowe; Angela Hornsby; Marjorie Matocq (2024). The role of climate and species interactions in determining the distribution of two elevationally segregated species of small mammals through time [Dataset]. http://doi.org/10.5061/dryad.mpg4f4r8q
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 17, 2024
    Dataset provided by
    Bell Museum of Natural History
    University of New Hampshire
    University of Nevada, Reno
    Authors
    Alexandra Coconis; Kenneth Nussear; Rebecca Rowe; Angela Hornsby; Marjorie Matocq
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The relative importance of abiotic and biotic factors in determining species distributions has long been of interest to ecologists but is often difficult to assess due to the lack of spatially and temporally robust occurrence records. Furthermore, locating places where potentially highly competitive species co-occur may be challenging but would provide critical knowledge into the effects of competition on species ranges. We built species distribution models for two closely related species of small mammals (Neotoma) that are largely parapatric along mountainsides throughout the Great Basin Desert, USA using extensive modern occurrence records. We hindcasted these models to the mid-Holocene to compare the response of each species to dramatic climatic change and used paleontological records to validate our models. Model results showed species co-occurrence at mid-elevations along select mountain ranges in this region. We confirmed our model results with fine-scale field surveys in a single mountain range containing one of the most extensive survey datasets across an elevational gradient in the Great Basin. We found close alignment of realized distributions to the respective abiotic species distribution model predictions, despite the presence of the congener, indicating that climate may be more influential than competition in shaping distribution at the scale of a single mountain range. Our models also predict differential species responses to historic climate change, leading to a reduced probability of species interactions during warmer and dryer climatic conditions. Our results emphasize the utility of examining species distributions with regard to both abiotic variables and species interactions and at various spatial scales to make inferences about the mechanisms underlying distributional limits. Methods Occurrence records for species distribution models came from the Global Biodiversity Information Facility (GBIF) (https://www.gbif.org/, accessed January 2022) and VertNet (http://vertnet.org/, accessed January 2022), and data we collected from surveys throughout the Great Basin in the Summer and Fall of 2021. We cropped all records to the Great Basin ecoregion, the boundary of which was obtained from the United States Geological Survey (USGS) database. We constrained records to dates on or after 1950, and with less than or equal to one kilometer coordinate uncertainty to reflect the resolution of the data layers. We used the spatial analysis georeferencing accuracy (SAGA) protocol to georeference data with no recorded coordinate uncertainty (Bloom et al. 2018). We thinned locality data to 1km raster cells. From April through October 2022, we conducted surveys (24 sites, 6555 trap nights) for woodrats in the southern Snake Range of eastern Nevada. We also obtained mid-Holocene fossil and midden records assembled from the Neotoma Paleoecology Database (http://www.neotoma.db.org; January 2024) (Williams, Grimm et al. 2018), and primary literature (Grayson 1985, Terry et al. 2011) to validate mid-Holocene model projections. We filtered records by excluding records with uncertain identification, restricting to only those with a calibrated median age between 4,500 and 7,500 years before the present and max age <11,700 (Grayson 2011), and trimming all records to our Great Basin extent. All paleontological records included were morphologically identified to species.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Paula F. Zermoglio; Robert P. Guralnick; John R. Wieczorek (2023). A Standardized Reference Data Set for Vertebrate Taxon Name Resolution [Dataset]. http://doi.org/10.1371/journal.pone.0146894

A Standardized Reference Data Set for Vertebrate Taxon Name Resolution

Explore at:
10 scholarly articles cite this dataset (View in Google Scholar)
tiffAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Paula F. Zermoglio; Robert P. Guralnick; John R. Wieczorek
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Taxonomic names associated with digitized biocollections labels have flooded into repositories such as GBIF, iDigBio and VertNet. The names on these labels are often misspelled, out of date, or present other problems, as they were often captured only once during accessioning of specimens, or have a history of label changes without clear provenance. Before records are reliably usable in research, it is critical that these issues be addressed. However, still missing is an assessment of the scope of the problem, the effort needed to solve it, and a way to improve effectiveness of tools developed to aid the process. We present a carefully human-vetted analysis of 1000 verbatim scientific names taken at random from those published via the data aggregator VertNet, providing the first rigorously reviewed, reference validation data set. In addition to characterizing formatting problems, human vetting focused on detecting misspelling, synonymy, and the incorrect use of Darwin Core. Our results reveal a sobering view of the challenge ahead, as less than 47% of name strings were found to be currently valid. More optimistically, nearly 97% of name combinations could be resolved to a currently valid name, suggesting that computer-aided approaches may provide feasible means to improve digitized content. Finally, we associated names back to biocollections records and fit logistic models to test potential drivers of issues. A set of candidate variables (geographic region, year collected, higher-level clade, and the institutional digitally accessible data volume) and their 2-way interactions all predict the probability of records having taxon name issues, based on model selection approaches. We strongly encourage further experiments to use this reference data set as a means to compare automated or computer-aided taxon name tools for their ability to resolve and improve the existing wealth of legacy data.

Search
Clear search
Close search
Google apps
Main menu