The Global Biodiversity Information Facility (GBIF) is an international network and data infrastructure funded by the world's governments providing global data that document the occurrence of species. GBIF currently integrates datasets documenting over 1.6 billion species occurrences, growing daily. The GBIF occurrence dataset combines data from a wide array of sources including specimen-related data from natural history museums, observations from citizen science networks and environment recording schemes. While these data are constantly changing at GBIF.org, periodic snapshots are taken and made available on AWS.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is the latest version of the dataset initially published to GBIF by the Invasive Species Specialist Group (ISSG) on behalf of the U.S. Geological Survey on October 12, 2020, at https://www.gbif.org/dataset/6b64ef7e-82f7-47a3-8ddb-ec6794ea07d6. Like that checklist, this version presents validated and verified national checklists of introduced (alien) and invasive alien species at the sub-country level. The other two related checklists for the United States, also newly published separately as V2.0, are for the States of Alaska and Hawaii.
Differences between two previous versions and ver.2.0, 2022 (this dataset): SIZE: the first version V1.0 - 5,006 accepted names (arthropods were not included); the previous version - 8,654 accepted names and two unranked hybrids; ver.2.0, 2022 (this dataset) - 8,525 accepted names and two unranked hybrids. OTHER DIFFERENCES: the previous version provided: a broader inclusion of arthropods; approximate dates of introduction (where available); 4,693 references; improved disambiguation of scientific names; biocontrol species information (where applicable); taxonomic synonyms, where available, in taxonRemarks field; unique occurrenceIDs; no habitat information; ver.2.0, 2022 (this dataset) adds pathway and habitat information, where available, more precise management of names and synonyms (and so is smaller than the previous version), and additional data on approximate dates of introduction.
OVERVIEW: Introduced (non-native) species that becomes established may eventually become invasive, so tracking introduced species provides a baseline for effective modeling of species trends and interactions, geospatially and temporally. The umbrella dataset, called United States Register of Introduced and Invasive Species (US-RIIS), is comprised of three lists, one each for Alaska (AK, with 545 records), Hawaii (HI, with 5,628 records), and the conterminous (or lower 48) United States (L48, with 8,527 records, this dataset). Each list includes introduced (non-native), established (reproducing) taxa that: are, or may become, invasive (harmful) in the locality; are not known to be harmful there; and/or have been used for biological control in the locality.
To be included in the Global Register of Introduced and Invasive Species - United States (Contiguous), or GRIIS-L48 (with L48 meaning the Lower 48 Conterminous United States), a taxon must be non-native everywhere in the locality and established (reproducing) anywhere in the locality. Native pest species are not included.
Each record has information on taxonomy, a vernacular name, establishment means designation (introduced unintentionally, or assisted colonization), degree of establishment (established, invasive, or widespread invasive), hybrid status, pathway of introduction (where available), habitat (where available), whether a biocontrol species, dates of introduction (where available; currently 46% of the records for the conterminous United States), associated taxa (where applicable), native and introduced distributions (where available), and citations for the authoritative source(s) from which this information is drawn. The umbrella dataset US-RIIS builds on a previous dataset, A Comprehensive List of Non-Native Species Established in Three Major Regions of the U.S.: Version 3.0 (Simpson et al., 2020, https://doi.org/10.5066/p9e5k160).
There are 14,700 records in the master list (USRIISv2_MasterList) and 12,571 unique scientific names. The list is derived from more than 5,800 authoritative sources (USRIISv2_AuthorityReferences) and was reviewed by (or based on input from) more than 30 taxonomic experts and invasive species scientists.
Many thanks to these reviewers and contributors: Coauthors Pam Fuller (USGS Emeritus), Kevin Faccenda (University of Hawaii), Neal Evenhuis (Bishop Museum), Janis Matsunaga (Hawaii Department of Agriculture), and Matt Bowser (US-Fish and Wildlife Service); contributors Rachael Blake (data science), National Socio-Environmental Synthesis Center (SESYNC); M. Lourdes Chamorro (Curculionidae), USDA-ARS Entomology; Meghan C. Eyler (data reviewer), US Fish & Wildlife Service; Danielle Froelich (Hawaiian botany), SWCA Environmental Consultants; Thomas Henry (Heteroptera), USDA-ARS Entomology; Sam James (Annelida), Maharishi University; Nancy Khan (Hawaiian botany), Smithsonian Institution; Alex Konstantinov (Chrysomelidae), USDA-ARS Entomology; Andrew P. Landsman (Arachnida), National Park Service, C&O Canal National Historical Park; Christopher Lepczyk (Vertebrata), Auburn University; Sandy Liebhold (Coleoptera), USDA-FS; Steven Lingafelter (Cerambycidae), USDA-APHIS; Walter Meshaka (Herpetology), State Museum of Pennsylvania; Gary L. Miller (Aphididae), USDA-ARS Entomology; Allen Norrbom (Tephritidae), USDA-ARS Entomology; Shyama Pagad (global invasive species), IUCN SSC Invasive Species Specialists' Group; John Reynolds (Annelida), Oligochaetology Laboratory; Alexander Salazar (Lycosidae), Miami University, Ohio; Elizabeth A. Sellers (data manager), USGS; Derek Sikes (Alaskan invertebrates), University of Alaska; Bruce A. Snyder (Annelida), Georgia College and State University; Alma Solis (Pyralid moths), USDS-ARS at the Smithsonian Institution; Rebecca Turner (data manager), Scion Inc., New Zealand; Darrell Ubick (Arachnida), Cal Academy; Warren Wagner (Hawaiian botany), Smithsonian Institution; Mark Wetzel (Annelida), Illinois Natural History Survey; and James D. Young (Lepidoptera), USDA-APHIS-PPQ-PHP. Our apologies to the many contributing experts we may have inadvertently omitted.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Species occurrence records for native and non-native bees, wasps and other insects collected using mainly pan, malaise, and vane trapping; and insect netting methods in Canada, Mexico, the non-contiguous United States, U.S. Territories (specifically U.S. Virgin Islands), U.S. Minor Outlying Islands and other global locations with the bulk of the specimens coming from the Eastern United States often from Federal lands such as USFWS, NPS, DOD, USFS. Some records also contain notes regarding plants or substrates from which insects were collected or that were present and/or in flower at the time the insects were collected. Unless otherwise noted, taxonomic determinations (identifications) were completed by Sam Droege (USGS Eastern Ecological Science Center- EESC, Native Bee Laboratory) and Clare Maffei (USFWS, Inventory and Monitoring Branch).
The EESC Native Bee Lab currently keeps only a small synoptic collection, rare and voucher specimens are deposited in the Smithsonian National Collection (NMNH) and widely distributed to other institutions for DNA, revisions, and augmentation of existing collections. Surplus specimens are also made available to students to learn their identifications. Corrections to any of our determinations are always welcomed. Common species that are not in demand for surplus are usually destroyed and the pins recycled. Recent revisions to Lasioglossum, Ceratina, and to a much lesser extent Triepeolus and Epeolus and other small groups have rendered determinations prior to those revisions out of date for species involved in name changes and users should account for that during analyses. Current data (included information on specimen codes without identifications) are always available without charge directly from Sam Droege.
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
This dataset provides a direct internet link to FSM's data hosted on the GBIF website / records.
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Dataset that provides a direct link to PNG's data hosted on the GBIF website/ records.
Contact emails: info@gbif.org / helpdesk@gbif.org
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Global Biodiversity Information Facility (GBIF) indexes thousands of biodiversity datasets from Natural History Collections, citizen science initiatives (e.g., iNaturalist, eBird), and other sources. As part of the index process, GBIF associates at least two identifiers with indexed records: a record id (aka gbifID) and a dataset id (aka dataset key). These ids are central to do lookup, reference data, and package interpreted data products.
This publication contains an exhaustive list of GBIF IDs and ids associated by their data providers as derived from:
GBIF.org (01 March 2023) GBIF Occurrence Download https://doi.org/10.15468/dl.pk3trq
The resource (size: ~260GB) provided by GBIF had content id hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97 and was used to generate the resource included in this publication using
preston cat 'zip:hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97!/0015281-230224095556074.csv'
| cut -f 1,2,3,37,38,39
| gzip\
gbifid.tsv.gz
with the content id of gbifid.tsv.gz (size: ~35GB) being hash://sha256/a339e32e10edaad585f61f2ded06cbb23e0618c65a6360db18d7d729054940a8 .
the first 10 lines of gbifid.tsv.gz as extracted via
preston cat --remote https://zenodo.org/record/7789866/files,https://linker.bio hash://sha256/a339e32e10edaad585f61f2ded06cbb23e0618c65a6360db18d7d729054940a8
| gunzip
| head
are:
gbifID datasetKey occurrenceID institutionCode collectionCode catalogNumber 2997162320 c71c8000-9fc7-422c-804a-ce6abe751771 3399442 CEPEC CEPEC CEPEC00109669 2997162309 c71c8000-9fc7-422c-804a-ce6abe751771 2733085 CEPEC CEPEC CEPEC00000818 2997162317 c71c8000-9fc7-422c-804a-ce6abe751771 2733086 CEPEC CEPEC CEPEC00000888 2997162313 c71c8000-9fc7-422c-804a-ce6abe751771 3399443 CEPEC CEPEC CEPEC00109744 2997162306 c71c8000-9fc7-422c-804a-ce6abe751771 2733087 CEPEC CEPEC CEPEC00000889 2997162316 c71c8000-9fc7-422c-804a-ce6abe751771 3399440 CEPEC CEPEC CEPEC00109605 2997162324 c71c8000-9fc7-422c-804a-ce6abe751771 2733088 CEPEC CEPEC CEPEC00000890 2997162308 c71c8000-9fc7-422c-804a-ce6abe751771 3399441 CEPEC CEPEC CEPEC00109615 2997162303 c71c8000-9fc7-422c-804a-ce6abe751771 2733089 CEPEC CEPEC CEPEC00000891
Note that at time of writing, the html resource associated with the occurrence id 2997162320, and data set key c71c8000-9fc7-422c-804a-ce6abe751771 (extracted from of the first data row example above) are available via:
https://gbif.org/occurrence/2997162320
and
https://gbif.org/dataset/c71c8000-9fc7-422c-804a-ce6abe751771
respectively.
This resource was initially created to help integrate with Bionomia (https://bionomia.net) to help associate people identifiers provided by bionomia to their original records via their GBIF ids. Bionomia re-uses GBIF records ids as a way to define links between records and the people (e.g., curators, collectors, identifiers) that worked on them.
In other words, this resource provides a versioned translation table from the GBIF data universe (as defined by GBIF record ids, and dataset keys) to the data collections that exist (and evolve) independent of it.
Note that the resource identified by hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97 was not included in this publication it was too big (260GB) to fit. You may be able to retrieve the resource from its original location at https://api.gbif.org/v1/occurrence/download/request/0015281-230224095556074.zip .
https://pacific-data.sprep.org/dataset/data-portal-license-agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588https://pacific-data.sprep.org/dataset/data-portal-license-agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588
Dataset that provides a direct link to Cook Island's data hosted on the GBIF website / records.
GBIF, the Global Biodiversity Information Facility, is an international network and data infrastructure funded by the world's governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth. Coordinated through its Secretariat in Copenhagen, the GBIF network of participating countries and organizations, working through participant nodes, provides data-holding institutions around the world with common standards and open-source tools that enable them to share information about where and when species have been recorded. This knowledge derives from many sources, including everything from museum specimens collected in the 18th and 19th century to geotagged smartphone photos shared by amateur naturalists in recent days and weeks. The GBIF network draws all these sources together through the use of data standards, such as Darwin Core, which forms the basis for the bulk of GBIF.org's index of hundreds of millions of species occurrence records. Publishers provide open access to their datasets using machine-readable Creative Commons licence designations, allowing scientists, researchers and others to apply the data in hundreds of peer-reviewed publications and policy papers each year. Many of these analyses, which cover topics from the impacts of climate change and the spread of invasive and alien pests to priorities for conservation and protected areas, food security and human health, would not be possible without this. GBIF arose from a 1999 recommendation by the Biodiversity Informatics Subgroup of the Organization for Economic Cooperation and Development's Megascience Forum. This report concluded that "An international mechanism is needed to make biodiversity data and information accessible worldwide", arguing that this mechanism could produce many economic and social benefits and enable sustainable development by providing sound scientific evidence.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Publication date:
2022-12-06T07:37:19-06:00
A Repackaged Taxonomic Backbone of Global Biodiversity Information Facility (GBIF)
---
Global Biodiversity Information Facility (GBIF) facilitates access to billions of biodiversity data records. These records include detailed accounts of life on earth.
To help records of specific life forms, GBIF provides a taxonomic backbone [1,2]. This backbone contains a long list of names used to describe species and associated hierarchies and taxonomic publications. These lists are sourced from datasets around the world.
At time of writing (6 Dec 2022), GBIF publishes a simplified version of their taxonomic backbone at [https://hosted-datasets.gbif.org/datasets/backbone/](https://hosted-datasets.gbif.org/datasets/backbone/) [1].
This repository provides script to pre-process https://hosted-datasets.gbif.org/datasets/backbone/current/simple.txt.gz to help facilitate access and improve performance of the creation of search indexes.
Pre-process steps currently include:
1. reducing amount of columns
2. reverse sort by id
3. reverse sort by name
Contents
---
README:
this file
repackage-gbif-backbone.sh:
script used to repackage GBIF Simple Backbone.
repackage-gbif-backbone.log:
log of repackaging of GBIF Simple Backbone.
backbone-current-simple.txt.gz:
original GBIF backbone archive
gbif-backbone-by-name.tsv.gz:
two columns, gzipped, tab-separated text file with columns name, and id
reverse sorted by name
gbif-backbone-by-name.tsv.sha256:
sha256 hash of the uncompressed gbif-backbone-by-name.tsv.gz
gbif-backbone-by-id.tsv.gz:
20 columns, gzipped, tab-separated text file with first 20 columns of repackaged GBIF backbone file
reverse sorted by id
gbif-backbone-by-id.tsv.sha256:
sha256 hash of the uncompressed gbif-backbone-by-id.tsv.gz
References
---
[1] Simplied GBIF Backbone Taxonomy. Accessed at https://hosted-datasets.gbif.org/datasets/backbone/ on 2022-12-06.
[2] GBIF Secretariat (2021). GBIF Backbone Taxonomy. Checklist dataset https://doi.org/10.15468/39omei accessed via GBIF.org on 2021-08-18.
Hash URIs
---
This publication includes the following content uris:
hash://sha256/82d5f2153b4533322692d95eeb18b0f103e1b2297e38bd9ea935b07ba86cd7d5
hash://sha256/50c155f66efb2efba0b8b624f8541e81cbe16a701d420a5073791fb993f72919
hash://sha256/9cd7d4c91292d86c726210446cd6fe45602505a7c0ea3b7c4f4f481f85f193ad (uncompressed)
hash://sha256/f950dde25cce9ba9cce67caa1c68ce0c99cb31fe2dc9658fec85a987d9f31654
hash://sha256/f21c6b90f17c6083fcfb4853f3c581dcc2aadd291691fa128392a205321f420b (uncompressed)
hash://sha256/5e0a4d1d2d1cccbdcc6b2c9831fafe61c54eb055f2d13ec40d9ac161889b9f89
hash://sha256/f6e477133d0585706ee5522963b204200cb3cd198f011cbf62be0fa8519763b5 (uncompressed)
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the digitized treatments in Plazi based on the original journal article Tarter, Donald C., Chaffee, Dwight L., Grubbs, Scott A., DeWalt, R. Edward (2015): New State Records Of Kentucky (Usa) Stoneflies (Plecoptera). Illiesia 11 (13): 167-174, DOI: 10.5281/zenodo.4752800
GBIF —the Global Biodiversity Information Facility—is an international network and data infrastructure funded by the world's governments providing global data that document the occurrence of species. GBIF integrates datasets from around the world and currently documents more than two billion species occurrences. The GBIF occurrence dataset combines data from a wide array of sources including specimen-related data from natural history museums, observations from citizen science networks and environment recording schemes. These data, which change constantly at GBIF.org, are compiled in periodic snapshots and made available on cloud-computing platforms like Google Big Query. The field names in the occurrences table are based on the Darwin Core standard . Further details are described here . This dataset is also available for access in Google Cloud Storage .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a filtered dataset from GBIF including all Bombyliidae observations identified to the species level in the North American range. Data was downloaded using rgbif::occ_download and accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2024-11-10. The original unfiltered GBIF occurrences can be download at https://doi.org/10.15468/dl.p2tn6p, and https://api.gbif.org/v1/occurrence/download/request/0005494-241107131044228.zip. The data is filtered to have coordinates in North America, no geospatial issues, no duplicates across species and coordinates, no coordinate uncertainty greater than 1 kilometer, and no occurrences lying within 1km of a college or university. This dataset is incomplete as it does not include ALL observations that occur in North American countries as observations lacking a continent field of "north_america" in GBIF are not included. The data was uploaded to Zenodo after filtering with the following DOI: 10.5281/zenodo.14062514.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
boettiger-lab/gbif dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset containing 90 species occurrences available in GBIF matching the query: DatasetKey: Plasm bearing foraminifera counts of multinet M21/2_MSN639. The dataset includes 90 records from 1 constituent datasets: 90 records from Plasm bearing foraminifera counts of multinet M21/2_MSN639. Data from some individual datasets included in this download may be licensed under less restrictive terms.
A dataset containing 473788584 species occurrences available in GBIF matching the query: { "or" : [ "Country is Mexico", "Country is United States of America" ] } The dataset includes 473788584 records from 3642 constituent datasets; see https://api.gbif.org/v1/occurrence/download/0007874-190415153152247/datasets/export for details. Data from some individual datasets included in this download may be licensed under less restrictive terms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains all the presence records of plants, beetles, chironomids, foraminifera and diatoms contained in the GBIF database in September 2024. This new version of the database has a new, refined spatial resolution at 5min (each grid cell in the previous version is now parted in 9 sub grid cells). The curation of the input data has also been largely improved.The coordinates of the presence records have been homogenised on a 0.083x0.083° grid, and corresponding bioclimatic values from the Worldclim2.0 database have been added.These data are formatted and ready to use by the crestr R package. More information about the data is available https://www.manuelchevalier.com/crestr/articles/calibration-data.html.To download the latest version of the database, please follow this link: https://figshare.com/articles/GBIF_for_CREST_database/6743207Please cite all the appropriate datasets from the following list:GBIF.org (23 August 2024) GBIF Occurrence Download Part 1. https://doi.org/10.15468/dl.7bvejkGBIF.org (23 August 2024) GBIF Occurrence Download Part 2. https://doi.org/10.15468/dl.mpfc47GBIF.org (23 August 2024) GBIF Occurrence Download Part 3. https://doi.org/10.15468/dl.nuq5tnGBIF.org (23 August 2024) GBIF Occurrence Download Part 4. https://doi.org/10.15468/dl.q8zuhhGBIF.org (24 August 2024) GBIF Occurrence Download Part 5. https://doi.org/10.15468/dl.qwcs68GBIF.org (24 August 2024) GBIF Occurrence Download Part 6. https://doi.org/10.15468/dl.y9kpwcGBIF.org (24 August 2024) GBIF Occurrence Download Part 7. https://doi.org/10.15468/dl.uk2xv6GBIF.org (25 August 2024) GBIF Occurrence Download Part 8. https://doi.org/10.15468/dl.zgmnq9GBIF.org (26 August 2024) GBIF Occurrence Download Part 9. https://doi.org/10.15468/dl.68hqxg
The purpose of this dataset is to evaluate herptile species' biodiversity in California's southwest desert. Species data was downloaded from the Global Diversity Information Facility (GBIF). GBIF.org (21 September 2022) GBIF Occurrence Download https://doi.org/10.15468/dl.5jhd82
https://pacific-data.sprep.org/dataset/data-portal-license-agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588https://pacific-data.sprep.org/dataset/data-portal-license-agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588
Dataset that provides a direct link to Nauru's data hosted on the GBIF website/records.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A dataset containing 797313708 species occurrences available in GBIF matching the query: All data. The dataset includes 797313708 records from 17376 constituent datasets: Please see http://www.gbif.org/occurrence/download/0000507-170826194755519 for full list of all constituents.
The Global Biodiversity Information Facility (GBIF) is an international network and data infrastructure funded by the world's governments providing global data that document the occurrence of species. GBIF currently integrates datasets documenting over 1.6 billion species occurrences, growing daily. The GBIF occurrence dataset combines data from a wide array of sources including specimen-related data from natural history museums, observations from citizen science networks and environment recording schemes. While these data are constantly changing at GBIF.org, periodic snapshots are taken and made available on AWS.