The Global Biodiversity Information Facility (GBIF) is an international network and data infrastructure funded by the world's governments providing global data that document the occurrence of species. GBIF currently integrates datasets documenting over 1.6 billion species occurrences, growing daily. The GBIF occurrence dataset combines data from a wide array of sources including specimen-related data from natural history museums, observations from citizen science networks and environment recording schemes. While these data are constantly changing at GBIF.org, periodic snapshots are taken and made available on AWS.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Observations from minka-sdg.org, MINKA Citizen Science Observatory is a community-based platform dedicated to biodiveristy and environmental data collection, utilising geolocalized images and observations uploaded by citizens through a mobile app and website. The dataset is produced by the BioPlatgesMet project, nested within MINKA, focuses on documenting and monitoring biodiversity in Barcelona's urban beach areas. This project highlights the dynamic dune ecosystems and engages the local community, naturalists, students, and enthusiasts in data collection. MINKA is a platform coordinated by the ICM-CSIC and the project BioPlatgesMet by AMB in Barcelona.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Specimens preserved at Sala de Colecciones Biológicas Universidad Católica del Norte (SCBUCN), Facultad de Ciencias del Mar, Coquimbo.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Global Biodiversity Information Facility (GBIF) indexes thousands of biodiversity datasets from Natural History Collections, citizen science initiatives (e.g., iNaturalist, eBird), and other sources. As part of the index process, GBIF associates at least two identifiers with indexed records: a record id (aka gbifID) and a dataset id (aka dataset key). These ids are central to do lookup, reference data, and package interpreted data products.
This publication contains an exhaustive list of GBIF IDs and ids associated by their data providers as derived from:
GBIF.org (01 March 2023) GBIF Occurrence Download https://doi.org/10.15468/dl.pk3trq
The resource (size: ~260GB) provided by GBIF had content id hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97 and was used to generate the resource included in this publication using
preston cat 'zip:hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97!/0015281-230224095556074.csv'
| cut -f 1,2,3,37,38,39
| gzip\
gbifid.tsv.gz
with the content id of gbifid.tsv.gz (size: ~35GB) being hash://sha256/a339e32e10edaad585f61f2ded06cbb23e0618c65a6360db18d7d729054940a8 .
the first 10 lines of gbifid.tsv.gz as extracted via
preston cat --remote https://zenodo.org/record/7789866/files,https://linker.bio hash://sha256/a339e32e10edaad585f61f2ded06cbb23e0618c65a6360db18d7d729054940a8
| gunzip
| head
are:
gbifID datasetKey occurrenceID institutionCode collectionCode catalogNumber 2997162320 c71c8000-9fc7-422c-804a-ce6abe751771 3399442 CEPEC CEPEC CEPEC00109669 2997162309 c71c8000-9fc7-422c-804a-ce6abe751771 2733085 CEPEC CEPEC CEPEC00000818 2997162317 c71c8000-9fc7-422c-804a-ce6abe751771 2733086 CEPEC CEPEC CEPEC00000888 2997162313 c71c8000-9fc7-422c-804a-ce6abe751771 3399443 CEPEC CEPEC CEPEC00109744 2997162306 c71c8000-9fc7-422c-804a-ce6abe751771 2733087 CEPEC CEPEC CEPEC00000889 2997162316 c71c8000-9fc7-422c-804a-ce6abe751771 3399440 CEPEC CEPEC CEPEC00109605 2997162324 c71c8000-9fc7-422c-804a-ce6abe751771 2733088 CEPEC CEPEC CEPEC00000890 2997162308 c71c8000-9fc7-422c-804a-ce6abe751771 3399441 CEPEC CEPEC CEPEC00109615 2997162303 c71c8000-9fc7-422c-804a-ce6abe751771 2733089 CEPEC CEPEC CEPEC00000891
Note that at time of writing, the html resource associated with the occurrence id 2997162320, and data set key c71c8000-9fc7-422c-804a-ce6abe751771 (extracted from of the first data row example above) are available via:
https://gbif.org/occurrence/2997162320
and
https://gbif.org/dataset/c71c8000-9fc7-422c-804a-ce6abe751771
respectively.
This resource was initially created to help integrate with Bionomia (https://bionomia.net) to help associate people identifiers provided by bionomia to their original records via their GBIF ids. Bionomia re-uses GBIF records ids as a way to define links between records and the people (e.g., curators, collectors, identifiers) that worked on them.
In other words, this resource provides a versioned translation table from the GBIF data universe (as defined by GBIF record ids, and dataset keys) to the data collections that exist (and evolve) independent of it.
Note that the resource identified by hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97 was not included in this publication it was too big (260GB) to fit. You may be able to retrieve the resource from its original location at https://api.gbif.org/v1/occurrence/download/request/0015281-230224095556074.zip .
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The Luther entomological research collection, one of the collections of the Hoslett Museum of Natural History at Luther College in Decorah, Iowa, is an important repository of Northeast Iowa insect biodiversity and includes many state record specimens (insect species not previously found in Iowa) not found in the Iowa State University insect collection. The LERC has a unique role specializing in the documentation of insect biodiversity of the driftless region in NE Iowa, SE Minnesota, and SW Wisconsin.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Publication date:
2022-12-06T07:37:19-06:00
A Repackaged Taxonomic Backbone of Global Biodiversity Information Facility (GBIF)
---
Global Biodiversity Information Facility (GBIF) facilitates access to billions of biodiversity data records. These records include detailed accounts of life on earth.
To help records of specific life forms, GBIF provides a taxonomic backbone [1,2]. This backbone contains a long list of names used to describe species and associated hierarchies and taxonomic publications. These lists are sourced from datasets around the world.
At time of writing (6 Dec 2022), GBIF publishes a simplified version of their taxonomic backbone at [https://hosted-datasets.gbif.org/datasets/backbone/](https://hosted-datasets.gbif.org/datasets/backbone/) [1].
This repository provides script to pre-process https://hosted-datasets.gbif.org/datasets/backbone/current/simple.txt.gz to help facilitate access and improve performance of the creation of search indexes.
Pre-process steps currently include:
1. reducing amount of columns
2. reverse sort by id
3. reverse sort by name
Contents
---
README:
this file
repackage-gbif-backbone.sh:
script used to repackage GBIF Simple Backbone.
repackage-gbif-backbone.log:
log of repackaging of GBIF Simple Backbone.
backbone-current-simple.txt.gz:
original GBIF backbone archive
gbif-backbone-by-name.tsv.gz:
two columns, gzipped, tab-separated text file with columns name, and id
reverse sorted by name
gbif-backbone-by-name.tsv.sha256:
sha256 hash of the uncompressed gbif-backbone-by-name.tsv.gz
gbif-backbone-by-id.tsv.gz:
20 columns, gzipped, tab-separated text file with first 20 columns of repackaged GBIF backbone file
reverse sorted by id
gbif-backbone-by-id.tsv.sha256:
sha256 hash of the uncompressed gbif-backbone-by-id.tsv.gz
References
---
[1] Simplied GBIF Backbone Taxonomy. Accessed at https://hosted-datasets.gbif.org/datasets/backbone/ on 2022-12-06.
[2] GBIF Secretariat (2021). GBIF Backbone Taxonomy. Checklist dataset https://doi.org/10.15468/39omei accessed via GBIF.org on 2021-08-18.
Hash URIs
---
This publication includes the following content uris:
hash://sha256/82d5f2153b4533322692d95eeb18b0f103e1b2297e38bd9ea935b07ba86cd7d5
hash://sha256/50c155f66efb2efba0b8b624f8541e81cbe16a701d420a5073791fb993f72919
hash://sha256/9cd7d4c91292d86c726210446cd6fe45602505a7c0ea3b7c4f4f481f85f193ad (uncompressed)
hash://sha256/f950dde25cce9ba9cce67caa1c68ce0c99cb31fe2dc9658fec85a987d9f31654
hash://sha256/f21c6b90f17c6083fcfb4853f3c581dcc2aadd291691fa128392a205321f420b (uncompressed)
hash://sha256/5e0a4d1d2d1cccbdcc6b2c9831fafe61c54eb055f2d13ec40d9ac161889b9f89
hash://sha256/f6e477133d0585706ee5522963b204200cb3cd198f011cbf62be0fa8519763b5 (uncompressed)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Biodiversity of the Weddell Sea: occurrence records of the macrozoobenthic species (demersal fish included) sampled during the expedition ANT XIII/3 (EASIZ I) with RV "Polarstern"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was compiled after a careful data-collection and cleaning procedure over four years for Hanieh Saeedi PhD project. Data were collected using field sampling, literature and museum collections. Then all the records went through quality control procedures such as validating the taxonomy of the species by examining and re-identifying the specimens in museum collections and using taxonomic and geographic data quality control tools in the World Register of Marine Species (WoRMS) and the robis package (Provoost and Bosch 2017). This dataset can thus be further used for taxonomical and biogeographical studies of Solenidae.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This is the DiGIR provider for CeDAMar.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains information on the algae specimens registered so far in the herbarium of the Swedish Museum of Natural History.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The extensive African Rodentia specimen and tissue collections of the Royal Museum for Central Africa (RMCA), the Royal Belgian Institute of Natural Sciences (RBINS) and the University of Antwerp (UA) provide taxonomical, ecological, geographical and genetic information, as well as measurements and data on parasitic and viral infections. The scientific importance of these collections is that, although numerous African rats and mice have been described over the last 150 years, many species descriptions are based on very few specimens.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The bird collections of the Estación Biológica de Doñana (EBD), as the other vertebrate collections, were mainly originated at the beginning and later consolidation of the Institute during the sixties and seventies. First of all, they are the result of the interest in comparative anatomy and biogeographic studies of the two first directors of the EBD Prof. JA Valverde and Prof. J Castroviejo. They promoted and led several biodiversity projects that were carried out world-wide and helped fulfil the necessity of reference systematic material, hardly accessible in Spain at that time. The bird collections are of outstanding scientific interest not only because of the volume of specimens housed in (more than 30.000) but also because of the areas represented in its high taxonomic diversity (about 1500 species belonging to 130 families) and finally, because it holds good series of unique species. The geographic areas covered by the collections are: Palaearctic (mainly Spain, Portugal, Morocco, Western Sahara), Aethiopic (Cameroon, Equatorial Guinea, Islands of the Gulf of Guinea, Angola, Gabon and Ehiopia) and Neotropics (Argentina, Paraguay, Bolivia, Ecuador, Venezuela, Panama, Nicaragua, Mexico). Finally, of exceptional importance are the holdings of extremely rare species such as the Spanish Imperial eagle (Aquila adalberti), in a unique series in the world that gives a special character to the collections of the EBD. The dataset currently available on GBIF.ES is one part of the mammal collection, and include the following families: PODICIPEDIDAE, TINAMIDAE, GAVIIDAE, RHEIDAE, STRUTHIONIDAE, CASUARIIDAE, DROMAIIDAE, DIOMEDEIDAE, PROCELLARIIDAE, HYDROBATIDAE, PHAETHONTIDAE, SULIDAE, PHALACROCORACIDAE, ANHINGIDAE, ARDEIDAE, SCOPIDAE, CICONIIDAE, THRESKIORNITHIDAE, PHOENICOPTERIDAE, ANHIMIDAE, ANATIDAE, CATHARTIDAE, ACCIPITRIDAE.
Note: this dataset was previously orphaned. It has been rescued by ① extracting it from the GBIF.org index (see GBIF Download in External Data) and ② republishing it on this IPT data hosting centre as version 1.0.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The database contains information about zoological collection of Institute of Systematics and Evolution of Animals Polish Academy of Sciences in Kraków.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Specimen-records (of physical specimens) of fishes, mostly from southern Africa and surrounding oceans, but also from elsewhere in the world.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hyperbenthic decanted crustacea collected with Rothlisberg-Piercy sledge, hauls 5-10 min bottomtime. All samples after 2012 are preserved on board in ethanol. Taxa identified as detailed as possible. Specimens kept at Bergen Museum, Bergen, Norway.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Oklahoma Collection of Genomic Resources currently holds about 35,000 aliquots of tissue from 344 genera and over 600 species of mammals, birds, amphibians and reptiles, with particular strength in mammals from Argentina and Oklahoma, and amphibians and reptiles from the Great Plains.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A collection of marine biological survey data collated from literature.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Crowd Source MUHW Specimens
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset contains species name, their number of specimen and wet-weight for each taxa (0,1 mg). Samples were originally preserved in formaline and later converted to ethanol. After identification samples are stored at Bergen Museum/University of Bergen.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Observations from iNaturalist.org, an online social network of people sharing biodiversity information to help each other learn about nature.
Observations included in this archive met the following requirements:
* Published under one of the following licenses or waivers: 1) https://creativecommons.org/publicdomain/zero/1.0/, 2) https://creativecommons.org/licenses/by/4.0/, 3) https://creativecommons.org/licenses/by-nc/4.0/
* Achieved one of following iNaturalist quality grades: Research
* Created on or before 2025-09-16 15:00:20 -0700
You can view observations meeting these requirements at https://www.inaturalist.org/observations?created_d2=2025-09-16+15%3A00%3A20+-0700&d1=1600-01-01&license=CC0%2CCC-BY%2CCC-BY-NC&quality_grade=research
The Global Biodiversity Information Facility (GBIF) is an international network and data infrastructure funded by the world's governments providing global data that document the occurrence of species. GBIF currently integrates datasets documenting over 1.6 billion species occurrences, growing daily. The GBIF occurrence dataset combines data from a wide array of sources including specimen-related data from natural history museums, observations from citizen science networks and environment recording schemes. While these data are constantly changing at GBIF.org, periodic snapshots are taken and made available on AWS.