Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The GBIF Backbone Taxonomy is a single, synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another.
It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. Additional scientific names only found in other authoritative nomenclatural and taxonomic datasets are then merged into the tree, thus extending the original catalogue and broadening the backbones name coverage. The GBIF Backbone taxonomy also includes identifiers for Operational Taxonomic Units (OTUs) drawn from the barcoding resources iBOL and UNITE.
International Barcode of Life project (iBOL), Barcode Index Numbers (BINs). BINs are connected to a taxon name and its classification by taking into account all names applied to the BIN and picking names with at least 80% consensus. If there is no consensus of name at the species level, the selection process is repeated moving up the major Linnaean ranks until consensus is achieved.
UNITE - Unified system for the DNA based fungal species, Species Hypotheses (SHs). SHs are connected to a taxon name and its classification based on the determination of the RefS (reference sequence) if present or the RepS (representative sequence). In the latter case, if there is no match in the UNITE taxonomy, the lowest rank with 100% consensus within the SH will be used.
The GBIF Backbone Taxonomy is available for download at https://hosted-datasets.gbif.org/datasets/backbone/ in different formats together with an archive of all previous versions.
The following 105 sources have been used to assemble the GBIF backbone with number of names given in brackets:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset containing 90 species occurrences available in GBIF matching the query: DatasetKey: Plasm bearing foraminifera counts of multinet M21/2_MSN648. The dataset includes 90 records from 1 constituent datasets: 90 records from Plasm bearing foraminifera counts of multinet M21/2_MSN648. Data from some individual datasets included in this download may be licensed under less restrictive terms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains all the presence records of plants, beetles, chironomids, foraminifera and diatoms contained in the GBIF database in September 2024. This new version of the database has a new, refined spatial resolution at 5min (each grid cell in the previous version is now parted in 9 sub grid cells). The curation of the input data has also been largely improved.The coordinates of the presence records have been homogenised on a 0.083x0.083° grid, and corresponding bioclimatic values from the Worldclim2.0 database have been added.These data are formatted and ready to use by the crestr R package. More information about the data is available https://www.manuelchevalier.com/crestr/articles/calibration-data.html.To download the latest version of the database, please follow this link: https://figshare.com/articles/GBIF_for_CREST_database/6743207Please cite all the appropriate datasets from the following list:GBIF.org (23 August 2024) GBIF Occurrence Download Part 1. https://doi.org/10.15468/dl.7bvejkGBIF.org (23 August 2024) GBIF Occurrence Download Part 2. https://doi.org/10.15468/dl.mpfc47GBIF.org (23 August 2024) GBIF Occurrence Download Part 3. https://doi.org/10.15468/dl.nuq5tnGBIF.org (23 August 2024) GBIF Occurrence Download Part 4. https://doi.org/10.15468/dl.q8zuhhGBIF.org (24 August 2024) GBIF Occurrence Download Part 5. https://doi.org/10.15468/dl.qwcs68GBIF.org (24 August 2024) GBIF Occurrence Download Part 6. https://doi.org/10.15468/dl.y9kpwcGBIF.org (24 August 2024) GBIF Occurrence Download Part 7. https://doi.org/10.15468/dl.uk2xv6GBIF.org (25 August 2024) GBIF Occurrence Download Part 8. https://doi.org/10.15468/dl.zgmnq9GBIF.org (26 August 2024) GBIF Occurrence Download Part 9. https://doi.org/10.15468/dl.68hqxg
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Global Biodiversity Information Facility (GBIF) indexes thousands of biodiversity datasets from Natural History Collections, citizen science initiatives (e.g., iNaturalist, eBird), and other sources. As part of the index process, GBIF associates at least two identifiers with indexed records: a record id (aka gbifID) and a dataset id (aka dataset key). These ids are central to do lookup, reference data, and package interpreted data products.
This publication contains an exhaustive list of GBIF IDs and ids associated by their data providers as derived from:
GBIF.org (01 March 2023) GBIF Occurrence Download https://doi.org/10.15468/dl.pk3trq
The resource (size: ~260GB) provided by GBIF had content id hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97 and was used to generate the resource included in this publication using
preston cat 'zip:hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97!/0015281-230224095556074.csv'
| cut -f 1,2,3,37,38,39
| gzip\
gbifid.tsv.gz
with the content id of gbifid.tsv.gz (size: ~35GB) being hash://sha256/a339e32e10edaad585f61f2ded06cbb23e0618c65a6360db18d7d729054940a8 .
the first 10 lines of gbifid.tsv.gz as extracted via
preston cat --remote https://zenodo.org/record/7789866/files,https://linker.bio hash://sha256/a339e32e10edaad585f61f2ded06cbb23e0618c65a6360db18d7d729054940a8
| gunzip
| head
are:
gbifID datasetKey occurrenceID institutionCode collectionCode catalogNumber 2997162320 c71c8000-9fc7-422c-804a-ce6abe751771 3399442 CEPEC CEPEC CEPEC00109669 2997162309 c71c8000-9fc7-422c-804a-ce6abe751771 2733085 CEPEC CEPEC CEPEC00000818 2997162317 c71c8000-9fc7-422c-804a-ce6abe751771 2733086 CEPEC CEPEC CEPEC00000888 2997162313 c71c8000-9fc7-422c-804a-ce6abe751771 3399443 CEPEC CEPEC CEPEC00109744 2997162306 c71c8000-9fc7-422c-804a-ce6abe751771 2733087 CEPEC CEPEC CEPEC00000889 2997162316 c71c8000-9fc7-422c-804a-ce6abe751771 3399440 CEPEC CEPEC CEPEC00109605 2997162324 c71c8000-9fc7-422c-804a-ce6abe751771 2733088 CEPEC CEPEC CEPEC00000890 2997162308 c71c8000-9fc7-422c-804a-ce6abe751771 3399441 CEPEC CEPEC CEPEC00109615 2997162303 c71c8000-9fc7-422c-804a-ce6abe751771 2733089 CEPEC CEPEC CEPEC00000891
Note that at time of writing, the html resource associated with the occurrence id 2997162320, and data set key c71c8000-9fc7-422c-804a-ce6abe751771 (extracted from of the first data row example above) are available via:
https://gbif.org/occurrence/2997162320
and
https://gbif.org/dataset/c71c8000-9fc7-422c-804a-ce6abe751771
respectively.
This resource was initially created to help integrate with Bionomia (https://bionomia.net) to help associate people identifiers provided by bionomia to their original records via their GBIF ids. Bionomia re-uses GBIF records ids as a way to define links between records and the people (e.g., curators, collectors, identifiers) that worked on them.
In other words, this resource provides a versioned translation table from the GBIF data universe (as defined by GBIF record ids, and dataset keys) to the data collections that exist (and evolve) independent of it.
Note that the resource identified by hash://sha256/c8bac8acb28c8524c53589b3a40e322dbbbdadf5689fef2e20266fbf6ddf6b97 was not included in this publication it was too big (260GB) to fit. You may be able to retrieve the resource from its original location at https://api.gbif.org/v1/occurrence/download/request/0015281-230224095556074.zip .
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A dataset containing 18 species occurrences available in GBIF matching the query: TaxonKey: Puya obconica L.B.Sm.. The dataset includes 18 records from 7 constituent datasets: 1 records from SysTax - Botanical Gardens. 9 records from Tropicos Specimen Data. 1 records from NMNH Extant Specimen Records. 1 records from The AAU Herbarium Database. 3 records from University of Vienna, Institute for Botany - Herbarium WU. 2 records from Field Museum of Natural History (Botany) Seed Plant Collection. 1 records from Harvard University Herbaria. Data from some individual datasets included in this download may be licensed under less restrictive terms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a filtered dataset from GBIF including all Apidae observations identified to the species level in the North American range. Data was downloaded using rgbif::occ_download and accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2024-11-10. The original unfiltered GBIF occurrences can be download at https://doi.org/10.15468/dl.5k8kue, and https://api.gbif.org/v1/occurrence/download/request/0005473-241107131044228.zip. The data is filtered to have coordinates in North America, no geospatial issues, no duplicates across species and coordinates, no coordinate uncertainty greater than 1 kilometer, and no occurrences lying within 1km of a college or university. This dataset is incomplete as it does not include ALL observations that occur in North American countries as observations lacking a continent field of "north_america" in GBIF are not included. The data was uploaded to Zenodo after filtering with the following DOI: 10.5281/zenodo.14062444.
A dataset containing 11981835 species occurrences available in GBIF matching the query: { "and" : [ "HasCoordinate is false", "TaxonKey is Insecta" ] } The dataset includes 11981835 records from 5404 constituent datasets; see https://api.gbif.org/v1/occurrence/download/0044746-200221144449610/datasets/export for details. Data from some individual datasets included in this download may be licensed under less restrictive terms.
A dataset containing 3826495 species occurrences available in GBIF matching the query: { "Country" : [ "is Colombia" ] } The dataset includes 3826495 records from 1068 constituent datasets; see https://api.gbif.org/v1/occurrence/download/0018098-160910150852091/datasets/export for details. Data from some individual datasets included in this download may be licensed under less restrictive terms.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A dataset containing 987117827 species occurrences available in GBIF matching the query: All data. The dataset includes 987117827 records from 19784 constituent datasets: Please see https://www.gbif.org/occurrence/download/0032137-180508205500799 for full list of all constituents.
A dataset containing 20655509 species occurrences available in GBIF matching the query: { "Country" : [ "is Colombia" ] } The dataset includes 20655509 records from 4013 constituent datasets; see https://api.gbif.org/v1/occurrence/download/0259663-220831081235567/datasets/export for details. Data from some individual datasets included in this download may be licensed under less restrictive terms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset contains species name, their number of specimen and wet-weight for each taxa (0,1 mg). Samples were originally preserved in formaline and later converted to ethanol. After identification samples are stored at Bergen Museum/University of Bergen.
A dataset containing 6572299 species occurrences available in GBIF matching the query: { "and" : [ "Country is Ecuador", "TaxonKey is Animalia" ] } The dataset includes 6572299 records from 1504 constituent datasets; see https://api.gbif.org/v1/occurrence/download/0089934-230530130749713/datasets/export for details. Data from some individual datasets included in this download may be licensed under less restrictive terms.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A dataset containing 875010368 species occurrences available in GBIF matching the query: All data. The dataset includes 875010368 records from 18477 constituent datasets: Please see http://www.gbif.org/occurrence/download/0008114-171124123535762 for full list of all constituents.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a filtered dataset from GBIF including all Syrphidae observations identified to the species level in the North American range. Data was downloaded using rgbif::occ_download and accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2024-11-10. The original unfiltered GBIF occurrences can be download at https://doi.org/10.15468/dl.p97dt2, and https://api.gbif.org/v1/occurrence/download/request/0005496-241107131044228.zip. The data is filtered to have coordinates in North America, no geospatial issues, no duplicates across species and coordinates, no coordinate uncertainty greater than 1 kilometer, and no occurrences lying within 1km of a college or university. This dataset is incomplete as it does not include ALL observations that occur in North American countries as observations lacking a continent field of "north_america" in GBIF are not included. The data was uploaded to Zenodo after filtering with the following DOI: 10.5281/zenodo.14062632.
A dataset containing 3835796 species occurrences available in GBIF matching the query: { "and" : [ "Country is China", "Year 1950-2021" ] } The dataset includes 3835796 records from 2309 constituent datasets; see https://api.gbif.org/v1/occurrence/download/0271528-200613084148143/datasets/export for details. Data from some individual datasets included in this download may be licensed under less restrictive terms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes all of the data downloaded from GBIF (DOIs provided in README.md as well as below, downloaded Feb 2021) as well as data downloaded from SCAN. This dataset has 2,808,432 records and can be used as a reference to the verbatim data before it underwent the cleaning process. The only modifications made to this datset after direct download from the data portals are the following:
1) for GBIF records, I renamed the countryCode column to be "country" so that the column title is consistent across both GBIF and SCAN 2) A source column was added where I specify if the record came from GBIF or SCAN 3) Duplicate records across SCAN and GBIF were removed by identifying identical instances "catalogNumber" and "institutionCode" 4) Only the Darwin core columns (DwC) that were shared across downloaded datasets were retained. GBIF contained ~249 DwC variables, and SCAN data contained fewer, so this combined dataset only includes the ~80 columns shared between the two datasets
For GBIF, we downloaded the data in three separate chunks, therefore there are three DOIs. See below:
GBIF.org (3 February 2021) GBIF Occurrence Downloadhttps://doi.org/10.15468/dl.6cxfsw GBIF.org (3 February 2021) GBIF Occurrence Downloadhttps://doi.org/10.15468/dl.b9rfa7 GBIF.org (3 February 2021) GBIF Occurrence Downloadhttps://doi.org/10.15468/dl.w2nndm
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Database on the recordings of Lepidoptera and Trichoptera by the Lepidopterological Society of Denmark (https://www.lepidoptera.dk). Occurrences are based on observations and collecting by any means, trapping, photos etc. Society specialist groups check data for errors and unusual records, mostly by contacting the observer/recorder. To assure quality of data, only members of the society are given access to enter records in the database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises occurrencies of the selected phyla and classes / subclasses for the Walvis Ridge Project AOI. This data has been extracted from GBIF database on July 19th and 20th, 2022. The download covered the following groups of species: Procellariiformes, Testudines, Mollusca Polychaeta Crustacea Echinodermata Elasmobranchii Mammalia Actinopterygii The GBIF database is available for download at: https://www.gbif.org
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test of DOI linking
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Context
Invasive alien species have been pointed out as an important driver of biodiversity loss. Many policy responses are being developed to address this threat. Protected areas often represent and preserve hotspots of biological diversity and ensure the maintenance of ecosystem services crucial to human livelihoods. The impact of biological invasions can be particularly severe in protected areas and their occurrence and impact in such areas is an important element of the risk they pose. To address this, there is a need for data on the occurrence and extent of alien species invasions in protected areas.
Description
This dataset contains species occurrence and occupancy in protected areas of the Natura2000 network in Belgium (Special Conservation Areas sensu Habitat Directive and Special Protection Areas sensu Bird Directive). The dataset was generated using the Belgian occurrence cube at species level and the Belgian occurrence cube for non-native taxa (both containing GBIF data aggregated using Oldoni et al. 2020), the 1x1km EEA reference grid and the Natura2000 protected areas shapefiles from the European Environment Agency.
Data are grouped by protected area (SITECODE
), year (year
) and (infra)species (taxonKey
, speciesKey
). For each group, it provides the number of occurrences found in GBIF (n
), the area of occupancy (aoo
: number of 1 km2 squares), the coverage (coverage
: % of 1 km2 squares), the minimum coordinateUncertaintyInMeters (min_coord_uncertainty
), and the alien status (is_alien
) based on the Global Register of Introduced and Invasive Species - Belgium. For infraspecific taxa in the latter, the alien status of the species is looked up and included.
The dataset is built on open science principles and intended to be completely reproducible:
Files
n
), area of occupancy (aoo
) and coverage
of taxa (taxonKey
) in Natura2000 areas of Belgium (SITECODE
). Other columns included: speciesKey
(for species is speciesKey
= taxonKey
), SITETYPE
containing the site type of the Natura2000 area (one of A
, B
or C
), min_coord_uncertainty
with the lowest coordinate uncertainty in meters, is_alien
containing the alien status (TRUE
or FALSE
) and remarks
containing, if present, the infraspecific alien taxa whose occurrences contribute to the calculated aoo
(only for species).protected_areas_species_occurrence.csv
as retrieved from GBIF Backbone Taxonomy. Columns: taxonKey
, speciesKey
, scientificName
, kingdom
, phylum
, order
, class
, genus
, family
, species
, rank
and includes
. The latter contains the infraspecific taxa and synonyms whose occurrences contribute to the number of occurrences at species level.protected_areas_species_occurrence.csv
. Columns: SITECODE
as in protected_areas_species_occurrence.csv
(BE*******
), SITENAME
containing the name of the protected area, SITETYPE
as in protected_areas_species_occurrence.csv
, flanders
, wallonia
and brussels
containing whether the area is situated respectively in Flanders, Wallonia or Brussels-Capital Region (TRUE
or FALSE
). Field codes are in line with EEA element definitions for Natura 2000 sites.Potential use of the dataset
Currently, there is no comprehensive reporting system for invasive alien species in Natura 2000 sites. This dataset provides a baseline as to which species occur in which protected area. We envisage this dataset can be an interesting starting point for various types of analyses on alien species in protected areas in Belgium, but that it can also be used in complement to other data on alien species in protected areas to study more general patterns. Some examples of research questions:
This work has been funded under the Belgian Science Policies Brain program (BelSPO BR/165/A1/TrIAS), the European Union's LIFE program (LIFE19 NAT/BE/000953 - LIFE RIPARIAS).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The GBIF Backbone Taxonomy is a single, synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another.
It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. Additional scientific names only found in other authoritative nomenclatural and taxonomic datasets are then merged into the tree, thus extending the original catalogue and broadening the backbones name coverage. The GBIF Backbone taxonomy also includes identifiers for Operational Taxonomic Units (OTUs) drawn from the barcoding resources iBOL and UNITE.
International Barcode of Life project (iBOL), Barcode Index Numbers (BINs). BINs are connected to a taxon name and its classification by taking into account all names applied to the BIN and picking names with at least 80% consensus. If there is no consensus of name at the species level, the selection process is repeated moving up the major Linnaean ranks until consensus is achieved.
UNITE - Unified system for the DNA based fungal species, Species Hypotheses (SHs). SHs are connected to a taxon name and its classification based on the determination of the RefS (reference sequence) if present or the RepS (representative sequence). In the latter case, if there is no match in the UNITE taxonomy, the lowest rank with 100% consensus within the SH will be used.
The GBIF Backbone Taxonomy is available for download at https://hosted-datasets.gbif.org/datasets/backbone/ in different formats together with an archive of all previous versions.
The following 105 sources have been used to assemble the GBIF backbone with number of names given in brackets: