100+ datasets found
  1. u

    Data from: Gap Analysis of Agrobiodiversity data in GBIF and the NAL...

    • agdatacommons.nal.usda.gov
    zip
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akshat Pant (2023). Gap Analysis of Agrobiodiversity data in GBIF and the NAL Thesaurus [Dataset]. http://doi.org/10.15482/USDA.ADC/1466041
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 14, 2023
    Dataset provided by
    Ag Data Commons
    Authors
    Akshat Pant
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This dataset contains all documents, the text and the pdf files, as well as the code that was used to carry out the term analysis of agriculturally relevant organisms in GBIF. The Global Biodiversity Information Facility (GBIF) is an international network and research infrastructure funded by the world's governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth. The National Agricultural Library Thesaurus (NALT) has online vocabulary tools of agricultural terms. My task was to use the agricultural terms from the NALT and analyze the agriculturally relevant organisms in GBIF. Some of the goals were:

    Get descriptive statistics about Agrobiodiversity Data (AgData) in GBIF Create visualizations to view occurrence trends of the GBIF corpus and AgData in GBIF to determine gaps or biases. Provide examples of and code for how agricultural researchers can work with GBIF data.

    Details about the process and the methodologies used to carry out this analysis I started off with trying to extract names from the Agricultural Thesaurus. I encountered some problems trying to extract names using the RDF format in the Thesaurus. An employee at the Library later provided me with the names in the Thesaurus in a text file. I then proceeded to extract the scientific names from that text file to run them through the GBIF API. Since there were so many of the names, the API would throw a connection error. The API can handle only so many requests in a particular interval of time. To handle this, I leveraged exception handling in Python. Every time the API threw an error, I told the script to wait for 5 seconds and then resume sending requests. Although this took a lot of time, it allowed me to get data such as year of occurrence, coordinate values about the ag relevant data from the API.

    Technology

    I used Python because it is has support for both web scraping and data analysis, both of which were needed for this project. I used Jupyter notebooks, run through Anaconda. Project Jupyter is a non-profit, open-source project that supports interactive data analysis and scientific computing. It allows users to code right in our browser and eliminates the need to install any other Integrated Development Environment, and also makes it very convenient to share our code. The main packages used in this project are pandas for data manipulation, requests and json to interact with the GBIF API, NumPy which adds support for array and matrix operations and more. Tableau and matplotlib has been used to create visualizations after performing the analysis in Python. Resources in this dataset:Resource Title: Code. File Name: Code.zipResource Description: This zip file contains multiple Jupyter notebooks that contain the code for all the analysis.Resource Software Recommended: Jupyter notebook,url: http://jupyter.org/ Resource Title: Visualizations. File Name: Visualizations.zipResource Description: This zip file contains Tableau workbooks for the visualizations.Resource Software Recommended: Tableau,url: https://www.tableau.com/ Resource Title: Corpus. File Name: Corpus.zipResource Description: This zip file contains the two datasets of family Apidae and Reduviidae.

  2. Data from: Spatial analyses

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Oct 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laymon Ball (2024). Spatial analyses [Dataset]. http://doi.org/10.6084/m9.figshare.26360050.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 24, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Laymon Ball
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R script to clean raw GBIF records, perform Getis-Ord Gi* analysis, and create maps. The vector shapefile including the number total clean GBIF records per one-degree squared grid cell is also included here.

  3. h

    gbif-plants-raw

    • huggingface.co
    Updated Dec 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Jupp (2025). gbif-plants-raw [Dataset]. https://huggingface.co/datasets/juppy44/gbif-plants-raw
    Explore at:
    Dataset updated
    Dec 4, 2025
    Authors
    Michael Jupp
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    gbif-plants-raw

    A large-scale dataset of 96.1 million research-grade plant observations sourced from iNaturalist Open Data and aligned with GBIF taxonomy. Each row contains species metadata, taxonomic identifiers, geolocation, event timing, dataset source info, and a direct image URL. This dataset is designed for large-scale image classification, biodiversity modelling, and pretraining work.

      Dataset Summary
    

    This dataset aggregates all research-grade Plantae… See the full description on the dataset page: https://huggingface.co/datasets/juppy44/gbif-plants-raw.

  4. Insect Species Occurrence Data from Multiple Projects Worldwide with Focus...

    • gbif.org
    • data.usgs.gov
    • +2more
    Updated Dec 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sam Droege; Clare Maffei; Sam Droege; Clare Maffei (2025). Insect Species Occurrence Data from Multiple Projects Worldwide with Focus on Bees and Wasps in North America [Dataset]. http://doi.org/10.15468/6autvb
    Explore at:
    Dataset updated
    Dec 30, 2025
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Sam Droege; Clare Maffei; Sam Droege; Clare Maffei
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jun 4, 1990 - Jun 11, 2019
    Area covered
    Description

    Species occurrence records for native and non-native bees, wasps and other insects collected using mainly pan, malaise, and vane trapping; and insect netting methods in Canada, Mexico, the non-contiguous United States, U.S. Territories (specifically U.S. Virgin Islands), U.S. Minor Outlying Islands and other global locations with the bulk of the specimens coming from the Eastern United States often from Federal lands such as USFWS, NPS, DOD, USFS. Some records also contain notes regarding plants or substrates from which insects were collected or that were present and/or in flower at the time the insects were collected. Unless otherwise noted, taxonomic determinations (identifications) were completed by Sam Droege (USGS Eastern Ecological Science Center- EESC, Native Bee Laboratory) and Clare Maffei (USFWS, Inventory and Monitoring Branch).

    The EESC Native Bee Lab currently keeps only a small synoptic collection, rare and voucher specimens are deposited in the Smithsonian National Collection (NMNH) and widely distributed to other institutions for DNA, revisions, and augmentation of existing collections. Surplus specimens are also made available to students to learn their identifications. Corrections to any of our determinations are always welcomed. Common species that are not in demand for surplus are usually destroyed and the pins recycled. Recent revisions to Lasioglossum, Ceratina, and to a much lesser extent Triepeolus and Epeolus and other small groups have rendered determinations prior to those revisions out of date for species involved in name changes and users should account for that during analyses. Current data (included information on specimen codes without identifications) are always available without charge directly from Sam Droege.

  5. f

    GBIF Data Backbone File

    • smithsonian.figshare.com
    txt
    Updated Oct 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vanessa Gonzalez (2023). GBIF Data Backbone File [Dataset]. http://doi.org/10.25573/data.24280102.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 10, 2023
    Dataset provided by
    National Museum of Natural History
    Authors
    Vanessa Gonzalez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GBIF Data Backbone File -- Smithsonian Gap Analysis Tool; Data download of the GBIF database (https://www.gbif.org/) formatted for use in the Smithsonian Gap Analysis tool

  6. Lane 6 Analysis

    • gbif.org
    Updated Sep 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MGnify (2025). Lane 6 Analysis [Dataset]. http://doi.org/10.15468/hhq9a6
    Explore at:
    Dataset updated
    Sep 23, 2025
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    MGnify
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Negative control lane for ancient DNA taken from core MV1012 46.9

  7. Data from: GBIF Occurrence Download

    • search.datacite.org
    Updated Sep 29, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Occdownload Gbif.Org (2016). GBIF Occurrence Download [Dataset]. http://doi.org/10.15468/dl.tdoaef
    Explore at:
    Dataset updated
    Sep 29, 2016
    Dataset provided by
    DataCite
    The Global Biodiversity Information Facility
    Authors
    Occdownload Gbif.Org
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A dataset containing 32117 species occurrences available in GBIF matching the query: Depth: 80m TaxonKey: Pimelodella australis Eigenmann, 1917. The dataset includes 32117 records from 561 constituent datasets: 16 records from Field Museum of Natural History (Zoology) Invertebrate Collection. 63 records from macnin. 111 records from Mollusca of Costa Rica (INBio). 581 records from The molluscs collection (IM) of the Muséum national d'Histoire naturelle (MNHN - Paris). 2 records from (Appendix 1) Coral analysis and isostatic rebound effects from different Holes of IODP Expedition 310. 1 records from ZUEC-OPH - Coleção de Ophiuroidea do Museu de Zoologia da UNICAMP. 46 records from (Table 5) Distribution of Miocene planktonic foraminifers in sediments of ODP Hole 138-848B. 71 records from Mollusca collection of National Museum of Nature and Science. 2 records from Zoology (Museum of Evolution - Uppsala). 53 records from TCWC Marine Invertebrates. 2 records from (Table 2) Oligocene to Pliocene nannofossil range chart for ODP Hole 134-828A. 128 records from CSIRO Ichthyology provider for OZCAM. 729 records from Fishbase. 41 records from Occurrence records of southern African aquatic biodiversity. 22 records from Biological Reference Collections ICM CSIC. 5 records from ZUEC-GAS - Coleção de Gastropoda do Museu de Zoologia da UNICAMP. 1 records from (Table 5) Oligocene to Pleistocene nannofossil range chart for ODP Hole 134-829A. 204 records from Abundance of megabenthic species in trawl catches per station in addition to table 2 during POLARSTERN cruise ARK-VIII/2 (EPOS). 4 records from (Table 7) Abundance of silicoflagellates and ebridians in selected samples from ODP Hole 138-850B. 3 records from UAM Fish Collection (Arctos). 2 records from Antarctic Porifera database from the Spanish benthic expeditions: Bentart, Gebrap and Ciemar. 6 records from Colección de Crustáceos Decápodos y Estomatópodos del Centro Oceanográfico de Cádiz: CCDE-IEOCD. 3 records from (Appendix B) Nannofossil abundance in ODP Hole 165-998A sediments. 526 records from Museum Victoria provider for OZCAM. 10 records from (Table 4) Relative abundances of stratigraphically useful planktonic foraminifers from ODP Site 167-1013 sediments. 12 records from Flora of tanzania. 2 records from CSIRO, Benthic Plant Invertebrate and Fish Biodiversity, Great Barrier Reef, Northeast Australia, 2003-2006. 143 records from BMSM Bailey-Matthews National Shell Museum. 652 records from CSIRO, Soviet Fishery Data, Australia, 1965-1978. 142 records from CAS Invertebrate Zoology (IZ). 3 records from Microplankton abundance measured on water bottle samples during AEGAEO cruise LIA-8. 1 records from Freshwater plants of Cameroon. 36 records from Colección Nacional de Foraminíferos - Museo Argentino de Ciencias Naturales 'Bernardino Rivadavia'. 9 records from Colección de Invertebrados Cenpat (CNP-INV). 203 records from Norwegian Biodiversity Information Centre - Other datasets. 2 records from ZUEC-BIV - Coleção de Bivalvia do Museu de Zoologia da UNICAMP. 658 records from Museum of Comparative Zoology, Harvard University. 55 records from (Table 4) Distribution of benthic foraminifers from ODP Hole 134-829A. 32 records from (Appendix G) Benthic foraminifera extinction group species in ODP Hole 167-1012B. 6 records from (Supplement 1) Most common taxa or groups of coccoliths from ODP Site 1233 (past dataset) covering the last 70 kyr. 2 records from Arthropoda Collection of the Seto Marine Biological Laboratory, Kyoto University. 2 records from Cnidaria Collection of the Seto Marine Biological Laboratory, Kyoto University. 1 records from Coleção de Polychaeta do Museu Nacional. 381 records from Invertebrates Collection of the Swedish Museum of Natural History. 9 records from KUBI Ichthyology Tissue Collection. 1 records from University of California Museum of Paleontology. 3571 records from NMNH occurrence DwC-A. 62 records from Colección Nacional de Invertebrados - Museo Argentino de Ciencias Naturales 'Bernardino Rivadavia'. 22 records from Counting of planktic foraminifera of ODP Hole 182-1129C. 6 records from (Table AT1) Grain size distribution and stable isotope record of benthic foraminifera of ODP Hole 175-1085A. 614 records from CSIRO, Marine Data Warehouse Biology Records Pre-1998, Australia, 1978-1997. 2 records from (Table AT3) Planktonic foraminiferal stratigraphy of ODP Hole 182-1128B. 10 records from Stable oxygen isotope ratios of foraminifera from late middle Eocene sediments of ODP Site 171-1052 from Blake Plateau, West Atlantic Ocean (Appendix). 11 records from Occurrence of planktic foraminifera in Pliocene to Holocene sediments of DSDP Hole 81-552A in the North Atlantic (Appendix 2). 54 records from Total foraminifera counts of multinet M6/7_MSN100. 2 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/009-6. 1 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/011-3. 2 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/012-4. 2 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/014-6. 2 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/041-5. 1 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/042-5. 2 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/045-9. 2 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/046-5. 1 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/048-5. 1 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/049-5. 2 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/061-3. 1 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/088-7. 2 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/090-2. 2 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/091-3. 2 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/092-6. 3 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/107-6. 2 records from Large protozoa abundance measured on concentrated water bottle samples at station PS58/108-3. 6 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/424-22. 6 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/427-6. 7 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/508-22. 6 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/509-16. 10 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/511-12. 5 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/514-18. 6 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/543-5. 7 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/544-6. 9 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/546-19. 8 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/553-11. 6 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/570-14. 7 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/580-12. 6 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/587-14. 7 records from Microzooplankton (larger protists and small copepods) abundance measured on concentrated water bottle samples at station PS65/593-9. 6 records from Large protozoan and small metazoan abundance measured on concentrated water bottle samples at station PS65/590-1. 4 records from Colección Ictiológica del CENPAT-CONICET. 34 records from Spores and dinoflagellates of Site 175-1075. 13 records from Abundance of microzooplankton measured on water bottle samples during POLARSTERN cruise ANT-X/6. 2 records from Biological data measured on water bottle sampels at station AT_II-119/5_35-4. 1 records from Microzooplankton abundance and biomass at station TT050_13-14. 1 records from Microzooplankton abundance and biomass at station TT050_17-4. 1 records from Microzooplankton abundance and biomass at station TT050_21-13. 1 records from Microzooplankton abundance and biomass at station TT054_13-18. 1 records from Abundance, biovolume and biomass of heterotrophic dinoflagellates at station TT007_1-CTD8. 1 records from Abundance, biovolume and biomass of heterotrophic dinoflagellates at station TT007_10-CTD124. 1 records from Abundance, biovolume and biomass of heterotrophic

  8. e

    A GBIF reptile and amphibian analysis of burn sites in Southwestern, USA

    • knb.ecoinformatics.org
    Updated Oct 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marina Goldgisser; CJ Lortie (2021). A GBIF reptile and amphibian analysis of burn sites in Southwestern, USA [Dataset]. http://doi.org/10.5063/F1D21W1F
    Explore at:
    Dataset updated
    Oct 18, 2021
    Dataset provided by
    Knowledge Network for Biocomplexity
    Authors
    Marina Goldgisser; CJ Lortie
    Time period covered
    Jan 1, 2021
    Area covered
    Variables measured
    lat, long, fireID, obsDay, dcmlLng, dcmlLtt, endDate, obsYear, species, fireName, and 6 more
    Description

    The purpose of this dataset is to evaluate the impact of fires on reptile and amphibian biodiversity in California's southwest desert. Species data was downloaded from the Global Diversity Information Facility (GBIF). GBIF.org (28 July 2021) GBIF Occurrence Download https://doi.org/10.15468/dl.6kvrr7

  9. d

    A GBIF endangered species diversity analysis of burn sites in the...

    • search.dataone.org
    • search-sandbox-2.test.dataone.org
    • +3more
    Updated Sep 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marina Goldgisser; Tarmo K Remmel; CJ Lortie (2022). A GBIF endangered species diversity analysis of burn sites in the Southwestern, USA [Dataset]. http://doi.org/10.5063/F19C6VVV
    Explore at:
    Dataset updated
    Sep 1, 2022
    Dataset provided by
    Knowledge Network for Biocomplexity
    Authors
    Marina Goldgisser; Tarmo K Remmel; CJ Lortie
    Time period covered
    Aug 2, 2021 - Oct 12, 2021
    Area covered
    Variables measured
    day, lat, long, class, month, obsID, order, fireID, endDate, indvdlC, and 29 more
    Description

    The purpose of this dataset is to evaluate the impact of fires on endangered species biodiversity in California's southwest desert. Species data was downloaded from the Global Diversity Information Facility (GBIF). Wildland fires were downloaded from the National Interagency Fire Network

  10. e

    gbif.org Traffic Analytics Data

    • analytics.explodingtopics.com
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). gbif.org Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/gbif.org
    Explore at:
    Dataset updated
    Nov 1, 2025
    Variables measured
    Global Rank, Monthly Visits, Authority Score, US Country Rank, Science Category Rank
    Description

    Traffic analytics, rankings, and competitive metrics for gbif.org as of November 2025

  11. The Retrospective Analysis of Antarctic Tracking (Standardised) Data from...

    • gbif.org
    Updated Dec 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan Ropert-Coudert; Anton P. Van de Putte; Horst Bornemann; Jean-Benoît Charrassin; Daniel P. Costa; Bruno Danis; Luis A. Hückstädt; Ian D. Jonsen; Mary-Anne Lea; Ryan R. Reisinger; David Thompson; Leigh G. Torres; Philip N. Trathan; Simon Wotherspoon; David G Ainley; Rachael Alderman; Virginia Andrews-Goff; Ben Arthur; Grant Ballard; John Bengtson; Marthán N. Bester; Lars Boehme; Charles-André Bost; Peter Boveng; Jaimie Cleeland; Rochelle Constantine; Robert J. M. Crawford; Luciano Dalla Rosa; P.J. Nico de Bruyn; Karine Delord; Sébastien Descamps; Mike Double; Louise Emmerson; Mike Fedak; Ari Friedlander; Nick Gales; Mike Goebel; Kimberly T. Goetz; Christophe Guinet; Simon D. Goldsworthy; Rob Harcourt; Jefferson Hinke; Kerstin Jerosch; Akiko Kato; Knowles R. Kerry; Roger Kirkwood; Gerald L. Kooyma; Kit M. Kovacs; Kieran Lawton; Andrew D. Lowther; Christian Lydersen; Phil O'B. Lyver; Azwianewi B. Makhado; Maria E. I. Márquez; Birgitte McDonald; Clive McMahon; Monica Muelbert; Dominik Nachtsheim; Keith W. Nicholls; Erling S. Nordøy; Silvia Olmastroni; Richard A. Phillips; Pierre Pistorius; Joachim Plötz; Klemens Pütz; Norman Ratcliffe; Peter G. Ryan; Mercedes Santos; Arnoldus Schytte Blix; Colin Southwell; Iain Staniland; Akinori Takahashi; Arnaud Tarroux; Wayne Trivelpiece; Ewan Wakefield; Henri Weimerskirch; Barbara Wienecke; José C. Xavier; Ben Raymond; Mark A. Hindell; Yan Ropert-Coudert; Anton P. Van de Putte; Horst Bornemann; Jean-Benoît Charrassin; Daniel P. Costa; Bruno Danis; Luis A. Hückstädt; Ian D. Jonsen; Mary-Anne Lea; Ryan R. Reisinger; David Thompson; Leigh G. Torres; Philip N. Trathan; Simon Wotherspoon; David G Ainley; Rachael Alderman; Virginia Andrews-Goff; Ben Arthur; Grant Ballard; John Bengtson; Marthán N. Bester; Lars Boehme; Charles-André Bost; Peter Boveng; Jaimie Cleeland; Rochelle Constantine; Robert J. M. Crawford; Luciano Dalla Rosa; P.J. Nico de Bruyn; Karine Delord; Sébastien Descamps; Mike Double; Louise Emmerson; Mike Fedak; Ari Friedlander; Nick Gales; Mike Goebel; Kimberly T. Goetz; Christophe Guinet; Simon D. Goldsworthy; Rob Harcourt; Jefferson Hinke; Kerstin Jerosch; Akiko Kato; Knowles R. Kerry; Roger Kirkwood; Gerald L. Kooyma; Kit M. Kovacs; Kieran Lawton; Andrew D. Lowther; Christian Lydersen; Phil O'B. Lyver; Azwianewi B. Makhado; Maria E. I. Márquez; Birgitte McDonald; Clive McMahon; Monica Muelbert; Dominik Nachtsheim; Keith W. Nicholls; Erling S. Nordøy; Silvia Olmastroni; Richard A. Phillips; Pierre Pistorius; Joachim Plötz; Klemens Pütz; Norman Ratcliffe; Peter G. Ryan; Mercedes Santos; Arnoldus Schytte Blix; Colin Southwell; Iain Staniland; Akinori Takahashi; Arnaud Tarroux; Wayne Trivelpiece; Ewan Wakefield; Henri Weimerskirch; Barbara Wienecke; José C. Xavier; Ben Raymond; Mark A. Hindell (2025). The Retrospective Analysis of Antarctic Tracking (Standardised) Data from the Scientific Committee on Antarctic Research [Dataset]. http://doi.org/10.4225/15/5afcb927e8162
    Explore at:
    Dataset updated
    Dec 17, 2025
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    SCAR - AntOBIS
    Authors
    Yan Ropert-Coudert; Anton P. Van de Putte; Horst Bornemann; Jean-Benoît Charrassin; Daniel P. Costa; Bruno Danis; Luis A. Hückstädt; Ian D. Jonsen; Mary-Anne Lea; Ryan R. Reisinger; David Thompson; Leigh G. Torres; Philip N. Trathan; Simon Wotherspoon; David G Ainley; Rachael Alderman; Virginia Andrews-Goff; Ben Arthur; Grant Ballard; John Bengtson; Marthán N. Bester; Lars Boehme; Charles-André Bost; Peter Boveng; Jaimie Cleeland; Rochelle Constantine; Robert J. M. Crawford; Luciano Dalla Rosa; P.J. Nico de Bruyn; Karine Delord; Sébastien Descamps; Mike Double; Louise Emmerson; Mike Fedak; Ari Friedlander; Nick Gales; Mike Goebel; Kimberly T. Goetz; Christophe Guinet; Simon D. Goldsworthy; Rob Harcourt; Jefferson Hinke; Kerstin Jerosch; Akiko Kato; Knowles R. Kerry; Roger Kirkwood; Gerald L. Kooyma; Kit M. Kovacs; Kieran Lawton; Andrew D. Lowther; Christian Lydersen; Phil O'B. Lyver; Azwianewi B. Makhado; Maria E. I. Márquez; Birgitte McDonald; Clive McMahon; Monica Muelbert; Dominik Nachtsheim; Keith W. Nicholls; Erling S. Nordøy; Silvia Olmastroni; Richard A. Phillips; Pierre Pistorius; Joachim Plötz; Klemens Pütz; Norman Ratcliffe; Peter G. Ryan; Mercedes Santos; Arnoldus Schytte Blix; Colin Southwell; Iain Staniland; Akinori Takahashi; Arnaud Tarroux; Wayne Trivelpiece; Ewan Wakefield; Henri Weimerskirch; Barbara Wienecke; José C. Xavier; Ben Raymond; Mark A. Hindell; Yan Ropert-Coudert; Anton P. Van de Putte; Horst Bornemann; Jean-Benoît Charrassin; Daniel P. Costa; Bruno Danis; Luis A. Hückstädt; Ian D. Jonsen; Mary-Anne Lea; Ryan R. Reisinger; David Thompson; Leigh G. Torres; Philip N. Trathan; Simon Wotherspoon; David G Ainley; Rachael Alderman; Virginia Andrews-Goff; Ben Arthur; Grant Ballard; John Bengtson; Marthán N. Bester; Lars Boehme; Charles-André Bost; Peter Boveng; Jaimie Cleeland; Rochelle Constantine; Robert J. M. Crawford; Luciano Dalla Rosa; P.J. Nico de Bruyn; Karine Delord; Sébastien Descamps; Mike Double; Louise Emmerson; Mike Fedak; Ari Friedlander; Nick Gales; Mike Goebel; Kimberly T. Goetz; Christophe Guinet; Simon D. Goldsworthy; Rob Harcourt; Jefferson Hinke; Kerstin Jerosch; Akiko Kato; Knowles R. Kerry; Roger Kirkwood; Gerald L. Kooyma; Kit M. Kovacs; Kieran Lawton; Andrew D. Lowther; Christian Lydersen; Phil O'B. Lyver; Azwianewi B. Makhado; Maria E. I. Márquez; Birgitte McDonald; Clive McMahon; Monica Muelbert; Dominik Nachtsheim; Keith W. Nicholls; Erling S. Nordøy; Silvia Olmastroni; Richard A. Phillips; Pierre Pistorius; Joachim Plötz; Klemens Pütz; Norman Ratcliffe; Peter G. Ryan; Mercedes Santos; Arnoldus Schytte Blix; Colin Southwell; Iain Staniland; Akinori Takahashi; Arnaud Tarroux; Wayne Trivelpiece; Ewan Wakefield; Henri Weimerskirch; Barbara Wienecke; José C. Xavier; Ben Raymond; Mark A. Hindell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1991 - Dec 31, 2015
    Area covered
    Description

    The Southern Ocean is a remote, hostile environment where conducting marine biology is challenging, so we know relatively little about this important region, which is critical as a habitat for breeding and foraging of many marine endotherms. Scientists from around the world have been tracking seals, penguins, petrels, whales and albatrosses for more than two decades to learn how they spend their time at sea. The Retrospective Analysis of Antarctic Tracking Data (RAATD), was initiated by the SCAR Expert Group on Marine Mammals (EG-BAMM) in 2010. This team has assembled tracking data shared by 38 biologists from 11 different countries to accumulate the largest animal tracking database in the world, containing information from 15 species, containing over 3,400 individual animals and almost 2.5 million at-sea locations. Analysing a dataset of this size brings its own challenges and the team is developing new and innovative statistical approaches to integrate these complex data. When complete RAATD will provide a greater understanding of fundamental ecosystem processes in the Southern Ocean, help predict the future of top predator distribution and help with spatial management planning.

  12. Data and code for: "Global Sampling Decline Erodes Science Potential of...

    • zenodo.org
    bin, zip
    Updated Nov 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Owen Forbes; Owen Forbes (2024). Data and code for: "Global Sampling Decline Erodes Science Potential of Natural History Collections" [Dataset]. http://doi.org/10.5281/zenodo.14010666
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Nov 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Owen Forbes; Owen Forbes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 1, 2024
    Description

    # GBIF Specimen Data Analysis and Forecasting

    This repository contains the code and data for analysing and forecasting trends in Global Biodiversity Information Facility (GBIF) specimen records across three major taxonomic groups: Chordata, Arthropoda, and Plantae.
    The analysis pipeline includes data cleaning, anomaly detection, primary analyses, and forecasting based on historical database snapshots.

    These scripts and data correspond to analyses in the following manuscript:

    Global Sampling Decline Erodes Science Potential of Natural History Collections

    Authors:
    Owen Forbes
    Andrew G. Young
    Peter H. Thrall


    ## Repository Structure

    The repository consists of three main Quarto (.qmd) scripts and associated data files:

    1. `1_DataCleaning_Forbes-et-al_2024.qmd`: Data cleaning and anomaly detection
    2. `2_PrimaryAnalyses_Forbes-et-al_2024.qmd`: Primary analyses and visualisation
    3. `3_SnapshotsForecasting_Forbes-et-al_2024.qmd`: Historical snapshot analysis and forecasting

    ## Requirements

    - R (version 4.3.2 or later)
    - Required R packages:
    - tidyverse (v2.0.0) - for data manipulation and visualization
    - readr (v2.1.5) - for reading CSV/TSV files
    - ggplot2 (v3.4.0 or v3.5.0) - for creating visualizations
    - rnaturalearth (v1.0.1) - for accessing natural earth map data
    - dplyr (v1.1.0 or v1.1.4) - for data manipulation
    - countrycode (v1.6.0) - for converting country names and codes
    - spdep (v1.3-3) - for spatial dependence modeling
    - sp (v1.6-0 or v2.1-3) - for spatial data manipulation
    - sf (v1.0-15 or v1.0-16) - for simple features access
    - data.table (v1.14.8) - for fast aggregation of large data
    - lubridate (v1.9.2) - for date-time manipulation
    - viridis (v0.6.3) - for color palettes
    - gridExtra (v2.3) - for arranging multiple plots
    - ggpubr (v0.6.0) - for creating publication-ready plots
    - zoo (v1.8-12) - for time series, including moving averages
    - scales (v1.3.0) - for graphical scales
    - forecast (v8.22.0) - for ARIMA forecast models
    - purrr (v1.0.2) - for mapping custom forecast function onto each dataset
    - arrow - for working with parquet files

    Install these packages before running the scripts.

    ## How to Use

    1. Download this repository to your local machine.
    2. Set your working directory to the location of the scripts.
    3. Download raw datasets from GBIF (as required)
    4. Ensure all required R packages are installed.
    5. Run the scripts in RStudio or your preferred R environment.

    ### Data Cleaning (`1_DataCleaning_Forbes-et-al_2024.qmd`)

    This script cleans the raw GBIF data and identifies anomalies. It produces files containing indexes of dataset records to be removed, which are used in subsequent analyses.

    **Note**: The raw GBIF exported datasets for contemporary records are not included in this repository due to file size constraints. Download them from the GBIF links provided in the script and place them in the `data/` directory.

    ### Primary Analyses (`2_PrimaryAnalyses_Forbes-et-al_2024.qmd`)

    This script performs the main analyses and generates visualisations. It uses the outputs from the data cleaning script to filter anomalous records.

    To reproduce all analysis stages from the original raw .csv files:
    - Start at the chunks labelled "DATA LOAD AND FILTERING".
    - Run the pipeline for non-spatial analyses before spatial analyses.
    - Due to memory constraints, it's recommended to run analyses for one taxonomic group and one analysis stream at a time.

    To skip to plot generation:
    - Navigate to sections tagged as "@! SKIP TO PLOTTING !@".
    - Ensure all required analysis output files are in the `data/` directory.

    ### Forecasting (`3_SnapshotsForecasting_Forbes-et-al_2024.qmd`)

    This script analyses historical GBIF database snapshots and forecasts future growth. It uses the cleaned snapshot data produced by the data cleaning script.

    ## Data Files

    ### GBIF Exports - Raw Data (not included on Zenodo due to file size, please download directly from GBIF)
    - `0016915-240425142415019.csv` for Chordata - https://www.gbif.org/occurrence/download/0016915-240425142415019

    - `0016914-240425142415019.csv` for Plantae - https://www.gbif.org/occurrence/download/0016914-240425142415019

    - `0016913-240425142415019.csv` for Arthropoda - https://www.gbif.org/occurrence/download/0016913-240425142415019

    ### Included Data Files

    #### Raw Data
    - `GBIF_snapshots.parquet` # Historical snapshots RAW dataset (arrow/parquet format)
    - `GBIF_integer_to_datasetKey.tsv` # Mapping old dataset IDs onto new datasetKey field

    #### Contemporary Datasets - data cleaning outputs
    - `chordata_counts_to_highlight_030724` # List of anomalous Chordata dataset + year indexes to filter
    - `arthropoda_counts_to_highlight_OG_030724` # List of anomalous Arthropoda dataset + year indexes to filter
    - `plantae_counts_to_highlight_030724` # List of anomalous Plantae dataset + year indexes to filter

    #### Cleaned Snapshots
    - `plantae_snapshots_filter_threshold_IN_040924` # Cleaned Plantae snapshots
    - `arthropoda_snapshots_filter_threshold_IN_040924` # Cleaned Arthropoda snapshots
    - `chordata_snapshots_filter_threshold_IN_040924` # Cleaned Chordata snapshots
    - `gbif_dates_df_anomaly_filtered_090724` # Anomaly-filtered snapshots (combined dataset)
    - `gbif_dates_df_anomalies_highlighted_090724` # Anomalies highlighted snapshots (combined dataset)

    #### Analysis Outputs - for skipping straight to plot/figure generation
    - `arthropoda_specimens_per_year_080724` # Arthropoda specimen counts per year
    - `arthropoda_unique_species_per_year_080724` # Arthropoda unique species counts per year
    - `arthropoda_grid_counts_080724` # Arthropoda grid counts
    - `chordata_specimens_per_year_080724` # Chordata specimen counts per year
    - `chordata_unique_species_per_year_080724` # Chordata unique species counts per year
    - `chordata_grid_counts_080724` # Chordata grid counts
    - `plantae_specimens_per_year_080724` # Plantae specimen counts per year
    - `plantae_unique_species_per_year_080724` # Plantae unique species counts per year
    - `plantae_grid_counts_080724` # Plantae grid counts
    - `chordata_continent_count_080724` # Chordata continent-specific counts
    - `arthropoda_continent_count_080724` # Arthropoda continent-specific counts
    - `plantae_continent_count_080724` # Plantae continent-specific counts

  13. The role of environmental factors in affecting species distribution: A joint...

    • zenodo.org
    • nde-dev.biothings.io
    pdf, zip
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomer Gueta; Tomer Gueta; Yohay Carmel; Yohay Carmel (2024). The role of environmental factors in affecting species distribution: A joint analysis of GBIF data and virtual species [Dataset]. http://doi.org/10.5281/zenodo.4295742
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tomer Gueta; Tomer Gueta; Yohay Carmel; Yohay Carmel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A vigorous debate among ecologists concerns two contrasting theories of species distribution and diversity, the niche theory and the neutral theory. The 'continuum hypothesis', supported by modelling results, maintains that rather than being mutually exclusive, these theories represent two ends of a continuum. Here we develop the first empirical test capable of distinguishing between these three theories using continental-scale occurrence data from GBIF and a novel simulation framework of corresponding virtual species; application of this test to a set of 84 Australian mammals supported the continuum hypothesis over the two competing theories.

    Repository contains:

    - Manuscript supplementary information (Sp.Dis_F1000-Supplementary.pdf)
    - All analysis data and code (analysis_data_and_code.zip)
    - GBIF raw data in a DwC-A format (0054618-160910150852091.zip). Data is also publicly available via GBIF, with the following DOI: https://doi.org/10.15468/dl.3poqxs

  14. Z

    GBIF animal distribution in Spain

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jan 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gómez Varela, Alba (2023). GBIF animal distribution in Spain [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7523014
    Explore at:
    Dataset updated
    Jan 15, 2023
    Authors
    Gómez Varela, Alba
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Spain
    Description

    CSV that contains 1.000 GBIF observations of animals than have been involved in wildlife–vehicle collision on interurban roads in Spain and a buffer of each species distribution calculated with these data. If you are interested in the whole country, please do not hesitate to contact me and I will forward it to you.

    Each record describes the observation by the following fields:

    gbifid: the unique identifier for an occurrence record in GBIF.
    
    datasetkey: the local dataset id within the GBIF network.
    
    occurrenceid: a unique identifier for the occurrence, allowing the same occurrence to be recognized across dataset versions as well as through data downloads and use.
    
    kingdom: the full scientific name specifying the kingdom that the occurrence's scientific name is classified under.
    
    phylum: the full scientific name of the phylum or division in which the taxon is classified.
    
    class: the full scientific name of the class in which the taxon is classified.
    
    order: the full scientific name of the order in which the taxon is classified.
    
    family: the full scientific name of the family in which the taxon is classified.
    
    genus: the full scientific name of the genus in which the taxon is classified.
    
    species: species classification key.
    
    infraspecificepithet: the name of the lowest or terminal infraspecific epithet of the scientificName, excluding any rank designation.
    
    taxonrank: the taxonomic rank of the supplied scientific name.
    
    scientificname: the full scientific name of the organism, to the lowest level taxonomic rank that is possible to supply, and including authorship and year of the name where applicable.
    
    verbatimscientificname: the taxonomic rank of the most specific name in the scientificName as it appears in the original record.
    
    verbatimscientificnameauthorship: non described.
    
    countrycode: a two-letter standard abbreviation for the country of the occurrence locality.
    
    locality: the specific description of the place.
    
    stateprovince: the name of the next smaller administrative region than country (state, province, canton, department, region, etc.) in which the Location occurs.
    
    occurrencestatus: a statement about the presence or absence of a Taxon at a Location.
    
    individualcount: to record the quantity of a species occurrence, e.g. as the number of individuals, percentage of vegetation coverage, or the biomass .
    
    publishingorgkey: the publishing organization key (a uuid).
    
    decimallatitude: the geographic latitude, resp., in decimal degrees. 
    
    decimallongitude: the geographic longitude, resp., in decimal degrees. 
    
    coordinateuncertaintyinmeters: the horizontal distance from the given decimalLatitude and decimalLongitude in meters, describing the smallest circle containing the whole of the Location.
    
    coordinateprecision: a decimal representation of the precision of the coordinates given in the decimalLatitude and decimalLongitude.
    
    elevation: elevation (altitude) in meters above sea level. Supports range queries.
    
    elevationaccuracy: non described.
    
    depth: depth in meters relative to altitude. For example 10 meters below a lake surface with given altitude. Supports range queries.
    
    depthaccuracy: non described.
    
    eventdate: the date or date interval during which the occurrence record was collected, following ISO 8601 date-time standard.
    
    day: the integer day of the month on which the Event occurred.
    
    month: the integer month in which the Event occurred.
    
    year: the four-digit year in which the Event occurred, according to the Common Era Calendar.
    
    taxonkey: a taxon key from the GBIF backbone. 
    
    specieskey: species classification key.
    
    basisofrecord: the type of the individual record, e.g. observation, physical specimen, fossil, living ex-situ, culture collection specimen.
    
    institutioncode: the name (or acronym) in use by the institution having custody of the object(s) or information referred to in the record.
    
    collectioncode: the name, acronym, coden, or initialism identifying the collection or data set from which the record was derived.
    
    catalognumber: an identifier (preferably unique) for the record within the data set or collection.
    
    recordnumber: an identifier given to the Occurrence at the time it was recorded. Often serves as a link between field notes and an Occurrence record, such as a specimen collector's number.
    
    identifiedby: a list (concatenated and separated) of names of people, groups, or organizations who assigned the Taxon to the subject.
    
    dateidentified: the date on which the subject was determined as representing the Taxon.
    
    license: a machine-readable statement of the rights assigned to the published dataset.
    
    rightsholder: a person or organization owning or managing rights over the resource.
    
    recordedby: the name of the institution or organization listed as the data publisher on GBIF.org.
    
    typestatus: a list (concatenated and separated) of nomenclatural types (type status, typified scientific name, publication) applied to the subject.
    
    establishmentmeans: The process by which the biological individual(s) represented in the Occurrence became established at the location.
    
    lastinterpreted: this date the record was last modified in GBIF, in ISO 8601 format: yyyy, yyyy-MM, yyyy-MM-dd, or MM-dd. 
    
    mediatype: the kind of multimedia associated with an occurrence as defined in GBIF MediaType enum
    
    issue: a specific interpretation issue as defined in GBIF OccurrenceIssue enum.
    
    geom (geometry): geometry from latitude and longitude position. Developed for this project.
    
    buff (geometry): buffer around 'geom' taking into account 'coordinateuncertaintyinmeters' and 'coordinateprecision'. Developed for this project.
    

    The context is the Final Master's Degree Project 'Analysis and Predictive Modelling of Wildlife–Vehicle Collision on Interurban Roads in Spain' (Data Science Master’s Degree of Universitat Oberta de Catalunya - UOC).

    This dataset is the output of the animal analysis and the code repository is available on GitHub.

  15. A biodiversity dataset graph: GBIF, iDigBio, BioCASe...

    • zenodo.org
    bin
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorrit H. Poelen; Jorrit H. Poelen (2023). A biodiversity dataset graph: GBIF, iDigBio, BioCASe hash://sha256/450deb8ed9092ac9b2f0f31d3dcf4e2b9be003c460df63dd6463d252bff37b55 hash://md5/898a9c02bedccaea5434ee4c6d64b7a2 [Dataset]. http://doi.org/10.5281/zenodo.7651831
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jorrit H. Poelen; Jorrit H. Poelen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A biodiversity dataset graph: GBIF, iDigBio, BioCASe hash://sha256/450deb8ed9092ac9b2f0f31d3dcf4e2b9be003c460df63dd6463d252bff37b55 hash://md5/898a9c02bedccaea5434ee4c6d64b7a2

    The intended use of this archive is to facilitate meta-analysis of the Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe). GBIF, iDigBio and BioCASe help provide access to biological data collections.

    This dataset provides versioned provenance logs of snapshots of the GBIF, iDigBio, BioCASe network as tracked by Preston [2] between 2018-09-03 and 2023-02-02 using "preston update -u https://gbif.org,https://idigbio.org,http://biocase.org".

    This publication contains two types of files: index files and provenance logs. Associated data files are hosted elsewhere for pragmatic reasons. Index files provide a way to link provenance files in time to establish a versioning mechanism. Provenance logs describe how, when, what and where the GBIF, iDigBio, BioCASe content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .

    To retrieve and verify the downloaded GBIF, iDigBio, BioCASe biodiversity dataset graph, use the preston[2] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar ls --remote https://zenodo.org/record/7651831/files > /dev/null

    Optionally, you can retrieve all associated data (>500GB) files using:

    $ java -jar preston.jar clone https://zenodo.org/record/7651831/files --remote https://zenodo.org/record/7651831/files,https://linker.bio,https://archive.org/download/biodiversity-dataset-archives/data.zip/data/

    Please note https://archive.org/download/biodiversity-dataset-archives/data.zip/data/ and https://linker.bio are Preston remotes that provided access to GBIF, iDigBio, BioCASe data files at time of writing (17 Feb 2023). These remotes can replaced with any other Preston remote(s) if needed. This may take a while depending on network speed and hardware constraints. See also https://archive.org/details/biodiversity-dataset-archives .

    After that, verify the index of the archive by reproducing the following provenance log history:

    $ java -jar preston.jar history

    hash://sha256/450deb8ed9092ac9b2f0f31d3dcf4e2b9be003c460df63dd6463d252bff37b55 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/1621a777e4a7442e9864424820c5f825d9cf1c65599cbfbbda039384f1b74ada .
    hash://sha256/1621a777e4a7442e9864424820c5f825d9cf1c65599cbfbbda039384f1b74ada http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/b1c11d9231768def925b9d076c1c4b711a727326ad99e62982aa4ede288e5aa2 .
    hash://sha256/b1c11d9231768def925b9d076c1c4b711a727326ad99e62982aa4ede288e5aa2 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/125e67d7c5077af8fa958569644d61e44a39bbbbdaaf16af0430dcf441e05cec .
    hash://sha256/125e67d7c5077af8fa958569644d61e44a39bbbbdaaf16af0430dcf441e05cec http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/63fae82e8a3aacd11e4a06b5736242aabe40802c6259a38de066de14848e3718 .
    hash://sha256/63fae82e8a3aacd11e4a06b5736242aabe40802c6259a38de066de14848e3718 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/7cd305e9d275763c96e7685847460fcc381b5c97c1460c00441f663c1788800f .
    hash://sha256/7cd305e9d275763c96e7685847460fcc381b5c97c1460c00441f663c1788800f http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/38e8e17f6742d39379b96cec2d4e70a5a63a85a28aee49727031c9061f4b1e03 .
    hash://sha256/38e8e17f6742d39379b96cec2d4e70a5a63a85a28aee49727031c9061f4b1e03 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/da7450941e7179c973a2fe1127718541bca6ccafe0e4e2bfb7f7ca9dbb7adb86 .
    hash://sha256/da7450941e7179c973a2fe1127718541bca6ccafe0e4e2bfb7f7ca9dbb7adb86 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/aab08c5c87ce6a8f400972e2b09b7fa3421947b59407a8feb98388d7e42b49e8 .
    hash://sha256/aab08c5c87ce6a8f400972e2b09b7fa3421947b59407a8feb98388d7e42b49e8 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/f449a34dd80d4e33248a1a7cb0d0fa2b8dac49865a0a32ed5bbaacb22addb0d1 .
    hash://sha256/f449a34dd80d4e33248a1a7cb0d0fa2b8dac49865a0a32ed5bbaacb22addb0d1 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/b771ed09aea78055e39d5c955997e5d9b42dd9edc6b094d9b8a27df16bdc6b6c .
    hash://sha256/b771ed09aea78055e39d5c955997e5d9b42dd9edc6b094d9b8a27df16bdc6b6c http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/15edbac974fb77347e07cda76358f7f662dd800bfc5b3e476fc66ecdc6203d03 .
    hash://sha256/15edbac974fb77347e07cda76358f7f662dd800bfc5b3e476fc66ecdc6203d03 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/4000d2a1af6da5b46f374038d884f91768782a1905d4a75fff3c8c3bb6629913 .
    hash://sha256/4000d2a1af6da5b46f374038d884f91768782a1905d4a75fff3c8c3bb6629913 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/f6d133620a665569a13a3fb7ca31b163bf849864812d447238994226d35e3253 .
    hash://sha256/f6d133620a665569a13a3fb7ca31b163bf849864812d447238994226d35e3253 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/e10c234ac54f02fd63da87b418f36428b876d91a30a42a4657e1726ba862b900 .
    hash://sha256/e10c234ac54f02fd63da87b418f36428b876d91a30a42a4657e1726ba862b900 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/83b4553edfc58e6389d427a08de533236e6a7eeb39b61239d225b0d4188d8c84 .
    hash://sha256/83b4553edfc58e6389d427a08de533236e6a7eeb39b61239d225b0d4188d8c84 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/49e079a0bac47ca17c0b14fa711b7742b9332ac64e1866adf13d294692720f9f .
    hash://sha256/49e079a0bac47ca17c0b14fa711b7742b9332ac64e1866adf13d294692720f9f http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/000cd23a8494a8a18f8b552e7f113af418eb2ae85e9908f61f44c720ce70608b .
    hash://sha256/000cd23a8494a8a18f8b552e7f113af418eb2ae85e9908f61f44c720ce70608b http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/1cce63489aa8618dcaf19ce2cd6166a7ba801798b235a25a725397d38c2fe957 .
    hash://sha256/1cce63489aa8618dcaf19ce2cd6166a7ba801798b235a25a725397d38c2fe957 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/251fa349c051bbda370decb7e5e58960d702add59f6e131ebf7c960d0f93b417 .
    hash://sha256/251fa349c051bbda370decb7e5e58960d702add59f6e131ebf7c960d0f93b417 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/810b22c16e1a3911c6eecfca348758d3ffd5b29fc36990015cda6427bdde2233 .
    hash://sha256/810b22c16e1a3911c6eecfca348758d3ffd5b29fc36990015cda6427bdde2233 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/dd16f4bae9a02ce71bc3ba4da2809cc5035743a4e23f61f5631f69b08d0e40f5 .
    hash://sha256/dd16f4bae9a02ce71bc3ba4da2809cc5035743a4e23f61f5631f69b08d0e40f5 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/bf676e2ce4164f8148a793188650f07c464dc52b2bfc07e92c9f16041baba8d5 .
    hash://sha256/bf676e2ce4164f8148a793188650f07c464dc52b2bfc07e92c9f16041baba8d5 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/1fd3e156c6ba1632a27b2bebaea36f76afeac8dfecf530d772988832821304ea .
    hash://sha256/1fd3e156c6ba1632a27b2bebaea36f76afeac8dfecf530d772988832821304ea http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/4dbee404b74775cac279e0e7fbc1aa72dddfc70df02b07b9a2f82023dccd4732 .
    hash://sha256/4dbee404b74775cac279e0e7fbc1aa72dddfc70df02b07b9a2f82023dccd4732 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/1f25e9a78ad0630ead9676807269185761f0d23544a4492a0337c2d306b10686 .
    hash://sha256/1f25e9a78ad0630ead9676807269185761f0d23544a4492a0337c2d306b10686 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/20b07c79ff0d48c818e2882816948ed192d5c86bdff2118881d7446b15e63bf1 .
    hash://sha256/20b07c79ff0d48c818e2882816948ed192d5c86bdff2118881d7446b15e63bf1 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/6b7bc4dc5901a459663f47628768b53622eda36bd0fa092390c6d1c0323abf6d .
    hash://sha256/6b7bc4dc5901a459663f47628768b53622eda36bd0fa092390c6d1c0323abf6d http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/d98e3bd2bc717bc11a3338cd43fc488bde1d96cb42d8cbe8301f0d9f9753007f .
    hash://sha256/d98e3bd2bc717bc11a3338cd43fc488bde1d96cb42d8cbe8301f0d9f9753007f http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/1d184ff657913a77d50b9f33b5bd1f483220fd83f26dbf02c020f98c778aafae .
    hash://sha256/1d184ff657913a77d50b9f33b5bd1f483220fd83f26dbf02c020f98c778aafae http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/a35ec845ec71a2951652d70e574e6280c843879efc3b1639e9ccdb4fbfd45e69 .
    hash://sha256/a35ec845ec71a2951652d70e574e6280c843879efc3b1639e9ccdb4fbfd45e69 http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/8ff46ae6a30bf9647df0294b92434a83784626b3f8c37163db3edefb049daead .
    hash://sha256/8ff46ae6a30bf9647df0294b92434a83784626b3f8c37163db3edefb049daead http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/afc2a4caa7f07503ccda9154d34dea1852c8283dee5cb4c5df7ddb3ce238ab7d .
    hash://sha256/afc2a4caa7f07503ccda9154d34dea1852c8283dee5cb4c5df7ddb3ce238ab7d http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/5cf1a9e491f218a94af5439f90beb905ae923f94cdd85c542d85c74c241f9e6e .
    hash://sha256/5cf1a9e491f218a94af5439f90beb905ae923f94cdd85c542d85c74c241f9e6e http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/8aacce08462b87a345d271081783bdd999663ef90099212c8831db399fc0831b .
    hash://sha256/8aacce08462b87a345d271081783bdd999663ef90099212c8831db399fc0831b http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/f13b15a20e4fe70b4a111e67ac20ef676404b8456dfc39694f2cb3a4c62a2b2d .
    hash://sha256/f13b15a20e4fe70b4a111e67ac20ef676404b8456dfc39694f2cb3a4c62a2b2d http://www.w3.org/ns/prov#wasDerivedFrom hash://sha256/3b39831bcc286c1db44787e21b736378f5847a16b7c39bdac3dd2011e9189dc1

  16. gbif.org Website Traffic, Ranking, Analytics [November 2025]

    • semrush.ebundletools.com
    Updated Dec 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2025). gbif.org Website Traffic, Ranking, Analytics [November 2025] [Dataset]. https://semrush.ebundletools.com/website/gbif.org/overview/
    Explore at:
    Dataset updated
    Dec 13, 2025
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://semrush.ebundletools.com/company/legal/terms-of-service/https://semrush.ebundletools.com/company/legal/terms-of-service/

    Time period covered
    Dec 13, 2025
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    gbif.org is ranked #7412 in MX with 614.09K Traffic. Categories: Science. Learn more about website traffic, market share, and more!

  17. Large herbaria and their contribution to GBIF at the time of this analysis.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Yesson; Peter W. Brewer; Tim Sutton; Neil Caithness; Jaspreet S. Pahwa; Mikhaila Burgess; W. Alec Gray; Richard J. White; Andrew C. Jones; Frank A. Bisby; Alastair Culham (2023). Large herbaria and their contribution to GBIF at the time of this analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0001124.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Chris Yesson; Peter W. Brewer; Tim Sutton; Neil Caithness; Jaspreet S. Pahwa; Mikhaila Burgess; W. Alec Gray; Richard J. White; Andrew C. Jones; Frank A. Bisby; Alastair Culham
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    *Source: Index Herbariorum http://sciweb.nybg.org/science2/IndexHerbariorum.asp Accessed 10/2005. # Source: http://www.gbif.org Accessed 10/2005. Note: K now has c. 140,000 records, GH has c.220,000 records, and US has c.766,000 records on GBIF (09/2007), some other institutions have increased their online records substantially during the past 24 months.

  18. Species occurrence and occupancy in protected areas of the Natura2000...

    • zenodo.org
    • nde-dev.biothings.io
    • +1more
    csv
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damiano Oldoni; Damiano Oldoni; Peter Desmet; Peter Desmet; Tim Adriaens; Tim Adriaens (2024). Species occurrence and occupancy in protected areas of the Natura2000 network in Belgium [Dataset]. http://doi.org/10.5281/zenodo.3784227
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Damiano Oldoni; Damiano Oldoni; Peter Desmet; Peter Desmet; Tim Adriaens; Tim Adriaens
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Belgium
    Description

    Context

    Invasive alien species have been pointed out as an important driver of biodiversity loss. Many policy responses are being developed to address this threat. Protected areas often represent and preserve hotspots of biological diversity and ensure the maintenance of ecosystem services crucial to human livelihoods. The impact of biological invasions can be particularly severe in protected areas and their occurrence and impact in such areas is an important element of the risk they pose. To address this, there is a need for data on the occurrence and extent of alien species invasions in protected areas.

    Description

    This dataset contains species occurrence and occupancy in protected areas of the Natura2000 network in Belgium (Special Conservation Areas sensu Habitat Directive and Special Protection Areas sensu Bird Directive). The dataset was generated using the Belgian occurrence cube at species level and the Belgian occurrence cube for non-native taxa (both containing GBIF data aggregated using Oldoni et al. 2020), the 1x1km EEA reference grid and the Natura2000 protected areas shapefiles from the European Environment Agency.

    Data are grouped by protected area (SITECODE), year (year) and (infra)species (taxonKey, speciesKey). For each group, it provides the number of occurrences found in GBIF (n), the area of occupancy (aoo: number of 1 km2 squares), the coverage (coverage: % of 1 km2 squares), the minimum coordinateUncertaintyInMeters (min_coord_uncertainty), and the alien status (is_alien) based on the Global Register of Introduced and Invasive Species - Belgium. For infraspecific taxa in the latter, the alien status of the species is looked up and included.

    The dataset is built on open science principles and intended to be completely reproducible:

    • The input data are publicly available on Zenodo, with the download DOIs listed in the related identifiers of this dataset package.
    • The code to process the data is publicly available and documented on GitHub.

    Files

    • protected_areas_species_occurrence.csv: number of occurrences (n), area of occupancy (aoo) and coverage of taxa (taxonKey) in Natura2000 areas of Belgium (SITECODE). Other columns included: speciesKey (for species is speciesKey = taxonKey), SITETYPE containing the site type of the Natura2000 area (one of A, B or C), min_coord_uncertainty with the lowest coordinate uncertainty in meters, is_alien containing the alien status (TRUE or FALSE) and remarks containing, if present, the infraspecific alien taxa whose occurrences contribute to the calculated aoo (only for species).
    • protected_areas_species_info.csv: taxonomic information of taxa in protected_areas_species_occurrence.csv as retrieved from GBIF Backbone Taxonomy. Columns: taxonKey, speciesKey, scientificName, kingdom, phylum, order, class, genus, family, species, rank and includes. The latter contains the infraspecific taxa and synonyms whose occurrences contribute to the number of occurrences at species level.
    • protected_areas_metadata.csv: protected area information for areas included in protected_areas_species_occurrence.csv. Columns: SITECODE as in protected_areas_species_occurrence.csv (BE*******), SITENAME containing the name of the protected area, SITETYPE as in protected_areas_species_occurrence.csv, flanders, wallonia and brussels containing whether the area is situated respectively in Flanders, Wallonia or Brussels-Capital Region (TRUE or FALSE). Field codes are in line with EEA element definitions for Natura 2000 sites.

    Potential use of the dataset

    Currently, there is no comprehensive reporting system for invasive alien species in Natura 2000 sites. This dataset provides a baseline as to which species occur in which protected area. We envisage this dataset can be an interesting starting point for various types of analyses on alien species in protected areas in Belgium, but that it can also be used in complement to other data on alien species in protected areas to study more general patterns. Some examples of research questions:

    • Which protected areas are most invaded by alien species
    • Which alien species are most distributed in protected areas and which traits do they have
    • How does the proportion of alien species in protected areas change in time
    • How does the occurrence/occupancy of alien species in protected areas match lists of regulated species (e.g. Union List, EPPO lists)
    • To what extent can the network of protected areas contribute to providing safe refuge to native species from the impacts of invasive alien species
    • How widespread are the impacts of certain alien species on protected areas

    Acknowledgements

    This work has been funded under the Belgian Science Policies Brain program (BelSPO BR/165/A1/TrIAS), the European Union's LIFE program (LIFE19 NAT/BE/000953 - LIFE RIPARIAS).

  19. Raw dataset on terrestrial plants occuring in Brazil

    • figshare.com
    • dataon.kisti.re.kr
    rar
    Updated Mar 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bruno Ribeiro (2022). Raw dataset on terrestrial plants occuring in Brazil [Dataset]. http://doi.org/10.6084/m9.figshare.14611776.v1
    Explore at:
    rarAvailable download formats
    Dataset updated
    Mar 24, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Bruno Ribeiro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    We have made available two databases of the Brazilian flora. The “raw database” contains data on terrestrial plant species after excluding records with invalid or missing taxonomic and georeferenced information, records outside Brazil, or from uncertain sources (i.e., the pre-filter step of the workflow). The results of each test used to flag data quality are appended in separate fields in this database and retrieved as TRUE or FALSE, in which the former indicates correct records and the latter potentially problematic or suspect records. It is worth noting that the “raw” database contains records with names not found in the Flora do Brasil and with taxonomic, spatial, and temporal issues.The “fitness-for-use” database is a filter of the “raw” database and only contains valid records that passed all data quality tests. Consequently, the result of each cleaning test is not shown. This database includes verified and standardized data on species taxonomy, geolocation, and date of collection. The databases contain data on conservation status, distribution, and establishment retrieved directly from the Brazilian Flora 2020 and accessed through the flora R package (Carvalho, 2017). Importantly, records lacking information on collecting date were not removed because they are fit-for-use for some biodiversity applications even when date information is missing.We have made available two databases of the Brazilian flora. First, a “raw” database (n = 12,762,595 records) containing the results of data quality tests appended in separate fields. This database includes records of algae and fungi species, records of species with non-accepted names, and records with taxonomic, spatial, and temporal issues. Second, a “fit-for-use” or “cleaned” database, containing 4,070,313 records of 38,207 species from 432 families. This database includes data on land plants occurring in Brazil (angiosperm, gymnosperm, ferns and lycophytes, and bryophyte), except algae and fungi species and records lacking information on collecting data.

  20. d

    Data from: GBIF - Global Biodiversity Information Facility

    • dknet.org
    • rrid.site
    Updated Aug 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). GBIF - Global Biodiversity Information Facility [Dataset]. http://doi.org/10.17616/R3J014
    Explore at:
    Dataset updated
    Aug 14, 2024
    Description

    The Global Biodiversity Information Facility (GBIF) was established by governments in 2001 to encourage free and open access to biodiversity data, via the Internet. Through a global network of countries and organizations, GBIF promotes and facilitates the mobilization, access, discovery and use of information about the occurrence of organisms over time and across the planet. GBIF provides three core services and products: # An information infrastructure an Internet-based index of a globally distributed network of interoperable databases that contain primary biodiversity data information on museum specimens, field observations of plants and animals in nature, and results from experiments so that data holders across the world can access and share them # Community-developed tools, standards and protocols the tools data providers need to format and share their data # Capacity-building the training, access to international experts and mentoring programs that national and regional institutions need to become part of a decentralized network of biodiversity information facilities. GBIF and its many partners work to mobilize the data, and to improve search mechanisms, data and metadata standards, web services, and the other components of an Internet-based information infrastructure for biodiversity. GBIF makes available data that are shared by hundreds of data publishers from around the world. These data are shared according to the GBIF Data Use Agreement, which includes the provision that users of any data accessed through or retrieved via the GBIF Portal will always give credit to the original data publishers. * Explore Species: Find data for a species or other group of organisms. Information on species and other groups of plants, animals, fungi and micro-organisms, including species occurrence records, as well as classifications and scientific and common names. * Explore Countries: Find data on the species recorded in a particular country, territory or island. Information on the species recorded in each country, including records shared by publishers from throughout the GBIF network. * Explore Datasets: Find data from a data publisher, dataset or data network. Information on the data publishers, datasets and data networks that share data through GBIF, including summary information on 10028 datasets from 419 data publishers.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Akshat Pant (2023). Gap Analysis of Agrobiodiversity data in GBIF and the NAL Thesaurus [Dataset]. http://doi.org/10.15482/USDA.ADC/1466041

Data from: Gap Analysis of Agrobiodiversity data in GBIF and the NAL Thesaurus

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Dec 14, 2023
Dataset provided by
Ag Data Commons
Authors
Akshat Pant
License

U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically

Description

This dataset contains all documents, the text and the pdf files, as well as the code that was used to carry out the term analysis of agriculturally relevant organisms in GBIF. The Global Biodiversity Information Facility (GBIF) is an international network and research infrastructure funded by the world's governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth. The National Agricultural Library Thesaurus (NALT) has online vocabulary tools of agricultural terms. My task was to use the agricultural terms from the NALT and analyze the agriculturally relevant organisms in GBIF. Some of the goals were:

Get descriptive statistics about Agrobiodiversity Data (AgData) in GBIF Create visualizations to view occurrence trends of the GBIF corpus and AgData in GBIF to determine gaps or biases. Provide examples of and code for how agricultural researchers can work with GBIF data.

Details about the process and the methodologies used to carry out this analysis I started off with trying to extract names from the Agricultural Thesaurus. I encountered some problems trying to extract names using the RDF format in the Thesaurus. An employee at the Library later provided me with the names in the Thesaurus in a text file. I then proceeded to extract the scientific names from that text file to run them through the GBIF API. Since there were so many of the names, the API would throw a connection error. The API can handle only so many requests in a particular interval of time. To handle this, I leveraged exception handling in Python. Every time the API threw an error, I told the script to wait for 5 seconds and then resume sending requests. Although this took a lot of time, it allowed me to get data such as year of occurrence, coordinate values about the ag relevant data from the API.

Technology

I used Python because it is has support for both web scraping and data analysis, both of which were needed for this project. I used Jupyter notebooks, run through Anaconda. Project Jupyter is a non-profit, open-source project that supports interactive data analysis and scientific computing. It allows users to code right in our browser and eliminates the need to install any other Integrated Development Environment, and also makes it very convenient to share our code. The main packages used in this project are pandas for data manipulation, requests and json to interact with the GBIF API, NumPy which adds support for array and matrix operations and more. Tableau and matplotlib has been used to create visualizations after performing the analysis in Python. Resources in this dataset:Resource Title: Code. File Name: Code.zipResource Description: This zip file contains multiple Jupyter notebooks that contain the code for all the analysis.Resource Software Recommended: Jupyter notebook,url: http://jupyter.org/ Resource Title: Visualizations. File Name: Visualizations.zipResource Description: This zip file contains Tableau workbooks for the visualizations.Resource Software Recommended: Tableau,url: https://www.tableau.com/ Resource Title: Corpus. File Name: Corpus.zipResource Description: This zip file contains the two datasets of family Apidae and Reduviidae.

Search
Clear search
Close search
Google apps
Main menu