1 dataset found
  1. Z

    Global Biotic Interactions: Taxon Graph...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poelen, Jorrit H (2025). Global Biotic Interactions: Taxon Graph hash://sha256/0b58753e4ff5519442689d866c0f1d19ffa7d97f917144df1d1cd56ea756921d hash://md5/b23bd0210c88ca10c3e3253091f4fdfa [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_755513
    Explore at:
    Dataset updated
    Feb 13, 2025
    Authors
    Poelen, Jorrit H
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Global Biotic Interactions: Taxon Cache and Taxon Map

    Global Biotic Interactions (GloBI) provides access to existing species interaction datasets (Poelen et al. 2014, http://globalbioticinteractions.org). As part of the dataset integration and aggregation, a best effort is made to resolve, match and link taxonomic names and associated vernacular/common names, hierarchies and thumbnails.

    The data archives included in this publication contain established taxonomic links (taxonMap.tsv.gz) and taxonomic information (taxonCache.tsv.gz) that GloBI retrieved and integrated from taxonomic name sources and web services associated with http://itis.gov, http://globalnames.org, http://eol.org and others open data services.

    While GloBI is not a naming authority and the primary goal of the name matching process is to detect incorrect or outdates names, the archives may serve as an example of how to publish denormalized taxonomic records and their interrelatioships in a pragmatic way.

    For related discussion threads, see https://github.com/globalbioticinteractions/globalbioticinteractions/issues/145 , https://github.com/globalbioticinteractions/globalbioticinteractions/issues/274 , https://github.com/globalbioticinteractions/globalbioticinteractions/issues/70 , https://github.com/EOL/tramea/issues/10 and https://github.com/globalbioticinteractions/globalbioticinteractions/issues/274 .

    Files README this file

    taxonCache.tsv.gz Taxonomic name, ids, hierarchies, common names and thumbnail associated to taxa known to GloBI. taxonCache.tsv.sha256 sha256 hash of taxonCache.tsv

    taxonCacheFirst10.tsv Header and 10 following lines from taxonCache.tsv

    taxonCacheFirst10.tsv.sha256 sha256 hash of taxonCacheFirst10.tsv taxonMap.tsv.gz Links between taxon name and ids across various taxon providers.

    taxonMap.tsv.sha256 sha256 hash of taxonMap.tsv

    taxonMapFirst10.tsv Header and 10 following lines from taxonMap.tsv taxonMapFirst10.tsv.sha256 sha256 hash of taxonMapFirst10.tsv

    prefixes.tsv Term prefixes and their associated uri schemes.

    names.tsv.gz Corpus of names used to resolve and link. Generated using https://github.com/globalbioticinteractions/elton .

    names.tsv.sha256 sha256 hash of names.tsv

    namesUnresolved.tsv.gz Names that are not (yet) linked to name sources using https://github.com/globalbioticinteractions/nomer .

    namesUnresolved.tsv.sha256 sha256 hash of namesUnresolved.tsv

    Column Descriptions

    taxonCache.tsv.gz

    1 | id  2 | name  3 | rank  4 | commonNames  5 | path  6 | pathIds   7 | pathNames  8 | externalUrl  9 | thumbnailUrl  taxonMap.tsv.gz
    
    1 | providedTaxonId  2 | providedTaxonName  3 | resolvedTaxonId  4 | resolvedTaxonName
    

    names.tsv.gz

    1 | providedTaxonId  2 | providedTaxonName
    

    namesUnresolved.tsv.gz

    1 | providedTaxonId  2 | providedTaxonName
    

    References

    Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.

    Updates

    org.globalbioticinteractions.taxon v0.3, 2018-03-02

    This taxon archive version was created by taking GloBI taxon v0.2 (Jan 2018) and appending a semi-automatically created WikiData taxon mapping and taxon cache.

    org.globalbioticinteractions.taxon v0.3.1, 2018-04-05

    This taxon archive version was created by taking GloBI taxon v0.2 (Jan 2018) and appending an automatically created WikiData taxon mapping and taxon cache using Apache Spark scripts at https://github.com/bio-guoda/guoda-datasets/tree/master/wikidata .

    org.globalbioticinteractions.taxon v0.3.2, 2018-05-21

    This taxon archive version includes the following:

    1. all lines in taxonMap.tsv.gz v0.3.1 that passed all validate-term-link tests defined in nomer v0.0.7 (see https://doi.org/10.5281/zenodo.1249964 or https://github.com/globalbioticinteractions/nomer/releases/tag/0.0.7).

    2. all lines in taxonCache.tsv.gz. v0.3.1 that passed all validate-term tests defined in nomer v0.0.7

    3. all lines in 1. that did not pass the validate-term test, were re-resolved using nomer v0.0.7 commands "append globi-enrich" and "append globi-globalnames". Only SAME_AS and SYNONYM_OF matches were used to generate new entries for taxonCache and taxonMap.

    4. in addition, elton v0.4.5 (see https://doi.org/10.5281/zenodo.1212599 or https://github.com/globalbioticinteractions/elton/releases/tag/0.4.5) was used to generate an up-to-date names list by running the "update" and "names" commands on 18-19 May 2018. Of the resulting names, only id/names pairs that were unknown to the taxon graph were resolved using the "append globi-enrich" and "append globi-globalnames" commands of nomer v0.0.7. Only matches classified as SAME_AS and SYNONYM_OF were used to generate new entries for taxonCache and taxonMap.

    5. the updated versions of taxonMap.tsv.gz and taxonCache.tsv.gz were produced by appending result of 1., 2., 3. and 4. , removing duplicate lines and sorting the result.

    6. finally, the resulting taxonMap.tsv.gz. and taxonCache.tsv.gz files were validated using the nomer v0.0.7 validate-term-link and validate-term commands, respectively. The result indicated that all lines (other than the header) passed the validation tests.

    org.globalbioticinteractions.taxon v0.3.3, 2018-06-12

    This taxon archive version includes the following:

    1. normalizing taxonomic ranks using nomer's taxon rank matcher

    2. include more manual taxonomic name mappings provided by Brian Hayden and collaborators.

    3. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.1286023 .

    4. remove mapping to NCBI taxa with name "Small" (and associated OTT).

    org.globalbioticinteractions.taxon v0.3.4, 2018-06-27

    1. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.1286023

    Please note that nomer and elton rely on web accessible apis like taxonomy resolution services and data portals. This dependence on external web-only accessible services might make reproduction of the results tricky due to network outages, server failures, upgrades, downgrades, data loss and/or abandonment of informatics projects/ datasets.

    org.globalbioticinteractions.taxon v0.3.5, 2018-06-28

    1. remove dubious provided name from taxon map. Names include "no name", "unidentified".2. remove dubious mappings to Pavlova (e.g., Unidentified Amoebozoa -> Pavlova). Related to 1.3. remove dubious mappings to resolve taxa that include names like "unidentified" or "organic species"4. removed dubious mappings to "Boiga dendrophila"5. removed dubious mappings from "Chaetognatha" (arrowworm) to a suspected homonym Lepidoptera GBIF:3257692 and IRMNG:12526516. removed dubious mappings from "small sharks" to multiple NCBI/OTT terms with name "Small"

    Please note that nomer and elton rely on web accessible apis like taxonomy resolution services and data portals. This dependence on external web-only accessible services might make reproduction of the results tricky due to network outages, server failures, upgrades, downgrades, data loss and/or abandonment of informatics projects/ datasets.

    org.globalbioticinteractions.taxon v0.3.6, 2018-09-10

    1. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.1286023

    org.globalbioticinteractions.taxon v0.3.7, 2018-10-18

    1. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.12860232. remove dubious mapping to Vertebrata (WORMS:370321 , http://www.marinespecies.org/aphia.php?p=taxdetails&id=370321). Also see https://github.com/globalbioticinteractions/globalbioticinteractions/issues/361 .3. remove dubious mapping to NCBITaxon:1585532 (Beta vulgaris/Cercospora beticola mixed EST library). Also see https://github.com/globalbioticinteractions/globalbioticinteractions/issues/346 and https://github.com/Planteome/samara/issues/50

    org.globalbioticinteractions.taxon v0.3.8, 2018-11-15

    1. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.1286023

    org.globalbioticinteractions.taxon v0.3.9, 2018-11-23

    1. label deprecated EOL ids by applying patches in http://doi.org/10.5281/zenodo.1495266 to taxonMap.tsv.gz and taxonCache.tsv.gz . Related to https://github.com/globalbioticinteractions/globalbioticinteractions/issues/383 .2. remove all Encyclopedia of Life thumbnail urls from taxonCache. Related to https://github.com/globalbioticinteractions/globalbioticinteractions/issues/381 .3. remove Encyclopedia of Life external urls associated with deprecated ids from taxonCache.

    org.globalbioticinteractions.taxon v0.3.10, 2018-11-26

    1. Remove suspicious name mappings related to Humpback scorpionfish (Scorpaenopsis gibbosa) by applying patch published in Poelen, Jorrit H. (2018). Global Biotic Interactions: Taxon Graph Patches (Version 0.2. [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1560662

    org.globalbioticinteractions.taxon v0.3.11, 2018-12-21

    1. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.12860232. remove suspicious name mappings using: zcat taxonMap.tsv.gz | grep -v -i -P "\tnone\t" | grep -v -P "(GBIF|IRMNG):.*\tBrachyura$" | grep -v -P "Gamarus" | grep -v -P "^EOL:1047365\ttrachurus trachurus" | grep -v -P "Loros\t.*Psittacidae" | grep -v -P "(GBIF|IRMNG).*Lucifer$" | grep -v -P "GBIF.*Diadema$" | gzip > taxonMapUpdated.tsv.gz

    org.globalbioticinteractions.taxon v0.3.12, 2019-06-05

    1. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.3240558

    org.globalbioticinteractions.taxon v0.3.13, 2019-06-12

    1. update taxonCache and taxonMap using automated scripts available at
  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Poelen, Jorrit H (2025). Global Biotic Interactions: Taxon Graph hash://sha256/0b58753e4ff5519442689d866c0f1d19ffa7d97f917144df1d1cd56ea756921d hash://md5/b23bd0210c88ca10c3e3253091f4fdfa [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_755513

Global Biotic Interactions: Taxon Graph hash://sha256/0b58753e4ff5519442689d866c0f1d19ffa7d97f917144df1d1cd56ea756921d hash://md5/b23bd0210c88ca10c3e3253091f4fdfa

Explore at:
Dataset updated
Feb 13, 2025
Authors
Poelen, Jorrit H
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Global Biotic Interactions: Taxon Cache and Taxon Map

Global Biotic Interactions (GloBI) provides access to existing species interaction datasets (Poelen et al. 2014, http://globalbioticinteractions.org). As part of the dataset integration and aggregation, a best effort is made to resolve, match and link taxonomic names and associated vernacular/common names, hierarchies and thumbnails.

The data archives included in this publication contain established taxonomic links (taxonMap.tsv.gz) and taxonomic information (taxonCache.tsv.gz) that GloBI retrieved and integrated from taxonomic name sources and web services associated with http://itis.gov, http://globalnames.org, http://eol.org and others open data services.

While GloBI is not a naming authority and the primary goal of the name matching process is to detect incorrect or outdates names, the archives may serve as an example of how to publish denormalized taxonomic records and their interrelatioships in a pragmatic way.

For related discussion threads, see https://github.com/globalbioticinteractions/globalbioticinteractions/issues/145 , https://github.com/globalbioticinteractions/globalbioticinteractions/issues/274 , https://github.com/globalbioticinteractions/globalbioticinteractions/issues/70 , https://github.com/EOL/tramea/issues/10 and https://github.com/globalbioticinteractions/globalbioticinteractions/issues/274 .

Files README this file

taxonCache.tsv.gz Taxonomic name, ids, hierarchies, common names and thumbnail associated to taxa known to GloBI. taxonCache.tsv.sha256 sha256 hash of taxonCache.tsv

taxonCacheFirst10.tsv Header and 10 following lines from taxonCache.tsv

taxonCacheFirst10.tsv.sha256 sha256 hash of taxonCacheFirst10.tsv taxonMap.tsv.gz Links between taxon name and ids across various taxon providers.

taxonMap.tsv.sha256 sha256 hash of taxonMap.tsv

taxonMapFirst10.tsv Header and 10 following lines from taxonMap.tsv taxonMapFirst10.tsv.sha256 sha256 hash of taxonMapFirst10.tsv

prefixes.tsv Term prefixes and their associated uri schemes.

names.tsv.gz Corpus of names used to resolve and link. Generated using https://github.com/globalbioticinteractions/elton .

names.tsv.sha256 sha256 hash of names.tsv

namesUnresolved.tsv.gz Names that are not (yet) linked to name sources using https://github.com/globalbioticinteractions/nomer .

namesUnresolved.tsv.sha256 sha256 hash of namesUnresolved.tsv

Column Descriptions

taxonCache.tsv.gz

1 | id  2 | name  3 | rank  4 | commonNames  5 | path  6 | pathIds   7 | pathNames  8 | externalUrl  9 | thumbnailUrl  taxonMap.tsv.gz

1 | providedTaxonId  2 | providedTaxonName  3 | resolvedTaxonId  4 | resolvedTaxonName

names.tsv.gz

1 | providedTaxonId  2 | providedTaxonName

namesUnresolved.tsv.gz

1 | providedTaxonId  2 | providedTaxonName

References

Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.

Updates

org.globalbioticinteractions.taxon v0.3, 2018-03-02

This taxon archive version was created by taking GloBI taxon v0.2 (Jan 2018) and appending a semi-automatically created WikiData taxon mapping and taxon cache.

org.globalbioticinteractions.taxon v0.3.1, 2018-04-05

This taxon archive version was created by taking GloBI taxon v0.2 (Jan 2018) and appending an automatically created WikiData taxon mapping and taxon cache using Apache Spark scripts at https://github.com/bio-guoda/guoda-datasets/tree/master/wikidata .

org.globalbioticinteractions.taxon v0.3.2, 2018-05-21

This taxon archive version includes the following:

  1. all lines in taxonMap.tsv.gz v0.3.1 that passed all validate-term-link tests defined in nomer v0.0.7 (see https://doi.org/10.5281/zenodo.1249964 or https://github.com/globalbioticinteractions/nomer/releases/tag/0.0.7).

  2. all lines in taxonCache.tsv.gz. v0.3.1 that passed all validate-term tests defined in nomer v0.0.7

  3. all lines in 1. that did not pass the validate-term test, were re-resolved using nomer v0.0.7 commands "append globi-enrich" and "append globi-globalnames". Only SAME_AS and SYNONYM_OF matches were used to generate new entries for taxonCache and taxonMap.

  4. in addition, elton v0.4.5 (see https://doi.org/10.5281/zenodo.1212599 or https://github.com/globalbioticinteractions/elton/releases/tag/0.4.5) was used to generate an up-to-date names list by running the "update" and "names" commands on 18-19 May 2018. Of the resulting names, only id/names pairs that were unknown to the taxon graph were resolved using the "append globi-enrich" and "append globi-globalnames" commands of nomer v0.0.7. Only matches classified as SAME_AS and SYNONYM_OF were used to generate new entries for taxonCache and taxonMap.

  5. the updated versions of taxonMap.tsv.gz and taxonCache.tsv.gz were produced by appending result of 1., 2., 3. and 4. , removing duplicate lines and sorting the result.

  6. finally, the resulting taxonMap.tsv.gz. and taxonCache.tsv.gz files were validated using the nomer v0.0.7 validate-term-link and validate-term commands, respectively. The result indicated that all lines (other than the header) passed the validation tests.

org.globalbioticinteractions.taxon v0.3.3, 2018-06-12

This taxon archive version includes the following:

  1. normalizing taxonomic ranks using nomer's taxon rank matcher

  2. include more manual taxonomic name mappings provided by Brian Hayden and collaborators.

  3. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.1286023 .

  4. remove mapping to NCBI taxa with name "Small" (and associated OTT).

org.globalbioticinteractions.taxon v0.3.4, 2018-06-27

  1. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.1286023

Please note that nomer and elton rely on web accessible apis like taxonomy resolution services and data portals. This dependence on external web-only accessible services might make reproduction of the results tricky due to network outages, server failures, upgrades, downgrades, data loss and/or abandonment of informatics projects/ datasets.

org.globalbioticinteractions.taxon v0.3.5, 2018-06-28

  1. remove dubious provided name from taxon map. Names include "no name", "unidentified".2. remove dubious mappings to Pavlova (e.g., Unidentified Amoebozoa -> Pavlova). Related to 1.3. remove dubious mappings to resolve taxa that include names like "unidentified" or "organic species"4. removed dubious mappings to "Boiga dendrophila"5. removed dubious mappings from "Chaetognatha" (arrowworm) to a suspected homonym Lepidoptera GBIF:3257692 and IRMNG:12526516. removed dubious mappings from "small sharks" to multiple NCBI/OTT terms with name "Small"

Please note that nomer and elton rely on web accessible apis like taxonomy resolution services and data portals. This dependence on external web-only accessible services might make reproduction of the results tricky due to network outages, server failures, upgrades, downgrades, data loss and/or abandonment of informatics projects/ datasets.

org.globalbioticinteractions.taxon v0.3.6, 2018-09-10

  1. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.1286023

org.globalbioticinteractions.taxon v0.3.7, 2018-10-18

  1. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.12860232. remove dubious mapping to Vertebrata (WORMS:370321 , http://www.marinespecies.org/aphia.php?p=taxdetails&id=370321). Also see https://github.com/globalbioticinteractions/globalbioticinteractions/issues/361 .3. remove dubious mapping to NCBITaxon:1585532 (Beta vulgaris/Cercospora beticola mixed EST library). Also see https://github.com/globalbioticinteractions/globalbioticinteractions/issues/346 and https://github.com/Planteome/samara/issues/50

org.globalbioticinteractions.taxon v0.3.8, 2018-11-15

  1. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.1286023

org.globalbioticinteractions.taxon v0.3.9, 2018-11-23

  1. label deprecated EOL ids by applying patches in http://doi.org/10.5281/zenodo.1495266 to taxonMap.tsv.gz and taxonCache.tsv.gz . Related to https://github.com/globalbioticinteractions/globalbioticinteractions/issues/383 .2. remove all Encyclopedia of Life thumbnail urls from taxonCache. Related to https://github.com/globalbioticinteractions/globalbioticinteractions/issues/381 .3. remove Encyclopedia of Life external urls associated with deprecated ids from taxonCache.

org.globalbioticinteractions.taxon v0.3.10, 2018-11-26

  1. Remove suspicious name mappings related to Humpback scorpionfish (Scorpaenopsis gibbosa) by applying patch published in Poelen, Jorrit H. (2018). Global Biotic Interactions: Taxon Graph Patches (Version 0.2. [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1560662

org.globalbioticinteractions.taxon v0.3.11, 2018-12-21

  1. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.12860232. remove suspicious name mappings using: zcat taxonMap.tsv.gz | grep -v -i -P "\tnone\t" | grep -v -P "(GBIF|IRMNG):.*\tBrachyura$" | grep -v -P "Gamarus" | grep -v -P "^EOL:1047365\ttrachurus trachurus" | grep -v -P "Loros\t.*Psittacidae" | grep -v -P "(GBIF|IRMNG).*Lucifer$" | grep -v -P "GBIF.*Diadema$" | gzip > taxonMapUpdated.tsv.gz

org.globalbioticinteractions.taxon v0.3.12, 2019-06-05

  1. update taxonCache and taxonMap using automated scripts available at https://doi.org/10.5281/zenodo.3240558

org.globalbioticinteractions.taxon v0.3.13, 2019-06-12

  1. update taxonCache and taxonMap using automated scripts available at
Search
Clear search
Close search
Google apps
Main menu