66 datasets found

T
wordnet
tensorflow.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
wordnet [Dataset]. https://www.tensorflow.org/datasets/catalog/wordnet
Explore at:
Description
WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('wordnet', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
d
WordNet
dknet.org
neuinfo.org
Updated Apr 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). WordNet [Dataset]. http://identifiers.org/RRID:SCR_022182
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_022182 https://identifiers.org/RRID:SCR_022182/resolver?q=&i=rrid
Dataset updated
Apr 27, 2022
Description
Lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. Resulting network of meaningfully related words and concepts can be navigated with browser.
network_edges
kaggle.com
Updated May 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vassily Morozov (2021). network_edges [Dataset]. https://www.kaggle.com/datasets/vassyesboy/network-edges
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 15, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vassily Morozov
Description
This dataset was developed by using the WordNet dataset (https://www.kaggle.com/datasets/duketemon/wordnet-synonyms) and pairing words with possible synonyms.

The format of the dataset allows one to create a network of synonyms. Each row represents an edge of the network. This has been used for exploratory analysis of the English language such as finding the most commonly linked words and identifying communities of words.

WordNet License This license is available as the file LICENSE in any downloaded version of WordNet.

WordNet 3.0 license: (Download)

WordNet Release 3.0 This software and database is being provided to you, the LICENSEE, by Princeton University under the following license. By obtaining, using and/or copying this software and database, you agree that you have read, understood, and will comply with these terms and conditions.: Permission to use, copy, modify and distribute this software and database and its documentation for any purpose and without fee or royalty is hereby granted, provided that you agree to comply with the following copyright notice and statements, including the disclaimer, and that the same appear on ALL copies of the software, database and documentation, including modifications that you make for internal use or for distribution. WordNet 3.0 Copyright 2006 by Princeton University. All rights reserved. THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT- ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS. The name of Princeton University or Princeton may not be used in advertising or publicity pertaining to distribution of the software and/or database. Title to copyright in this software, database and any associated documentation shall at all times remain with Princeton University and LICENSEE agrees to preserve same.
e
Thesaurus of Modern Slovene 2.0 - Dataset - B2FIND
b2find.eudat.eu
Updated Feb 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Thesaurus of Modern Slovene 2.0 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/737eb78e-e319-5ef3-9018-cf84236c6472
Explore at:
Dataset updated
Feb 6, 2024
Description
Thesaurus of Modern Slovene is the largest automatically generated open-access collection of Slovene synonyms. It is sourced from the data in two principal language resources: The Oxford®-DZS Comprehensive English-Slovenian Dictionary and the Gigafida 1.0 corpus of written Slovene. The links identified between synonyms were additionally confirmed using the Dictionary of Standard Slovenian Language (SSKJ). The data extraction and structure for the Thesaurus were based on the frequency and manner in which words co-occur in translation strings of the Oxford-DZS Dictionary. This information is the basis for discriminating between ‘core’ and ‘near’ synonyms, with ‘core’ synonyms exhibiting a greater connection to the keyword. In the following step, an approach combining balanced co-occurrence graphs and the Personal PageRank algorithm automatically divides the synonyms into subgroups and ranks them according to the degree of semantic relatedness to the keyword, as well as their frequency in language use. For the creation methodology, see Krek et al. (2017) in the provided references. The database includes dictionary entries: single- and multiword headwords, their part-of-speech and other linguistic features, as well as automatically extracted synonyms, their type (core or near) and relevancy rank. In version 2.0, 4,544 manually revised antonyms were added to the database. Additionally, for a part of the database, synonyms were distributed under the corresponding word senses. Pertaining to how much lexicographic revision was involved in their preparation, database entries can have one of the following three statuses: (a) ssss-automatic (96,064 entries): no manual revision was conducted; (b) ssss-manual (3,421 entries): word senses and semantic indicators were prepared by lexicographers, and synonyms were manually distributed under each corresponding sense; (c) ssss-hybrid (1,352 entries): manually revised senses are combined with data compiled automatically. For novelties of v2.0, see Arhar Holdt et al. (2023) in the provided references.
Detecting Synonymous Relationships by Shared Data-driven Definitions
figshare.com
txt
Updated Dec 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan-Christoph Kalo (2019). Detecting Synonymous Relationships by Shared Data-driven Definitions [Dataset]. http://doi.org/10.6084/m9.figshare.11343785.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11343785.v1
Dataset updated
Dec 9, 2019
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jan-Christoph Kalo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets that can be used together with the Code in: https://github.com/JanKalo/RuleAlign
f
Synonymous Relationships in DBpedia
figshare.com
zip
Updated Jul 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan-Christoph Kalo; Wolf-Tilo Balke (2019). Synonymous Relationships in DBpedia [Dataset]. http://doi.org/10.6084/m9.figshare.8188394.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8188394.v1
Dataset updated
Jul 3, 2019
Dataset provided by
figshare
Authors
Jan-Christoph Kalo; Wolf-Tilo Balke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A sample of manually evaluted synonymous predicates in DBpedia-2016-10.
Data from: Knowledge Graph Consolidation by Unifying Synonymous...
figshare.com
bz2
Updated Sep 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan-Christoph Kalo (2019). Knowledge Graph Consolidation by Unifying Synonymous Relationships [Dataset]. http://doi.org/10.6084/m9.figshare.8490134.v2
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8490134.v2
Dataset updated
Sep 9, 2019
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jan-Christoph Kalo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets for reproducing the results of the paper "Knowledge Graph Consolidation by Unifying Synonymous Relationships" published at ISWC 2019.
E
Ooh Na Na Synonyms
live.european-language-grid.eu
zenodo.org
tsv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Ooh Na Na Synonyms [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/18322
Explore at:
tsvAvailable download formats
Dataset updated
Dec 10, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a gzipped three-column TSV file that has prefixes, identifiers, and synonyms for lots of biomedical entities drawn from the terminologies and ontologies in PyOBO. It was generated with the following code in the shell: pip install pyobo pyobo database synonyms Any silly name suggestions that include literary references are welcome.
E
MultiWordNet database (included semantic fields) (MultiWordNet)
catalogue.elra.info
live.european-language-grid.eu
Updated Mar 10, 2010
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2010). MultiWordNet database (included semantic fields) (MultiWordNet) [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-M0026_01/
Explore at:
Dataset updated
Mar 10, 2010
Dataset provided by
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
ELRA (European Language Resources Association)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_EVALUATION.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_EVALUATION.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
MultiWordNet is a multilingual lexical database including information about English and Italian words. It is an extension of WordNet 1.6, a lexical database for English developed at the Princeton University. MultiWordNet contains information about the following aspects of the English and Italian lexical:- lexical relations between words- semantic relations between lexical concepts- correspondences between Italian and English lexical concepts- semantic fieldsThe basic lexical relationship in MultiWordNet is synonymy. Groups of synonyms are used to identify lexical concepts, which are also called synsets. Synsets are the most important unit in MultiWordNet. A lot of interesting information is attached to them, such as semantic fields and semantic relationships.MultiWordNet can be used for a variety of NLP tasks including:- Information Retrieval: synonymy relations are used for query expansion to improve the recall of IR; cross language correspondences between Italian and English synsets are used for Cross Language Information Retrieval. - Semantic tagging: MultiWordNet constitutes a large coverage sense inventory which is the basis for semantic tagging, i.e. texts are tagged with synset identifiers.- Disambiguation: Semantic relationships are used to measure the semantic distance between words, which can be used to disambiguate the meaning of words in texts. Also semantic fields have proved to be very useful for the disambiguation task.- Ontologies: MultiWordNet can be seen as an ontology to be used for a variety of knowledge-based NLP tasks.- Terminologies: MultiWordNet constitutes a robust framework supporting the development of specific structured terminologies.The release 1.1 of MultiWordNet is currently available. It includes information about 51,000 Italian word meanings and 28,000 synsets (incorrespondence with the English equivalents). It also includes a labelling of most WordNet 1.6 synsets with semantic field labels.Work on MultiWordNet is going on. The next release will contain at least 10,000 new word meanings.Data are contained in a specialized database server, which can be accessed by clients through a socket connection. The database server has been implemented in Lisp under the Unix and Windows environments. An application program interface and graphical browsing interface are provided with the database. A Java implementation of the database is planned for the next release.For more information, visit: http://multiwordnet.itc.it
d
Sino-Tibetan Etymological Dictionary and Thesaurus Database Software
datadryad.org
data.niaid.nih.gov
zip
Updated Jan 13, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Bruhn; John Lowe; David Mortensen; Dominic Yu (2015). Sino-Tibetan Etymological Dictionary and Thesaurus Database Software [Dataset]. http://doi.org/10.6078/D1159Q
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6078/D1159Q
Dataset updated
Jan 13, 2015
Dataset provided by
Dryad
Authors
Daniel Bruhn; John Lowe; David Mortensen; Dominic Yu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 13, 2015
Description
several software suites which support access to the STEDT database. These are written in PERL and PHP, and present different capabilities and dimensions of this linguistic data. This object is a compressed archive of the svn code repository for the project as of January 5, 2015. The active repository is now on GitHub at https://github.com/stedt-project/sss.
f
The Importance of Species Name Synonyms in Literature Searches
plos.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerald F. Guala (2023). The Importance of Species Name Synonyms in Literature Searches [Dataset]. http://doi.org/10.1371/journal.pone.0162648
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0162648
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Gerald F. Guala
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The synonyms of biological species names are shown to be an important component in comprehensive searches of electronic scientific literature databases but they are not well leveraged within the major literature databases examined. For accepted or valid species names in the Integrated Taxonomic Information System (ITIS) which have synonyms in the system, and which are found in citations within PLoS, PMC, PubMed or Scopus, both the percentage of species for which citations will not be found if synonyms are not used, and the percentage increase in number of citations found by including synonyms are very often substantial. However, there is no correlation between the number of synonyms per species and the magnitude of the effect. Further, the number of citations found does not generally increase proportionally to the number of synonyms available. Users looking for literature on specific species across all of the resources investigated here are often missing large numbers of citations if they are not manually augmenting their searches with synonyms. Of course, missing citations can have serious consequences by effectively hiding critical information. Literature searches should include synonym relationships and a new web service in ITIS, with examples of how to apply it to this issue, was developed as a result of this study, and is here announced, to aide in this.
Z
RivFISH - An European database on fish species presence across river basins
data.niaid.nih.gov
zenodo.org
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ferreira, Maria Teresa (2025). RivFISH - An European database on fish species presence across river basins [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13848976
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
Ferreira, Maria Teresa
Duarte, Gonçalo
Santos, José Maria
Mameri, Daniel
Figueira, Rui
Branco, Paulo
Segurado, Pedro
Cabo, João
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The RivFISH database aggregates the available data on freshwater-dependent fish presence in Europe, validated at the river basin level and considering taxonomical synonyms for species names, thus allowing for a maximization of data usage and robustness. This database also promotes interoperability with other datasets, including the IUCN Red List of Threatened Species, FishBase and the Catchment Characterisation and Modelling (CCM2) – River and Catchment Database v2.1. It is, as far as the authors know, the most up-to-date and comprehensive database on the presence of freshwater-dependent fish species for European river basins. The structure of the database is also prepared to deal with future alterations in species taxonomy, as well as new records of species occurrence in river basins.
d
Hazelnut SSR database: genetic profiles of the accessions, list of synonyms,...
search.dataone.org
datadryad.org
+1more
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paolo Boccacci; Maria Aramini; Matthew Ordidge; Theo van Hintum; Daniela Torello Marinoni; Nadia Valentini; Jean-Paul Sarraquigne; Anita Solar; MercÃ¨ Rovira; Loretta Bacchetta; Roberto Botta (2025). Hazelnut SSR database: genetic profiles of the accessions, list of synonyms, and true-to-type genotypes [Dataset]. http://doi.org/10.5061/dryad.cz8w9gj45
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.cz8w9gj45
Dataset updated
May 2, 2025
Dataset provided by
Dryad Digital Repository
Authors
Paolo Boccacci; Maria Aramini; Matthew Ordidge; Theo van Hintum; Daniela Torello Marinoni; Nadia Valentini; Jean-Paul Sarraquigne; Anita Solar; MercÃ¨ Rovira; Loretta Bacchetta; Roberto Botta
Time period covered
Jan 1, 2021
Description
Hazelnut (Corylus avellana L.) is one of the most important tree nut crops in Europe. Germplasm accessions are conserved in ex situ repositories, located in countries where hazelnut production occurs. In this work, we used ten simple sequence repeat (SSR) markers as the basis to establish a core collection representative of the hazelnut genetic diversity conserved in different European collections. A total of 480 accessions were used: 430 from ex situ collections and 50 landraces maintained on-farm. SSR analysis identified 181 genotypes, that represented our whole hazelnut germplasm collection (WHGC). Four approaches (utilizing MSTRAT, Power Core, and Core Hunterâ€™s single- and multi-strategy) based on the maximization (M) strategy were used to determine the best sampling method. Core Hunterâ€™s multi-strategy, optimizing both allele coverage (Cv) and Cavalli-Sforza and Edwards (Dce) distance with equal weight, outperformed the others and was selected as the best approach. The final core c...
d
ChemIDplus
catalog.data.gov
datadiscovery.nlm.nih.gov
+4more
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). ChemIDplus [Dataset]. https://catalog.data.gov/dataset/chemidplus-subset-33c85
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description
ChemIDplus, the Chemical Identification Plus Database, is no longer updated. These are the final files from February 22, 2023. All ChemIDplus data have been incorporated into PubChem. ChemIDplus was a dictionary of over 400,000 chemicals (names, synonyms, and structures). ChemIDplus includes links to NLM and other databases and resources, including links to federal, state and international agencies. NLM makes a subset of ChemIDplus data available for download. The ChemIDplus Subset does not include the structure or the toxicity data available from the NLM web versions of the database. The ChemIDplus Subset is updated monthly.
E
Czech Lexico-Semantic Database 0.1
live.european-language-grid.eu
binary format
Updated Dec 31, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Czech Lexico-Semantic Database 0.1 [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/18277
Explore at:
binary formatAvailable download formats
Dataset updated
Dec 31, 2021
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
A lexicographical project, whose aim is to digitize and align two Czech onomasiological dictionaries (Haller 1969–77; Klégr 2007) in order to create an integrated digital multi-purpose lexico-semantic database of Czech.
f
Percent Increase in Number of Citations Found Due to Synonyms.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerald F. Guala (2023). Percent Increase in Number of Citations Found Due to Synonyms. [Dataset]. http://doi.org/10.1371/journal.pone.0162648.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0162648.t002
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Gerald F. Guala
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Percent Increase in Number of Citations Found Due to Synonyms.
e
MilkOligoThesaurus: A milk oligosaccharide thesaurus (HoloOLIGO project) -...
b2find.eudat.eu
Updated Apr 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). MilkOligoThesaurus: A milk oligosaccharide thesaurus (HoloOLIGO project) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/5386a062-4689-5bd2-a49e-73fcbbae61bd
Explore at:
Dataset updated
Apr 27, 2024
Description
This dataset is a thesaurus (MilkOligoThesaurus) gathering milk oligosaccharide names. This dataset is a table of names of unique milk oligosaccharide (rows) and descriptors (columns) including the name of the molecule, abbreviation, chemical database ID if available, chemical information (monoisotopic mass, osidic composition, molecular formula), synonyms, isomer groups, and scientific articles sources. The archive also includes two RDF serializations (in RDF/XML and TTL) of the dataset based on the W3C SKOS standard. The intermediate tabular file that allowed to produce these serializations with SkosPlay!, a free online convertion tool is included. A datapaper has been published in Data In Brief to detail how data were collected, described and transformed (https://doi.org/10.1016/j.dib.2024.110404) . English
E
EuroWordNet English Addition to English WordNet
catalogue.elra.info
live.european-language-grid.eu
Updated Jun 26, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2017). EuroWordNet English Addition to English WordNet [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-M0015/
Explore at:
Dataset updated
Jun 26, 2017
Dataset provided by
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
ELRA (European Language Resources Association)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_EVALUATION.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_EVALUATION.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
A. Available Wordnets Following the announcement of the EuroWordNet databases in the last issue of the ELRA Newsletter (Vol.4 N.2), we are happy to announce that the list of EuroWordNet languages has grown. The following wordnets are now available via ELRA:ELRA ref. Language Synsets Word Meanings Language Internal Relations Equi-valence Relations ELRA-M0015 English Addition to English WordNet 16361 40588 42140 0 ELRA-M0016 Dutch 44015 70201 111639 53448 ELRA-M0017 Spanish 23370 50526 55163 21236 ELRA-M0018 Italian 48529 48499 117068 71789 ELRA-M0019 German 15132 20453 34818 16347 ELRA-M0020 French 22745 32809 49494 22730 ELRA-M0021 Czech 12824 19949 26259 12824 ELRA-M0022 Estonian 9317 13839 16318 9004 B. LR(1) Common Components (All Foreground - Data of layer 1) A. The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created. An ILI-record contains: A.1 synset: set of synonymous words or phrases (mostly from WordNet1.5) A.2 part-of-speech, A.3 one or more Top-Concept classifications (Optional) A.4 one or more Domain labels (Optional) A.5 a gloss in English (mostly from WordNet1.5) A.6 a unique ID linking the synset to its source (mostly WordNet1.5) B. Top-Ontology: an ontology of 63 basic semantic classes based on fundamental distinctions. By means of the Top-Ontology all the wordnets can be accessed using a single language-independent classification-scheme. Top-Concepts are only assigned to ILI-records. C. Domain-ontology: an ontology of subject-domains optionally assigned to ILI-records. D. A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets. These Base-Concepts form the core of all the wordnets. All the Base-Concepts are classified in terms of the Top-Concepts that apply to them. E. WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format. C. LR(2) Language-Specific Components (Data of layer 2- partly Foreground and partly Background) Wordnets produced in the first project (LE2-4003): F. Dutch wordnet G. English wordnet (additional relations which are missing in WordNet1.5) H. Italian wordnet I. Spanish wordnet After extension of the project (LE4-8328): J. German wordnet K. French wordnet L. Czech wordnet M. Estonian wordnet The specific wordnets are language-internal structures, minimally containing:o set of variants or synonyms making up the synset o part-of-speech o language-internal relations to other synsets o equivalence relations with ILI-records o a unique-id linking the synset to its source Each wordnet will be distributed with LR1 and will include documentation on LR1 and the distributed wordnet. All the data will be distributed as text-files in the EuroWordNet import format and as Polaris database files (see below LR3). The EuroWordNet viewer (Periscope, see below LR3) can be used to access the database version. Polaris has to be licensed to modify and...
h
tla-demotic-v18-premium
huggingface.co
Updated Feb 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thesaurus Linguae Aegyptiae (2024). tla-demotic-v18-premium [Dataset]. https://huggingface.co/datasets/thesaurus-linguae-aegyptiae/tla-demotic-v18-premium
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 20, 2024
Authors
Thesaurus Linguae Aegyptiae
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for Dataset tla-demotic-v18-premium

This data set contains demotic sentences in transliteration, with lemmatization, with POS glossing and with a German translation. The data comes from the database of the Thesaurus Linguae Agegyptiae, corpus version 18, and contains only fully intact, unambiguously readable sentences (13,383 of 31,156 sentences), adjusted for philological and editorial markup.

Dataset Details Dataset Description… See the full description on the dataset page: https://huggingface.co/datasets/thesaurus-linguae-aegyptiae/tla-demotic-v18-premium.
W
USGS Biocomplexity Thesaurus
cloud.csiss.gmu.edu
html
Updated Mar 21, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GEOSS CSR (2019). USGS Biocomplexity Thesaurus [Dataset]. http://cloud.csiss.gmu.edu/dataset/7af09f8d-f376-4bc0-ad45-a5f94627e22d
Explore at:
htmlAvailable download formats
Dataset updated
Mar 21, 2019
Dataset provided by
GEOSS CSR
Description
Development of the USGS Biocomplexity Thesaurus began in 2002-2003 through a partnership between the former USGS NBII Program and CSA, a worldwide information company with more than 30 years experience as a leading bibliographic database provider. The original Biocomplexity Thesaurus, first made available online in 2003,?? was a merger of five individual thesauri: - the CSA Aquatic Sciences and Fisheries Thesaurus - the CSA Life Sciences Thesaurus - the CSA Pollution Thesaurus - the CSA Sociological Thesaurus - the CERES/NBII Thesaurus Additional thesuarui, including fire related terminologies, are in the process of being added. The CSA-NBII Biocomplexity thesaurus is being used globally by USGS Partners and other organizations in support of the classification and retrieval of biological data and information.

Facebook

Twitter

Click to copy link

Link copied

Cite

wordnet [Dataset]. https://www.tensorflow.org/datasets/catalog/wordnet

wordnet

Explore at:

Description

WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('wordnet', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Clear search

Close search

Google apps

Main menu

wordnet

WordNet

network_edges

Thesaurus of Modern Slovene 2.0 - Dataset - B2FIND

Detecting Synonymous Relationships by Shared Data-driven Definitions

Synonymous Relationships in DBpedia

Data from: Knowledge Graph Consolidation by Unifying Synonymous...

Ooh Na Na Synonyms

MultiWordNet database (included semantic fields) (MultiWordNet)

Sino-Tibetan Etymological Dictionary and Thesaurus Database Software

The Importance of Species Name Synonyms in Literature Searches

RivFISH - An European database on fish species presence across river basins

Hazelnut SSR database: genetic profiles of the accessions, list of synonyms,...

ChemIDplus

Czech Lexico-Semantic Database 0.1

Percent Increase in Number of Citations Found Due to Synonyms.

MilkOligoThesaurus: A milk oligosaccharide thesaurus (HoloOLIGO project) -...

EuroWordNet English Addition to English WordNet

tla-demotic-v18-premium

USGS Biocomplexity Thesaurus

wordnet