Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Slovenian datasets for contextual synonym and antonym detection can be used for training machine learning classifiers as described in the MSc thesis of Jasmina Pegan "Semantic detection of synonyms and antonyms with contextual embeddings" (https://repozitorij.uni-lj.si/IzpisGradiva.php?id=141456). Datasets contain example pairs of synonyms and antonyms in contexts together with additional information on a sense pair. Candidates for synonyms and antonyms were retrieved from the dataset created in the BSc thesis of Jasmina Pegan "Antonym detection with word embeddings" (https://repozitorij.uni-lj.si/IzpisGradiva.php?id=110533). Example sentences were retrieved from The comprehensive Slovenian-Hungarian dictionary (VSMS) (https://www.clarin.si/repository/xmlui/handle/11356/1453). Each dataset is class balanced and contains an equal amount of examples and counterexamples. An example is a pair of example sentences where the two words are synonyms/antonyms. A counterexample is a pair of example sentences where two words are not synonyms/antonyms. Note that a word pair can be synonymous or antonymous in some sense of the two words (but not in the given context).
Datasets are divided into two categories, datasets for synonyms and datasets for antonyms. Each category is further divided into base and updated datasets. These contain three dataset files: train, validation and test dataset. Base datasets include only manually-reviewed sense pairs. These are generated from all pairs of VSMS sense examples for all confirmed pairs of antonym and synonym senses. Updated datasets include automatically generated sense pairs while constraining the maximal number of examples per word. In this way, the dataset is more balanced word-wise, but is not fully manually-reviewed and contains less accurate data.
A single dataset entry contains the information on the base word, followed by data on synonym/antonym candidate. The last column discerns whether the sense pair is a pair of synonyms/antonyms or not. More details on this can be found inside the included README file.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the digitized treatments in Plazi based on the original journal article Yang, Yu-Xia, Okushima, Yûichi, Yang, Xing-Ke (2012): Synonym, new species and checklist of the genus Fissocantharis Pic from Taiwan (Coleoptera, Cantharidae). Zootaxa 3262 (1): 46-53, DOI: 10.11646/zootaxa.3262.1.4, URL: https://biotaxa.org/Zootaxa/article/view/zootaxa.3262.1.4
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Noun Compound Synonym Substitution in Books (NCSSB) datasets contain in-context instances of potentially idiomatic English noun compounds, obtained by substituting idioms for synonyms occurring in public domain books forming part of the Project Gutenberg corpus.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Currently, 11 species of Kamimuria have been reported in Guizhou Province, China. However, the original illustrations of Kamimuria magnimacula Du, 2005 and K. extremispina Du, 2006, lack the necessary detail to accurately assess the spine patterns on the endophallus, which is a key diagnostic feature. To resolve this issue, a re-examination of the type materials, complemented by high-resolution colour photographs, is crucial to ensure precise identification and reliable documentation of these species.Based on a detailed examination of the type materials of Kamimuria magnimacula Du, 2005 and K. extremispina Du, 2006, we propose that K. hunanensis Li & Li, 2022 be considered a synonym of K. magnimacula, K. circumspina Li, Mo & Yang, 2019 and K. dabieshana Yan, Kong & Li, 2021 be regarded as synonyms of K. extremispina. Additionally, we have provided holotype photographs of K. magnimacula and K. extremispina, along with a distribution map for both species in this paper.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The aim of the present study was to determine support for reducing Chaerophyllum karsianum and C. posofianum, both local endemic to northeast Anatolia, to the synonym C. bulbosum. Chaerophyllum karsianum is closely related to C. bulbosum but distinguished from it by its pink petals, ciliate bracteoles, entire leaf segments, and 12–16 rays according to the protologue and Flora of Turkey and the East Aegean Islands. Chaerophyllum posofianum is also closely related to C. bulbosum but is distinguished from it by its entire leaf segments, ciliate bracteoles, and purple anthers. Flower color of C. bulbosum ranges from white to purple within the same populations or even within the same individuals. The bracteole margin ranges from entire to ciliate in C. bulbosum. Our field observations and examination of herbarium specimens showed that morphological characteristics overlap in all of the examined samples. We also investigated and compared the anatomical and micromorphological characteristics of C. bulbosum, C. karsianum, and C. posofianum fruit. The nucleotide sequence data reported in the present study showed that the internal transcribed spacersequences of C. karsianum and C. posofianum were identical to that of C. bulbosum. Our results strongly support that C. karsianum and C. posofianum be conspecific with C. bulbosum.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
taxonIDs of synonyms that should be removed from DH 1.1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This gzipped text file contains a list of all (live) substance records in PubChem with their "unfiltered" depositor-provided chemical synonyms, downloaded from PubChem in June 2017. Each line has a Substance ID (SID) and its chemical synonym, separated by a tab. The SID-synonym pairs in this file were used in the paper “PubChem Synonym Filtering Process Using Crowdsourcing” by Sunghwan Kim et al., published in the Journal of Cheminformatics (https://doi.org/10.1186/s13321-024-00868-3). The up-to-date version of this file can be downloaded from the PubChem FTP Site (https://ftp.ncbi.nlm.nih.gov/pubchem/Substance/Extras/).
https://doi.org/10.5061/dryad.j9kd51cnz
Molecular and morphological data of Lappula duplicicarpa var. brevispinula.
Total genomic DNA was extracted from silica-gel dried leaves of L. duplicicarpa var. duplicicarpa, L. duplicicarpa var. brevispinula, and L. macrantha using a modified CTAB method (Li et al. 2013). High-quality genomic DNA was sent to Novogene Biotechnology in Tianjin for library construction and sequencing. Libraries were sequenced on the NovaSeq 6000 platform, generating paired-end reads of 2×150 bp, with approximately 10 Gb of raw data per sample. Plastome assembly was performed using GetOrganelle version 1.7.5 (Jin et al. 2020) with default parameters, and annotation was carried out using the online tool CpGAVAS2 (Shi et al. 2019), with Lappula lasiocarpa (Accession: NC_077516) as the refer...
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Sixteen lectotypifications of Asian Piper species are provided. Piper argyrites, P. baccatum, P. leptostachyum, P. majusculum, P. peepuloides, P. quinqueangulatum and P. sulcatum are accepted as species and many new synonyms are proposed. Useful diagnostic characters are described and geographical distribution data of each species are provided.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Thesaurus of Modern Slovene is the largest automatically generated open-access collection of Slovene synonyms. It is sourced from the data in two principal language resources: The Oxford®-DZS Comprehensive English-Slovenian Dictionary and the Gigafida 1.0 corpus of written Slovene. The links identified between synonyms were additionally confirmed using the Dictionary of Standard Slovenian Language (SSKJ). The data extraction and structure for the Thesaurus were based on the frequency and manner in which words co-occur in translation strings of the Oxford-DZS Dictionary. This information is the basis for discriminating between ‘core’ and ‘near’ synonyms, with ‘core’ synonyms exhibiting a greater connection to the keyword. In the following step, an approach combining balanced co-occurrence graphs and the Personal PageRank algorithm automatically divides the synonyms into subgroups and ranks them according to the degree of semantic relatedness to the keyword, as well as their frequency in language use. For the creation methodology, see Krek et al. (2017) in the provided references.
The database includes dictionary entries: single- and multiword headwords, their part-of-speech and other linguistic features, as well as automatically extracted synonyms, their type (core or near) and relevancy rank. In version 2.0, 4,544 manually revised antonyms were added to the database. Additionally, for a part of the database, synonyms were distributed under the corresponding word senses. Pertaining to how much lexicographic revision was involved in their preparation, database entries can have one of the following three statuses: (a) ssss-automatic (96,064 entries): no manual revision was conducted; (b) ssss-manual (3,421 entries): word senses and semantic indicators were prepared by lexicographers, and synonyms were manually distributed under each corresponding sense; (c) ssss-hybrid (1,352 entries): manually revised senses are combined with data compiled automatically. For novelties of v2.0, see Arhar Holdt et al. (2023) in the provided references.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets for reproducing the results of the paper "Knowledge Graph Consolidation by Unifying Synonymous Relationships" published at ISWC 2019.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets that can be used together with the Code in: https://github.com/JanKalo/RuleAlign
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Database dump containing words from 4 languages: English, Romanian, French and Spanish, and their translations.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the digitized treatments in Plazi based on the original journal article Li, Zheng-Long, Huang, Zhang-Jie, Chen, Da-Wei, Hong, Xin, Wen, Fang (2023): A new combination and a new synonym of Gesneriaceae in China. PhytoKeys 232: 99-107, DOI: http://dx.doi.org/10.3897/phytokeys.232.108644, URL: http://dx.doi.org/10.3897/phytokeys.232.108644
https://tokenterminal.com/termshttps://tokenterminal.com/terms
Detailed Price metrics and analytics for Synonym Finance, including historical data and trends.
This research utilizes cognitive neuroscience and information systems research to predict user engagement and decision-making in digital platforms. By applying Natural Language Processing (NLP) techniques and cognitive bias theories, we investigate user interactions with synonyms in digital content. Our approach incorporates four cognitive biases - representativeness, ease-of-use (processing fluency), affect-biased attention, and distribution/availability (R.E.A.D) - into a comprehensive model. The model's predictive capacity was evaluated using a large user survey, revealing that synonyms representative of core concepts, easy to process, emotionally resonant, and readily available, fostered increased user engagement. Importantly, our research provides a novel perspective on human-computer interaction, digital habits, and decision-making processes. Findings underscore the potential of cognitive biases as powerful predictors of user engagement, emphasizing their role in effective digital content design across education, marketing, and beyond.
https://tokenterminal.com/termshttps://tokenterminal.com/terms
Detailed Code commits metrics and analytics for Synonym Finance, including historical data and trends.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Examination of relevant type materials and living plants reveals that Rungia axilliflora, R. densiflora and R. evrardii are conspecific with R. stolonifera. Lectotypes are designated for the names R. evrardii and R. stolonifera.
The Camellia Cultivar Names were widely collected from books and journals and new registrations throughout the world every year, then reviewed by experts in the online working platform, the Database of International Camellia Register. After treating some important issues existed in camellia names, especially those plenty of re-used names and diacritical marks etc. especially in Japanese cultivars, a dataset of Camellia names was summarized from the year of 1253 to 2019 throughout the world. The data was contained in an excel table file (.xlsx format) including two sheets, entitled ’Cultivars’ and ’Synonyms’. The ’Cultivars’ sheet mainly recorded the name and description of each cultivar, while their corresponding Synonyms could be gained from the ’Synonyms’ sheet. Fields and its descriptions are given below: Data fields in Sheet ‘Cultivars’: CultivarId: A unique number for each cultivar. CultivarEpithet: The Cultivar Epithet for each cultivar. ScientificName: The Scientific Name for each cultivar. ChineseName: The Chinese Name for each cultivar. JapaneseName: The Japanese Name for each cultivar. Hiragana: The phonetic sounds in Japanese for each cultivar. SpeciesOrCombination: Cultivar’s origin or cross parentage. Meaning: The explanation of name. CultivarType: The type of economic value, For Ornamental, For Tea, Or For Oil. DescriptionEn: The English Description for each cultivar. DescriptionCn: The Chinese Description for each cultivar. DescriptionJp: The Japanese Description for each cultivar. YearPublished: The year of first publication. Country: The country name to release the cultivar. DefaultPhoto: The Type Image. DefaultPhotoChosenBy: A specialist name who determine Type image. DefaultPhotoChosenDate: A date when to choose type image by a specialist. IsExtinct: Whether it was extinct or not. Data fields in Sheet ‘Synonyms’: SynonymId: A unique number for each cultivar Synonym. Synonym: The Synonym for each cultivar used. Reference: The Reference recorded the synonym. CultivarEpithet: The corresponding Cultivar Epithet for the synonym.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The repository contains datasets for out-of-KB mention discovery from texts, documented in the work, Reveal the Unknown: Out-of-Knowledge-Base Mention Discovery with Entity Linking, on arXiv: https://arxiv.org/abs/2302.07189 (CIKM 2023).
Each data setting (as a sub-folder) contains train, valid, and test files and also 100 random sample files for each data split for debugging.
Data folder names with “syn_full” at the end are synonym augmented data (each synonym as an entity) for the setting.
Ontology .jsonl files have two versions for each, "syn_attr" setting treats synonyms are attributes, "syn_full" setting treats synonyms as entities.
Data scripts are available at https://github.com/KRR-Oxford/BLINKout#data-scripts
Acknowledgement of the data sources below:
ShARe/CLEF 2013 dataset is from https://physionet.org/content/shareclefehealth2013/1.0/
MedMention dataset is from https://github.com/chanzuckerberg/MedMentions
UMLS (versions 2012AB, 2014AB, 2017AA) is from https://www.nlm.nih.gov/research/umls/index.html
SNOMED CT (corresponding versions) is from https://www.nlm.nih.gov/healthit/snomedct/index.html
NILK dataset is from https://zenodo.org/record/6607514
WikiData 2017 dump is from https://archive.org/download/enwiki-20170220/enwiki-20170220-pages-articles.xml.bz2
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Slovenian datasets for contextual synonym and antonym detection can be used for training machine learning classifiers as described in the MSc thesis of Jasmina Pegan "Semantic detection of synonyms and antonyms with contextual embeddings" (https://repozitorij.uni-lj.si/IzpisGradiva.php?id=141456). Datasets contain example pairs of synonyms and antonyms in contexts together with additional information on a sense pair. Candidates for synonyms and antonyms were retrieved from the dataset created in the BSc thesis of Jasmina Pegan "Antonym detection with word embeddings" (https://repozitorij.uni-lj.si/IzpisGradiva.php?id=110533). Example sentences were retrieved from The comprehensive Slovenian-Hungarian dictionary (VSMS) (https://www.clarin.si/repository/xmlui/handle/11356/1453). Each dataset is class balanced and contains an equal amount of examples and counterexamples. An example is a pair of example sentences where the two words are synonyms/antonyms. A counterexample is a pair of example sentences where two words are not synonyms/antonyms. Note that a word pair can be synonymous or antonymous in some sense of the two words (but not in the given context).
Datasets are divided into two categories, datasets for synonyms and datasets for antonyms. Each category is further divided into base and updated datasets. These contain three dataset files: train, validation and test dataset. Base datasets include only manually-reviewed sense pairs. These are generated from all pairs of VSMS sense examples for all confirmed pairs of antonym and synonym senses. Updated datasets include automatically generated sense pairs while constraining the maximal number of examples per word. In this way, the dataset is more balanced word-wise, but is not fully manually-reviewed and contains less accurate data.
A single dataset entry contains the information on the base word, followed by data on synonym/antonym candidate. The last column discerns whether the sense pair is a pair of synonyms/antonyms or not. More details on this can be found inside the included README file.