100+ datasets found

E
Data from: Slovenian datasets for contextual synonym and antonym detection
live.european-language-grid.eu
binary format
Updated Oct 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Slovenian datasets for contextual synonym and antonym detection [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/20526
Explore at:
binary formatAvailable download formats
Dataset updated
Oct 25, 2022
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Area covered
Slovenia
Description
Slovenian datasets for contextual synonym and antonym detection can be used for training machine learning classifiers as described in the MSc thesis of Jasmina Pegan "Semantic detection of synonyms and antonyms with contextual embeddings" (https://repozitorij.uni-lj.si/IzpisGradiva.php?id=141456). Datasets contain example pairs of synonyms and antonyms in contexts together with additional information on a sense pair. Candidates for synonyms and antonyms were retrieved from the dataset created in the BSc thesis of Jasmina Pegan "Antonym detection with word embeddings" (https://repozitorij.uni-lj.si/IzpisGradiva.php?id=110533). Example sentences were retrieved from The comprehensive Slovenian-Hungarian dictionary (VSMS) (https://www.clarin.si/repository/xmlui/handle/11356/1453). Each dataset is class balanced and contains an equal amount of examples and counterexamples. An example is a pair of example sentences where the two words are synonyms/antonyms. A counterexample is a pair of example sentences where two words are not synonyms/antonyms. Note that a word pair can be synonymous or antonymous in some sense of the two words (but not in the given context).

Datasets are divided into two categories, datasets for synonyms and datasets for antonyms. Each category is further divided into base and updated datasets. These contain three dataset files: train, validation and test dataset. Base datasets include only manually-reviewed sense pairs. These are generated from all pairs of VSMS sense examples for all confirmed pairs of antonym and synonym senses. Updated datasets include automatically generated sense pairs while constraining the maximal number of examples per word. In this way, the dataset is more balanced word-wise, but is not fully manually-reviewed and contains less accurate data.

A single dataset entry contains the information on the base word, followed by data on synonym/antonym candidate. The last column discerns whether the sense pair is a pair of synonyms/antonyms or not. More details on this can be found inside the included README file.
Data from: Synonym, new species and checklist of the genus Fissocantharis...
gbif.org
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yu-Xia Yang; Yûichi Okushima; Xing-Ke Yang; Yu-Xia Yang; Yûichi Okushima; Xing-Ke Yang (2024). Synonym, new species and checklist of the genus Fissocantharis Pic from Taiwan (Coleoptera, Cantharidae) [Dataset]. http://doi.org/10.5281/zenodo.213031
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.213031
Dataset updated
Nov 28, 2024
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Plazi
Authors
Yu-Xia Yang; Yûichi Okushima; Xing-Ke Yang; Yu-Xia Yang; Yûichi Okushima; Xing-Ke Yang
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Taiwan
Description
This dataset contains the digitized treatments in Plazi based on the original journal article Yang, Yu-Xia, Okushima, Yûichi, Yang, Xing-Ke (2012): Synonym, new species and checklist of the genus Fissocantharis Pic from Taiwan (Coleoptera, Cantharidae). Zootaxa 3262 (1): 46-53, DOI: 10.11646/zootaxa.3262.1.4, URL: https://biotaxa.org/Zootaxa/article/view/zootaxa.3262.1.4
s
Noun Compound Synonym Substitution in Books – NCSSB datasets
orda.shef.ac.uk
txt
Updated Feb 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Pickard; Aline Villavicencio; Agne Knietaite; Adam Allsebrook; Anton Minkov; Adam Tomaszewski; Norbert Slinko; Richard Johnson (2024). Noun Compound Synonym Substitution in Books – NCSSB datasets [Dataset]. http://doi.org/10.15131/shef.data.25259722.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.25259722.v1
Dataset updated
Feb 26, 2024
Dataset provided by
The University of Sheffield
Authors
Thomas Pickard; Aline Villavicencio; Agne Knietaite; Adam Allsebrook; Anton Minkov; Adam Tomaszewski; Norbert Slinko; Richard Johnson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Noun Compound Synonym Substitution in Books (NCSSB) datasets contain in-context instances of potentially idiomatic English noun compounds, obtained by substituting idioms for synonyms occurring in public domain books forming part of the Project Gutenberg corpus.
Data from: Three new synonyms of the genus Kamimuria (Plecoptera, Perlidae)
gbif.org
Updated May 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liang-Liang Zeng; Liang-Liang Zeng (2025). Three new synonyms of the genus Kamimuria (Plecoptera, Perlidae) [Dataset]. http://doi.org/10.3897/bdj.13.e153697
Explore at:
Unique identifier
https://doi.org/10.3897/bdj.13.e153697
Dataset updated
May 31, 2025
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Biodiversity Data Journal
Authors
Liang-Liang Zeng; Liang-Liang Zeng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Currently, 11 species of Kamimuria have been reported in Guizhou Province, China. However, the original illustrations of Kamimuria magnimacula Du, 2005 and K. extremispina Du, 2006, lack the necessary detail to accurately assess the spine patterns on the endophallus, which is a key diagnostic feature. To resolve this issue, a re-examination of the type materials, complemented by high-resolution colour photographs, is crucial to ensure precise identification and reliable documentation of these species.Based on a detailed examination of the type materials of Kamimuria magnimacula Du, 2005 and K. extremispina Du, 2006, we propose that K. hunanensis Li & Li, 2022 be considered a synonym of K. magnimacula, K. circumspina Li, Mo & Yang, 2019 and K. dabieshana Yan, Kong & Li, 2021 be regarded as synonyms of K. extremispina. Additionally, we have provided holotype photographs of K. magnimacula and K. extremispina, along with a distribution map for both species in this paper.
n
Data from: Two new synonyms for Chaerophyllum bulbosum based on...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated May 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ÖZLEM ÇETİN; Mustafa Çelik (2022). Two new synonyms for Chaerophyllum bulbosum based on morphological, anatomical and molecular data [Dataset]. http://doi.org/10.5061/dryad.hqbzkh1jf
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.hqbzkh1jf
Dataset updated
May 2, 2022
Dataset provided by
Selçuk University
Authors
ÖZLEM ÇETİN; Mustafa Çelik
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The aim of the present study was to determine support for reducing Chaerophyllum karsianum and C. posofianum, both local endemic to northeast Anatolia, to the synonym C. bulbosum. Chaerophyllum karsianum is closely related to C. bulbosum but distinguished from it by its pink petals, ciliate bracteoles, entire leaf segments, and 12–16 rays according to the protologue and Flora of Turkey and the East Aegean Islands. Chaerophyllum posofianum is also closely related to C. bulbosum but is distinguished from it by its entire leaf segments, ciliate bracteoles, and purple anthers. Flower color of C. bulbosum ranges from white to purple within the same populations or even within the same individuals. The bracteole margin ranges from entire to ciliate in C. bulbosum. Our field observations and examination of herbarium specimens showed that morphological characteristics overlap in all of the examined samples. We also investigated and compared the anatomical and micromorphological characteristics of C. bulbosum, C. karsianum, and C. posofianum fruit. The nucleotide sequence data reported in the present study showed that the internal transcribed spacersequences of C. karsianum and C. posofianum were identical to that of C. bulbosum. Our results strongly support that C. karsianum and C. posofianum be conspecific with C. bulbosum.
Bad Synonyms: bad synonyms
zenodo.org
bin
Updated Aug 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
script; script (2024). Bad Synonyms: bad synonyms [Dataset]. http://doi.org/10.5281/zenodo.13239699
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13239699
Dataset updated
Aug 6, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
script; script
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Dec 9, 2019
Description
taxonIDs of synonyms that should be removed from DH 1.1
Unfiltered Depositor-Provided Chemical Synonyms for Substance Records in...
zenodo.org
application/gzip
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sunghwan Kim; Sunghwan Kim; Bo Yu; Bo Yu; Qingliang Li; Qingliang Li; Evan E. Bolton; Evan E. Bolton (2025). Unfiltered Depositor-Provided Chemical Synonyms for Substance Records in PubChem [Dataset]. http://doi.org/10.5281/zenodo.11194943
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11194943
Dataset updated
May 28, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sunghwan Kim; Sunghwan Kim; Bo Yu; Bo Yu; Qingliang Li; Qingliang Li; Evan E. Bolton; Evan E. Bolton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This gzipped text file contains a list of all (live) substance records in PubChem with their "unfiltered" depositor-provided chemical synonyms, downloaded from PubChem in June 2017. Each line has a Substance ID (SID) and its chemical synonym, separated by a tab. The SID-synonym pairs in this file were used in the paper “PubChem Synonym Filtering Process Using Crowdsourcing” by Sunghwan Kim et al., published in the Journal of Cheminformatics (https://doi.org/10.1186/s13321-024-00868-3). The up-to-date version of this file can be downloaded from the PubChem FTP Site (https://ftp.ncbi.nlm.nih.gov/pubchem/Substance/Extras/).
d
Data from: Lappula duplicicarpa var. brevispinula C.J.Wang (Boraginaceae) is...
datadryad.org
zip
Updated Mar 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danhui Liu (2025). Lappula duplicicarpa var. brevispinula C.J.Wang (Boraginaceae) is a synonym of L. macrantha based on morphological and molecular data [Dataset]. http://doi.org/10.5061/dryad.j9kd51cnz
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.j9kd51cnz
Dataset updated
Mar 1, 2025
Dataset provided by
Dryad
Authors
Danhui Liu
Description
Molecular and morphological data of Lappula duplicicarpa var. brevispinula

https://doi.org/10.5061/dryad.j9kd51cnz

Description of the data and file structure

Molecular and morphological data of Lappula duplicicarpa var. brevispinula.

Total genomic DNA was extracted from silica-gel dried leaves of L. duplicicarpa var. duplicicarpa, L. duplicicarpa var. brevispinula, and L. macrantha using a modified CTAB method (Li et al. 2013). High-quality genomic DNA was sent to Novogene Biotechnology in Tianjin for library construction and sequencing. Libraries were sequenced on the NovaSeq 6000 platform, generating paired-end reads of 2×150 bp, with approximately 10 Gb of raw data per sample. Plastome assembly was performed using GetOrganelle version 1.7.5 (Jin et al. 2020) with default parameters, and annotation was carried out using the online tool CpGAVAS2 (Shi et al. 2019), with Lappula lasiocarpa (Accession: NC_077516) as the refer...
n
Data from: Taxonomic notes on the genus Piper (Piperaceae)
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Apr 27, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chalermpol Suwanphakdee; David A. Simpson; Trevor R. Hodkinson; Pranom Chantaranothai (2016). Taxonomic notes on the genus Piper (Piperaceae) [Dataset]. http://doi.org/10.5061/dryad.qp50f
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.qp50f
Dataset updated
Apr 27, 2016
Dataset provided by
Royal Botanic Gardens, Kew
Trinity College
Kasetsart University
Khon Kaen University
Authors
Chalermpol Suwanphakdee; David A. Simpson; Trevor R. Hodkinson; Pranom Chantaranothai
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Asia
Description
Sixteen lectotypifications of Asian Piper species are provided. Piper argyrites, P. baccatum, P. leptostachyum, P. majusculum, P. peepuloides, P. quinqueangulatum and P. sulcatum are accepted as species and many new synonyms are proposed. Useful diagnostic characters are described and geographical distribution data of each species are provided.
E
Data from: Thesaurus of Modern Slovene 2.0
live.european-language-grid.eu
binary format
Updated Nov 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Thesaurus of Modern Slovene 2.0 [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/23182
Explore at:
binary formatAvailable download formats
Dataset updated
Nov 14, 2023
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Thesaurus of Modern Slovene is the largest automatically generated open-access collection of Slovene synonyms. It is sourced from the data in two principal language resources: The Oxford®-DZS Comprehensive English-Slovenian Dictionary and the Gigafida 1.0 corpus of written Slovene. The links identified between synonyms were additionally confirmed using the Dictionary of Standard Slovenian Language (SSKJ). The data extraction and structure for the Thesaurus were based on the frequency and manner in which words co-occur in translation strings of the Oxford-DZS Dictionary. This information is the basis for discriminating between ‘core’ and ‘near’ synonyms, with ‘core’ synonyms exhibiting a greater connection to the keyword. In the following step, an approach combining balanced co-occurrence graphs and the Personal PageRank algorithm automatically divides the synonyms into subgroups and ranks them according to the degree of semantic relatedness to the keyword, as well as their frequency in language use. For the creation methodology, see Krek et al. (2017) in the provided references.

The database includes dictionary entries: single- and multiword headwords, their part-of-speech and other linguistic features, as well as automatically extracted synonyms, their type (core or near) and relevancy rank. In version 2.0, 4,544 manually revised antonyms were added to the database. Additionally, for a part of the database, synonyms were distributed under the corresponding word senses. Pertaining to how much lexicographic revision was involved in their preparation, database entries can have one of the following three statuses: (a) ssss-automatic (96,064 entries): no manual revision was conducted; (b) ssss-manual (3,421 entries): word senses and semantic indicators were prepared by lexicographers, and synonyms were manually distributed under each corresponding sense; (c) ssss-hybrid (1,352 entries): manually revised senses are combined with data compiled automatically. For novelties of v2.0, see Arhar Holdt et al. (2023) in the provided references.
Data from: Knowledge Graph Consolidation by Unifying Synonymous...
figshare.com
bz2
Updated Sep 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan-Christoph Kalo (2019). Knowledge Graph Consolidation by Unifying Synonymous Relationships [Dataset]. http://doi.org/10.6084/m9.figshare.8490134.v2
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8490134.v2
Dataset updated
Sep 9, 2019
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Jan-Christoph Kalo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets for reproducing the results of the paper "Knowledge Graph Consolidation by Unifying Synonymous Relationships" published at ISWC 2019.
Detecting Synonymous Relationships by Shared Data-driven Definitions
figshare.com
txt
Updated Dec 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan-Christoph Kalo (2019). Detecting Synonymous Relationships by Shared Data-driven Definitions [Dataset]. http://doi.org/10.6084/m9.figshare.11343785.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11343785.v1
Dataset updated
Dec 9, 2019
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jan-Christoph Kalo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets that can be used together with the Code in: https://github.com/JanKalo/RuleAlign
f
Hard Synonyms MySQL dump 8.1
figshare.com
txt
Updated Jan 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ana Uban (2016). Hard Synonyms MySQL dump 8.1 [Dataset]. http://doi.org/10.6084/m9.figshare.1584665.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1584665.v1
Dataset updated
Jan 20, 2016
Dataset provided by
figshare
Authors
Ana Uban
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database dump containing words from 4 languages: English, Romanian, French and Spanish, and their translations.
Data from: A new combination and a new synonym of Gesneriaceae in China
gbif.org
Updated Nov 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zheng-Long Li; Zhang-Jie Huang; Da-Wei Chen; Xin Hong; Fang Wen; Zheng-Long Li; Zhang-Jie Huang; Da-Wei Chen; Xin Hong; Fang Wen (2024). A new combination and a new synonym of Gesneriaceae in China [Dataset]. http://doi.org/10.15468/3rtu4z
Explore at:
Unique identifier
https://doi.org/10.15468/3rtu4z
Dataset updated
Nov 29, 2024
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Plazi
Authors
Zheng-Long Li; Zhang-Jie Huang; Da-Wei Chen; Xin Hong; Fang Wen; Zheng-Long Li; Zhang-Jie Huang; Da-Wei Chen; Xin Hong; Fang Wen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
China
Description
This dataset contains the digitized treatments in Plazi based on the original journal article Li, Zheng-Long, Huang, Zhang-Jie, Chen, Da-Wei, Hong, Xin, Wen, Fang (2023): A new combination and a new synonym of Gesneriaceae in China. PhytoKeys 232: 99-107, DOI: http://dx.doi.org/10.3897/phytokeys.232.108644, URL: http://dx.doi.org/10.3897/phytokeys.232.108644
t
Synonym Finance Price Metrics
tokenterminal.com
csv, json
Updated Apr 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Token Terminal (2025). Synonym Finance Price Metrics [Dataset]. https://tokenterminal.com/explorer/projects/synonym-finance
Explore at:
csv, jsonAvailable download formats
Dataset updated
Apr 26, 2025
Dataset authored and provided by
Token Terminal
License
https://tokenterminal.com/termshttps://tokenterminal.com/terms
Time period covered
2020 - Present
Variables measured
Price
Description
Detailed Price metrics and analytics for Synonym Finance, including historical data and trends.
d
Replication Data for: Words That Stick Predicting Decision Making and...
search.dataone.org
data.mendeley.com
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dvir, Nimrod (2023). Replication Data for: Words That Stick Predicting Decision Making and Synonym Engagement Using Cognitive Biases and Computational Linguistics [Dataset]. http://doi.org/10.7910/DVN/J5LTYE
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/J5LTYE
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Dvir, Nimrod
Description
This research utilizes cognitive neuroscience and information systems research to predict user engagement and decision-making in digital platforms. By applying Natural Language Processing (NLP) techniques and cognitive bias theories, we investigate user interactions with synonyms in digital content. Our approach incorporates four cognitive biases - representativeness, ease-of-use (processing fluency), affect-biased attention, and distribution/availability (R.E.A.D) - into a comprehensive model. The model's predictive capacity was evaluated using a large user survey, revealing that synonyms representative of core concepts, easy to process, emotionally resonant, and readily available, fostered increased user engagement. Importantly, our research provides a novel perspective on human-computer interaction, digital habits, and decision-making processes. Findings underscore the potential of cognitive biases as powerful predictors of user engagement, emphasizing their role in effective digital content design across education, marketing, and beyond.
t
Synonym Finance Code commits Metrics
tokenterminal.com
csv, json
Updated Feb 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Token Terminal (2025). Synonym Finance Code commits Metrics [Dataset]. https://tokenterminal.com/explorer/projects/synonym-finance
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Token Terminal
License
https://tokenterminal.com/termshttps://tokenterminal.com/terms
Time period covered
2020 - Present
Variables measured
Code commits
Description
Detailed Code commits metrics and analytics for Synonym Finance, including historical data and trends.
n
Data from: Three new synonyms of Rungia stolonifera (Acanthaceae) from China...
data.niaid.nih.gov
datadryad.org
zip
Updated Jan 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zheli Lin; Van Hai Do; Yunfei Deng (2020). Three new synonyms of Rungia stolonifera (Acanthaceae) from China and Vietnam [Dataset]. http://doi.org/10.5061/dryad.pc866t1k0
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.pc866t1k0
Dataset updated
Jan 30, 2020
Dataset provided by
South China Botanical Garden
Instituto de Ciencia y Tecnología de Alimentos y Nutrición
Authors
Zheli Lin; Van Hai Do; Yunfei Deng
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Vietnam, China
Description
Examination of relevant type materials and living plants reveals that Rungia axilliflora, R. densiflora and R. evrardii are conspecific with R. stolonifera. Lectotypes are designated for the names R. evrardii and R. stolonifera.
o
The Dataset of Camellia Cultivar Names in the World
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Nov 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yanan Wang; Huifu Zhuang; Yunguang Sheng; Yuhua Wang; Zhonglang Wang (2020). The Dataset of Camellia Cultivar Names in the World [Dataset]. http://doi.org/10.5281/zenodo.4289784
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4289784
Dataset updated
Nov 25, 2020
Authors
Yanan Wang; Huifu Zhuang; Yunguang Sheng; Yuhua Wang; Zhonglang Wang
Area covered
World
Description
The Camellia Cultivar Names were widely collected from books and journals and new registrations throughout the world every year, then reviewed by experts in the online working platform, the Database of International Camellia Register. After treating some important issues existed in camellia names, especially those plenty of re-used names and diacritical marks etc. especially in Japanese cultivars, a dataset of Camellia names was summarized from the year of 1253 to 2019 throughout the world. The data was contained in an excel table file (.xlsx format) including two sheets, entitled ’Cultivars’ and ’Synonyms’. The ’Cultivars’ sheet mainly recorded the name and description of each cultivar, while their corresponding Synonyms could be gained from the ’Synonyms’ sheet. Fields and its descriptions are given below: Data fields in Sheet ‘Cultivars’: CultivarId: A unique number for each cultivar. CultivarEpithet: The Cultivar Epithet for each cultivar. ScientificName: The Scientific Name for each cultivar. ChineseName: The Chinese Name for each cultivar. JapaneseName: The Japanese Name for each cultivar. Hiragana: The phonetic sounds in Japanese for each cultivar. SpeciesOrCombination: Cultivar’s origin or cross parentage. Meaning: The explanation of name. CultivarType: The type of economic value, For Ornamental, For Tea, Or For Oil. DescriptionEn: The English Description for each cultivar. DescriptionCn: The Chinese Description for each cultivar. DescriptionJp: The Japanese Description for each cultivar. YearPublished: The year of first publication. Country: The country name to release the cultivar. DefaultPhoto: The Type Image. DefaultPhotoChosenBy: A specialist name who determine Type image. DefaultPhotoChosenDate: A date when to choose type image by a specialist. IsExtinct: Whether it was extinct or not. Data fields in Sheet ‘Synonyms’: SynonymId: A unique number for each cultivar Synonym. Synonym: The Synonym for each cultivar used. Reference: The Reference recorded the synonym. CultivarEpithet: The corresponding Cultivar Epithet for the synonym.
Z
Datasets for Out-of-KB Mention Discovery with Entity Linking
data.niaid.nih.gov
zenodo.org
Updated Aug 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dong, Hang (2023). Datasets for Out-of-KB Mention Discovery with Entity Linking [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8228370
Explore at:
Dataset updated
Aug 10, 2023
Dataset provided by
Horrocks, Ian
Yinan, Liu
He, Yuan
Chen, Jiaoyan
Dong, Hang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The repository contains datasets for out-of-KB mention discovery from texts, documented in the work, Reveal the Unknown: Out-of-Knowledge-Base Mention Discovery with Entity Linking, on arXiv: https://arxiv.org/abs/2302.07189 (CIKM 2023).

Each data setting (as a sub-folder) contains train, valid, and test files and also 100 random sample files for each data split for debugging.

Data folder names with “syn_full” at the end are synonym augmented data (each synonym as an entity) for the setting.

Ontology .jsonl files have two versions for each, "syn_attr" setting treats synonyms are attributes, "syn_full" setting treats synonyms as entities.

Data scripts are available at https://github.com/KRR-Oxford/BLINKout#data-scripts

Acknowledgement of the data sources below:

ShARe/CLEF 2013 dataset is from https://physionet.org/content/shareclefehealth2013/1.0/

MedMention dataset is from https://github.com/chanzuckerberg/MedMentions

UMLS (versions 2012AB, 2014AB, 2017AA) is from https://www.nlm.nih.gov/research/umls/index.html

SNOMED CT (corresponding versions) is from https://www.nlm.nih.gov/healthit/snomedct/index.html

NILK dataset is from https://zenodo.org/record/6607514

WikiData 2017 dump is from https://archive.org/download/enwiki-20170220/enwiki-20170220-pages-articles.xml.bz2

Facebook

Twitter

Click to copy link

Link copied

Cite

(2022). Slovenian datasets for contextual synonym and antonym detection [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/20526

Data from: Slovenian datasets for contextual synonym and antonym detection

Explore at:

binary formatAvailable download formats

Dataset updated

Oct 25, 2022

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Area covered

Slovenia

Description

Slovenian datasets for contextual synonym and antonym detection can be used for training machine learning classifiers as described in the MSc thesis of Jasmina Pegan "Semantic detection of synonyms and antonyms with contextual embeddings" (https://repozitorij.uni-lj.si/IzpisGradiva.php?id=141456). Datasets contain example pairs of synonyms and antonyms in contexts together with additional information on a sense pair. Candidates for synonyms and antonyms were retrieved from the dataset created in the BSc thesis of Jasmina Pegan "Antonym detection with word embeddings" (https://repozitorij.uni-lj.si/IzpisGradiva.php?id=110533). Example sentences were retrieved from The comprehensive Slovenian-Hungarian dictionary (VSMS) (https://www.clarin.si/repository/xmlui/handle/11356/1453). Each dataset is class balanced and contains an equal amount of examples and counterexamples. An example is a pair of example sentences where the two words are synonyms/antonyms. A counterexample is a pair of example sentences where two words are not synonyms/antonyms. Note that a word pair can be synonymous or antonymous in some sense of the two words (but not in the given context).

Datasets are divided into two categories, datasets for synonyms and datasets for antonyms. Each category is further divided into base and updated datasets. These contain three dataset files: train, validation and test dataset. Base datasets include only manually-reviewed sense pairs. These are generated from all pairs of VSMS sense examples for all confirmed pairs of antonym and synonym senses. Updated datasets include automatically generated sense pairs while constraining the maximal number of examples per word. In this way, the dataset is more balanced word-wise, but is not fully manually-reviewed and contains less accurate data.

A single dataset entry contains the information on the base word, followed by data on synonym/antonym candidate. The last column discerns whether the sense pair is a pair of synonyms/antonyms or not. More details on this can be found inside the included README file.

Clear search

Close search

Google apps

Main menu

Data from: Slovenian datasets for contextual synonym and antonym detection

Data from: Synonym, new species and checklist of the genus Fissocantharis...

Noun Compound Synonym Substitution in Books – NCSSB datasets

Data from: Three new synonyms of the genus Kamimuria (Plecoptera, Perlidae)

Data from: Two new synonyms for Chaerophyllum bulbosum based on...

Bad Synonyms: bad synonyms

Unfiltered Depositor-Provided Chemical Synonyms for Substance Records in...

Data from: Lappula duplicicarpa var. brevispinula C.J.Wang (Boraginaceae) is...

Molecular and morphological data of Lappula duplicicarpa var. brevispinula

Description of the data and file structure

Data from: Taxonomic notes on the genus Piper (Piperaceae)

Data from: Thesaurus of Modern Slovene 2.0

Data from: Knowledge Graph Consolidation by Unifying Synonymous...

Detecting Synonymous Relationships by Shared Data-driven Definitions

Hard Synonyms MySQL dump 8.1

Data from: A new combination and a new synonym of Gesneriaceae in China

Synonym Finance Price Metrics

Replication Data for: Words That Stick Predicting Decision Making and...

Synonym Finance Code commits Metrics

Data from: Three new synonyms of Rungia stolonifera (Acanthaceae) from China...

The Dataset of Camellia Cultivar Names in the World

Datasets for Out-of-KB Mention Discovery with Entity Linking

Data from: Slovenian datasets for contextual synonym and antonym detection