89 datasets found

1000+ Data Science Concepts
kaggle.com
zip
Updated Mar 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
serdar altan (2024). 1000+ Data Science Concepts [Dataset]. https://www.kaggle.com/datasets/hserdaraltan/1000-data-science-concepts
Explore at:
zip(121402 bytes)Available download formats
Dataset updated
Mar 23, 2024
Authors
serdar altan
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset covers more than 1000 common data science concepts. It covers several topics related to statistics, machine learning, and artificial intelligence. It has two columns, one of which is questions or instructions, the other is responses to these instructions. The dataset can be used in Q&A and text generation.
n
National concept directory in National data catalogue
data.norge.no
json
Updated Oct 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Digitaliseringsdirektoratet (2025). National concept directory in National data catalogue [Dataset]. https://data.norge.no/en/datasets/8fbe9c6d-4962-3362-9952-62d9d7ce17bf/national-concept-directory-in-national-data-catalogue
Explore at:
jsonAvailable download formats
Dataset updated
Oct 9, 2025
Dataset provided by
Digitaliseringsdirektoratet
Description
The data set "National concept directory in National data catalogue" (Begrepskatalog i Felles datakatalog) contains all terms published in National concept directory in National data catalogue. Each term contains at least information about the recommended term, definition and source of definition. The terms may also include the following information if the owner of the concept has provided such information: additional information about the meaning of the term that does not belong in the definition field; permitted and advised term, example on use of the term, subject area the term belongs to, area of application, legal categories or value ranges of the term, the date the term is valid from, the date the term shall apply to and contact information by e-mail and telephone.

Objective: To make all concepts in the National concept directory in National data catalogue available for downloading
P
Multilingual Concept Dictionary
opendata.pku.edu.cn
Updated Jan 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peking University Open Research Data Platform (2018). Multilingual Concept Dictionary [Dataset]. http://doi.org/10.18170/DVN/JAU6RB
Explore at:
application/msaccess(2478080), pdf(122919)Available download formats
Unique identifier
https://doi.org/10.18170/DVN/JAU6RB
Dataset updated
Jan 11, 2018
Dataset provided by
Peking University Open Research Data Platform
Description
(1) The Chinese Concept Dictionary (CCD) implements Chinese corresponding to the English concepts in the WordNet 1.6 version. The total number of concepts is close to 100,000 (of which the total number of words far exceeds 100,000), including 66025 concepts of nouns, 12127 of verbs, 17915 of adjectives and 3575 of adverbs. The transfer of use rights to a number of research institutes and multinational corporations has promoted the progress of Chinese-English semantic analysis. (2) The Multilingual Concept Dictionary (MCD), based on CCD, Japanese WordNet, Korean WordNet and CoreNet, is built by automatic method and artificial expert checkup. Currently, under the multilingual conceptual dictionary, there are 8,400 Japanese concepts (mainly medium and high-level concepts in language) and 9,700 Korean concepts (also middle and high-level concepts), forming connection information with CCD concepts. Under the framework of WordNet, the basic concepts of East Asian languages (Chinese, Japanese and Korean) are generally described. (3) Please login to download the datafiles.
E
Data from: Terminological dictionary of artificial intelligence
live.european-language-grid.eu
binary format
Updated Nov 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Terminological dictionary of artificial intelligence [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/20833
Explore at:
binary formatAvailable download formats
Dataset updated
Nov 25, 2022
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The terminological dictionary was compiled within the framework of the project Development of Slovene in the Digital Environment. It is an example collection of 413 terms from the field of artificial intelligence, especially from the subfields of machine learning, computer vision, natural language processing, and fuzzy logic. Definitions, English equivalents, and possible synonyms are added to the terms. The dictionary is based on a conceptual approach, according to which terms are perceived as designations for concepts that are related to each other in the conceptual system of the subject field. Consequently, the terms are interrelated in the naming system of the subject field. The dictionary is distributed in XML using the TBX (TermBase eXchange) standard for representing and exchanging information from termbases.
Data from: TOWARDS A SYSTEMATIZED DIACHRONIC TERMINOLOGY: SOME BASIC...
scielo.figshare.com
tiff
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Beatriz Curti-Contessoto (2023). TOWARDS A SYSTEMATIZED DIACHRONIC TERMINOLOGY: SOME BASIC CONCEPTS IN FOCUS [Dataset]. http://doi.org/10.6084/m9.figshare.21835336.v1
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21835336.v1
Dataset updated
Jun 1, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Beatriz Curti-Contessoto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT Diachrony is not always an aspect discussed by terminological studies that can be based on different perspectives. Among them, Diachronic Terminology (DT), conceived as an approach that has the main characteristic of focusing on this aspect, has received more attention in recent decades by several works, whose common characteristic is the adoption of a terminologicaldiachronic approach. The diversity of these research is enormous, which and, according to Dury and Picton (2009), generates vague and imprecise theoretical and methodological contours in Terminology. To contribute to an organization of these contours in Brazilian Portuguese, since, in our country, studies (especially the theoretical ones) in this regard are still incipient, this paper presents an overview of international and national research by highlighting their main characteristics in terms of contribution and basic conceptions. Based on this panorama, the use of some terms in these studies that refer to the phenomena analyzed, and their theoretical and methodological implications are discussed. Thus, it is hoped that this work may arouse more interest in this approach and, going further, it can serve as an initial guide, as it discusses some paths that can be followed by investigations to be developed especially in Brazil.
o
Armenian-Russian-English Dictionary of Forest Terminology - Dataset - Data...
data.opendata.am
Updated Jul 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Armenian-Russian-English Dictionary of Forest Terminology - Dataset - Data Catalog Armenia [Dataset]. https://data.opendata.am/dataset/recc-95f59cfffe224b46a641db49186efe7e
Explore at:
Dataset updated
Jul 8, 2023
Area covered
Armenia
Description
The dictionary includes about 1650 terms and concepts (in Armenian, Russian and English) used in forest and landscaping sectors with a brief explanation in Armenian.Citation: J.H. Vardanyan, H.T. Sayadyan, Armenian-Russian-English Dictionary of Forest Terminology, Publishing House of the Institute of Botany of NAS RA, Yerevan, 2008.
RxNorm Attributes Data for Concepts and Atoms
johnsnowlabs.com
csv
Updated Jan 20, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). RxNorm Attributes Data for Concepts and Atoms [Dataset]. https://www.johnsnowlabs.com/marketplace/rxnorm-attributes-data-for-concepts-and-atoms/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Area covered
United States
Description
This dataset contains all of the attribute data. This includes RXNORM provided attributes, such as normalized 11-digit National Drug Codes (NDCs), UNII codes, and human or veterinary usage markers, and source-provided attributes, such as labeler, definition, and imprint information. Each attribute has an 'Attribute Name' (ATN) and 'Attribute Value' (ATV) combination. For example, NDCs have an ATN of 'NDC' and an ATV of the actual NDC value.
Sex Education Discourse: US UK Ngrams (1922-2022)
kaggle.com
zip
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shah Mohammad Rizvi (2025). Sex Education Discourse: US UK Ngrams (1922-2022) [Dataset]. https://www.kaggle.com/datasets/shahmohammadrizvi/sex-education-discourse-us-uk-ngrams-1922-2022
Explore at:
zip(759733 bytes)Available download formats
Dataset updated
Jul 2, 2025
Authors
Shah Mohammad Rizvi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States, United Kingdom
Description
A Century of Sex Education Discourse: Ngram Frequencies (US & UK, 1922-2022)

1. Dataset Overview

This dataset provides historical linguistic frequency data related to sex education discourse in British English and American English from 1922 to 2022. Frequencies were extracted from the Google Ngram Viewer (English-GB and English-US corpora, 2019 version) for terms systematically categorized into four distinct conceptual groups. This dataset aims to support research into the evolution of public discourse, pedagogical approaches, and cultural attitudes surrounding sex education over the past century.

2. Creator / Author(s)

Creator: Shah Mohammad Rizvi

Affiliation: N/A

Contact: smri29.ml@gmail.com

3. Date Information

Date Created: June 30, 2025

Last Updated: July 2, 2025

4. Data Source

The data in this dataset was extracted from the Google Ngram Viewer.

Corpora Version: 2019 (English-GB, English-US)

URL: https://books.google.com/ngrams

5. Data Collection Methodology

Ngram frequency data was programmatically extracted from the Google Ngram Viewer by accessing generated HTML pages, which contain embedded JSON data. A custom Python script was used to parse the HTML, extract the time-series frequency data for specific terms, and consolidate it into a structured CSV format. Ngram Viewer smoothing was uniformly set to 3 for all queries to mitigate year-to-year fluctuations.

6. Coverage

Time Period: 1922 - 2022 (inclusive)

Languages/Corpora:

British English (Google Ngram's English (GB) corpus)

American English (Google Ngram's English (US) corpus)

7. Term Groups and Specific Terms Queried

Terms were carefully selected and grouped to analyze different facets of sex education discourse. Each group's terms were queried individually or as grouped queries where indicated (e.g., using (All) quantifier in Ngram Viewer).

group01_Primary Discourse Terms (Foundational Concepts)

These terms represent the central, overarching, and foundational concepts that define or are core to the public conversation surrounding sex education.

sex education

reproductive health

sexual health

contraception

abstinence

consent

STD

STI

group02_Biological & Reproductive Terms

This group includes vocabulary related to human anatomy, physiological processes, and biological aspects often discussed in the context of sex education.

puberty

menstruation

vagina

penis

reproduction

sperm

ovulation

group03_Evolving Discourse Terms

These terms reflect contemporary understandings, progressive approaches, inclusivity, and specific modern public health concerns that have gained significant prominence in later decades of the discourse.

LGBTQ

gender identity

sexual orientation

body autonomy

safe sex

HIV prevention

AIDS education

group04_Historical Terms

This group contains vocabulary that was more prevalent in earlier periods, reflecting older approaches, euphemisms, or terms whose primary usage or connotations have significantly shifted over the past century.

venereal disease

chastity

morality

family planning

the pill

prophylactic

8. File Structure

The dataset is organized as follows:

sex_education_final_combined_dataset.csv: This file contains all Ngram frequency data for both British and American English, encompassing all terms from all four groups, consolidated into a single DataFrame.

Sex_ED_UK/: Directory containing individual CSV files for each term group relevant to the British English corpus.

group01_Primary Discourse Terms.csv

group02_Biological & Reproductive Terms.csv

group03_Evolving Discourse Terms.csv

group04_Historical Terms.csv

Sex_ED_USA/: Directory containing individual CSV files for each term group relevant to the American English corpus.

group01_Primary Discourse Terms.csv

group02_Biological & Reproductive Terms.csv

group03_Evolving Discourse Terms.csv

group04_Historical Terms.csv

README.md: This metadata file.

9. Column Descriptions

All CSV files (individual and combined) share the following columns:

Year: Integer - The year of publication of the texts from which the Ngram frequencies were calculated (ranging from 1922 to 2022).

Term: String - The specific Ngram term or phrase for which the frequency is provided.

Frequency: Float - The relative frequency of the Term in the Corpus for that Year. This is a proportion of the total number of Ngrams for that year.

Corpus: String - The Google Ngram corpus from which the data was extracted (British English or American English).

TermGroup: String - The conceptu...
E
Data from: TermFrame: Terms, definitions and semantic annotations for...
live.european-language-grid.eu
binary format
Updated Nov 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). TermFrame: Terms, definitions and semantic annotations for karstology [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/20243
Explore at:
binary formatAvailable download formats
Dataset updated
Nov 17, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The resource contains several datasets containing domain-specific data in three languages, English, Slovenian and Croatian, which can be used for various knowledge extraction or knowledge modelling tasks. The resource represents knowledge for the domain of karstology, a subfield of geography studying karst and related phenomena. It contains:

Definitions Plain text files contain definitions of karst concepts from relevant glossaries and encyclopaedia, but also definitions which had been extracted from domain-specific corpora.

Annotated definitions Definitions were manually annotated and curated in the WebAnno tool. Annotations include several layers including definition elements, semantic relations following the frame-based theory of terminology (FBT), relation definitors which can be used for learning relation patterns, and semantic categories defined in the domain model.

Terms, definitions and sources The TermFrame knowledge base contains terms and their corresponding concept identifiers, definitions and definition sources.
m
Annotated Terms of Service of 100 Online Platforms
data.mendeley.com
Updated Dec 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Przemyslaw Palka (2023). Annotated Terms of Service of 100 Online Platforms [Dataset]. http://doi.org/10.17632/dtbj87j937.3
Explore at:
Unique identifier
https://doi.org/10.17632/dtbj87j937.3
Dataset updated
Dec 12, 2023
Authors
Przemyslaw Palka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains information about the contents of 100 Terms of Service (ToS) of online platforms. The documents were analyzed and evaluated from the point of view of the European Union consumer law. The main results have been presented in the table titled "Terms of Service Analysis and Evaluation_RESULTS." This table is accompanied by the instruction followed by the annotators, titled "Variables Definitions," allowing for the interpretation of the assigned values. In addition, we provide the raw data (analyzed ToS, in the folder "Clear ToS") and the annotated documents (in the folder "Annotated ToS," further subdivided).

SAMPLE: The sample contains 100 contracts of digital platforms operating in sixteen market sectors: Cloud storage, Communication, Dating, Finance, Food, Gaming, Health, Music, Shopping, Social, Sports, Transportation, Travel, Video, Work, and Various. The selected companies' main headquarters span four legal surroundings: the US, the EU, Poland specifically, and Other jurisdictions. The chosen platforms are both privately held and publicly listed and offer both fee-based and free services. Although the sample cannot be treated as representative of all online platforms, it nevertheless accounts for the most popular consumer services in the analyzed sectors and contains a diverse and heterogeneous set.

CONTENT: Each ToS has been assigned the following information: 1. Metadata: 1.1. the name of the service; 1.2. the URL; 1.3. the effective date; 1.4. the language of ToS; 1.5. the sector; 1.6. the number of words in ToS; 1.7–1.8. the jurisdiction of the main headquarters; 1.9. if the company is public or private; 1.10. if the service is paid or free. 2. Evaluative Variables: remedy clauses (2.1– 2.5); dispute resolution clauses (2.6–2.10); unilateral alteration clauses (2.11–2.15); rights to police the behavior of users (2.16–2.17); regulatory requirements (2.18–2.20); and various (2.21–2.25). 3. Count Variables: the number of clauses seen as unclear (3.1) and the number of other documents referred to by the ToS (3.2). 4. Pull-out Text Variables: rights and obligations of the parties (4.1) and descriptions of the service (4.2)

ACKNOWLEDGEMENT: The research leading to these results has received funding from the Norwegian Financial Mechanism 2014-2021, project no. 2020/37/K/HS5/02769, titled “Private Law of Data: Concepts, Practices, Principles & Politics.”
c
Data from: SSHOC Multilingual Data Stewardship Terminology
dspace-clarin-it.ilc.cnr.it
Updated Dec 31, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesca Frontini; Federica Gamba; Monica Monachini; Daan Broeder (2021). SSHOC Multilingual Data Stewardship Terminology [Dataset]. https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-567
Explore at:
Dataset updated
Dec 31, 2021
Authors
Francesca Frontini; Federica Gamba; Monica Monachini; Daan Broeder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The SSHOC Multilingual Data Stewardship Terminology is a multilingual terminology that collects terms specific to the domain of Data Stewardship, as well as their definitions. A list of domain-specific terms was automatically extracted from a corpus pertaining to the domain of Data Stewardship and Curation, validated by domain experts, assigned a definition, and linked to other existing terminologies (Loterre Open Science Thesaurus, terms4FAIRskills, Linked Open Vocabularies, ISO terms and definitions). Each term-definition pair was then automatically translated into multiple languages (Dutch, French, German, Greek, Italian, Slovenian) by employing Deep-L. The Multilingual Data Stewardship Terminology thus consists of 210 concepts available in Dutch, French, German, Greek, Italian, Slovenian. This resource was created within the frame of the SSHOC (Social Sciences and Humanities Open Cloud) project (H2020-INFRAEOSC-2018-2-823782). It is the result of the work of Task 3.1.2 "extraction of terminology from technical documentation about standards and interoperability", as described in D3.9, carried out jointly by ILC-CNR and CLARIN ERIC.
Z
Public Understanding of Common Data Concepts
data.niaid.nih.gov
Updated Jun 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
O'Grady, Michael; Mangina, Eleni (2024). Public Understanding of Common Data Concepts [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11073209
Explore at:
Dataset updated
Jun 30, 2024
Dataset provided by
University College Dublin
Authors
O'Grady, Michael; Mangina, Eleni
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset reports on a survey on awareness amongst the European public of common data concepts, terms and principles.

Elements of this survey were utilised in the following publication -

O’Grady, M., Mangina, E. Citizen scientists—practices, observations, and experience. Humanit Soc Sci Commun 11, 469 (2024). https://doi.org/10.1057/s41599-024-02966-x
E
Data from: Slovenian Definition Extraction evaluation datasets RSDO-def 1.0
live.european-language-grid.eu
binary format
Updated May 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Slovenian Definition Extraction evaluation datasets RSDO-def 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/21588
Explore at:
binary formatAvailable download formats
Dataset updated
May 18, 2023
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The Slovene Definition Extraction evaluation datasets RSDO-def contains sentences extracted from the Corpus of term-annotated texts RSDO5 1.1 (http://hdl.handle.net/11356/1470), which contains texts with annotated terms from four different domains: biomechanics, linguistics, chemistry, and veterinary science. The file and sentence identifiers are the same as in the original RSDO corpus.

The labels added to the sentences included in the dataset denote: 0: Non-definition 1: Weak definition 2: Definition

The dataset consists of two parts: 1. RSDO-def-random employed a random sampling strategy, with 14 definitions, 98 weak-definitions and 849 non-definitions. 2. RSDO-def-larger added sentences to the random one by the pattern-based definition extraction as presented in Pollak et al. (2014). It contains 169 definitions, 214 weak-definitions and 872 non-definitions.

Both parts were manually annotated by five terminographers. In case of discrepancies between annotators, a consensus was reached and the final label was confirmed by all five annotators. Duplicates were removed in both parts.

The criteria for annotation are based on the standard ISO 1087-1:2000 (E/F) Terminology Work - Vocabulary, Part 1, Theory and Application, which explains a definition as follows: "Representation of a concept by a descriptive statement which serves to differentiate it from related concepts". Weak definition labels were assigned if the extracted sentences contained a term and at least one delimiting feature without a superordinate concept, or sentences consisting of superordinate concepts without delimiting features but with some typical examples. Instances were labeled as Non-definition if the sentence with the extracted concept did not contain any information about the concept or its delimiting features.

The dataset is described in more detail in Tran et al. 2023, where it was used for evaluating definition extraction approaches. If you use this resource, please cite:

Tran, T.H.H., Podpečan, V., Jemec Tomazin, M., Pollak, Senja (2023). Definition Extraction for Slovene: Patterns, Transformer Classifiers and ChatGPT. Proceedings of the ELEX 2023: Electronic lexicography in the 21st century. Invisible lexicography: everywhere lexical data is used without users realizing they make use of a “dictionary” (accepted)

Reference to the pattern-based definition extraction method used for creating RSDO-def-larger: Pollak, S. (2014). Extracting definition candidates from specialized corpora. Slovenščina 2.0: empirical, applied and interdisciplinary research, 2(1), pp. 1–40. https://doi.org/10.4312/slo2.0.2014.1.1-40

Related resources:

Jemec Tomazin, M. et al. (2021). Corpus of term-annotated texts RSDO5 1.1, Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, http://hdl.handle.net/11356/1470.

Podpečan et al. (2023). DF_NDF_wiki_slo: Definition extraction training sets from Wikipedia, Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1840.
ClinSpEn Data: Parallel English-Spanish COVID-19 Clinical Cases, Terminology...
zenodo.org
zip
Updated Mar 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salvador Lima; Salvador Lima; Darryl Johan; Martin Krallinger; Martin Krallinger; Darryl Johan (2023). ClinSpEn Data: Parallel English-Spanish COVID-19 Clinical Cases, Terminology and Ontology Concepts [Dataset]. http://doi.org/10.5281/zenodo.7014645
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7014645
Dataset updated
Mar 9, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Salvador Lima; Salvador Lima; Darryl Johan; Martin Krallinger; Martin Krallinger; Darryl Johan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ClinSpEn

This repository contains the sample, test and background data for the ClinSpEn track.

ClinSpEn is part of the Biomedical WMT 2022 shared task, having the aim to promote the development and evaluation of machine translation systems adapted to the medical domain with three highly relevant sub-tracks: clinical cases, medical controlled vocabularies/ontologies, and clinical terms and entities extracted from medical content.

Data Description

ClinSpEn proposes three different sub-tracks, each based on a different type of clinical data:

- Clinical Cases:

Parallel EN-EN COVID-19 clinical cases. The direction of this sub-track is EN>ES.

The dataset’s case reports were carefully selected to cover a wide range of aspects related to the disease: different types of patients (children, adults, elderly and pregnant people, babies), different comorbidities (cancer, mental health issues, immunosuppressed patients) and symptomatology (mild and severe presentations, dermatologic, immunologic and psychiatric manifestations, thrombosis, …). The reports were translated from English to Spanish by a professional medical translator on a first step and revised by a clinical expert on a second step.

The sample set files is made up of parallel txt files, with the Spanish version having a “.es” extension and the English files having a “.en” extension. Each report has been parallelized so that every sentence’s line number corresponds to the same sentence’s line number in both languages.

The test and background data is made up of a TSV file with three columns: document number, line number and English line. The clinical cases themselves include COVID-19 case reports as well as diverse content extracted from PubMed.

- Clinical Terminology:

Parallel EN-ES clinical terms extracted from medical literature and clinical records, with particular focus on diseases, symptoms, findings, procedures and professions and translated and revised by professional medical translators. The direction of this sub-track is ES>EN.

The sample set contains 7 000 terms as a tab-separated file (TSV), with the first column corresponding to English terms and the second column to Spanish terms.

The test and background data is made up of a TSV file with two columns: term number and Spanish term.

- Ontology Concepts:

Parallel EN-ES concepts extracted from various open biomedical ontologies and taxonomies and then manually translated by a professional medical translator. The direction of this sub-track is EN>ES.

The sample data includes 400 concepts. The terms are presented as tab-separated file (TSV), with the first column corresponding to English terms and the second column to Spanish terms. The third column includes the term’s origin ontology and its correspondent ID, while the fourth one includes a link to the concept in OBO Library.

The test and background data is made up of a TSV file with two columns: concept number and English concept.

Related Links:

- Sub-track website with more information: https://temu.bsc.es/clinspen/

- WMT website: https://www.statmt.org/wmt22/

- CodaLab: https://codalab.lisn.upsaclay.fr/competitions/6696 /
Data from: ANALYSIS OF CONCEPTUAL PATTERNS AND INTRATERM RELATIONSHIPS OF...
scielo.figshare.com
tiff
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucimara Alves da Conceição Costa (2023). ANALYSIS OF CONCEPTUAL PATTERNS AND INTRATERM RELATIONSHIPS OF TERMINOLOGICAL VARIANTS IN ECONOMICS [Dataset]. http://doi.org/10.6084/m9.figshare.21835266.v1
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21835266.v1
Dataset updated
May 30, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Lucimara Alves da Conceição Costa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT Denominative variation in terminology, that is, the use of different names to designate the same concept or nuances of the same conceptual reality, is often considered as a mere stylistic resource or a strategy of thematic progression. It can present, however, distinct conceptual patterns and distinct intra-term relations, which means that the units are not always semantically equivalent. Thus, much more than a thematic progression mechanism, variants act as a discursive and cognitive resource to highlight different conceptual nuances of terminological units. In this sense, in this article, based on the assumptions of modern trends in Terminology, in particular on the Communicative Theory of Terminology (CABRÉ, 1999, 2005) and on the classification of conceptual specification patterns by Kageura (2002), we aim to analyze conceptual patterns and intra-term relations present in terminological variants of Economics. Through this analysis, we intend to show which conceptual information is highlighted in these terminological units and how they can influence the understanding and construction of specialized knowledge.
n
Data from: Talking Glossary of Genetic Terms
neuinfo.org
dknet.org
+2more
Updated Oct 21, 2009
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2009). Talking Glossary of Genetic Terms [Dataset]. http://identifiers.org/RRID:SCR_003215
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_003215
Dataset updated
Oct 21, 2009
Description
Glossary of Genetic Terms to help everyone understand the terms and concepts used in genetic research. In addition to definitions, specialists in the field of genetics share their descriptions of terms, and many terms include images, animation and links to related terms.
I
Conceptual novelty scores for PubMed articles
databank.illinois.edu
aws-databank-alb.library.illinois.edu
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shubhanshu Mishra; Vetle I. Torvik (2024). Conceptual novelty scores for PubMed articles [Dataset]. http://doi.org/10.13012/B2IDB-5060298_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-5060298_V1
Dataset updated
Feb 1, 2024
Authors
Shubhanshu Mishra; Vetle I. Torvik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
U.S. National Institutes of Health (NIH)
U.S. National Science Foundation (NSF)
Description
Conceptual novelty analysis data based on PubMed Medical Subject Headings ---------------------------------------------------------------------- Created by Shubhanshu Mishra, and Vetle I. Torvik on April 16th, 2018 ## Introduction This is a dataset created as part of the publication titled: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : the magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra. It contains final data generated as part of our experiments based on MEDLINE 2015 baseline and MeSH tree from 2015. The dataset is distributed in the form of the following tab separated text files: * PubMed2015_NoveltyData.tsv - Novelty scores for each paper in PubMed. The file contains 22,349,417 rows and 6 columns, as follow: - PMID: PubMed ID - Year: year of publication - TimeNovelty: time novelty score of the paper based on individual concepts (see paper) - VolumeNovelty: volume novelty score of the paper based on individual concepts (see paper) - PairTimeNovelty: time novelty score of the paper based on pair of concepts (see paper) - PairVolumeNovelty: volume novelty score of the paper based on pair of concepts (see paper) * mesh_scores.tsv - Temporal profiles for each MeSH term for all years. The file contains 1,102,831 rows and 5 columns, as follow: - MeshTerm: Name of the MeSH term - Year: year - AbsVal: Total publications with that MeSH term in the given year - TimeNovelty: age (in years since first publication) of MeSH term in the given year - VolumeNovelty: : age (in number of papers since first publication) of MeSH term in the given year * meshpair_scores.txt.gz (36 GB uncompressed) - Temporal profiles for each MeSH term for all years - Mesh1: Name of the first MeSH term (alphabetically sorted) - Mesh2: Name of the second MeSH term (alphabetically sorted) - Year: year - AbsVal: Total publications with that MeSH pair in the given year - TimeNovelty: age (in years since first publication) of MeSH pair in the given year - VolumeNovelty: : age (in number of papers since first publication) of MeSH pair in the given year * README.txt file ## Dataset creation This dataset was constructed using multiple datasets described in the following locations: * MEDLINE 2015 baseline: https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html * MeSH tree 2015: ftp://nlmpubs.nlm.nih.gov/online/mesh/2015/meshtrees/ * Source code provided at: https://github.com/napsternxg/Novelty Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check here for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions: Additional data related updates can be found at: Torvik Research Group ## Acknowledgments This work was made possible in part with funding to VIT from NIH grant P01AG039347 and NSF grant 1348742 . The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ## License Conceptual novelty analysis data based on PubMed Medical Subject Headings by Shubhanshu Mishra, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at https://github.com/napsternxg/Novelty
Data from: Leveraging Terminology Services for FAIR Semantic Data...
meta4ds.fokus.fraunhofer.de
pdf, unknown
Updated Sep 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2023). Leveraging Terminology Services for FAIR Semantic Data Integration across NFDI Domains - How to Integrate Terminology Services Into Other Service Applications [Dataset]. https://meta4ds.fokus.fraunhofer.de/datasets/oai-zenodo-org-8342678?locale=en
Explore at:
pdf(4984427), unknownAvailable download formats
Dataset updated
Sep 14, 2023
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The National Research Data Infrastructure (NFDI) strives to develop FAIR research data and data services for major scientific disciplines, using terminologies as a key factor for semantic annotations and semantic interoperability of data. Several NFDI consortia provide domain-specific terminologies through Terminology services or registries, offering access, search capabilities, visualization, and downloads. Prioritizing user-friendly access, terminology services seamlessly integrate semantic concepts into applications, often operating in the background to enable smooth semantic annotation and data interoperability. We present exemplary fields of application from selected disciplines and how terminology services support semantic search, user experience, annotation workflows, terminology curation and design. This presentation is connected to the following conference paper https://doi.org/10.52825/cordi.v1i.356

Medical Insurance Glossary dataset 💉💉

kaggle.com

zip

Updated Oct 17, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Shiv_D24Coder (2023). Medical Insurance Glossary dataset 💉💉 [Dataset]. https://www.kaggle.com/datasets/shivd24coder/medical-insurance-glossary-dataset

Explore at:

zip(73527 bytes)Available download formats

Dataset updated

Oct 17, 2023

Authors

Shiv_D24Coder

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Key Features

Column Name	Description
tags	Tags associated with the glossary entry.
categories	Categories of glossary entries.
topics	Topics related to the glossary entry.
title	The title of the glossary entry.
es-title	The Spanish translation of the title.
url	The URL or link to the glossary entry.
bite	A brief description or explanation of the term in English.
es-bite	The Spanish translation of the term's description.
audience	The intended audience for the glossary entry.
segment	The specific segment this entry relates to.
insurance-status	Information related to insurance status.
state	The state to which the entry pertains.
condition	Any specific conditions associated with the entry.

How to use this dataset

1. Understand Medical Insurance Terminology: Use the glossary to understand and explain common medical insurance terms and concepts.

2. Language Translation: If you're working in a bilingual setting or need translations of medical insurance terms, the Spanish translations provided in this dataset can be invaluable.

3. Educational Resources: Create educational resources, articles, or content related to medical insurance by using the glossary entries.

4. Data Enrichment: Enhance your medical insurance-related datasets or applications with standardized terminology using this glossary.

5. Reference for Medical Professionals: This glossary can serve as a reference for healthcare professionals, insurance agents, and researchers in the field.

A
Unified Medical Language System Terminology Services (UTS) API
data.amerigeoss.org
data.wu.ac.at
api
Updated Jul 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2019). Unified Medical Language System Terminology Services (UTS) API [Dataset]. https://data.amerigeoss.org/sr/dataset/unified-medical-language-system-terminology-services-uts-api
Explore at:
apiAvailable download formats
Dataset updated
Jul 28, 2019
Dataset provided by
United States
Description
The UTS API is intended for application developers to perform Web service calls and retrieve UMLS data within their own applications. The UTS API provides the ability to search, retrieve, and filter terms, concepts, attributes, relations, metadata and more from over 160 vocabularies of the UMLS Metathesaurus, as well as the Semantic Network. Paging, sorting and filtering (PSF) capabilities allows users to customize results of Web service calls in many ways: choose to include or exclude specific criteria, sort results by fields, or specify results displayed per page. The documentation provides a suite of Web Services Description Language (WSDL) files, API installation instructions, and sample code.

Facebook

Twitter

Click to copy link

Link copied

Cite

serdar altan (2024). 1000+ Data Science Concepts [Dataset]. https://www.kaggle.com/datasets/hserdaraltan/1000-data-science-concepts

1000+ Data Science Concepts

Questions, Instructions and Responses about Data Science Concepts and Terms

Explore at:

zip(121402 bytes)Available download formats

Dataset updated

Mar 23, 2024

Authors

serdar altan

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

This dataset covers more than 1000 common data science concepts. It covers several topics related to statistics, machine learning, and artificial intelligence. It has two columns, one of which is questions or instructions, the other is responses to these instructions. The dataset can be used in Q&A and text generation.

Clear search

Close search

Google apps

Main menu

1000+ Data Science Concepts

National concept directory in National data catalogue

Multilingual Concept Dictionary

Data from: Terminological dictionary of artificial intelligence

Data from: TOWARDS A SYSTEMATIZED DIACHRONIC TERMINOLOGY: SOME BASIC...

Armenian-Russian-English Dictionary of Forest Terminology - Dataset - Data...

RxNorm Attributes Data for Concepts and Atoms

Sex Education Discourse: US UK Ngrams (1922-2022)

A Century of Sex Education Discourse: Ngram Frequencies (US & UK, 1922-2022)

1. Dataset Overview

2. Creator / Author(s)

3. Date Information

4. Data Source

5. Data Collection Methodology

6. Coverage

7. Term Groups and Specific Terms Queried

group01_Primary Discourse Terms (Foundational Concepts)

group02_Biological & Reproductive Terms

group03_Evolving Discourse Terms

group04_Historical Terms

8. File Structure

9. Column Descriptions

Data from: TermFrame: Terms, definitions and semantic annotations for...

Annotated Terms of Service of 100 Online Platforms

Data from: SSHOC Multilingual Data Stewardship Terminology

Public Understanding of Common Data Concepts

Data from: Slovenian Definition Extraction evaluation datasets RSDO-def 1.0

ClinSpEn Data: Parallel English-Spanish COVID-19 Clinical Cases, Terminology...

Data from: ANALYSIS OF CONCEPTUAL PATTERNS AND INTRATERM RELATIONSHIPS OF...

Data from: Talking Glossary of Genetic Terms

Conceptual novelty scores for PubMed articles

Data from: Leveraging Terminology Services for FAIR Semantic Data...

Medical Insurance Glossary dataset 💉💉

Key Features

How to use this dataset

Unified Medical Language System Terminology Services (UTS) API

1000+ Data Science Concepts

Questions, Instructions and Responses about Data Science Concepts and Terms