89 datasets found
  1. 1000+ Data Science Concepts

    • kaggle.com
    zip
    Updated Mar 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    serdar altan (2024). 1000+ Data Science Concepts [Dataset]. https://www.kaggle.com/datasets/hserdaraltan/1000-data-science-concepts
    Explore at:
    zip(121402 bytes)Available download formats
    Dataset updated
    Mar 23, 2024
    Authors
    serdar altan
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset covers more than 1000 common data science concepts. It covers several topics related to statistics, machine learning, and artificial intelligence. It has two columns, one of which is questions or instructions, the other is responses to these instructions. The dataset can be used in Q&A and text generation.

  2. n

    National concept directory in National data catalogue

    • data.norge.no
    json
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Digitaliseringsdirektoratet (2025). National concept directory in National data catalogue [Dataset]. https://data.norge.no/en/datasets/8fbe9c6d-4962-3362-9952-62d9d7ce17bf/national-concept-directory-in-national-data-catalogue
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 9, 2025
    Dataset provided by
    Digitaliseringsdirektoratet
    Description

    The data set "National concept directory in National data catalogue" (Begrepskatalog i Felles datakatalog) contains all terms published in National concept directory in National data catalogue. Each term contains at least information about the recommended term, definition and source of definition. The terms may also include the following information if the owner of the concept has provided such information: additional information about the meaning of the term that does not belong in the definition field; permitted and advised term, example on use of the term, subject area the term belongs to, area of ​​application, legal categories or value ranges of the term, the date the term is valid from, the date the term shall apply to and contact information by e-mail and telephone.

    Objective: To make all concepts in the National concept directory in National data catalogue available for downloading

  3. P

    Multilingual Concept Dictionary

    • opendata.pku.edu.cn
    Updated Jan 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peking University Open Research Data Platform (2018). Multilingual Concept Dictionary [Dataset]. http://doi.org/10.18170/DVN/JAU6RB
    Explore at:
    application/msaccess(2478080), pdf(122919)Available download formats
    Dataset updated
    Jan 11, 2018
    Dataset provided by
    Peking University Open Research Data Platform
    Description

    (1) The Chinese Concept Dictionary (CCD) implements Chinese corresponding to the English concepts in the WordNet 1.6 version. The total number of concepts is close to 100,000 (of which the total number of words far exceeds 100,000), including 66025 concepts of nouns, 12127 of verbs, 17915 of adjectives and 3575 of adverbs. The transfer of use rights to a number of research institutes and multinational corporations has promoted the progress of Chinese-English semantic analysis. (2) The Multilingual Concept Dictionary (MCD), based on CCD, Japanese WordNet, Korean WordNet and CoreNet, is built by automatic method and artificial expert checkup. Currently, under the multilingual conceptual dictionary, there are 8,400 Japanese concepts (mainly medium and high-level concepts in language) and 9,700 Korean concepts (also middle and high-level concepts), forming connection information with CCD concepts. Under the framework of WordNet, the basic concepts of East Asian languages (Chinese, Japanese and Korean) are generally described. (3) Please login to download the datafiles.

  4. E

    Data from: Terminological dictionary of artificial intelligence

    • live.european-language-grid.eu
    binary format
    Updated Nov 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Terminological dictionary of artificial intelligence [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/20833
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Nov 25, 2022
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The terminological dictionary was compiled within the framework of the project Development of Slovene in the Digital Environment. It is an example collection of 413 terms from the field of artificial intelligence, especially from the subfields of machine learning, computer vision, natural language processing, and fuzzy logic. Definitions, English equivalents, and possible synonyms are added to the terms. The dictionary is based on a conceptual approach, according to which terms are perceived as designations for concepts that are related to each other in the conceptual system of the subject field. Consequently, the terms are interrelated in the naming system of the subject field. The dictionary is distributed in XML using the TBX (TermBase eXchange) standard for representing and exchanging information from termbases.

  5. Data from: TOWARDS A SYSTEMATIZED DIACHRONIC TERMINOLOGY: SOME BASIC...

    • scielo.figshare.com
    tiff
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beatriz Curti-Contessoto (2023). TOWARDS A SYSTEMATIZED DIACHRONIC TERMINOLOGY: SOME BASIC CONCEPTS IN FOCUS [Dataset]. http://doi.org/10.6084/m9.figshare.21835336.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Beatriz Curti-Contessoto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT Diachrony is not always an aspect discussed by terminological studies that can be based on different perspectives. Among them, Diachronic Terminology (DT), conceived as an approach that has the main characteristic of focusing on this aspect, has received more attention in recent decades by several works, whose common characteristic is the adoption of a terminologicaldiachronic approach. The diversity of these research is enormous, which and, according to Dury and Picton (2009), generates vague and imprecise theoretical and methodological contours in Terminology. To contribute to an organization of these contours in Brazilian Portuguese, since, in our country, studies (especially the theoretical ones) in this regard are still incipient, this paper presents an overview of international and national research by highlighting their main characteristics in terms of contribution and basic conceptions. Based on this panorama, the use of some terms in these studies that refer to the phenomena analyzed, and their theoretical and methodological implications are discussed. Thus, it is hoped that this work may arouse more interest in this approach and, going further, it can serve as an initial guide, as it discusses some paths that can be followed by investigations to be developed especially in Brazil.

  6. o

    Armenian-Russian-English Dictionary of Forest Terminology - Dataset - Data...

    • data.opendata.am
    Updated Jul 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Armenian-Russian-English Dictionary of Forest Terminology - Dataset - Data Catalog Armenia [Dataset]. https://data.opendata.am/dataset/recc-95f59cfffe224b46a641db49186efe7e
    Explore at:
    Dataset updated
    Jul 8, 2023
    Area covered
    Armenia
    Description

    The dictionary includes about 1650 terms and concepts (in Armenian, Russian and English) used in forest and landscaping sectors with a brief explanation in Armenian.Citation: J.H. Vardanyan, H.T. Sayadyan, Armenian-Russian-English Dictionary of Forest Terminology, Publishing House of the Institute of Botany of NAS RA, Yerevan, 2008.

  7. RxNorm Attributes Data for Concepts and Atoms

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). RxNorm Attributes Data for Concepts and Atoms [Dataset]. https://www.johnsnowlabs.com/marketplace/rxnorm-attributes-data-for-concepts-and-atoms/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    United States
    Description

    This dataset contains all of the attribute data. This includes RXNORM provided attributes, such as normalized 11-digit National Drug Codes (NDCs), UNII codes, and human or veterinary usage markers, and source-provided attributes, such as labeler, definition, and imprint information. Each attribute has an 'Attribute Name' (ATN) and 'Attribute Value' (ATV) combination. For example, NDCs have an ATN of 'NDC' and an ATV of the actual NDC value.

  8. Sex Education Discourse: US UK Ngrams (1922-2022)

    • kaggle.com
    zip
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shah Mohammad Rizvi (2025). Sex Education Discourse: US UK Ngrams (1922-2022) [Dataset]. https://www.kaggle.com/datasets/shahmohammadrizvi/sex-education-discourse-us-uk-ngrams-1922-2022
    Explore at:
    zip(759733 bytes)Available download formats
    Dataset updated
    Jul 2, 2025
    Authors
    Shah Mohammad Rizvi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States, United Kingdom
    Description

    A Century of Sex Education Discourse: Ngram Frequencies (US & UK, 1922-2022)

    1. Dataset Overview

    This dataset provides historical linguistic frequency data related to sex education discourse in British English and American English from 1922 to 2022. Frequencies were extracted from the Google Ngram Viewer (English-GB and English-US corpora, 2019 version) for terms systematically categorized into four distinct conceptual groups. This dataset aims to support research into the evolution of public discourse, pedagogical approaches, and cultural attitudes surrounding sex education over the past century.

    2. Creator / Author(s)

    • Creator: Shah Mohammad Rizvi
    • Affiliation: N/A
    • Contact: smri29.ml@gmail.com

    3. Date Information

    • Date Created: June 30, 2025
    • Last Updated: July 2, 2025

    4. Data Source

    The data in this dataset was extracted from the Google Ngram Viewer.

    5. Data Collection Methodology

    Ngram frequency data was programmatically extracted from the Google Ngram Viewer by accessing generated HTML pages, which contain embedded JSON data. A custom Python script was used to parse the HTML, extract the time-series frequency data for specific terms, and consolidate it into a structured CSV format. Ngram Viewer smoothing was uniformly set to 3 for all queries to mitigate year-to-year fluctuations.

    6. Coverage

    • Time Period: 1922 - 2022 (inclusive)
    • Languages/Corpora:
      • British English (Google Ngram's English (GB) corpus)
      • American English (Google Ngram's English (US) corpus)

    7. Term Groups and Specific Terms Queried

    Terms were carefully selected and grouped to analyze different facets of sex education discourse. Each group's terms were queried individually or as grouped queries where indicated (e.g., using (All) quantifier in Ngram Viewer).

    group01_Primary Discourse Terms (Foundational Concepts)

    These terms represent the central, overarching, and foundational concepts that define or are core to the public conversation surrounding sex education.

    • sex education
    • reproductive health
    • sexual health
    • contraception
    • abstinence
    • consent
    • STD
    • STI

    group02_Biological & Reproductive Terms

    This group includes vocabulary related to human anatomy, physiological processes, and biological aspects often discussed in the context of sex education.

    • puberty
    • menstruation
    • vagina
    • penis
    • reproduction
    • sperm
    • ovulation

    group03_Evolving Discourse Terms

    These terms reflect contemporary understandings, progressive approaches, inclusivity, and specific modern public health concerns that have gained significant prominence in later decades of the discourse.

    • LGBTQ
    • gender identity
    • sexual orientation
    • body autonomy
    • safe sex
    • HIV prevention
    • AIDS education

    group04_Historical Terms

    This group contains vocabulary that was more prevalent in earlier periods, reflecting older approaches, euphemisms, or terms whose primary usage or connotations have significantly shifted over the past century.

    • venereal disease
    • chastity
    • morality
    • family planning
    • the pill
    • prophylactic

    8. File Structure

    The dataset is organized as follows:

    • sex_education_final_combined_dataset.csv: This file contains all Ngram frequency data for both British and American English, encompassing all terms from all four groups, consolidated into a single DataFrame.
    • Sex_ED_UK/: Directory containing individual CSV files for each term group relevant to the British English corpus.
      • group01_Primary Discourse Terms.csv
      • group02_Biological & Reproductive Terms.csv
      • group03_Evolving Discourse Terms.csv
      • group04_Historical Terms.csv
    • Sex_ED_USA/: Directory containing individual CSV files for each term group relevant to the American English corpus.
      • group01_Primary Discourse Terms.csv
      • group02_Biological & Reproductive Terms.csv
      • group03_Evolving Discourse Terms.csv
      • group04_Historical Terms.csv
    • README.md: This metadata file.

    9. Column Descriptions

    All CSV files (individual and combined) share the following columns:

    • Year: Integer - The year of publication of the texts from which the Ngram frequencies were calculated (ranging from 1922 to 2022).
    • Term: String - The specific Ngram term or phrase for which the frequency is provided.
    • Frequency: Float - The relative frequency of the Term in the Corpus for that Year. This is a proportion of the total number of Ngrams for that year.
    • Corpus: String - The Google Ngram corpus from which the data was extracted (British English or American English).
    • TermGroup: String - The conceptu...
  9. E

    Data from: TermFrame: Terms, definitions and semantic annotations for...

    • live.european-language-grid.eu
    binary format
    Updated Nov 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). TermFrame: Terms, definitions and semantic annotations for karstology [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/20243
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Nov 17, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The resource contains several datasets containing domain-specific data in three languages, English, Slovenian and Croatian, which can be used for various knowledge extraction or knowledge modelling tasks. The resource represents knowledge for the domain of karstology, a subfield of geography studying karst and related phenomena. It contains:

    1. Definitions Plain text files contain definitions of karst concepts from relevant glossaries and encyclopaedia, but also definitions which had been extracted from domain-specific corpora.

    2. Annotated definitions Definitions were manually annotated and curated in the WebAnno tool. Annotations include several layers including definition elements, semantic relations following the frame-based theory of terminology (FBT), relation definitors which can be used for learning relation patterns, and semantic categories defined in the domain model.

    3. Terms, definitions and sources The TermFrame knowledge base contains terms and their corresponding concept identifiers, definitions and definition sources.

  10. m

    Annotated Terms of Service of 100 Online Platforms

    • data.mendeley.com
    Updated Dec 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Przemyslaw Palka (2023). Annotated Terms of Service of 100 Online Platforms [Dataset]. http://doi.org/10.17632/dtbj87j937.3
    Explore at:
    Dataset updated
    Dec 12, 2023
    Authors
    Przemyslaw Palka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains information about the contents of 100 Terms of Service (ToS) of online platforms. The documents were analyzed and evaluated from the point of view of the European Union consumer law. The main results have been presented in the table titled "Terms of Service Analysis and Evaluation_RESULTS." This table is accompanied by the instruction followed by the annotators, titled "Variables Definitions," allowing for the interpretation of the assigned values. In addition, we provide the raw data (analyzed ToS, in the folder "Clear ToS") and the annotated documents (in the folder "Annotated ToS," further subdivided).

    SAMPLE: The sample contains 100 contracts of digital platforms operating in sixteen market sectors: Cloud storage, Communication, Dating, Finance, Food, Gaming, Health, Music, Shopping, Social, Sports, Transportation, Travel, Video, Work, and Various. The selected companies' main headquarters span four legal surroundings: the US, the EU, Poland specifically, and Other jurisdictions. The chosen platforms are both privately held and publicly listed and offer both fee-based and free services. Although the sample cannot be treated as representative of all online platforms, it nevertheless accounts for the most popular consumer services in the analyzed sectors and contains a diverse and heterogeneous set.

    CONTENT: Each ToS has been assigned the following information: 1. Metadata: 1.1. the name of the service; 1.2. the URL; 1.3. the effective date; 1.4. the language of ToS; 1.5. the sector; 1.6. the number of words in ToS; 1.7–1.8. the jurisdiction of the main headquarters; 1.9. if the company is public or private; 1.10. if the service is paid or free. 2. Evaluative Variables: remedy clauses (2.1– 2.5); dispute resolution clauses (2.6–2.10); unilateral alteration clauses (2.11–2.15); rights to police the behavior of users (2.16–2.17); regulatory requirements (2.18–2.20); and various (2.21–2.25). 3. Count Variables: the number of clauses seen as unclear (3.1) and the number of other documents referred to by the ToS (3.2). 4. Pull-out Text Variables: rights and obligations of the parties (4.1) and descriptions of the service (4.2)

    ACKNOWLEDGEMENT: The research leading to these results has received funding from the Norwegian Financial Mechanism 2014-2021, project no. 2020/37/K/HS5/02769, titled “Private Law of Data: Concepts, Practices, Principles & Politics.”

  11. c

    Data from: SSHOC Multilingual Data Stewardship Terminology

    • dspace-clarin-it.ilc.cnr.it
    Updated Dec 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francesca Frontini; Federica Gamba; Monica Monachini; Daan Broeder (2021). SSHOC Multilingual Data Stewardship Terminology [Dataset]. https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-567
    Explore at:
    Dataset updated
    Dec 31, 2021
    Authors
    Francesca Frontini; Federica Gamba; Monica Monachini; Daan Broeder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SSHOC Multilingual Data Stewardship Terminology is a multilingual terminology that collects terms specific to the domain of Data Stewardship, as well as their definitions. A list of domain-specific terms was automatically extracted from a corpus pertaining to the domain of Data Stewardship and Curation, validated by domain experts, assigned a definition, and linked to other existing terminologies (Loterre Open Science Thesaurus, terms4FAIRskills, Linked Open Vocabularies, ISO terms and definitions). Each term-definition pair was then automatically translated into multiple languages (Dutch, French, German, Greek, Italian, Slovenian) by employing Deep-L. The Multilingual Data Stewardship Terminology thus consists of 210 concepts available in Dutch, French, German, Greek, Italian, Slovenian. This resource was created within the frame of the SSHOC (Social Sciences and Humanities Open Cloud) project (H2020-INFRAEOSC-2018-2-823782). It is the result of the work of Task 3.1.2 "extraction of terminology from technical documentation about standards and interoperability", as described in D3.9, carried out jointly by ILC-CNR and CLARIN ERIC.

  12. Z

    Public Understanding of Common Data Concepts

    • data.niaid.nih.gov
    Updated Jun 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    O'Grady, Michael; Mangina, Eleni (2024). Public Understanding of Common Data Concepts [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11073209
    Explore at:
    Dataset updated
    Jun 30, 2024
    Dataset provided by
    University College Dublin
    Authors
    O'Grady, Michael; Mangina, Eleni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset reports on a survey on awareness amongst the European public of common data concepts, terms and principles.

    Elements of this survey were utilised in the following publication -

    O’Grady, M., Mangina, E. Citizen scientists—practices, observations, and experience. Humanit Soc Sci Commun 11, 469 (2024). https://doi.org/10.1057/s41599-024-02966-x

  13. E

    Data from: Slovenian Definition Extraction evaluation datasets RSDO-def 1.0

    • live.european-language-grid.eu
    binary format
    Updated May 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Slovenian Definition Extraction evaluation datasets RSDO-def 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/21588
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    May 18, 2023
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The Slovene Definition Extraction evaluation datasets RSDO-def contains sentences extracted from the Corpus of term-annotated texts RSDO5 1.1 (http://hdl.handle.net/11356/1470), which contains texts with annotated terms from four different domains: biomechanics, linguistics, chemistry, and veterinary science. The file and sentence identifiers are the same as in the original RSDO corpus.

    The labels added to the sentences included in the dataset denote: 0: Non-definition 1: Weak definition 2: Definition

    The dataset consists of two parts: 1. RSDO-def-random employed a random sampling strategy, with 14 definitions, 98 weak-definitions and 849 non-definitions. 2. RSDO-def-larger added sentences to the random one by the pattern-based definition extraction as presented in Pollak et al. (2014). It contains 169 definitions, 214 weak-definitions and 872 non-definitions.

    Both parts were manually annotated by five terminographers. In case of discrepancies between annotators, a consensus was reached and the final label was confirmed by all five annotators. Duplicates were removed in both parts.

    The criteria for annotation are based on the standard ISO 1087-1:2000 (E/F) Terminology Work - Vocabulary, Part 1, Theory and Application, which explains a definition as follows: "Representation of a concept by a descriptive statement which serves to differentiate it from related concepts". Weak definition labels were assigned if the extracted sentences contained a term and at least one delimiting feature without a superordinate concept, or sentences consisting of superordinate concepts without delimiting features but with some typical examples. Instances were labeled as Non-definition if the sentence with the extracted concept did not contain any information about the concept or its delimiting features.

    The dataset is described in more detail in Tran et al. 2023, where it was used for evaluating definition extraction approaches. If you use this resource, please cite:

    Tran, T.H.H., Podpečan, V., Jemec Tomazin, M., Pollak, Senja (2023). Definition Extraction for Slovene: Patterns, Transformer Classifiers and ChatGPT. Proceedings of the ELEX 2023: Electronic lexicography in the 21st century. Invisible lexicography: everywhere lexical data is used without users realizing they make use of a “dictionary” (accepted)

    Reference to the pattern-based definition extraction method used for creating RSDO-def-larger: Pollak, S. (2014). Extracting definition candidates from specialized corpora. Slovenščina 2.0: empirical, applied and interdisciplinary research, 2(1), pp. 1–40. https://doi.org/10.4312/slo2.0.2014.1.1-40

    Related resources:

    • Jemec Tomazin, M. et al. (2021). Corpus of term-annotated texts RSDO5 1.1, Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, http://hdl.handle.net/11356/1470.
    • Podpečan et al. (2023). DF_NDF_wiki_slo: Definition extraction training sets from Wikipedia, Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1840.
  14. ClinSpEn Data: Parallel English-Spanish COVID-19 Clinical Cases, Terminology...

    • zenodo.org
    zip
    Updated Mar 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salvador Lima; Salvador Lima; Darryl Johan; Martin Krallinger; Martin Krallinger; Darryl Johan (2023). ClinSpEn Data: Parallel English-Spanish COVID-19 Clinical Cases, Terminology and Ontology Concepts [Dataset]. http://doi.org/10.5281/zenodo.7014645
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 9, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Salvador Lima; Salvador Lima; Darryl Johan; Martin Krallinger; Martin Krallinger; Darryl Johan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ClinSpEn

    This repository contains the sample, test and background data for the ClinSpEn track.

    ClinSpEn is part of the Biomedical WMT 2022 shared task, having the aim to promote the development and evaluation of machine translation systems adapted to the medical domain with three highly relevant sub-tracks: clinical cases, medical controlled vocabularies/ontologies, and clinical terms and entities extracted from medical content.

    Data Description

    ClinSpEn proposes three different sub-tracks, each based on a different type of clinical data:

    - Clinical Cases:

    Parallel EN-EN COVID-19 clinical cases. The direction of this sub-track is EN>ES.

    The dataset’s case reports were carefully selected to cover a wide range of aspects related to the disease: different types of patients (children, adults, elderly and pregnant people, babies), different comorbidities (cancer, mental health issues, immunosuppressed patients) and symptomatology (mild and severe presentations, dermatologic, immunologic and psychiatric manifestations, thrombosis, …). The reports were translated from English to Spanish by a professional medical translator on a first step and revised by a clinical expert on a second step.

    The sample set files is made up of parallel txt files, with the Spanish version having a “.es” extension and the English files having a “.en” extension. Each report has been parallelized so that every sentence’s line number corresponds to the same sentence’s line number in both languages.

    The test and background data is made up of a TSV file with three columns: document number, line number and English line. The clinical cases themselves include COVID-19 case reports as well as diverse content extracted from PubMed.

    - Clinical Terminology:

    Parallel EN-ES clinical terms extracted from medical literature and clinical records, with particular focus on diseases, symptoms, findings, procedures and professions and translated and revised by professional medical translators. The direction of this sub-track is ES>EN.

    The sample set contains 7 000 terms as a tab-separated file (TSV), with the first column corresponding to English terms and the second column to Spanish terms.

    The test and background data is made up of a TSV file with two columns: term number and Spanish term.

    - Ontology Concepts:

    Parallel EN-ES concepts extracted from various open biomedical ontologies and taxonomies and then manually translated by a professional medical translator. The direction of this sub-track is EN>ES.

    The sample data includes 400 concepts. The terms are presented as tab-separated file (TSV), with the first column corresponding to English terms and the second column to Spanish terms. The third column includes the term’s origin ontology and its correspondent ID, while the fourth one includes a link to the concept in OBO Library.

    The test and background data is made up of a TSV file with two columns: concept number and English concept.

    Related Links:

    - Sub-track website with more information: https://temu.bsc.es/clinspen/

    - WMT website: https://www.statmt.org/wmt22/

    - CodaLab: https://codalab.lisn.upsaclay.fr/competitions/6696/

  15. Data from: ANALYSIS OF CONCEPTUAL PATTERNS AND INTRATERM RELATIONSHIPS OF...

    • scielo.figshare.com
    tiff
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucimara Alves da Conceição Costa (2023). ANALYSIS OF CONCEPTUAL PATTERNS AND INTRATERM RELATIONSHIPS OF TERMINOLOGICAL VARIANTS IN ECONOMICS [Dataset]. http://doi.org/10.6084/m9.figshare.21835266.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Lucimara Alves da Conceição Costa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT Denominative variation in terminology, that is, the use of different names to designate the same concept or nuances of the same conceptual reality, is often considered as a mere stylistic resource or a strategy of thematic progression. It can present, however, distinct conceptual patterns and distinct intra-term relations, which means that the units are not always semantically equivalent. Thus, much more than a thematic progression mechanism, variants act as a discursive and cognitive resource to highlight different conceptual nuances of terminological units. In this sense, in this article, based on the assumptions of modern trends in Terminology, in particular on the Communicative Theory of Terminology (CABRÉ, 1999, 2005) and on the classification of conceptual specification patterns by Kageura (2002), we aim to analyze conceptual patterns and intra-term relations present in terminological variants of Economics. Through this analysis, we intend to show which conceptual information is highlighted in these terminological units and how they can influence the understanding and construction of specialized knowledge.

  16. n

    Data from: Talking Glossary of Genetic Terms

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Oct 21, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2009). Talking Glossary of Genetic Terms [Dataset]. http://identifiers.org/RRID:SCR_003215
    Explore at:
    Dataset updated
    Oct 21, 2009
    Description

    Glossary of Genetic Terms to help everyone understand the terms and concepts used in genetic research. In addition to definitions, specialists in the field of genetics share their descriptions of terms, and many terms include images, animation and links to related terms.

  17. I

    Conceptual novelty scores for PubMed articles

    • databank.illinois.edu
    • aws-databank-alb.library.illinois.edu
    Updated Feb 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubhanshu Mishra; Vetle I. Torvik (2024). Conceptual novelty scores for PubMed articles [Dataset]. http://doi.org/10.13012/B2IDB-5060298_V1
    Explore at:
    Dataset updated
    Feb 1, 2024
    Authors
    Shubhanshu Mishra; Vetle I. Torvik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    U.S. National Institutes of Health (NIH)
    U.S. National Science Foundation (NSF)
    Description

    Conceptual novelty analysis data based on PubMed Medical Subject Headings ---------------------------------------------------------------------- Created by Shubhanshu Mishra, and Vetle I. Torvik on April 16th, 2018 ## Introduction This is a dataset created as part of the publication titled: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : the magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra. It contains final data generated as part of our experiments based on MEDLINE 2015 baseline and MeSH tree from 2015. The dataset is distributed in the form of the following tab separated text files: * PubMed2015_NoveltyData.tsv - Novelty scores for each paper in PubMed. The file contains 22,349,417 rows and 6 columns, as follow: - PMID: PubMed ID - Year: year of publication - TimeNovelty: time novelty score of the paper based on individual concepts (see paper) - VolumeNovelty: volume novelty score of the paper based on individual concepts (see paper) - PairTimeNovelty: time novelty score of the paper based on pair of concepts (see paper) - PairVolumeNovelty: volume novelty score of the paper based on pair of concepts (see paper) * mesh_scores.tsv - Temporal profiles for each MeSH term for all years. The file contains 1,102,831 rows and 5 columns, as follow: - MeshTerm: Name of the MeSH term - Year: year - AbsVal: Total publications with that MeSH term in the given year - TimeNovelty: age (in years since first publication) of MeSH term in the given year - VolumeNovelty: : age (in number of papers since first publication) of MeSH term in the given year * meshpair_scores.txt.gz (36 GB uncompressed) - Temporal profiles for each MeSH term for all years - Mesh1: Name of the first MeSH term (alphabetically sorted) - Mesh2: Name of the second MeSH term (alphabetically sorted) - Year: year - AbsVal: Total publications with that MeSH pair in the given year - TimeNovelty: age (in years since first publication) of MeSH pair in the given year - VolumeNovelty: : age (in number of papers since first publication) of MeSH pair in the given year * README.txt file ## Dataset creation This dataset was constructed using multiple datasets described in the following locations: * MEDLINE 2015 baseline: https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html * MeSH tree 2015: ftp://nlmpubs.nlm.nih.gov/online/mesh/2015/meshtrees/ * Source code provided at: https://github.com/napsternxg/Novelty Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check here for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions: Additional data related updates can be found at: Torvik Research Group ## Acknowledgments This work was made possible in part with funding to VIT from NIH grant P01AG039347 and NSF grant 1348742 . The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ## License Conceptual novelty analysis data based on PubMed Medical Subject Headings by Shubhanshu Mishra, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at https://github.com/napsternxg/Novelty

  18. Data from: Leveraging Terminology Services for FAIR Semantic Data...

    • meta4ds.fokus.fraunhofer.de
    pdf, unknown
    Updated Sep 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2023). Leveraging Terminology Services for FAIR Semantic Data Integration across NFDI Domains - How to Integrate Terminology Services Into Other Service Applications [Dataset]. https://meta4ds.fokus.fraunhofer.de/datasets/oai-zenodo-org-8342678?locale=en
    Explore at:
    pdf(4984427), unknownAvailable download formats
    Dataset updated
    Sep 14, 2023
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Research Data Infrastructure (NFDI) strives to develop FAIR research data and data services for major scientific disciplines, using terminologies as a key factor for semantic annotations and semantic interoperability of data. Several NFDI consortia provide domain-specific terminologies through Terminology services or registries, offering access, search capabilities, visualization, and downloads. Prioritizing user-friendly access, terminology services seamlessly integrate semantic concepts into applications, often operating in the background to enable smooth semantic annotation and data interoperability. We present exemplary fields of application from selected disciplines and how terminology services support semantic search, user experience, annotation workflows, terminology curation and design. This presentation is connected to the following conference paper https://doi.org/10.52825/cordi.v1i.356

  19. Medical Insurance Glossary dataset 💉💉

    • kaggle.com
    zip
    Updated Oct 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiv_D24Coder (2023). Medical Insurance Glossary dataset 💉💉 [Dataset]. https://www.kaggle.com/datasets/shivd24coder/medical-insurance-glossary-dataset
    Explore at:
    zip(73527 bytes)Available download formats
    Dataset updated
    Oct 17, 2023
    Authors
    Shiv_D24Coder
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Key Features

    Column NameDescription
    tagsTags associated with the glossary entry.
    categoriesCategories of glossary entries.
    topicsTopics related to the glossary entry.
    titleThe title of the glossary entry.
    es-titleThe Spanish translation of the title.
    urlThe URL or link to the glossary entry.
    biteA brief description or explanation of the term in English.
    es-biteThe Spanish translation of the term's description.
    audienceThe intended audience for the glossary entry.
    segmentThe specific segment this entry relates to.
    insurance-statusInformation related to insurance status.
    stateThe state to which the entry pertains.
    conditionAny specific conditions associated with the entry.

    How to use this dataset

    1. Understand Medical Insurance Terminology: Use the glossary to understand and explain common medical insurance terms and concepts.

    2. Language Translation: If you're working in a bilingual setting or need translations of medical insurance terms, the Spanish translations provided in this dataset can be invaluable.

    3. Educational Resources: Create educational resources, articles, or content related to medical insurance by using the glossary entries.

    4. Data Enrichment: Enhance your medical insurance-related datasets or applications with standardized terminology using this glossary.

    5. Reference for Medical Professionals: This glossary can serve as a reference for healthcare professionals, insurance agents, and researchers in the field.

  20. A

    Unified Medical Language System Terminology Services (UTS) API

    • data.amerigeoss.org
    • data.wu.ac.at
    api
    Updated Jul 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2019). Unified Medical Language System Terminology Services (UTS) API [Dataset]. https://data.amerigeoss.org/sr/dataset/unified-medical-language-system-terminology-services-uts-api
    Explore at:
    apiAvailable download formats
    Dataset updated
    Jul 28, 2019
    Dataset provided by
    United States
    Description

    The UTS API is intended for application developers to perform Web service calls and retrieve UMLS data within their own applications. The UTS API provides the ability to search, retrieve, and filter terms, concepts, attributes, relations, metadata and more from over 160 vocabularies of the UMLS Metathesaurus, as well as the Semantic Network. Paging, sorting and filtering (PSF) capabilities allows users to customize results of Web service calls in many ways: choose to include or exclude specific criteria, sort results by fields, or specify results displayed per page. The documentation provides a suite of Web Services Description Language (WSDL) files, API installation instructions, and sample code.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
serdar altan (2024). 1000+ Data Science Concepts [Dataset]. https://www.kaggle.com/datasets/hserdaraltan/1000-data-science-concepts
Organization logo

1000+ Data Science Concepts

Questions, Instructions and Responses about Data Science Concepts and Terms

Explore at:
zip(121402 bytes)Available download formats
Dataset updated
Mar 23, 2024
Authors
serdar altan
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

This dataset covers more than 1000 common data science concepts. It covers several topics related to statistics, machine learning, and artificial intelligence. It has two columns, one of which is questions or instructions, the other is responses to these instructions. The dataset can be used in Q&A and text generation.

Search
Clear search
Close search
Google apps
Main menu