The Data Glossary dataset includes common data terms and definitions that have been standardized within Maryland State Government. This dataset will continue to evolve as new terms come into the landscape.
The data lexicon provides an application that promotes transparency across all of FEMA for common data terms used by defining common data terms and providing additional context. The data lexicon contains descriptive information on key attributes of datasets such as:rnrnTitle of requested termrnReason for requested termrnStatus of requested termrnUser who requested the term
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The LScDC (Leicester Scientific Dictionary-Core Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScDC (Leicester Scientific Dictionary-Core) is formed using the updated LScD (Leicester Scientific Dictionary) - Version 3*. All steps applied to build the new version of core dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. The files provided with this description are also same as described as for LScDC Version 2. The numbers of words in the 3rd versions of LScD and LScDC are summarized below. # of wordsLScD (v3) 972,060LScDC (v3) 103,998 * Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v3 ** Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v2[Version 2] Getting StartedThis file describes a sorted and cleaned list of words from LScD (Leicester Scientific Dictionary), explains steps for sub-setting the LScD and basic statistics of words in the LSC (Leicester Scientific Corpus), to be found in [1, 2]. The LScDC (Leicester Scientific Dictionary-Core) is a list of words ordered by the number of documents containing the words, and is available in the CSV file published. There are 104,223 unique words (lemmas) in the LScDC. This dictionary is created to be used in future work on the quantification of the sense of research texts. The objective of sub-setting the LScD is to discard words which appear too rarely in the corpus. In text mining algorithms, usage of enormous number of text data brings the challenge to the performance and the accuracy of data mining applications. The performance and the accuracy of models are heavily depend on the type of words (such as stop words and content words) and the number of words in the corpus. Rare occurrence of words in a collection is not useful in discriminating texts in large corpora as rare words are likely to be non-informative signals (or noise) and redundant in the collection of texts. The selection of relevant words also holds out the possibility of more effective and faster operation of text mining algorithms.To build the LScDC, we decided the following process on LScD: removing words that appear in no more than 10 documents (
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Since COVID-19 was first identified in December 2019, the number of countries affected by this disease has been increasing; the World Health Organization declared it a pandemic in March 2020. The current global situation requires highly effective communication. The vocabulary used must be understood by everyone, and it is important that all documents have consistent terminology. This glossary is designed as a tool for language professionals as well as those responsible for disseminating information in the context of this pandemic. In it, you will find terms in the fields of medicine, sociology and politics, among others. Please note that some records in this data set may have been updated after the extraction date for this data set. To find the most recent terminology data including textual supports beyond the definitions present in the open data file, consult TERMIUM Plus® or check the Glossary on the COVID-19 pandemic. Please also note that, as a result of technical constraints, some abbreviations may not immediately follow the terms they abbreviate. This dataset is no longer being updated. COVID-19 terminology can be found in the TERMIUM Plus® dataset and in TERMIUM Plus® , the Government of Canada's terminology data bank.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
glosario
is an open-source glossary of terms used in data science that is available online and also as a library in both R and Python. By adding glossary keys to a lesson’s metadata, authors can indicate what the lesson teaches, what learners ought to know before they start, and where they can go to find that knowledge. Authors can also use the library’s functions to insert consistent hyperlinks for terms and definitions in their lessons in any of several languages. The master copy of the glossary lives in the glossary.yml
file.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
A dataset for the 1st task Explain or teach basic data science concepts
of the competition Google – AI Assistants for Data Tasks with Gemma.
This dataset contains several glossaries of Data Science, where every sample contains two keys term(vocab name) and definition.
A table of the values and definitions of fields used in Austin Police Department datasets. City of Austin Open Data Terms of Use - https://data.austintexas.gov/stories/s/ranj-cccq
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This fileset provides supporting data and corpora for the empirical study described in: Laura Miron, Rafael S. Goncalves and Mark A. Musen. Obstacles to the Reuse of Metadata in ClinicalTrials.govDescription of filesOriginal data files:- AllPublicXml.zip contains the set of all public XML records in ClinicalTrials.gov (protocols and summary results information), on which all remaining analyses are based. Set contains 302,091 records downloaded on April 3, 2019.- public.xsd is the XML schema downloaded from ClinicalTrials.gov on April 3, 2019, used to validate records in AllPublicXML.BioPortal API Query Results- condition_matches.csv contains the results of querying the BioPortal API for all ontology terms that are an 'exact match' to each condition string scraped from the ClinicalTrials.gov XML. Columns={filename, condition, url, bioportal term, cuis, tuis}. - intervention_matches.csv contains BioPortal API query results for all interventions scraped from the ClinicalTrials.gov XML. Columns={filename, intervention, url, bioportal term, cuis, tuis}.Data Element Definitions- supplementary_table_1.xlsx Mapping of element names, element types, and whether elements are required in ClinicalTrials.gov data dictionaries, the ClinicalTrials.gov XML schema declaration for records (public.XSD), the Protocol Registration System (PRS), FDAAA801, and the WHO required data elements for clinical trial registrations.Column and value definitions: - CT.gov Data Dictionary Section: Section heading for a group of data elements in the ClinicalTrials.gov data dictionary (https://prsinfo.clinicaltrials.gov/definitions.html) - CT.gov Data Dictionary Element Name: Name of an element/field according to the ClinicalTrials.gov data dictionaries (https://prsinfo.clinicaltrials.gov/definitions.html) and (https://prsinfo.clinicaltrials.gov/expanded_access_definitions.html) - CT.gov Data Dictionary Element Type: "Data" if the element is a field for which the user provides a value, "Group Heading" if the element is a group heading for several sub-fields, but is not in itself associated with a user-provided value. - Required for CT.gov for Interventional Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to interventional records (only observational or expanded access) - Required for CT.gov for Observational Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to observational records (only interventional or expanded access) - Required in CT.gov for Expanded Access Records?: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to expanded access records (only interventional or observational) - CT.gov XSD Element Definition: abbreviated xpath to the corresponding element in the ClinicalTrials.gov XSD (public.XSD). The full xpath includes 'clinical_study/' as a prefix to every element. (There is a single top-level element called "clinical_study" for all other elements.) - Required in XSD? : "Yes" if the element is required according to public.XSD, "No" if the element is optional, "-" if the element is not made public or included in the XSD - Type in XSD: "text" if the XSD type was "xs:string" or "textblock", name of enum given if type was enum, "integer" if type was "xs:integer" or "xs:integer" extended with the "type" attribute, "struct" if the type was a struct defined in the XSD - PRS Element Name: Name of the corresponding entry field in the PRS system - PRS Entry Type: Entry type in the PRS system. This column contains some free text explanations/observations - FDAAA801 Final Rule FIeld Name: Name of the corresponding required field in the FDAAA801 Final Rule (https://www.federalregister.gov/documents/2016/09/21/2016-22129/clinical-trials-registration-and-results-information-submission). This column contains many empty values where elements in ClinicalTrials.gov do not correspond to a field required by the FDA - WHO Field Name: Name of the corresponding field required by the WHO Trial Registration Data Set (v 1.3.1) (https://prsinfo.clinicaltrials.gov/trainTrainer/WHO-ICMJE-ClinTrialsgov-Cross-Ref.pdf)Analytical Results:- EC_human_review.csv contains the results of a manual review of random sample eligibility criteria from 400 CT.gov records. Table gives filename, criteria, and whether manual review determined the criteria to contain criteria for "multiple subgroups" of participants.- completeness.xlsx contains counts and percentages of interventional records missing fields required by FDAAA801 and its Final Rule.- industry_completeness.xlsx contains percentages of interventional records missing required fields, broken up by agency class of trial's lead sponsor ("NIH", "US Fed", "Industry", or "Other"), and before and after the effective date of the Final Rule- location_completeness.xlsx contains percentages of interventional records missing required fields, broken up by whether record listed at least one location in the United States and records with only international location (excluding trials with no listed location), and before and after the effective date of the Final RuleIntermediate Results:- cache.zip contains pickle and csv files of pandas dataframes with values scraped from the XML records in AllPublicXML. Downloading these files greatly speeds up running analysis steps from jupyter notebooks in our github repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction This dataset contains the terms and definitions included on the UKPN Open Data Portal Glossary Page.
Methodological Approach This dataset is sourced from UK Power Networks internal business glossary.
Quality Control Statement Quality Control Measures include:
Manual review and correction of data inconsistencies Use of additional verification steps to ensure accuracy in the methodology
Assurance Statement The Open Data Team and Data Governance Team worked together to ensure data accuracy and consistency.
Other UKPN Open Data Portal Glossary helps ensure common understanding of terms, used or related to the datasets published on UKPN Open Data Portal. Download dataset information: Metadata (JSON) Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The resource contains several datasets containing domain-specific data in three languages, English, Slovenian and Croatian, which can be used for various knowledge extraction or knowledge modelling tasks. The resource represents knowledge for the domain of karstology, a subfield of geography studying karst and related phenomena. It contains:
Definitions Plain text files contain definitions of karst concepts from relevant glossaries and encyclopaedia, but also definitions which had been extracted from domain-specific corpora.
Annotated definitions Definitions were manually annotated and curated in the WebAnno tool. Annotations include several layers including definition elements, semantic relations following the frame-based theory of terminology (FBT), relation definitors which can be used for learning relation patterns, and semantic categories defined in the domain model.
Terms, definitions and sources The TermFrame knowledge base contains terms and their corresponding concept identifiers, definitions and definition sources.
County spending glossary (definitions of financial terms) on dataMontgomery. These definitions appear when a user hovers over this term in spendingMontgomery.
https://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do
This file is a CSV format data that organizes the standard terminology dictionary used in the homepage system. It contains a total of 363 terms. Term name: The name of the term used in the system. Physical name: The physical field name used when implementing a system such as a database. Domain: Indicates the logical data category to which the term belongs. Info type: The type of information, providing data classification criteria. Data type: Specifies the data storage format (e.g. VARCHAR, etc.) of the term. Code name: Indicates the name when managed as a code value, and is mostly blank. Definition: A definition that explains the meaning of the term. Personal information type: Specifies whether the item corresponds to personal information. Public/private status: This item distinguishes the possibility of information being disclosed. This data can be used to unify terms between systems, standardize data, and establish personal information protection and information disclosure standards.
The goal of the CODATA Research Data Management Terminology is to gather the key terms needed for a common understanding of the research data management domain. The RDMT was revised by the CODATA RDM Terminology Working Group, shared for public review, and then confirmed and finalised in 2023.
The RDMT grew out of the CASRAI Research Data Management Glossary, which was intended as a practical reference for individuals and groups concerned with the improvement of research data management (RDM). In 2020, CASRAI requested that CODATA assume responsibility for the curation of this valued resource.
To that end, the RDM Terminology Working Group uses a lightweight and pragmatic biennial process to review the resource now restructured as the CODATA RDM Terminology and suggest any edits, additions and removals that are required in order to develop and improve this important reference resource.
The ASR Glossary provides definitions of all the terminology used throughout the Annual Statistical Report. It can also be found on Appendices B and C of the ASR Documentation.
Terminology Services provides tools and services that enable vocabulary development, maintenance and provisioning for the enterprise.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
17 multilingual medical terminologies from Termcat in the following domains: - Anatomy (3610 terms; languages: es, en, ca) - Integrated care (75 terms; languages: es, en,ca) - Pathophysiology (330 terms; languages: es, en, ca) - Chronicity (80 terms; languages: es, en, ca) - Physiotherapy (1818 terms; languages: es, en, ca) - Homeopathy (331 terms; languages: languages: es, en, fr, ca) - Stroke and spinal cord injuries (212 terms; languages: es, en, fr, ca) - Immunology (1312 terms; languages: es, en, fr, ca) - Nursing (904 terms; languages: es, en, fr, ca) - Metabolic disorders (183 terms; languages: es, en, fr, ca) - Neuroscience (1974 terms; languages: es, en, ca) - Ophthalmology (1178 terms; languages: es, en, ca) - Otorhinolaryngology (1079 terms; languages: es, en, ca) - Psychiatry (915 terms; languages: es, en, fr, de, ca) - Clinical Drug Research (710 terms; languages: es, en, ca) - Semiology (1062 terms; languages: es, en, fr, it, ca) - Vaccination (135 terms; languages: es, en, fr, ca)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) actions SMART 2014/1074 and SMART 2015/1091. For further information on the project: http://lr-coordination.eu.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Since COVID-19 was first identified in December 2019, the number of countries affected by this disease has been increasing; the World Health Organization declared it a pandemic in March 2020. The current global situation requires highly effective communication. The vocabulary used must be understood by everyone, and it is important that all documents have consistent terminology. This glossary is designed as a tool for language professionals as well as those responsible for disseminating information in the context of this pandemic. In it, you will find terms in the fields of medicine, sociology and politics, among others. Please note that some records in this data set may have been updated after the extraction date for this data set. To find the most recent terminology data including textual supports beyond the definitions present in the open data file, consult TERMIUM Plus® or check the Glossary on the COVID-19 pandemic. Please also note that, as a result of technical constraints, some abbreviations may not immediately follow the terms they abbreviate.
Trilingual glossary (EN-LT-DA) of the English terms referring to personal data and their equivalents in the Lithuanian and Danish languages.
Learn about local government finance terms and retirement system terms commonly used on this website.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The GeoNet Aotearoa New Zealand Glossary of Data-related terms is a glossary used within the GeoNet programme. The Glossary is a set of terminologies adapted and used to define in a generalized form and plain language terms and concepts associated with the generation of GeoNet data products.
This dataset is funded through https://www.geonet.org.nz/sponsors
DOI: https://doi.org/10.21420/XQS0-0Z48
Cite as: GNS Science. (2021). GeoNet Aotearoa New Zealand Glossary of Data-related terms [Data set]. GNS Science. https://doi.org/10.21420/XQS0-0Z48
The Data Glossary dataset includes common data terms and definitions that have been standardized within Maryland State Government. This dataset will continue to evolve as new terms come into the landscape.