100+ datasets found
  1. Data Dictionary

    • mcri.figshare.com
    txt
    Updated Sep 6, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer Piscionere (2018). Data Dictionary [Dataset]. http://doi.org/10.25374/MCRI.7039280.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 6, 2018
    Dataset provided by
    Murdoch Children's Research Institutehttp://www.mcri.edu.au/
    Authors
    Jennifer Piscionere
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is a data dictionary example we will use in the MVP presentation. It can be deleted after 13/9/18.

  2. t

    Data from: Data Dictionary Template

    • data.tempe.gov
    • data-academy.tempe.gov
    • +8more
    Updated Jun 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2020). Data Dictionary Template [Dataset]. https://data.tempe.gov/documents/f97e93ac8d324c71a35caf5a295c4c1e
    Explore at:
    Dataset updated
    Jun 5, 2020
    Dataset authored and provided by
    City of Tempe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Dictionary template for Tempe Open Data.

  3. d

    Open Data Dictionary Template Individual

    • catalog.data.gov
    • hub.arcgis.com
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of the Chief Tecnology Officer (2025). Open Data Dictionary Template Individual [Dataset]. https://catalog.data.gov/dataset/open-data-dictionary-template-individual
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Office of the Chief Tecnology Officer
    Description

    This template covers section 2.5 Resource Fields: Entity and Attribute Information of the Data Discovery Form cited in the Open Data DC Handbook (2022). It completes documentation elements that are required for publication. Each field column (attribute) in the dataset needs a description clarifying the contents of the column. Data originators are encouraged to enter the code values (domains) of the column to help end-users translate the contents of the column where needed, especially when lookup tables do not exist.

  4. Superstore

    • kaggle.com
    zip
    Updated Oct 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrahim Elsayed (2022). Superstore [Dataset]. https://www.kaggle.com/datasets/ibrahimelsayed182/superstore
    Explore at:
    zip(167457 bytes)Available download formats
    Dataset updated
    Oct 3, 2022
    Authors
    Ibrahim Elsayed
    Description

    Context

    super Store in USA , the data contain about 10000 rows

    Data Dictionary

    AttributesDefinitionexample
    Ship ModeSecond Class
    SegmentSegment CategoryConsumer
    CountryUnited State
    CityLos Angeles
    StateCalifornia
    Postal Code90032
    RegionWest
    CategoryCategories of productTechnology
    Sub-CategoryPhones
    Salesnumber of sales114.9
    Quantity3
    Discount0.45
    Profit14.1694

    Acknowledgements

    All thanks to The Sparks Foundation For making this data set

    Inspiration

    Get the data and try to take insights. Good luck ❤️

    Don't forget to Upvote😊🥰

  5. E

    Viking II Data Dictionary

    • find.data.gov.scot
    • dtechtive.com
    csv, docx, pdf, txt +1
    Updated Oct 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. Institute of Genetics and Cancer. MRC Human Genetics Unit (2021). Viking II Data Dictionary [Dataset]. http://doi.org/10.7488/ds/3145
    Explore at:
    csv(0.0071 MB), csv(0.0012 MB), csv(0.0063 MB), csv(0.004 MB), csv(0.0043 MB), csv(0.0068 MB), csv(0.0042 MB), csv(0.0051 MB), csv(0.0029 MB), csv(0.0038 MB), csv(0.0065 MB), csv(0.0015 MB), csv(0.0021 MB), csv(0.01 MB), txt(0.0166 MB), csv(0.0008 MB), pdf(1.215 MB), xlsx(0.0923 MB), csv(0.0098 MB), docx(0.015 MB), csv(0.007 MB)Available download formats
    Dataset updated
    Oct 8, 2021
    Dataset provided by
    University of Edinburgh. Institute of Genetics and Cancer. MRC Human Genetics Unit
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    UNITED KINGDOM
    Description

    VIKING II was made possible thanks to Medical Research Council (MRC) funding. We aim to better understand what might cause diseases such as heart disease, eye disease, stroke, diabetes and others by inviting 4,000 people with 2 or more grandparents from Orkney and Shetland to complete a questionnaire and provide a saliva sample. This data dictionary outlines what volunteers were asked and indicates the data you can access. To access the data, please e-mail viking@ed.ac.uk.

  6. w

    Energy Performance of Buildings Certificates: Data dictionary and glossary

    • gov.uk
    Updated Oct 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Levelling Up, Housing and Communities (2023). Energy Performance of Buildings Certificates: Data dictionary and glossary [Dataset]. https://www.gov.uk/government/statistics/energy-performance-of-buildings-certificates-data-dictionary-and-glossary
    Explore at:
    Dataset updated
    Oct 23, 2023
    Dataset provided by
    GOV.UK
    Authors
    Department for Levelling Up, Housing and Communities
    Description

    EPC statistics data dictionary:

    • definitions of data variables included in the EPC statistics release
    • limitations of data variables
    • suggested usage of data variables

    EPC statistics glossary:

    • a consolidated glossary of all the terms used in EPC statistics releases
  7. Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal...

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Craig S. Mayer; Nick Williams; Vojtech Huser (2023). Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal Antiretroviral Regimens for Prevention of Intrapartum HIV Transmission’). [Dataset]. http://doi.org/10.1371/journal.pone.0240047.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Craig S. Mayer; Nick Williams; Vojtech Huser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal Antiretroviral Regimens for Prevention of Intrapartum HIV Transmission’).

  8. d

    Data from: Delta Neighborhood Physical Activity Study

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Delta Neighborhood Physical Activity Study [Dataset]. https://catalog.data.gov/dataset/delta-neighborhood-physical-activity-study-f82d7
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    The Delta Neighborhood Physical Activity Study was an observational study designed to assess characteristics of neighborhood built environments associated with physical activity. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns and neighborhoods in which Delta Healthy Sprouts participants resided. The 12 towns were located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys between August 2016 and September 2017 using the Rural Active Living Assessment (RALA) tools and the Community Park Audit Tool (CPAT). Scale scores for the RALA Programs and Policies Assessment and the Town-Wide Assessment were computed using the scoring algorithms provided for these tools via SAS software programming. The Street Segment Assessment and CPAT do not have associated scoring algorithms and therefore no scores are provided for them. Because the towns were not randomly selected and the sample size is small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one contains data collected with the RALA Programs and Policies Assessment (PPA) tool. Dataset two contains data collected with the RALA Town-Wide Assessment (TWA) tool. Dataset three contains data collected with the RALA Street Segment Assessment (SSA) tool. Dataset four contains data collected with the Community Park Audit Tool (CPAT). [Note : title changed 9/4/2020 to reflect study name] Resources in this dataset:Resource Title: Dataset One RALA PPA Data Dictionary. File Name: RALA PPA Data Dictionary.csvResource Description: Data dictionary for dataset one collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA Data Dictionary. File Name: RALA TWA Data Dictionary.csvResource Description: Data dictionary for dataset two collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA Data Dictionary. File Name: RALA SSA Data Dictionary.csvResource Description: Data dictionary for dataset three collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT Data Dictionary. File Name: CPAT Data Dictionary.csvResource Description: Data dictionary for dataset four collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One RALA PPA. File Name: RALA PPA Data.csvResource Description: Data collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA. File Name: RALA TWA Data.csvResource Description: Data collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA. File Name: RALA SSA Data.csvResource Description: Data collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT. File Name: CPAT Data.csvResource Description: Data collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Data Dictionary. File Name: DataDictionary_RALA_PPA_SSA_TWA_CPAT.csvResource Description: This is a combined data dictionary from each of the 4 dataset files in this set.

  9. a

    Commercial Fishing Regulations Data Dictionary

    • hub.arcgis.com
    Updated Jan 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry for Primary Industries (2020). Commercial Fishing Regulations Data Dictionary [Dataset]. https://hub.arcgis.com/documents/MPI::commercial-fishing-regulations-data-dictionary-1/about
    Explore at:
    Dataset updated
    Jan 10, 2020
    Dataset authored and provided by
    Ministry for Primary Industries
    Description

    This data dictionary describes the field names, expected data, examples of data and field types (schema) of the commercial fishing regulations data set.

  10. U

    Data Dictionary for Electron Microprobe Data Collected with Probe for EPMA...

    • data.usgs.gov
    • s.cnmilf.com
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heather Lowers, Data Dictionary for Electron Microprobe Data Collected with Probe for EPMA Software Package Developed by Probe Software [Dataset]. http://doi.org/10.5066/P91HKRPM
    Explore at:
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Heather Lowers
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jan 1, 1995 - Dec 31, 2050
    Description

    This data dictionary describes most of the possible output options given in the Probe for EPMA software package developed by Probe Software. Examples of the data output options include sample identification, analytical conditions, elemental weight percents, atomic percents, detection limits, and stage coordinates. Many more options are available and the data that is output will depend upon the end use.

  11. APAC Data Suite | 4M+ Translations | 1.6M+ Words | Natural Language...

    • datarade.ai
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxford Languages (2025). APAC Data Suite | 4M+ Translations | 1.6M+ Words | Natural Language Processing Data | Dictionary Display | Translations | APAC Coverage [Dataset]. https://datarade.ai/data-products/apac-data-suite-4m-translations-1-6m-words-natural-la-oxford-languages
    Explore at:
    .json, .xml, .csv, .txt, .mp3, .wavAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Oxford Languageshttps://lexico.com/es
    Area covered
    Marshall Islands, Papua New Guinea, China, Thailand, Taiwan, Philippines, Australia, Kiribati, Vietnam, Fiji
    Description

    APAC Data Suite offers high-quality language datasets. Ideal for NLP, AI, LLMs, translation, and education, it combines linguistic depth and regional authenticity to power scalable, multilingual language technologies.

    Discover our expertly curated language datasets in the APAC Data Suite. Compiled and annotated by language and linguistic experts, this suite offers high-quality resources tailored to your needs. This suite includes:

    • Monolingual and Bilingual Dictionary Data
      Featuring headwords, definitions, word senses, part-of-speech (POS) tags, and semantic metadata.

    • Semi-bilingual Dictionary Data Each entry features a headword with definitions and/or usage examples in Language 1, followed by a translation of the headword and/or definition in Language 2, enabling efficient cross-lingual mapping.

    • Sentence Corpora
      Curated examples of real-world usage with contextual annotations for training and evaluation.

    • Synonyms & Antonyms
      Lexical relations to support semantic search, paraphrasing, and language understanding.

    • Audio Data
      Native speaker recordings for speech recognition, TTS, and pronunciation modeling.

    • Word Lists
      Frequency-ranked and thematically grouped lists for vocabulary training and NLP tasks. The word list data can cover one language or two, such as Tamil words with English translations.

    Each language may contain one or more types of language data. Depending on the dataset, we can provide these in formats such as XML, JSON, TXT, XLSX, CSV, WAV, MP3, and more. Delivery is currently available via email (link-based sharing) or REST API.

    If you require more information about a specific dataset, please contact us Growth.OL@oup.com.

    Below are the different types of datasets available for each language, along with their key features and approximate metrics. If you have any questions or require additional assistance, please don't hesitate to contact us.

    1. Assamese Semi-bilingual Dictionary Data: 72,200 words | 83,700 senses | 83,800 translations.

    2. Bengali Bilingual Dictionary Data: 161,400 translations | 71,600 senses.

    3. Bengali Semi-bilingual Dictionary Data: 28,300 words | 37,700 senses | 62,300 translations.

    4. British English Monolingual Dictionary Data: 146,000 words | 230,000 senses | 149,000 example sentences.

    5. British English Synonyms and Antonyms Data: 600,000 synonyms | 22,000 antonyms.

    6. British English Pronunciations with Audio: 250,000 transcriptions (IPA) | 180,000 audio files.

    7. French Monolingual Dictionary Data: 42,000 words | 56,000 senses | 43,000 example sentences.

    8. French Bilingual Dictionary Data: 380,000 translations | 199,000 senses | 146,000 example translations.

    9. Gujarati Monolingual Dictionary Data: 91,800 words | 131,500 senses.

    10. Gujarati Bilingual Dictionary Data: 171,800 translations | 158,200 senses.

    11. Hindi Monolingual Dictionary Data: 46,200 words | 112,700 senses.

    12. Hindi Bilingual Dictionary Data: 263,400 translations | 208,100 senses | 18,600 example translations.

    13. Hindi Synonyms and Antonyms Dictionary Data: 478,100 synonyms | 18,800 antonyms.

    14. Hindi Sentence Data: 216,000 sentences.

    15. Hindi Audio data: 68,000 audio files.

    16. Indonesian Bilingual Dictionary Data: 36,000 translations | 23,700 senses | 12,700 example translations.

    17. Indonesian Monolingual Dictionary Data: 120,000 words | 140,000 senses | 30,000 example sentences.

      1. Korean Monolingual Dictionary Data: 596,100 words | 386,600 senses | 91,700 example sentences.
    18. Korean Bilingual Dictionary Data: 952,500 translations | 449,700 senses | 227,800 example translations.

    19. Mandarin Chinese (simplified) Monolingual Dictionary Data: 81,300 words | 162,400 senses | 80,700 example sentences.

    20. Mandarin Chinese (traditional) Monolingual Dictionary Data: 60,100 words | 144,700 senses | 29,900 example sentences.

    21. Mandarin Chinese (simplified) Bilingual Dictionary Data: 367,600 translations | 204,500 senses | 150,900 example translations.

    22. Mandarin Chinese (traditional) Bilingual Dictionary Data: 215,600 translations | 202,800 senses | 149,700 example translations.

    23. Mandarin Chinese (simplified) Synonyms and Antonyms Data: 3,800 synonyms | 3,180 antonyms.

    24. Malay Bilingual Dictionary Data: 106,100 translations | 53,500 senses.

    25. Malay Monolingual Dictionary Data: 39,800 words | 40,600 senses | 21,100 example sentences.

    26. Malayalam Monolingual Dictionary Data: 91,300 words | 159,200 senses.

    27. Malayalam Bilingual Word List Data: 76,200 translation pairs.

    28. Marathi Bilingual Dictionary Data: 45,400 translations | 32,800 senses | 3,600 example translations.

    29. Nepali Bilingual Dictionary Data: 350,000 translations | 264,200 senses | 1,300 example translations.

    30. New Zealand English Monolingual Dictionary Data: 100,000 words

    31. Odia Semi-bilingual Dictionary Data: 30,700 words | 69,300 senses | 69,200 translations.

    32. Punjabi ...

  12. Database Creation Description and Data Dictionaries

    • figshare.com
    txt
    Updated Aug 11, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan Kempker; John David Ike (2016). Database Creation Description and Data Dictionaries [Dataset]. http://doi.org/10.6084/m9.figshare.3569067.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 11, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jordan Kempker; John David Ike
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    There are several Microsoft Word documents here detailing data creation methods and with various dictionaries describing the included and derived variables.The Database Creation Description is meant to walk a user through some of the steps detailed in the SAS code with this project.The alphabetical list of variables is intended for users as sometimes this makes some coding steps easier to copy and paste from this list instead of retyping.The NIS Data Dictionary contains some general dataset description as well as each variable's responses.

  13. n

    Data from: Generalizable EHR-R-REDCap pipeline for a national...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jan 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller (2022). Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry [Dataset]. http://doi.org/10.5061/dryad.rjdfn2zcm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2022
    Dataset provided by
    Massachusetts General Hospital
    Harvard Medical School
    Authors
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.

    Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.

    Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.

    Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.

    Methods eLAB Development and Source Code (R statistical software):

    eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).

    eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.

    Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.

    The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).

    Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.

    Data Dictionary (DD)

    EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.

    Study Cohort

    This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.

    Statistical Analysis

    OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.

  14. E

    Traveller Genes Data Dictionary

    • dtechtive.com
    • find.data.gov.scot
    csv, docx, pdf, txt +1
    Updated Oct 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. Usher Institute (2021). Traveller Genes Data Dictionary [Dataset]. http://doi.org/10.7488/ds/3155
    Explore at:
    csv(0.001 MB), txt(0.0166 MB), docx(0.0127 MB), xlsx(0.0469 MB), csv(0.0026 MB), csv(0.0008 MB), csv(0.0025 MB), csv(0.0039 MB), csv(0.0101 MB), csv(0.0011 MB), pdf(0.4028 MB), csv(0.0022 MB), csv(0.0061 MB), csv(0.0009 MB)Available download formats
    Dataset updated
    Oct 25, 2021
    Dataset provided by
    University of Edinburgh. Usher Institute
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Traveller Genes is a research study supported by the Traveller community. We're looking at the genetics, origins and health of over 200 volunteers who have at least two grandparents who are or were Travellers. This includes Scottish Travellers, Irish Travellers, Romanichal or Romany, or Welsh Kale. We aim to identify the genetic origins and relationships of the Scottish Traveller community e.g. Highland Travellers, Lowland Travellers, Borders Romanichal Travellers. We also want to understand how Scottish Travellers are related to other communities and their overall patterns of health. Participants are asked to complete a questionnaire and provide a saliva sample. This data dictionary outlines what volunteers were asked and indicates the data you can access. To access the data, please e-mail travellergenes@ed.ac.uk.

  15. Data Dictionary/README files

    • figshare.com
    xlsx
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Camille Jaime; Taejoon Won (2025). Data Dictionary/README files [Dataset]. http://doi.org/10.6084/m9.figshare.28527137.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Camille Jaime; Taejoon Won
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Dictionary contains all info for each sample/mouse of each experiment.All README files are also included with brief experimental description.

  16. S

    data dictionary

    • health.data.ny.gov
    csv, xlsx, xml
    Updated Aug 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Center for Environmental Health (2022). data dictionary [Dataset]. https://health.data.ny.gov/Health/data-dictionary/3tsn-2bah
    Explore at:
    xlsx, csv, xmlAvailable download formats
    Dataset updated
    Aug 23, 2022
    Authors
    Center for Environmental Health
    Description

    This data includes the location of cooling towers registered with New York State. The data is self-reported by owners/property managers of cooling towers in service in New York State. In August 2015 the New York State Department of Health released emergency regulations requiring the owners of cooling towers to register them with New York State. In addition the regulation includes requirements: regular inspection; annual certification; obtaining and implementing a maintenance plan; record keeping; reporting of certain information; and sample collection and culture testing. All cooling towers in New York State, including New York City, need to be registered in the NYS system. Registration is done through an electronic database found at: www.ny.gov/services/register-cooling-tower-and-submit-reports. For more information, check http://www.health.ny.gov/diseases/communicable/legionellosis/, or go to the “About” tab.

  17. NZ Addresses Data Dictionary

    • data.linz.govt.nz
    Updated Jan 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Land Information New Zealand (2023). NZ Addresses Data Dictionary [Dataset]. https://data.linz.govt.nz/document/24489-nz-addresses-data-dictionary/
    Explore at:
    Dataset updated
    Jan 26, 2023
    Dataset authored and provided by
    Land Information New Zealandhttps://www.linz.govt.nz/
    License

    https://data.linz.govt.nz/license/attribution-4-0-international/https://data.linz.govt.nz/license/attribution-4-0-international/

    Area covered
    New Zealand
    Description

    This document provides detailed metadata (data dictionary) and model diagrams for NZ Addresses and full AIMS Street Address datasets published on the LINZ Data Service. These datasets are derived from LINZ’s Address Information Management System (AIMS) and Comprehensive Address Data Store (CADS).

  18. d

    Ecological Concerns Data Dictionary - Ecological Concerns data dictionary

    • catalog.data.gov
    • fisheries.noaa.gov
    Updated May 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact, Custodian) (2025). Ecological Concerns Data Dictionary - Ecological Concerns data dictionary [Dataset]. https://catalog.data.gov/dataset/ecological-concerns-data-dictionary-ecological-concerns-data-dictionary2
    Explore at:
    Dataset updated
    May 24, 2025
    Dataset provided by
    (Point of Contact, Custodian)
    Description

    Evaluating the status of threatened and endangered salmonid populations requires information on the current status of the threats (e.g., habitat, hatcheries, hydropower, and invasives) and the risk of extinction (e.g., status and trend in the Viable Salmonid Population criteria). For salmonids in the Pacific Northwest, threats generally result in changes to physical and biological characteristics of freshwater habitat. These changes are often described by terms like "limiting factors" or "habitat impairment." For example, the condition of freshwater habitat directly impacts salmonid abundance and population spatial structure by affecting carrying capacity and the variability and accessibility of rearing and spawning areas. Thus, one way to assess or quantify threats to ESUs and populations is to evaluate whether the ecological conditions on which fish depend is improving, becoming more degraded, or remains unchanged. In the attached spreadsheets, we have attempted to consistently record limiting factors and threats across all populations and ESUs to enable comparison to other datasets (e.g., restoration projects) in a consistent way. Limiting factors and threats (LF/T) identified in salmon recovery plans were translated in a common language using an ecological concerns data dictionary (see "Ecological Concerns" tab in the attached spreadsheets) (a data dictionaries defines the wording, meaning and scope of categories). The ecological concerns data dictionary defines how different elements are related, such as the relationships between threats, ecological concerns and life history stages. The data dictionary includes categories for ecological dynamics and population level effects such as "reduced genetic fitness" and "behavioral changes." The data dictionary categories are meant to encompass the ecological conditions that directly impact salmonids and can be addressed directly or indirectly by management (habitat restoration, hatchery reform, etc.) actions. Using the ecological concerns data dictionary enables us to more fully capture the range of effects of hydro, hatchery, and invasive threats as well as habitat threat categories. The organization and format of the data dictionary was also chosen so the information we record can be easily related to datasets we already posses (e.g., restoration data). Data Dictionary.

  19. E

    New Oxford Dictionary of English, 2nd Edition

    • live.european-language-grid.eu
    • catalog.elra.info
    Updated Dec 6, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2005). New Oxford Dictionary of English, 2nd Edition [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/2276
    Explore at:
    Dataset updated
    Dec 6, 2005
    License

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in XML or SGML. - Source dictionary data. The NODE data set includes all the information present in the New Oxford Dictionary of English itself, such as definition text, example sentences, grammatical indicators, and encyclopaedic material. - Morphological data. Each NODE lemma (both headwords and subentries) has a full listing of all possible syntactic forms (e.g. plurals for nouns, inflections for verbs, comparatives and superlatives for adjectives), tagged to show their syntactic relationships. Each form has an IPA pronunciation. Full morphological data is also given for spelling variants (e.g. typical American variants), and a system of links enables straightforward correlation of variant forms to standard forms. The data set thus provides robust support for all look-up routines, and is equally viable for applications dealing with American and British English. - Phrases and idioms. The NODE data set provides a rich and flexible codification of over 10,000 phrasal verbs and other multi-word phrases. It features comprehensive lexical resources enabling applications to identify a phrase not only in the form listed in the dictionary but also in a range of real-world variations, including alternative wording, variable syntactic patterns, inflected verbs, optional determiners, etc. - Subject classification. Using a categorization scheme of 200 key domains, over 80,000 words and senses have been associated with particular subject areas, from aeronautics to zoology. As well as facilitating the extraction of subject-specific sub-lexicons, this also provides an extensive resource for document categorization and information retrieval. - Semantic relationships. The relationships between every noun and noun sense in the dictionary are being codified using an extensive semantic taxonomy on the model of the Princeton WordNet project. (Mapping to WordNet 1.7 is supported.) This structure allows elements of the basic lexical database to function as a formal knowledge database, enabling functionality such as sense disambiguation and logical inference. - Derived from the detailed and authoritative corpus-based research of Oxford University Press's lexicographic team, the NODE data set is a powerful asset for any task dealing with real-world contemporary English usage. By integrating a number of different data types into a single structure, it creates a coherent resource which can be queried along numerous axes, allowing open-ended exploitation by many kinds of language-related applications.

  20. Portuguese Language Datasets | 300K Translations | Natural Language...

    • datarade.ai
    .json, .xml
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxford Languages (2025). Portuguese Language Datasets | 300K Translations | Natural Language Processing (NLP) Data | Dictionary Display | Translation | EU & LATAM Coverage [Dataset]. https://datarade.ai/data-products/portuguese-language-datasets-140k-words-300k-translations-oxford-languages
    Explore at:
    .json, .xmlAvailable download formats
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Oxford Languageshttps://lexico.com/es
    Area covered
    Timor-Leste, Portugal, Angola, Sao Tome and Principe, Guinea-Bissau, Macao, Cabo Verde, Brazil, Mozambique
    Description

    Comprehensive Portuguese language datasets with linguistic annotations, including headwords, definitions, word senses, usage examples, part-of-speech (POS) tags, semantic metadata, and contextual usage details. Perfect for powering dictionary platforms, NLP, AI models, and translation systems.

    Our Portuguese language datasets are carefully compiled and annotated by language and linguistic experts. The below datasets in Portuguese are available for license:

    1. Portuguese Monolingual Dictionary Data
    2. Portuguese Bilingual Dictionary Data

    Key Features (approximate numbers):

    1. Portuguese Monolingual Dictionary Data

    Our Portuguese monolingual covers both EU and LATAM varieties, featuring clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Portuguese language.

    • Words:143,600
    • Senses: 285,500
    • Example sentences: 69,300
    • Format: XML format
    • Delivery: Email (link-based file sharing)
    1. Portuguese Bilingual Dictionary Data

    The bilingual data provides translations in both directions, from English to Portuguese and from Portuguese to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality that span both EU and LATAM Portuguese varieties.

    • Translations: 300,000
    • Senses: 158,000
    • Example translations: 117,800
    • Format: XML and JSON format
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: annually

    Use Cases:

    We consistently work with our clients on new use cases as language technology continues to evolve. These include Natural Language Processing (NLP) applications, TTS, dictionary display tools, games, translations, word embedding, and word sense disambiguation (WSD).

    If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation.

    Pricing:

    Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.

    Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals.

    About the sample:

    The samples offer a brief overview of one or two language datasets (monolingual or/and bilingual dictionary data). To help you explore the structure and features of our dataset, we provide a sample in CSV format for preview purposes only.

    If you need the complete original sample or more details about any dataset, please contact us (Growth.OL@oup.com) to request access or further information

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jennifer Piscionere (2018). Data Dictionary [Dataset]. http://doi.org/10.25374/MCRI.7039280.v1
Organization logo

Data Dictionary

Explore at:
txtAvailable download formats
Dataset updated
Sep 6, 2018
Dataset provided by
Murdoch Children's Research Institutehttp://www.mcri.edu.au/
Authors
Jennifer Piscionere
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This is a data dictionary example we will use in the MVP presentation. It can be deleted after 13/9/18.

Search
Clear search
Close search
Google apps
Main menu