100+ datasets found
  1. Data Dictionary

    • mcri.figshare.com
    txt
    Updated Sep 6, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer Piscionere (2018). Data Dictionary [Dataset]. http://doi.org/10.25374/MCRI.7039280.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 6, 2018
    Dataset provided by
    Murdoch Children's Research Institutehttp://www.mcri.edu.au/
    Authors
    Jennifer Piscionere
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is a data dictionary example we will use in the MVP presentation. It can be deleted after 13/9/18.

  2. t

    Data from: Data Dictionary Template

    • data.tempe.gov
    • data-academy.tempe.gov
    • +8more
    Updated Jun 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2020). Data Dictionary Template [Dataset]. https://data.tempe.gov/documents/f97e93ac8d324c71a35caf5a295c4c1e
    Explore at:
    Dataset updated
    Jun 5, 2020
    Dataset authored and provided by
    City of Tempe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Dictionary template for Tempe Open Data.

  3. d

    Open Data Dictionary Template Individual

    • catalog.data.gov
    • hub.arcgis.com
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of the Chief Tecnology Officer (2025). Open Data Dictionary Template Individual [Dataset]. https://catalog.data.gov/dataset/open-data-dictionary-template-individual
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Office of the Chief Tecnology Officer
    Description

    This template covers section 2.5 Resource Fields: Entity and Attribute Information of the Data Discovery Form cited in the Open Data DC Handbook (2022). It completes documentation elements that are required for publication. Each field column (attribute) in the dataset needs a description clarifying the contents of the column. Data originators are encouraged to enter the code values (domains) of the column to help end-users translate the contents of the column where needed, especially when lookup tables do not exist.

  4. E

    Viking II Data Dictionary

    • dtechtive.com
    • find.data.gov.scot
    csv, docx, pdf, txt +1
    Updated Oct 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. Institute of Genetics and Cancer. MRC Human Genetics Unit (2021). Viking II Data Dictionary [Dataset]. http://doi.org/10.7488/ds/3145
    Explore at:
    csv(0.0038 MB), csv(0.0065 MB), csv(0.0012 MB), docx(0.015 MB), csv(0.0098 MB), csv(0.0063 MB), csv(0.007 MB), csv(0.004 MB), csv(0.0042 MB), csv(0.0029 MB), csv(0.0068 MB), csv(0.01 MB), xlsx(0.0923 MB), csv(0.0008 MB), csv(0.0015 MB), pdf(1.215 MB), csv(0.0043 MB), csv(0.0021 MB), csv(0.0071 MB), csv(0.0051 MB), txt(0.0166 MB)Available download formats
    Dataset updated
    Oct 8, 2021
    Dataset provided by
    University of Edinburgh. Institute of Genetics and Cancer. MRC Human Genetics Unit
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    UNITED KINGDOM
    Description

    VIKING II was made possible thanks to Medical Research Council (MRC) funding. We aim to better understand what might cause diseases such as heart disease, eye disease, stroke, diabetes and others by inviting 4,000 people with 2 or more grandparents from Orkney and Shetland to complete a questionnaire and provide a saliva sample. This data dictionary outlines what volunteers were asked and indicates the data you can access. To access the data, please e-mail viking@ed.ac.uk.

  5. Superstore

    • kaggle.com
    zip
    Updated Oct 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrahim Elsayed (2022). Superstore [Dataset]. https://www.kaggle.com/datasets/ibrahimelsayed182/superstore
    Explore at:
    zip(167457 bytes)Available download formats
    Dataset updated
    Oct 3, 2022
    Authors
    Ibrahim Elsayed
    Description

    Context

    super Store in USA , the data contain about 10000 rows

    Data Dictionary

    AttributesDefinitionexample
    Ship ModeSecond Class
    SegmentSegment CategoryConsumer
    CountryUnited State
    CityLos Angeles
    StateCalifornia
    Postal Code90032
    RegionWest
    CategoryCategories of productTechnology
    Sub-CategoryPhones
    Salesnumber of sales114.9
    Quantity3
    Discount0.45
    Profit14.1694

    Acknowledgements

    All thanks to The Sparks Foundation For making this data set

    Inspiration

    Get the data and try to take insights. Good luck ❤️

    Don't forget to Upvote😊🥰

  6. Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal...

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Craig S. Mayer; Nick Williams; Vojtech Huser (2023). Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal Antiretroviral Regimens for Prevention of Intrapartum HIV Transmission’). [Dataset]. http://doi.org/10.1371/journal.pone.0240047.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Craig S. Mayer; Nick Williams; Vojtech Huser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal Antiretroviral Regimens for Prevention of Intrapartum HIV Transmission’).

  7. a

    Commercial Fishing Regulations Data Dictionary

    • hub.arcgis.com
    Updated Jan 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry for Primary Industries (2020). Commercial Fishing Regulations Data Dictionary [Dataset]. https://hub.arcgis.com/documents/MPI::commercial-fishing-regulations-data-dictionary-1/about
    Explore at:
    Dataset updated
    Jan 10, 2020
    Dataset authored and provided by
    Ministry for Primary Industries
    Description

    This data dictionary describes the field names, expected data, examples of data and field types (schema) of the commercial fishing regulations data set.

  8. U

    Data Dictionary for Electron Microprobe Data Collected with Probe for EPMA...

    • data.usgs.gov
    • s.cnmilf.com
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heather Lowers, Data Dictionary for Electron Microprobe Data Collected with Probe for EPMA Software Package Developed by Probe Software [Dataset]. http://doi.org/10.5066/P91HKRPM
    Explore at:
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Heather Lowers
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jan 1, 1995 - Dec 31, 2050
    Description

    This data dictionary describes most of the possible output options given in the Probe for EPMA software package developed by Probe Software. Examples of the data output options include sample identification, analytical conditions, elemental weight percents, atomic percents, detection limits, and stage coordinates. Many more options are available and the data that is output will depend upon the end use.

  9. w

    Energy Performance of Buildings Certificates: Data dictionary and glossary

    • gov.uk
    Updated Oct 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Levelling Up, Housing and Communities (2023). Energy Performance of Buildings Certificates: Data dictionary and glossary [Dataset]. https://www.gov.uk/government/statistics/energy-performance-of-buildings-certificates-data-dictionary-and-glossary
    Explore at:
    Dataset updated
    Oct 23, 2023
    Dataset provided by
    GOV.UK
    Authors
    Department for Levelling Up, Housing and Communities
    Description

    EPC statistics data dictionary:

    • definitions of data variables included in the EPC statistics release
    • limitations of data variables
    • suggested usage of data variables

    EPC statistics glossary:

    • a consolidated glossary of all the terms used in EPC statistics releases
  10. APAC Data Suite | 4M+ Translations | 1.6M+ Words | Natural Language...

    • datarade.ai
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxford Languages (2025). APAC Data Suite | 4M+ Translations | 1.6M+ Words | Natural Language Processing Data | Dictionary Display | Translations | APAC Coverage [Dataset]. https://datarade.ai/data-products/apac-data-suite-4m-translations-1-6m-words-natural-la-oxford-languages
    Explore at:
    .json, .xml, .csv, .txt, .mp3, .wavAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Oxford Languageshttps://lexico.com/es
    Area covered
    Marshall Islands, Fiji, Papua New Guinea, China, Australia, Thailand, Taiwan, Philippines, Vietnam, Kiribati
    Description

    APAC Data Suite offers high-quality language datasets. Ideal for NLP, AI, LLMs, translation, and education, it combines linguistic depth and regional authenticity to power scalable, multilingual language technologies.

    Discover our expertly curated language datasets in the APAC Data Suite. Compiled and annotated by language and linguistic experts, this suite offers high-quality resources tailored to your needs. This suite includes:

    • Monolingual and Bilingual Dictionary Data
      Featuring headwords, definitions, word senses, part-of-speech (POS) tags, and semantic metadata.

    • Semi-bilingual Dictionary Data Each entry features a headword with definitions and/or usage examples in Language 1, followed by a translation of the headword and/or definition in Language 2, enabling efficient cross-lingual mapping.

    • Sentence Corpora
      Curated examples of real-world usage with contextual annotations for training and evaluation.

    • Synonyms & Antonyms
      Lexical relations to support semantic search, paraphrasing, and language understanding.

    • Audio Data
      Native speaker recordings for speech recognition, TTS, and pronunciation modeling.

    • Word Lists
      Frequency-ranked and thematically grouped lists for vocabulary training and NLP tasks. The word list data can cover one language or two, such as Tamil words with English translations.

    Each language may contain one or more types of language data. Depending on the dataset, we can provide these in formats such as XML, JSON, TXT, XLSX, CSV, WAV, MP3, and more. Delivery is currently available via email (link-based sharing) or REST API.

    If you require more information about a specific dataset, please contact us Growth.OL@oup.com.

    Below are the different types of datasets available for each language, along with their key features and approximate metrics. If you have any questions or require additional assistance, please don't hesitate to contact us.

    1. Assamese Semi-bilingual Dictionary Data: 72,200 words | 83,700 senses | 83,800 translations.

    2. Bengali Bilingual Dictionary Data: 161,400 translations | 71,600 senses.

    3. Bengali Semi-bilingual Dictionary Data: 28,300 words | 37,700 senses | 62,300 translations.

    4. British English Monolingual Dictionary Data: 146,000 words | 230,000 senses | 149,000 example sentences.

    5. British English Synonyms and Antonyms Data: 600,000 synonyms | 22,000 antonyms.

    6. British English Pronunciations with Audio: 250,000 transcriptions (IPA) | 180,000 audio files.

    7. French Monolingual Dictionary Data: 42,000 words | 56,000 senses | 43,000 example sentences.

    8. French Bilingual Dictionary Data: 380,000 translations | 199,000 senses | 146,000 example translations.

    9. Gujarati Monolingual Dictionary Data: 91,800 words | 131,500 senses.

    10. Gujarati Bilingual Dictionary Data: 171,800 translations | 158,200 senses.

    11. Hindi Monolingual Dictionary Data: 46,200 words | 112,700 senses.

    12. Hindi Bilingual Dictionary Data: 263,400 translations | 208,100 senses | 18,600 example translations.

    13. Hindi Synonyms and Antonyms Dictionary Data: 478,100 synonyms | 18,800 antonyms.

    14. Hindi Sentence Data: 216,000 sentences.

    15. Hindi Audio data: 68,000 audio files.

    16. Indonesian Bilingual Dictionary Data: 36,000 translations | 23,700 senses | 12,700 example translations.

    17. Indonesian Monolingual Dictionary Data: 120,000 words | 140,000 senses | 30,000 example sentences.

      1. Korean Monolingual Dictionary Data: 596,100 words | 386,600 senses | 91,700 example sentences.
    18. Korean Bilingual Dictionary Data: 952,500 translations | 449,700 senses | 227,800 example translations.

    19. Mandarin Chinese (simplified) Monolingual Dictionary Data: 81,300 words | 162,400 senses | 80,700 example sentences.

    20. Mandarin Chinese (traditional) Monolingual Dictionary Data: 60,100 words | 144,700 senses | 29,900 example sentences.

    21. Mandarin Chinese (simplified) Bilingual Dictionary Data: 367,600 translations | 204,500 senses | 150,900 example translations.

    22. Mandarin Chinese (traditional) Bilingual Dictionary Data: 215,600 translations | 202,800 senses | 149,700 example translations.

    23. Mandarin Chinese (simplified) Synonyms and Antonyms Data: 3,800 synonyms | 3,180 antonyms.

    24. Malay Bilingual Dictionary Data: 106,100 translations | 53,500 senses.

    25. Malay Monolingual Dictionary Data: 39,800 words | 40,600 senses | 21,100 example sentences.

    26. Malayalam Monolingual Dictionary Data: 91,300 words | 159,200 senses.

    27. Malayalam Bilingual Word List Data: 76,200 translation pairs.

    28. Marathi Bilingual Dictionary Data: 45,400 translations | 32,800 senses | 3,600 example translations.

    29. Nepali Bilingual Dictionary Data: 350,000 translations | 264,200 senses | 1,300 example translations.

    30. New Zealand English Monolingual Dictionary Data: 100,000 words

    31. Odia Semi-bilingual Dictionary Data: 30,700 words | 69,300 senses | 69,200 translations.

    32. Punjabi ...

  11. d

    Data from: Delta Neighborhood Physical Activity Study

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Delta Neighborhood Physical Activity Study [Dataset]. https://catalog.data.gov/dataset/delta-neighborhood-physical-activity-study-f82d7
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    The Delta Neighborhood Physical Activity Study was an observational study designed to assess characteristics of neighborhood built environments associated with physical activity. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns and neighborhoods in which Delta Healthy Sprouts participants resided. The 12 towns were located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys between August 2016 and September 2017 using the Rural Active Living Assessment (RALA) tools and the Community Park Audit Tool (CPAT). Scale scores for the RALA Programs and Policies Assessment and the Town-Wide Assessment were computed using the scoring algorithms provided for these tools via SAS software programming. The Street Segment Assessment and CPAT do not have associated scoring algorithms and therefore no scores are provided for them. Because the towns were not randomly selected and the sample size is small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one contains data collected with the RALA Programs and Policies Assessment (PPA) tool. Dataset two contains data collected with the RALA Town-Wide Assessment (TWA) tool. Dataset three contains data collected with the RALA Street Segment Assessment (SSA) tool. Dataset four contains data collected with the Community Park Audit Tool (CPAT). [Note : title changed 9/4/2020 to reflect study name] Resources in this dataset:Resource Title: Dataset One RALA PPA Data Dictionary. File Name: RALA PPA Data Dictionary.csvResource Description: Data dictionary for dataset one collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA Data Dictionary. File Name: RALA TWA Data Dictionary.csvResource Description: Data dictionary for dataset two collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA Data Dictionary. File Name: RALA SSA Data Dictionary.csvResource Description: Data dictionary for dataset three collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT Data Dictionary. File Name: CPAT Data Dictionary.csvResource Description: Data dictionary for dataset four collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One RALA PPA. File Name: RALA PPA Data.csvResource Description: Data collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA. File Name: RALA TWA Data.csvResource Description: Data collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA. File Name: RALA SSA Data.csvResource Description: Data collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT. File Name: CPAT Data.csvResource Description: Data collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Data Dictionary. File Name: DataDictionary_RALA_PPA_SSA_TWA_CPAT.csvResource Description: This is a combined data dictionary from each of the 4 dataset files in this set.

  12. Database Creation Description and Data Dictionaries

    • figshare.com
    txt
    Updated Aug 11, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan Kempker; John David Ike (2016). Database Creation Description and Data Dictionaries [Dataset]. http://doi.org/10.6084/m9.figshare.3569067.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 11, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jordan Kempker; John David Ike
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    There are several Microsoft Word documents here detailing data creation methods and with various dictionaries describing the included and derived variables.The Database Creation Description is meant to walk a user through some of the steps detailed in the SAS code with this project.The alphabetical list of variables is intended for users as sometimes this makes some coding steps easier to copy and paste from this list instead of retyping.The NIS Data Dictionary contains some general dataset description as well as each variable's responses.

  13. n

    Data from: Generalizable EHR-R-REDCap pipeline for a national...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jan 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller (2022). Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry [Dataset]. http://doi.org/10.5061/dryad.rjdfn2zcm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2022
    Dataset provided by
    Harvard Medical School
    Massachusetts General Hospital
    Authors
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.

    Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.

    Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.

    Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.

    Methods eLAB Development and Source Code (R statistical software):

    eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).

    eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.

    Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.

    The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).

    Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.

    Data Dictionary (DD)

    EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.

    Study Cohort

    This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.

    Statistical Analysis

    OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.

  14. E

    Traveller Genes Data Dictionary

    • dtechtive.com
    • find.data.gov.scot
    csv, docx, pdf, txt +1
    Updated Oct 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. Usher Institute (2021). Traveller Genes Data Dictionary [Dataset]. http://doi.org/10.7488/ds/3155
    Explore at:
    csv(0.001 MB), txt(0.0166 MB), docx(0.0127 MB), xlsx(0.0469 MB), csv(0.0026 MB), csv(0.0008 MB), csv(0.0025 MB), csv(0.0039 MB), csv(0.0101 MB), csv(0.0011 MB), pdf(0.4028 MB), csv(0.0022 MB), csv(0.0061 MB), csv(0.0009 MB)Available download formats
    Dataset updated
    Oct 25, 2021
    Dataset provided by
    University of Edinburgh. Usher Institute
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Traveller Genes is a research study supported by the Traveller community. We're looking at the genetics, origins and health of over 200 volunteers who have at least two grandparents who are or were Travellers. This includes Scottish Travellers, Irish Travellers, Romanichal or Romany, or Welsh Kale. We aim to identify the genetic origins and relationships of the Scottish Traveller community e.g. Highland Travellers, Lowland Travellers, Borders Romanichal Travellers. We also want to understand how Scottish Travellers are related to other communities and their overall patterns of health. Participants are asked to complete a questionnaire and provide a saliva sample. This data dictionary outlines what volunteers were asked and indicates the data you can access. To access the data, please e-mail travellergenes@ed.ac.uk.

  15. S

    data dictionary

    • health.data.ny.gov
    csv, xlsx, xml
    Updated Aug 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Center for Environmental Health (2022). data dictionary [Dataset]. https://health.data.ny.gov/Health/data-dictionary/3tsn-2bah
    Explore at:
    xlsx, csv, xmlAvailable download formats
    Dataset updated
    Aug 23, 2022
    Authors
    Center for Environmental Health
    Description

    This data includes the location of cooling towers registered with New York State. The data is self-reported by owners/property managers of cooling towers in service in New York State. In August 2015 the New York State Department of Health released emergency regulations requiring the owners of cooling towers to register them with New York State. In addition the regulation includes requirements: regular inspection; annual certification; obtaining and implementing a maintenance plan; record keeping; reporting of certain information; and sample collection and culture testing. All cooling towers in New York State, including New York City, need to be registered in the NYS system. Registration is done through an electronic database found at: www.ny.gov/services/register-cooling-tower-and-submit-reports. For more information, check http://www.health.ny.gov/diseases/communicable/legionellosis/, or go to the “About” tab.

  16. Portuguese Language Datasets | 300K Translations | Natural Language...

    • datarade.ai
    .json, .xml
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxford Languages (2025). Portuguese Language Datasets | 300K Translations | Natural Language Processing (NLP) Data | Dictionary Display | Translation | EU & LATAM Coverage [Dataset]. https://datarade.ai/data-products/portuguese-language-datasets-140k-words-300k-translations-oxford-languages
    Explore at:
    .json, .xmlAvailable download formats
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Oxford Languageshttps://lexico.com/es
    Area covered
    Brazil, Timor-Leste, Mozambique, Angola, Portugal, Sao Tome and Principe, Cabo Verde, Macao, Guinea-Bissau
    Description

    Comprehensive Portuguese language datasets with linguistic annotations, including headwords, definitions, word senses, usage examples, part-of-speech (POS) tags, semantic metadata, and contextual usage details. Perfect for powering dictionary platforms, NLP, AI models, and translation systems.

    Our Portuguese language datasets are carefully compiled and annotated by language and linguistic experts. The below datasets in Portuguese are available for license:

    1. Portuguese Monolingual Dictionary Data
    2. Portuguese Bilingual Dictionary Data

    Key Features (approximate numbers):

    1. Portuguese Monolingual Dictionary Data

    Our Portuguese monolingual covers both EU and LATAM varieties, featuring clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Portuguese language.

    • Words:143,600
    • Senses: 285,500
    • Example sentences: 69,300
    • Format: XML format
    • Delivery: Email (link-based file sharing)
    1. Portuguese Bilingual Dictionary Data

    The bilingual data provides translations in both directions, from English to Portuguese and from Portuguese to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality that span both EU and LATAM Portuguese varieties.

    • Translations: 300,000
    • Senses: 158,000
    • Example translations: 117,800
    • Format: XML and JSON format
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: annually

    Use Cases:

    We consistently work with our clients on new use cases as language technology continues to evolve. These include Natural Language Processing (NLP) applications, TTS, dictionary display tools, games, translations, word embedding, and word sense disambiguation (WSD).

    If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation.

    Pricing:

    Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.

    Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals.

    About the sample:

    The samples offer a brief overview of one or two language datasets (monolingual or/and bilingual dictionary data). To help you explore the structure and features of our dataset, we provide a sample in CSV format for preview purposes only.

    If you need the complete original sample or more details about any dataset, please contact us (Growth.OL@oup.com) to request access or further information

  17. d

    Ecological Concerns Data Dictionary - Ecological Concerns data dictionary

    • catalog.data.gov
    • fisheries.noaa.gov
    Updated May 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact, Custodian) (2025). Ecological Concerns Data Dictionary - Ecological Concerns data dictionary [Dataset]. https://catalog.data.gov/dataset/ecological-concerns-data-dictionary-ecological-concerns-data-dictionary2
    Explore at:
    Dataset updated
    May 24, 2025
    Dataset provided by
    (Point of Contact, Custodian)
    Description

    Evaluating the status of threatened and endangered salmonid populations requires information on the current status of the threats (e.g., habitat, hatcheries, hydropower, and invasives) and the risk of extinction (e.g., status and trend in the Viable Salmonid Population criteria). For salmonids in the Pacific Northwest, threats generally result in changes to physical and biological characteristics of freshwater habitat. These changes are often described by terms like "limiting factors" or "habitat impairment." For example, the condition of freshwater habitat directly impacts salmonid abundance and population spatial structure by affecting carrying capacity and the variability and accessibility of rearing and spawning areas. Thus, one way to assess or quantify threats to ESUs and populations is to evaluate whether the ecological conditions on which fish depend is improving, becoming more degraded, or remains unchanged. In the attached spreadsheets, we have attempted to consistently record limiting factors and threats across all populations and ESUs to enable comparison to other datasets (e.g., restoration projects) in a consistent way. Limiting factors and threats (LF/T) identified in salmon recovery plans were translated in a common language using an ecological concerns data dictionary (see "Ecological Concerns" tab in the attached spreadsheets) (a data dictionaries defines the wording, meaning and scope of categories). The ecological concerns data dictionary defines how different elements are related, such as the relationships between threats, ecological concerns and life history stages. The data dictionary includes categories for ecological dynamics and population level effects such as "reduced genetic fitness" and "behavioral changes." The data dictionary categories are meant to encompass the ecological conditions that directly impact salmonids and can be addressed directly or indirectly by management (habitat restoration, hatchery reform, etc.) actions. Using the ecological concerns data dictionary enables us to more fully capture the range of effects of hydro, hatchery, and invasive threats as well as habitat threat categories. The organization and format of the data dictionary was also chosen so the information we record can be easily related to datasets we already posses (e.g., restoration data). Data Dictionary.

  18. Data Dictionary/README files

    • figshare.com
    xlsx
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Camille Jaime; Taejoon Won (2025). Data Dictionary/README files [Dataset]. http://doi.org/10.6084/m9.figshare.28527137.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Camille Jaime; Taejoon Won
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Dictionary contains all info for each sample/mouse of each experiment.All README files are also included with brief experimental description.

  19. u

    Data from: Pesticide Data Program (PDP)

    • agdatacommons.nal.usda.gov
    txt
    Updated Dec 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Agriculture (USDA), Agricultural Marketing Service (AMS) (2025). Pesticide Data Program (PDP) [Dataset]. http://doi.org/10.15482/USDA.ADC/1520764
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 2, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    U.S. Department of Agriculture (USDA), Agricultural Marketing Service (AMS)
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The Pesticide Data Program (PDP) is a national pesticide residue database program. Through cooperation with State agriculture departments and other Federal agencies, PDP manages the collection, analysis, data entry, and reporting of pesticide residues on agricultural commodities in the U.S. food supply, with an emphasis on those commodities highly consumed by infants and children.This dataset provides information on where each tested sample was collected, where the product originated from, what type of product it was, and what residues were found on the product, for calendar years 1992 through 2023. The data can measure residues of individual compounds and classes of compounds, as well as provide information about the geographic distribution of the origin of samples, from growers, packers and distributors. The dataset also includes information on where the samples were taken, what laboratory was used to test them, and all testing procedures (by sample, so can be linked to the compound that is identified). The dataset also contains a reference variable for each compound that denotes the limit of detection for a pesticide/commodity pair (LOD variable). The metadata also includes EPA tolerance levels or action levels for each pesticide/commodity pair. The dataset will be updated on a continual basis, with a new resource data file added annually after the PDP calendar-year survey data is released.Resources in this dataset:Resource Title: CSV Data Dictionary for PDP.File Name: PDP_DataDictionary.csv. Resource Description: Machine-readable Comma Separated Values (CSV) format data dictionary for PDP Database Zip files. Defines variables for the sample identity and analytical results data tables/files. The ## characters in the Table and Text Data File name refer to the 2-digit year for the PDP survey, like 97 for 1997 or 01 for 2001. For details on table linking, see PDF. Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excelResource Title: Data dictionary for Pesticide Data Program. File Name: PDP DataDictionary.pdf. Resource Description: Data dictionary for PDP Database Zip files. Resource Software Recommended: Adobe Acrobat, url: https://www.adobe.comResource Title: 2023 PDP Database Zip File. File Name: 2023PDPDatabase.zipResource Title: 2022 PDP Database Zip File. File Name: 2022PDPDatabase.zipResource Title: 2021 PDP Database Zip File. File Name: 2021PDPDatabase.zipResource Title: 2020 PDP Database Zip File. File Name: 2020PDPDatabase.zipResource Title: 2019 PDP Database Zip File. File Name: 2019PDPDatabase.zipResource Title: 2018 PDP Database Zip File. File Name: 2018PDPDatabase.zipResource Title: 2017 PDP Database Zip File. File Name: 2017PDPDatabase.zipResource Title: 2016 PDP Database Zip File. File Name: 2016PDPDatabase.zipResource Title: 2015 PDP Database Zip File. File Name: 2015PDPDatabase.zipResource Title: 2014 PDP Database Zip File. File Name: 2014PDPDatabase.zipResource Title: 2013 PDP Database Zip File. File Name: 2013PDPDatabase.zipResource Title: 2012 PDP Database Zip File. File Name: 2012PDPDatabase.zipResource Title: 2011 PDP Database Zip File. File Name: 2011PDPDatabase.zipResource Title: 2010 PDP Database Zip File. File Name: 2010PDPDatabase.zipResource Title: 2009 PDP Database Zip File. File Name: 2009PDPDatabase.zipResource Title: 2008 PDP Database Zip File. File Name: 2008PDPDatabase.zipResource Title: 2007 PDP Database Zip File. File Name: 2007PDPDatabase.zipResource Title: 2006 PDP Database Zip File. File Name: 2006PDPDatabase.zipResource Title: 2005 PDP Database Zip File. File Name: 2005PDPDatabase.zipResource Title: 2004 PDP Database Zip File. File Name: 2004PDPDatabase.zipResource Title: 2003 PDP Database Zip File. File Name: 2003PDPDatabase.zipResource Title: 2002 PDP Database Zip File. File Name: 2002PDPDatabase.zipResource Title: 2001 PDP Database Zip File. File Name: 2001PDPDatabase.zipResource Title: 2000 PDP Database Zip File. File Name: 2000PDPDatabase.zipResource Title: 1999 PDP Database Zip File. File Name: 1999PDPDatabase.zipResource Title: 1998 PDP Database Zip File. File Name: 1998PDPDatabase.zipResource Title: 1997 PDP Database Zip File. File Name: 1997PDPDatabase.zipResource Title: 1996 PDP Database Zip File. File Name: 1996PDPDatabase.zipResource Title: 1995 PDP Database Zip File. File Name: 1995PDPDatabase.zipResource Title: 1994 PDP Database Zip File. File Name: 1994PDPDatabase.zipResource Title: 1993 PDP Database Zip File. File Name: 1993PDPDatabase.zipResource Title: 1992 PDP Database Zip File. File Name: 1992PDPDatabase.zip

  20. Dictionary of English Words and Definitions

    • kaggle.com
    zip
    Updated Sep 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnthonyTherrien (2024). Dictionary of English Words and Definitions [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/dictionary-of-english-words-and-definitions
    Explore at:
    zip(6401928 bytes)Available download formats
    Dataset updated
    Sep 22, 2024
    Authors
    AnthonyTherrien
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Overview

    This dataset consists of 42,052 English words and their corresponding definitions. It is a comprehensive collection of words ranging from common terms to more obscure vocabulary. The dataset is ideal for Natural Language Processing (NLP) tasks, educational tools, and various language-related applications.

    Key Features:

    • Words: A diverse set of English words, including both rare and frequently used terms.
    • Definitions: Each word is accompanied by a detailed definition that explains its meaning and contextual usage.

    Total Number of Words: 42,052

    Applications

    This dataset is well-suited for a range of use cases, including:

    • Natural Language Processing (NLP): Enhance text understanding models by providing contextual meaning and word associations.
    • Vocabulary Building: Create educational tools or games that help users expand their vocabulary.
    • Lexical Studies: Perform academic research on word usage, trends, and lexical semantics.
    • Dictionary and Thesaurus Development: Serve as a resource for building dictionary or thesaurus applications, where users can search for words and definitions.

    Data Structure

    • Word: The column containing the English word.
    • Definition: The column providing a comprehensive definition of the word.

    Potential Use Cases

    • Language Learning: This dataset can be used to develop applications or tools aimed at enhancing vocabulary acquisition for language learners.
    • NLP Model Training: Useful for tasks such as word embeddings, definition generation, and contextual learning.
    • Research: Analyze word patterns, rare vocabulary, and trends in the English language.

    This version focuses on providing essential information while emphasizing the total number of words and potential applications of the dataset. Let me know if you'd like any further adjustments!

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jennifer Piscionere (2018). Data Dictionary [Dataset]. http://doi.org/10.25374/MCRI.7039280.v1
Organization logo

Data Dictionary

Explore at:
txtAvailable download formats
Dataset updated
Sep 6, 2018
Dataset provided by
Murdoch Children's Research Institutehttp://www.mcri.edu.au/
Authors
Jennifer Piscionere
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This is a data dictionary example we will use in the MVP presentation. It can be deleted after 13/9/18.

Search
Clear search
Close search
Google apps
Main menu