100+ datasets found

Data Dictionary
mcri.figshare.com
txt
Updated Sep 6, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jennifer Piscionere (2018). Data Dictionary [Dataset]. http://doi.org/10.25374/MCRI.7039280.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25374/MCRI.7039280.v1
Dataset updated
Sep 6, 2018
Dataset provided by
Murdoch Children's Research Institutehttp://www.mcri.edu.au/
Authors
Jennifer Piscionere
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is a data dictionary example we will use in the MVP presentation. It can be deleted after 13/9/18.
t
Data from: Data Dictionary Template
data.tempe.gov
data-academy.tempe.gov
+8more
Updated Jun 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2020). Data Dictionary Template [Dataset]. https://data.tempe.gov/documents/f97e93ac8d324c71a35caf5a295c4c1e
Explore at:
Dataset updated
Jun 5, 2020
Dataset authored and provided by
City of Tempe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Dictionary template for Tempe Open Data.
d
Open Data Dictionary Template Individual
catalog.data.gov
hub.arcgis.com
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of the Chief Tecnology Officer (2025). Open Data Dictionary Template Individual [Dataset]. https://catalog.data.gov/dataset/open-data-dictionary-template-individual
Explore at:
Dataset updated
Feb 4, 2025
Dataset provided by
Office of the Chief Tecnology Officer
Description
This template covers section 2.5 Resource Fields: Entity and Attribute Information of the Data Discovery Form cited in the Open Data DC Handbook (2022). It completes documentation elements that are required for publication. Each field column (attribute) in the dataset needs a description clarifying the contents of the column. Data originators are encouraged to enter the code values (domains) of the column to help end-users translate the contents of the column where needed, especially when lookup tables do not exist.
E
Viking II Data Dictionary
dtechtive.com
find.data.gov.scot
csv, docx, pdf, txt +1
Updated Oct 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh. Institute of Genetics and Cancer. MRC Human Genetics Unit (2021). Viking II Data Dictionary [Dataset]. http://doi.org/10.7488/ds/3145
Explore at:
csv(0.0038 MB), csv(0.0065 MB), csv(0.0012 MB), docx(0.015 MB), csv(0.0098 MB), csv(0.0063 MB), csv(0.007 MB), csv(0.004 MB), csv(0.0042 MB), csv(0.0029 MB), csv(0.0068 MB), csv(0.01 MB), xlsx(0.0923 MB), csv(0.0008 MB), csv(0.0015 MB), pdf(1.215 MB), csv(0.0043 MB), csv(0.0021 MB), csv(0.0071 MB), csv(0.0051 MB), txt(0.0166 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/3145
Dataset updated
Oct 8, 2021
Dataset provided by
University of Edinburgh. Institute of Genetics and Cancer. MRC Human Genetics Unit
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
UNITED KINGDOM
Description
VIKING II was made possible thanks to Medical Research Council (MRC) funding. We aim to better understand what might cause diseases such as heart disease, eye disease, stroke, diabetes and others by inviting 4,000 people with 2 or more grandparents from Orkney and Shetland to complete a questionnaire and provide a saliva sample. This data dictionary outlines what volunteers were asked and indicates the data you can access. To access the data, please e-mail viking@ed.ac.uk.

Superstore

kaggle.com

zip

Updated Oct 3, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Ibrahim Elsayed (2022). Superstore [Dataset]. https://www.kaggle.com/datasets/ibrahimelsayed182/superstore

Explore at:

zip(167457 bytes)Available download formats

Dataset updated

Oct 3, 2022

Authors

Ibrahim Elsayed

Description

Context

super Store in USA , the data contain about 10000 rows

Data Dictionary

Attributes	Definition	example
Ship Mode		Second Class
Segment	Segment Category	Consumer
Country		United State
City		Los Angeles
State		California
Postal Code		90032
Region		West
Category	Categories of product	Technology
Sub-Category		Phones
Sales	number of sales	114.9
Quantity		3
Discount		0.45
Profit		14.1694

Acknowledgements

All thanks to The Sparks Foundation For making this data set

Inspiration

Get the data and try to take insights. Good luck ❤️

Don't forget to Upvote😊🥰

Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal...
plos.figshare.com
xls
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Craig S. Mayer; Nick Williams; Vojtech Huser (2023). Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal Antiretroviral Regimens for Prevention of Intrapartum HIV Transmission’). [Dataset]. http://doi.org/10.1371/journal.pone.0240047.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0240047.t001
Dataset updated
Jun 14, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Craig S. Mayer; Nick Williams; Vojtech Huser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal Antiretroviral Regimens for Prevention of Intrapartum HIV Transmission’).
a
Commercial Fishing Regulations Data Dictionary
hub.arcgis.com
Updated Jan 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministry for Primary Industries (2020). Commercial Fishing Regulations Data Dictionary [Dataset]. https://hub.arcgis.com/documents/MPI::commercial-fishing-regulations-data-dictionary-1/about
Explore at:
Dataset updated
Jan 10, 2020
Dataset authored and provided by
Ministry for Primary Industries
Description
This data dictionary describes the field names, expected data, examples of data and field types (schema) of the commercial fishing regulations data set.
U
Data Dictionary for Electron Microprobe Data Collected with Probe for EPMA...
data.usgs.gov
s.cnmilf.com
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heather Lowers, Data Dictionary for Electron Microprobe Data Collected with Probe for EPMA Software Package Developed by Probe Software [Dataset]. http://doi.org/10.5066/P91HKRPM
Explore at:
Unique identifier
https://doi.org/10.5066/P91HKRPM
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Heather Lowers
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Jan 1, 1995 - Dec 31, 2050
Description
This data dictionary describes most of the possible output options given in the Probe for EPMA software package developed by Probe Software. Examples of the data output options include sample identification, analytical conditions, elemental weight percents, atomic percents, detection limits, and stage coordinates. Many more options are available and the data that is output will depend upon the end use.
w
Energy Performance of Buildings Certificates: Data dictionary and glossary
gov.uk
Updated Oct 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Levelling Up, Housing and Communities (2023). Energy Performance of Buildings Certificates: Data dictionary and glossary [Dataset]. https://www.gov.uk/government/statistics/energy-performance-of-buildings-certificates-data-dictionary-and-glossary
Explore at:
Dataset updated
Oct 23, 2023
Dataset provided by
GOV.UK
Authors
Department for Levelling Up, Housing and Communities
Description
EPC statistics data dictionary:

definitions of data variables included in the EPC statistics release

limitations of data variables

suggested usage of data variables

EPC statistics glossary:

a consolidated glossary of all the terms used in EPC statistics releases
APAC Data Suite | 4M+ Translations | 1.6M+ Words | Natural Language...
datarade.ai
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxford Languages (2025). APAC Data Suite | 4M+ Translations | 1.6M+ Words | Natural Language Processing Data | Dictionary Display | Translations | APAC Coverage [Dataset]. https://datarade.ai/data-products/apac-data-suite-4m-translations-1-6m-words-natural-la-oxford-languages
Explore at:
.json, .xml, .csv, .txt, .mp3, .wavAvailable download formats
Dataset updated
Oct 1, 2025
Dataset authored and provided by
Oxford Languageshttps://lexico.com/es
Area covered
Marshall Islands, Fiji, Papua New Guinea, China, Australia, Thailand, Taiwan, Philippines, Vietnam, Kiribati
Description
APAC Data Suite offers high-quality language datasets. Ideal for NLP, AI, LLMs, translation, and education, it combines linguistic depth and regional authenticity to power scalable, multilingual language technologies.

Discover our expertly curated language datasets in the APAC Data Suite. Compiled and annotated by language and linguistic experts, this suite offers high-quality resources tailored to your needs. This suite includes:

Monolingual and Bilingual Dictionary Data
Featuring headwords, definitions, word senses, part-of-speech (POS) tags, and semantic metadata.

Semi-bilingual Dictionary Data Each entry features a headword with definitions and/or usage examples in Language 1, followed by a translation of the headword and/or definition in Language 2, enabling efficient cross-lingual mapping.

Sentence Corpora
Curated examples of real-world usage with contextual annotations for training and evaluation.

Synonyms & Antonyms
Lexical relations to support semantic search, paraphrasing, and language understanding.

Audio Data
Native speaker recordings for speech recognition, TTS, and pronunciation modeling.

Word Lists
Frequency-ranked and thematically grouped lists for vocabulary training and NLP tasks. The word list data can cover one language or two, such as Tamil words with English translations.

Each language may contain one or more types of language data. Depending on the dataset, we can provide these in formats such as XML, JSON, TXT, XLSX, CSV, WAV, MP3, and more. Delivery is currently available via email (link-based sharing) or REST API.

If you require more information about a specific dataset, please contact us Growth.OL@oup.com.

Below are the different types of datasets available for each language, along with their key features and approximate metrics. If you have any questions or require additional assistance, please don't hesitate to contact us.

Assamese Semi-bilingual Dictionary Data: 72,200 words | 83,700 senses | 83,800 translations.

Bengali Bilingual Dictionary Data: 161,400 translations | 71,600 senses.

Bengali Semi-bilingual Dictionary Data: 28,300 words | 37,700 senses | 62,300 translations.

British English Monolingual Dictionary Data: 146,000 words | 230,000 senses | 149,000 example sentences.

British English Synonyms and Antonyms Data: 600,000 synonyms | 22,000 antonyms.

British English Pronunciations with Audio: 250,000 transcriptions (IPA) | 180,000 audio files.

French Monolingual Dictionary Data: 42,000 words | 56,000 senses | 43,000 example sentences.

French Bilingual Dictionary Data: 380,000 translations | 199,000 senses | 146,000 example translations.

Gujarati Monolingual Dictionary Data: 91,800 words | 131,500 senses.

Gujarati Bilingual Dictionary Data: 171,800 translations | 158,200 senses.

Hindi Monolingual Dictionary Data: 46,200 words | 112,700 senses.

Hindi Bilingual Dictionary Data: 263,400 translations | 208,100 senses | 18,600 example translations.

Hindi Synonyms and Antonyms Dictionary Data: 478,100 synonyms | 18,800 antonyms.

Hindi Sentence Data: 216,000 sentences.

Hindi Audio data: 68,000 audio files.

Indonesian Bilingual Dictionary Data: 36,000 translations | 23,700 senses | 12,700 example translations.

Indonesian Monolingual Dictionary Data: 120,000 words | 140,000 senses | 30,000 example sentences.

Korean Monolingual Dictionary Data: 596,100 words | 386,600 senses | 91,700 example sentences.

Korean Bilingual Dictionary Data: 952,500 translations | 449,700 senses | 227,800 example translations.

Mandarin Chinese (simplified) Monolingual Dictionary Data: 81,300 words | 162,400 senses | 80,700 example sentences.

Mandarin Chinese (traditional) Monolingual Dictionary Data: 60,100 words | 144,700 senses | 29,900 example sentences.

Mandarin Chinese (simplified) Bilingual Dictionary Data: 367,600 translations | 204,500 senses | 150,900 example translations.

Mandarin Chinese (traditional) Bilingual Dictionary Data: 215,600 translations | 202,800 senses | 149,700 example translations.

Mandarin Chinese (simplified) Synonyms and Antonyms Data: 3,800 synonyms | 3,180 antonyms.

Malay Bilingual Dictionary Data: 106,100 translations | 53,500 senses.

Malay Monolingual Dictionary Data: 39,800 words | 40,600 senses | 21,100 example sentences.

Malayalam Monolingual Dictionary Data: 91,300 words | 159,200 senses.

Malayalam Bilingual Word List Data: 76,200 translation pairs.

Marathi Bilingual Dictionary Data: 45,400 translations | 32,800 senses | 3,600 example translations.

Nepali Bilingual Dictionary Data: 350,000 translations | 264,200 senses | 1,300 example translations.

New Zealand English Monolingual Dictionary Data: 100,000 words

Odia Semi-bilingual Dictionary Data: 30,700 words | 69,300 senses | 69,200 translations.

Punjabi ...
d
Data from: Delta Neighborhood Physical Activity Study
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Jun 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Delta Neighborhood Physical Activity Study [Dataset]. https://catalog.data.gov/dataset/delta-neighborhood-physical-activity-study-f82d7
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Service
Description
The Delta Neighborhood Physical Activity Study was an observational study designed to assess characteristics of neighborhood built environments associated with physical activity. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns and neighborhoods in which Delta Healthy Sprouts participants resided. The 12 towns were located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys between August 2016 and September 2017 using the Rural Active Living Assessment (RALA) tools and the Community Park Audit Tool (CPAT). Scale scores for the RALA Programs and Policies Assessment and the Town-Wide Assessment were computed using the scoring algorithms provided for these tools via SAS software programming. The Street Segment Assessment and CPAT do not have associated scoring algorithms and therefore no scores are provided for them. Because the towns were not randomly selected and the sample size is small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one contains data collected with the RALA Programs and Policies Assessment (PPA) tool. Dataset two contains data collected with the RALA Town-Wide Assessment (TWA) tool. Dataset three contains data collected with the RALA Street Segment Assessment (SSA) tool. Dataset four contains data collected with the Community Park Audit Tool (CPAT). [Note : title changed 9/4/2020 to reflect study name] Resources in this dataset:Resource Title: Dataset One RALA PPA Data Dictionary. File Name: RALA PPA Data Dictionary.csvResource Description: Data dictionary for dataset one collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA Data Dictionary. File Name: RALA TWA Data Dictionary.csvResource Description: Data dictionary for dataset two collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA Data Dictionary. File Name: RALA SSA Data Dictionary.csvResource Description: Data dictionary for dataset three collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT Data Dictionary. File Name: CPAT Data Dictionary.csvResource Description: Data dictionary for dataset four collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One RALA PPA. File Name: RALA PPA Data.csvResource Description: Data collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA. File Name: RALA TWA Data.csvResource Description: Data collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA. File Name: RALA SSA Data.csvResource Description: Data collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT. File Name: CPAT Data.csvResource Description: Data collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Data Dictionary. File Name: DataDictionary_RALA_PPA_SSA_TWA_CPAT.csvResource Description: This is a combined data dictionary from each of the 4 dataset files in this set.
Database Creation Description and Data Dictionaries
figshare.com
txt
Updated Aug 11, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan Kempker; John David Ike (2016). Database Creation Description and Data Dictionaries [Dataset]. http://doi.org/10.6084/m9.figshare.3569067.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3569067.v3
Dataset updated
Aug 11, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Jordan Kempker; John David Ike
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
There are several Microsoft Word documents here detailing data creation methods and with various dictionaries describing the included and derived variables.The Database Creation Description is meant to walk a user through some of the steps detailed in the SAS code with this project.The alphabetical list of variables is intended for users as sometimes this makes some coding steps easier to copy and paste from this list instead of retyping.The NIS Data Dictionary contains some general dataset description as well as each variable's responses.
n
Data from: Generalizable EHR-R-REDCap pipeline for a national...
data.niaid.nih.gov
datadryad.org
zip
Updated Jan 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller (2022). Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry [Dataset]. http://doi.org/10.5061/dryad.rjdfn2zcm
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.rjdfn2zcm
Dataset updated
Jan 9, 2022
Dataset provided by
Harvard Medical School
Massachusetts General Hospital
Authors
Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.

Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.

Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.

Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.

Methods eLAB Development and Source Code (R statistical software):

eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).

eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.

Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.

The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).

Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.

Data Dictionary (DD)

EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.

Study Cohort

This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.

Statistical Analysis

OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.
E
Traveller Genes Data Dictionary
dtechtive.com
find.data.gov.scot
csv, docx, pdf, txt +1
Updated Oct 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh. Usher Institute (2021). Traveller Genes Data Dictionary [Dataset]. http://doi.org/10.7488/ds/3155
Explore at:
csv(0.001 MB), txt(0.0166 MB), docx(0.0127 MB), xlsx(0.0469 MB), csv(0.0026 MB), csv(0.0008 MB), csv(0.0025 MB), csv(0.0039 MB), csv(0.0101 MB), csv(0.0011 MB), pdf(0.4028 MB), csv(0.0022 MB), csv(0.0061 MB), csv(0.0009 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/3155
Dataset updated
Oct 25, 2021
Dataset provided by
University of Edinburgh. Usher Institute
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Traveller Genes is a research study supported by the Traveller community. We're looking at the genetics, origins and health of over 200 volunteers who have at least two grandparents who are or were Travellers. This includes Scottish Travellers, Irish Travellers, Romanichal or Romany, or Welsh Kale. We aim to identify the genetic origins and relationships of the Scottish Traveller community e.g. Highland Travellers, Lowland Travellers, Borders Romanichal Travellers. We also want to understand how Scottish Travellers are related to other communities and their overall patterns of health. Participants are asked to complete a questionnaire and provide a saliva sample. This data dictionary outlines what volunteers were asked and indicates the data you can access. To access the data, please e-mail travellergenes@ed.ac.uk.
S
data dictionary
health.data.ny.gov
csv, xlsx, xml
Updated Aug 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Center for Environmental Health (2022). data dictionary [Dataset]. https://health.data.ny.gov/Health/data-dictionary/3tsn-2bah
Explore at:
xlsx, csv, xmlAvailable download formats
Dataset updated
Aug 23, 2022
Authors
Center for Environmental Health
Description
This data includes the location of cooling towers registered with New York State. The data is self-reported by owners/property managers of cooling towers in service in New York State. In August 2015 the New York State Department of Health released emergency regulations requiring the owners of cooling towers to register them with New York State. In addition the regulation includes requirements: regular inspection; annual certification; obtaining and implementing a maintenance plan; record keeping; reporting of certain information; and sample collection and culture testing. All cooling towers in New York State, including New York City, need to be registered in the NYS system. Registration is done through an electronic database found at: www.ny.gov/services/register-cooling-tower-and-submit-reports. For more information, check http://www.health.ny.gov/diseases/communicable/legionellosis/, or go to the “About” tab.
Portuguese Language Datasets | 300K Translations | Natural Language...
datarade.ai
.json, .xml
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxford Languages (2025). Portuguese Language Datasets | 300K Translations | Natural Language Processing (NLP) Data | Dictionary Display | Translation | EU & LATAM Coverage [Dataset]. https://datarade.ai/data-products/portuguese-language-datasets-140k-words-300k-translations-oxford-languages
Explore at:
.json, .xmlAvailable download formats
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Oxford Languageshttps://lexico.com/es
Area covered
Brazil, Timor-Leste, Mozambique, Angola, Portugal, Sao Tome and Principe, Cabo Verde, Macao, Guinea-Bissau
Description
Comprehensive Portuguese language datasets with linguistic annotations, including headwords, definitions, word senses, usage examples, part-of-speech (POS) tags, semantic metadata, and contextual usage details. Perfect for powering dictionary platforms, NLP, AI models, and translation systems.

Our Portuguese language datasets are carefully compiled and annotated by language and linguistic experts. The below datasets in Portuguese are available for license:

Portuguese Monolingual Dictionary Data

Portuguese Bilingual Dictionary Data

Key Features (approximate numbers):

Portuguese Monolingual Dictionary Data

Our Portuguese monolingual covers both EU and LATAM varieties, featuring clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Portuguese language.

Words:143,600

Senses: 285,500

Example sentences: 69,300

Format: XML format

Delivery: Email (link-based file sharing)

Portuguese Bilingual Dictionary Data

The bilingual data provides translations in both directions, from English to Portuguese and from Portuguese to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality that span both EU and LATAM Portuguese varieties.

Translations: 300,000

Senses: 158,000

Example translations: 117,800

Format: XML and JSON format

Delivery: Email (link-based file sharing) and REST API

Updated frequency: annually

Use Cases:

We consistently work with our clients on new use cases as language technology continues to evolve. These include Natural Language Processing (NLP) applications, TTS, dictionary display tools, games, translations, word embedding, and word sense disambiguation (WSD).

If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation.

Pricing:

Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.

Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals.

About the sample:

The samples offer a brief overview of one or two language datasets (monolingual or/and bilingual dictionary data). To help you explore the structure and features of our dataset, we provide a sample in CSV format for preview purposes only.

If you need the complete original sample or more details about any dataset, please contact us (Growth.OL@oup.com) to request access or further information
d
Ecological Concerns Data Dictionary - Ecological Concerns data dictionary
catalog.data.gov
fisheries.noaa.gov
Updated May 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact, Custodian) (2025). Ecological Concerns Data Dictionary - Ecological Concerns data dictionary [Dataset]. https://catalog.data.gov/dataset/ecological-concerns-data-dictionary-ecological-concerns-data-dictionary2
Explore at:
Dataset updated
May 24, 2025
Dataset provided by
(Point of Contact, Custodian)
Description
Evaluating the status of threatened and endangered salmonid populations requires information on the current status of the threats (e.g., habitat, hatcheries, hydropower, and invasives) and the risk of extinction (e.g., status and trend in the Viable Salmonid Population criteria). For salmonids in the Pacific Northwest, threats generally result in changes to physical and biological characteristics of freshwater habitat. These changes are often described by terms like "limiting factors" or "habitat impairment." For example, the condition of freshwater habitat directly impacts salmonid abundance and population spatial structure by affecting carrying capacity and the variability and accessibility of rearing and spawning areas. Thus, one way to assess or quantify threats to ESUs and populations is to evaluate whether the ecological conditions on which fish depend is improving, becoming more degraded, or remains unchanged. In the attached spreadsheets, we have attempted to consistently record limiting factors and threats across all populations and ESUs to enable comparison to other datasets (e.g., restoration projects) in a consistent way. Limiting factors and threats (LF/T) identified in salmon recovery plans were translated in a common language using an ecological concerns data dictionary (see "Ecological Concerns" tab in the attached spreadsheets) (a data dictionaries defines the wording, meaning and scope of categories). The ecological concerns data dictionary defines how different elements are related, such as the relationships between threats, ecological concerns and life history stages. The data dictionary includes categories for ecological dynamics and population level effects such as "reduced genetic fitness" and "behavioral changes." The data dictionary categories are meant to encompass the ecological conditions that directly impact salmonids and can be addressed directly or indirectly by management (habitat restoration, hatchery reform, etc.) actions. Using the ecological concerns data dictionary enables us to more fully capture the range of effects of hydro, hatchery, and invasive threats as well as habitat threat categories. The organization and format of the data dictionary was also chosen so the information we record can be easily related to datasets we already posses (e.g., restoration data). Data Dictionary.
Data Dictionary/README files
figshare.com
xlsx
Updated Mar 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Camille Jaime; Taejoon Won (2025). Data Dictionary/README files [Dataset]. http://doi.org/10.6084/m9.figshare.28527137.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28527137.v1
Dataset updated
Mar 11, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Camille Jaime; Taejoon Won
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Dictionary contains all info for each sample/mouse of each experiment.All README files are also included with brief experimental description.
u
Data from: Pesticide Data Program (PDP)
agdatacommons.nal.usda.gov
txt
Updated Dec 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Agriculture (USDA), Agricultural Marketing Service (AMS) (2025). Pesticide Data Program (PDP) [Dataset]. http://doi.org/10.15482/USDA.ADC/1520764
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1520764
Dataset updated
Dec 2, 2025
Dataset provided by
Ag Data Commons
Authors
U.S. Department of Agriculture (USDA), Agricultural Marketing Service (AMS)
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The Pesticide Data Program (PDP) is a national pesticide residue database program. Through cooperation with State agriculture departments and other Federal agencies, PDP manages the collection, analysis, data entry, and reporting of pesticide residues on agricultural commodities in the U.S. food supply, with an emphasis on those commodities highly consumed by infants and children.This dataset provides information on where each tested sample was collected, where the product originated from, what type of product it was, and what residues were found on the product, for calendar years 1992 through 2023. The data can measure residues of individual compounds and classes of compounds, as well as provide information about the geographic distribution of the origin of samples, from growers, packers and distributors. The dataset also includes information on where the samples were taken, what laboratory was used to test them, and all testing procedures (by sample, so can be linked to the compound that is identified). The dataset also contains a reference variable for each compound that denotes the limit of detection for a pesticide/commodity pair (LOD variable). The metadata also includes EPA tolerance levels or action levels for each pesticide/commodity pair. The dataset will be updated on a continual basis, with a new resource data file added annually after the PDP calendar-year survey data is released.Resources in this dataset:Resource Title: CSV Data Dictionary for PDP.File Name: PDP_DataDictionary.csv. Resource Description: Machine-readable Comma Separated Values (CSV) format data dictionary for PDP Database Zip files. Defines variables for the sample identity and analytical results data tables/files. The ## characters in the Table and Text Data File name refer to the 2-digit year for the PDP survey, like 97 for 1997 or 01 for 2001. For details on table linking, see PDF. Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excelResource Title: Data dictionary for Pesticide Data Program. File Name: PDP DataDictionary.pdf. Resource Description: Data dictionary for PDP Database Zip files. Resource Software Recommended: Adobe Acrobat, url: https://www.adobe.comResource Title: 2023 PDP Database Zip File. File Name: 2023PDPDatabase.zipResource Title: 2022 PDP Database Zip File. File Name: 2022PDPDatabase.zipResource Title: 2021 PDP Database Zip File. File Name: 2021PDPDatabase.zipResource Title: 2020 PDP Database Zip File. File Name: 2020PDPDatabase.zipResource Title: 2019 PDP Database Zip File. File Name: 2019PDPDatabase.zipResource Title: 2018 PDP Database Zip File. File Name: 2018PDPDatabase.zipResource Title: 2017 PDP Database Zip File. File Name: 2017PDPDatabase.zipResource Title: 2016 PDP Database Zip File. File Name: 2016PDPDatabase.zipResource Title: 2015 PDP Database Zip File. File Name: 2015PDPDatabase.zipResource Title: 2014 PDP Database Zip File. File Name: 2014PDPDatabase.zipResource Title: 2013 PDP Database Zip File. File Name: 2013PDPDatabase.zipResource Title: 2012 PDP Database Zip File. File Name: 2012PDPDatabase.zipResource Title: 2011 PDP Database Zip File. File Name: 2011PDPDatabase.zipResource Title: 2010 PDP Database Zip File. File Name: 2010PDPDatabase.zipResource Title: 2009 PDP Database Zip File. File Name: 2009PDPDatabase.zipResource Title: 2008 PDP Database Zip File. File Name: 2008PDPDatabase.zipResource Title: 2007 PDP Database Zip File. File Name: 2007PDPDatabase.zipResource Title: 2006 PDP Database Zip File. File Name: 2006PDPDatabase.zipResource Title: 2005 PDP Database Zip File. File Name: 2005PDPDatabase.zipResource Title: 2004 PDP Database Zip File. File Name: 2004PDPDatabase.zipResource Title: 2003 PDP Database Zip File. File Name: 2003PDPDatabase.zipResource Title: 2002 PDP Database Zip File. File Name: 2002PDPDatabase.zipResource Title: 2001 PDP Database Zip File. File Name: 2001PDPDatabase.zipResource Title: 2000 PDP Database Zip File. File Name: 2000PDPDatabase.zipResource Title: 1999 PDP Database Zip File. File Name: 1999PDPDatabase.zipResource Title: 1998 PDP Database Zip File. File Name: 1998PDPDatabase.zipResource Title: 1997 PDP Database Zip File. File Name: 1997PDPDatabase.zipResource Title: 1996 PDP Database Zip File. File Name: 1996PDPDatabase.zipResource Title: 1995 PDP Database Zip File. File Name: 1995PDPDatabase.zipResource Title: 1994 PDP Database Zip File. File Name: 1994PDPDatabase.zipResource Title: 1993 PDP Database Zip File. File Name: 1993PDPDatabase.zipResource Title: 1992 PDP Database Zip File. File Name: 1992PDPDatabase.zip
Dictionary of English Words and Definitions
kaggle.com
zip
Updated Sep 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AnthonyTherrien (2024). Dictionary of English Words and Definitions [Dataset]. https://www.kaggle.com/datasets/anthonytherrien/dictionary-of-english-words-and-definitions
Explore at:
zip(6401928 bytes)Available download formats
Dataset updated
Sep 22, 2024
Authors
AnthonyTherrien
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Overview

This dataset consists of 42,052 English words and their corresponding definitions. It is a comprehensive collection of words ranging from common terms to more obscure vocabulary. The dataset is ideal for Natural Language Processing (NLP) tasks, educational tools, and various language-related applications.

Key Features:

Words: A diverse set of English words, including both rare and frequently used terms.

Definitions: Each word is accompanied by a detailed definition that explains its meaning and contextual usage.

Total Number of Words: 42,052

Applications

This dataset is well-suited for a range of use cases, including:

Natural Language Processing (NLP): Enhance text understanding models by providing contextual meaning and word associations.

Vocabulary Building: Create educational tools or games that help users expand their vocabulary.

Lexical Studies: Perform academic research on word usage, trends, and lexical semantics.

Dictionary and Thesaurus Development: Serve as a resource for building dictionary or thesaurus applications, where users can search for words and definitions.

Data Structure

Word: The column containing the English word.

Definition: The column providing a comprehensive definition of the word.

Potential Use Cases

Language Learning: This dataset can be used to develop applications or tools aimed at enhancing vocabulary acquisition for language learners.

NLP Model Training: Useful for tasks such as word embeddings, definition generation, and contextual learning.

Research: Analyze word patterns, rare vocabulary, and trends in the English language.

This version focuses on providing essential information while emphasizing the total number of words and potential applications of the dataset. Let me know if you'd like any further adjustments!

Facebook

Twitter

Click to copy link

Link copied

Cite

Jennifer Piscionere (2018). Data Dictionary [Dataset]. http://doi.org/10.25374/MCRI.7039280.v1

Data Dictionary

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.25374/MCRI.7039280.v1

Dataset updated

Sep 6, 2018

Dataset provided by

Murdoch Children's Research Institutehttp://www.mcri.edu.au/

Authors

Jennifer Piscionere

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This is a data dictionary example we will use in the MVP presentation. It can be deleted after 13/9/18.

Clear search

Close search

Google apps

Main menu

Data Dictionary

Data from: Data Dictionary Template

Open Data Dictionary Template Individual

Viking II Data Dictionary

Superstore

Context

Data Dictionary

Acknowledgements

Inspiration

Don't forget to Upvote😊🥰

Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal...

Commercial Fishing Regulations Data Dictionary

Data Dictionary for Electron Microprobe Data Collected with Probe for EPMA...

Energy Performance of Buildings Certificates: Data dictionary and glossary

APAC Data Suite | 4M+ Translations | 1.6M+ Words | Natural Language...

Data from: Delta Neighborhood Physical Activity Study

Database Creation Description and Data Dictionaries

Data from: Generalizable EHR-R-REDCap pipeline for a national...

Traveller Genes Data Dictionary

data dictionary

Portuguese Language Datasets | 300K Translations | Natural Language...

Ecological Concerns Data Dictionary - Ecological Concerns data dictionary

Data Dictionary/README files

Data from: Pesticide Data Program (PDP)

Dictionary of English Words and Definitions

Dataset Overview

Key Features:

Total Number of Words: 42,052

Applications

Data Structure

Potential Use Cases

Data Dictionary