100+ datasets found

t
Data from: Data Dictionary Template
data.tempe.gov
data-academy.tempe.gov
+9more
Updated Jun 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2020). Data Dictionary Template [Dataset]. https://data.tempe.gov/documents/f97e93ac8d324c71a35caf5a295c4c1e
Explore at:
Dataset updated
Jun 5, 2020
Dataset authored and provided by
City of Tempe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Dictionary template for Tempe Open Data.
d
Open Data Dictionary Template Individual
opendata.dc.gov
catalog.data.gov
+1more
Updated Jan 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Washington, DC (2023). Open Data Dictionary Template Individual [Dataset]. https://opendata.dc.gov/documents/cb6a686b1e344eeb8136d0103c942346
Explore at:
Dataset updated
Jan 5, 2023
Dataset authored and provided by
City of Washington, DC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This template covers section 2.5 Resource Fields: Entity and Attribute Information of the Data Discovery Form cited in the Open Data DC Handbook (2022). It completes documentation elements that are required for publication. Each field column (attribute) in the dataset needs a description clarifying the contents of the column. Data originators are encouraged to enter the code values (domains) of the column to help end-users translate the contents of the column where needed, especially when lookup tables do not exist.
f
Data Dictionary
mcri.figshare.com
txt
Updated Sep 6, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jennifer Piscionere (2018). Data Dictionary [Dataset]. http://doi.org/10.25374/MCRI.7039280.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25374/MCRI.7039280.v1
Dataset updated
Sep 6, 2018
Dataset provided by
Murdoch Childrens Research Institute
Authors
Jennifer Piscionere
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is a data dictionary example we will use in the MVP presentation. It can be deleted after 13/9/18.
g
Data Dictionary Template | gimi9.com
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Dictionary Template | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_data-dictionary-template-2e170
Explore at:
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
🇺🇸 미국
Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal...
plos.figshare.com
xls
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Craig S. Mayer; Nick Williams; Vojtech Huser (2023). Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal Antiretroviral Regimens for Prevention of Intrapartum HIV Transmission’). [Dataset]. http://doi.org/10.1371/journal.pone.0240047.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0240047.t001
Dataset updated
Jun 14, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Craig S. Mayer; Nick Williams; Vojtech Huser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example data elements (for trial NCT00099359: ‘Trial of Three Neonatal Antiretroviral Regimens for Prevention of Intrapartum HIV Transmission’).
v
Ecological Concerns Data Dictionary - Ecological Concerns data dictionary
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
fisheries.noaa.gov
+1more
Updated May 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact, Custodian) (2025). Ecological Concerns Data Dictionary - Ecological Concerns data dictionary [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/ecological-concerns-data-dictionary-ecological-concerns-data-dictionary2
Explore at:
Dataset updated
May 24, 2025
Dataset provided by
(Point of Contact, Custodian)
Description
Evaluating the status of threatened and endangered salmonid populations requires information on the current status of the threats (e.g., habitat, hatcheries, hydropower, and invasives) and the risk of extinction (e.g., status and trend in the Viable Salmonid Population criteria). For salmonids in the Pacific Northwest, threats generally result in changes to physical and biological characteristics of freshwater habitat. These changes are often described by terms like "limiting factors" or "habitat impairment." For example, the condition of freshwater habitat directly impacts salmonid abundance and population spatial structure by affecting carrying capacity and the variability and accessibility of rearing and spawning areas. Thus, one way to assess or quantify threats to ESUs and populations is to evaluate whether the ecological conditions on which fish depend is improving, becoming more degraded, or remains unchanged. In the attached spreadsheets, we have attempted to consistently record limiting factors and threats across all populations and ESUs to enable comparison to other datasets (e.g., restoration projects) in a consistent way. Limiting factors and threats (LF/T) identified in salmon recovery plans were translated in a common language using an ecological concerns data dictionary (see "Ecological Concerns" tab in the attached spreadsheets) (a data dictionaries defines the wording, meaning and scope of categories). The ecological concerns data dictionary defines how different elements are related, such as the relationships between threats, ecological concerns and life history stages. The data dictionary includes categories for ecological dynamics and population level effects such as "reduced genetic fitness" and "behavioral changes." The data dictionary categories are meant to encompass the ecological conditions that directly impact salmonids and can be addressed directly or indirectly by management (habitat restoration, hatchery reform, etc.) actions. Using the ecological concerns data dictionary enables us to more fully capture the range of effects of hydro, hatchery, and invasive threats as well as habitat threat categories. The organization and format of the data dictionary was also chosen so the information we record can be easily related to datasets we already posses (e.g., restoration data). Data Dictionary.
Data dictionary and brochure from: REAP (Resilient Economic Agricultural...
figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ag Data Commons (2023). Data dictionary and brochure from: REAP (Resilient Economic Agricultural Practices) [Dataset]. http://doi.org/10.6084/m9.figshare.19100243.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19100243.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ag Data Commons
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data dictionary and brochure for REAP (Resilient Economic Agricultural Practices). https://data.nal.usda.gov/node/5594

Data Entry Template 2017 includes

Excel templates for Experiment description worksheets, Site characterization worksheets, Management worksheets, Measurement worksheets where experimental unit data are reported, and Information that may be useful to the user, including drop down lists of treatment specific information and ranges of expected values. General and introductory instructions, as well as a Data Validation check are also included.A data dictionary typically provides a detailed description for each element or variable in a dataset or data model. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and text description.Dataset citation: (dataset) USDA Agricultural Research Service. (2017). REAP (Resilient Economic Agricultural Practices). Agricultural Research Service. https://doi.org/10.15482/USDA.ADC/1372394.
H
Replication Code for: Proxy Advisory Firms and Corporate Shareholder...
dataverse.harvard.edu
search.dataone.org
Updated Sep 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joshua White (2024). Replication Code for: Proxy Advisory Firms and Corporate Shareholder Engagement [Dataset]. http://doi.org/10.7910/DVN/ABLKE4
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ABLKE4
Dataset updated
Sep 5, 2024
Dataset provided by
Harvard Dataverse
Authors
Joshua White
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Contains compressed file, "Proxy Advisory Firms and Corporate Shareholder Engagement.zip", which contains Stata code, Stata pseudo-datasets (to demonstrate format of data), and a data dictionary. Review of Financial Studies, forthcoming. (2024)
S
APD Data Dictionary
splitgraph.com
datahub.austintexas.gov
+3more
Updated Jun 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
datahub-austintexas-gov (2024). APD Data Dictionary [Dataset]. https://www.splitgraph.com/datahub-austintexas-gov/apd-data-dictionary-6w8q-suwv/
Explore at:
json, application/openapi+json, application/vnd.splitgraph.imageAvailable download formats
Dataset updated
Jun 17, 2024
Authors
datahub-austintexas-gov
Description
A table of the values and definitions of fields used in Austin Police Department datasets.

City of Austin Open Data Terms of Use - https://data.austintexas.gov/stories/s/ranj-cccq

Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

See the Splitgraph documentation for more information.
l
LScD (Leicester Scientific Dictionary)
figshare.le.ac.uk
docx
Updated Apr 15, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neslihan Suzen (2020). LScD (Leicester Scientific Dictionary) [Dataset]. http://doi.org/10.25392/leicester.data.9746900.v3
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.25392/leicester.data.9746900.v3
Dataset updated
Apr 15, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Leicester
Description
LScD (Leicester Scientific Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScD (Leicester Scientific Dictionary) is created from the updated LSC (Leicester Scientific Corpus) - Version 2*. All pre-processing steps applied to build the new version of the dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. After pre-processing steps, the total number of unique words in the new version of the dictionary is 972,060. The files provided with this description are also same as described as for LScD Version 2 below.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2** Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v2[Version 2] Getting StartedThis document provides the pre-processing steps for creating an ordered list of words from the LSC (Leicester Scientific Corpus) [1] and the description of LScD (Leicester Scientific Dictionary). This dictionary is created to be used in future work on the quantification of the meaning of research texts. R code for producing the dictionary from LSC and instructions for usage of the code are available in [2]. The code can be also used for list of texts from other sources, amendments to the code may be required.LSC is a collection of abstracts of articles and proceeding papers published in 2014 and indexed by the Web of Science (WoS) database [3]. Each document contains title, list of authors, list of categories, list of research areas, and times cited. The corpus contains only documents in English. The corpus was collected in July 2018 and contains the number of citations from publication date to July 2018. The total number of documents in LSC is 1,673,824.LScD is an ordered list of words from texts of abstracts in LSC.The dictionary stores 974,238 unique words, is sorted by the number of documents containing the word in descending order. All words in the LScD are in stemmed form of words. The LScD contains the following information:1.Unique words in abstracts2.Number of documents containing each word3.Number of appearance of a word in the entire corpusProcessing the LSCStep 1.Downloading the LSC Online: Use of the LSC is subject to acceptance of request of the link by email. To access the LSC for research purposes, please email to ns433@le.ac.uk. The data are extracted from Web of Science [3]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.Step 2.Importing the Corpus to R: The full R code for processing the corpus can be found in the GitHub [2].All following steps can be applied for arbitrary list of texts from any source with changes of parameter. The structure of the corpus such as file format and names (also the position) of fields should be taken into account to apply our code. The organisation of CSV files of LSC is described in README file for LSC [1].Step 3.Extracting Abstracts and Saving Metadata: Metadata that include all fields in a document excluding abstracts and the field of abstracts are separated. Metadata are then saved as MetaData.R. Fields of metadata are: List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.Step 4.Text Pre-processing Steps on the Collection of Abstracts: In this section, we presented our approaches to pre-process abstracts of the LSC.1.Removing punctuations and special characters: This is the process of substitution of all non-alphanumeric characters by space. We did not substitute the character “-” in this step, because we need to keep words like “z-score”, “non-payment” and “pre-processing” in order not to lose the actual meaning of such words. A processing of uniting prefixes with words are performed in later steps of pre-processing.2.Lowercasing the text data: Lowercasing is performed to avoid considering same words like “Corpus”, “corpus” and “CORPUS” differently. Entire collection of texts are converted to lowercase.3.Uniting prefixes of words: Words containing prefixes joined with character “-” are united as a word. The list of prefixes united for this research are listed in the file “list_of_prefixes.csv”. The most of prefixes are extracted from [4]. We also added commonly used prefixes: ‘e’, ‘extra’, ‘per’, ‘self’ and ‘ultra’.4.Substitution of words: Some of words joined with “-” in the abstracts of the LSC require an additional process of substitution to avoid losing the meaning of the word before removing the character “-”. Some examples of such words are “z-test”, “well-known” and “chi-square”. These words have been substituted to “ztest”, “wellknown” and “chisquare”. Identification of such words is done by sampling of abstracts form LSC. The full list of such words and decision taken for substitution are presented in the file “list_of_substitution.csv”.5.Removing the character “-”: All remaining character “-” are replaced by space.6.Removing numbers: All digits which are not included in a word are replaced by space. All words that contain digits and letters are kept because alphanumeric characters such as chemical formula might be important for our analysis. Some examples are “co2”, “h2o” and “21st”.7.Stemming: Stemming is the process of converting inflected words into their word stem. This step results in uniting several forms of words with similar meaning into one form and also saving memory space and time [5]. All words in the LScD are stemmed to their word stem.8.Stop words removal: Stop words are words that are extreme common but provide little value in a language. Some common stop words in English are ‘I’, ‘the’, ‘a’ etc. We used ‘tm’ package in R to remove stop words [6]. There are 174 English stop words listed in the package.Step 5.Writing the LScD into CSV Format: There are 1,673,824 plain processed texts for further analysis. All unique words in the corpus are extracted and written in the file “LScD.csv”.The Organisation of the LScDThe total number of words in the file “LScD.csv” is 974,238. Each field is described below:Word: It contains unique words from the corpus. All words are in lowercase and their stem forms. The field is sorted by the number of documents that contain words in descending order.Number of Documents Containing the Word: In this content, binary calculation is used: if a word exists in an abstract then there is a count of 1. If the word exits more than once in a document, the count is still 1. Total number of document containing the word is counted as the sum of 1s in the entire corpus.Number of Appearance in Corpus: It contains how many times a word occurs in the corpus when the corpus is considered as one large document.Instructions for R CodeLScD_Creation.R is an R script for processing the LSC to create an ordered list of words from the corpus [2]. Outputs of the code are saved as RData file and in CSV format. Outputs of the code are:Metadata File: It includes all fields in a document excluding abstracts. Fields are List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.File of Abstracts: It contains all abstracts after pre-processing steps defined in the step 4.DTM: It is the Document Term Matrix constructed from the LSC[6]. Each entry of the matrix is the number of times the word occurs in the corresponding document.LScD: An ordered list of words from LSC as defined in the previous section.The code can be used by:1.Download the folder ‘LSC’, ‘list_of_prefixes.csv’ and ‘list_of_substitution.csv’2.Open LScD_Creation.R script3.Change parameters in the script: replace with the full path of the directory with source files and the full path of the directory to write output files4.Run the full code.References[1]N. Suzen. (2019). LSC (Leicester Scientific Corpus) [Dataset]. Available: https://doi.org/10.25392/leicester.data.9449639.v1[2]N. Suzen. (2019). LScD-LEICESTER SCIENTIFIC DICTIONARY CREATION. Available: https://github.com/neslihansuzen/LScD-LEICESTER-SCIENTIFIC-DICTIONARY-CREATION[3]Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4]A. Thomas, "Common Prefixes, Suffixes and Roots," Center for Development and Learning, 2013.[5]C. Ramasubramanian and R. Ramya, "Effective pre-processing activities in text mining using improved porter’s stemming algorithm," International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, no. 12, pp. 4536-4538, 2013.[6]I. Feinerer, "Introduction to the tm Package Text Mining in R," Accessible en ligne: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf, 2013.
d
Data from: Development of Data Dictionary for neonatal intensive care unit:...
datadryad.org
search.dataone.org
zip
Updated Dec 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harpreet Singh; Ravneet Kaur; Satish Saluja; Su Cho; Avneet Kaur; Ashish Pandey; Shubham Gupta; Ritu Das; Praveen Kumar; Jonathan Palma; Gautam Yadav; Yao Sun (2020). Development of Data Dictionary for neonatal intensive care unit: advancement towards a better critical care unit [Dataset]. http://doi.org/10.5061/dryad.zkh18936f
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.zkh18936f
Dataset updated
Dec 27, 2020
Dataset provided by
Dryad
Authors
Harpreet Singh; Ravneet Kaur; Satish Saluja; Su Cho; Avneet Kaur; Ashish Pandey; Shubham Gupta; Ritu Das; Praveen Kumar; Jonathan Palma; Gautam Yadav; Yao Sun
Time period covered
Dec 20, 2019
Description
Supplementary_Data_Dictionary_Sheet_v1.0.xls

The data dictionary Excel sheet is the main supporting document for the paper.

DD_-_Neonatal_Data.csv

The patient dataset is provided as a format for capturing data with respect to data dictionary.
Meta data and supporting documentation
catalog.data.gov
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
d
LNWB Ch03 Data Processes
search.dataone.org
hydroshare.org
+1more
Updated Apr 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christina Bandaragoda; Joanne Greenberg; Peter Gill; Bracken Capen; Mary Dumas (2022). LNWB Ch03 Data Processes [Dataset]. https://search.dataone.org/view/sha256%3A2a8103e6f0e432948dd223f69ee2ce60f9611139cdfae7b8dab0b800e6f2526f
Explore at:
Dataset updated
Apr 15, 2022
Dataset provided by
Hydroshare
Authors
Christina Bandaragoda; Joanne Greenberg; Peter Gill; Bracken Capen; Mary Dumas
Description
Overview: The Lower Nooksack Water Budget Project involved assembling a wide range of existing data related to WRIA 1 and specifically the Lower Nooksack Subbasin, updating existing data sets and generating new data sets. This Data Management Plan provides an overview of the data sets, formats and collaboration environment that was used to develop the project. Use of a plan during development of the technical work products provided a forum for the data development and management to be conducted with transparent methods and processes. At project completion, the Data Management Plan provides an accessible archive of the data resources used and supporting information on the data storage, intended access, sharing and re-use guidelines.

One goal of the Lower Nooksack Water Budget project is to make this “usable technical information” as accessible as possible across technical, policy and general public users. The project data, analyses and documents will be made available through the WRIA 1 Watershed Management Project website http://wria1project.org. This information is intended for use by the WRIA 1 Joint Board and partners working to achieve the adopted goals and priorities of the WRIA 1 Watershed Management Plan.

Model outputs for the Lower Nooksack Water Budget are summarized by sub-watersheds (drainages) and point locations (nodes). In general, due to changes in land use over time and changes to available streamflow and climate data, the water budget for any watershed needs to be updated periodically. Further detailed information about data sources is provided in review packets developed for specific technical components including climate, streamflow and groundwater level, soils and land cover, and water use.

Purpose: This project involves assembling a wide range of existing data related to the WRIA 1 and specifically the Lower Nooksack Subbasin, updating existing data sets and generating new data sets. Data will be used as input to various hydrologic, climatic and geomorphic components of the Topnet-Water Management (WM) model, but will also be available to support other modeling efforts in WRIA 1. Much of the data used as input to the Topnet model is publicly available and maintained by others, (i.e., USGS DEMs and streamflow data, SSURGO soils data, University of Washington gridded meteorological data). Pre-processing is performed to convert these existing data into a format that can be used as input to the Topnet model. Post-processing of Topnet model ASCII-text file outputs is subsequently combined with spatial data to generate GIS data that can be used to create maps and illustrations of the spatial distribution of water information. Other products generated during this project will include documentation of methods, input by WRIA 1 Joint Board Staff Team during review and comment periods, communication tools developed for public engagement and public comment on the project.

In order to maintain an organized system of developing and distributing data, Lower Nooksack Water Budget project collaborators should be familiar with standards for data management described in this document, and the following issues related to generating and distributing data: 1. Standards for metadata and data formats 2. Plans for short-term storage and data management (i.e., file formats, local storage and back up procedures and security) 3. Legal and ethical issues (i.e., intellectual property, confidentiality of study participants) 4. Access policies and provisions (i.e., how the data will be made available to others, any restrictions needed) 5. Provisions for long-term archiving and preservation (i.e., establishment of a new data archive or utilization of an existing archive) 6. Assigned data management responsibilities (i.e., persons responsible for ensuring data Management, monitoring compliance with the Data Management Plan)

This resource is a subset of the Lower Nooksack Water Budget (LNWB) Collection Resource.
n
Data from: Generalizable EHR-R-REDCap pipeline for a national...
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated Jan 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller (2022). Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry [Dataset]. http://doi.org/10.5061/dryad.rjdfn2zcm
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.rjdfn2zcm
Dataset updated
Jan 9, 2022
Dataset provided by
Massachusetts General Hospital
Harvard Medical School
Authors
Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.

Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.

Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.

Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.

Methods eLAB Development and Source Code (R statistical software):

eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).

eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.

Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.

The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).

Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.

Data Dictionary (DD)

EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.

Study Cohort

This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.

Statistical Analysis

OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.
S
Live Well San Diego Data Dictionary
splitgraph.com
data.sandiegocounty.gov
Updated Oct 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
internal-sandiegocounty-data-socrata (2024). Live Well San Diego Data Dictionary [Dataset]. https://www.splitgraph.com/internal-sandiegocounty-data-socrata/live-well-san-diego-data-dictionary-37vr-nftn/
Explore at:
json, application/vnd.splitgraph.image, application/openapi+jsonAvailable download formats
Dataset updated
Oct 10, 2024
Authors
internal-sandiegocounty-data-socrata
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Area covered
San Diego
Description
This is the Data Dictionary for the Live Well San Diego Database. Each variable is defined, given pertinent notes, and sourced.

Prepared by: County of San Diego, Health & Human Services Agency, Public Health Services Division, Community Health Statistics Unit.

Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

See the Splitgraph documentation for more information.
Data dictionary.docx
figshare.com
docx
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monica Panca (2019). Data dictionary.docx [Dataset]. http://doi.org/10.6084/m9.figshare.7708016.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7708016.v1
Dataset updated
Feb 12, 2019
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Monica Panca
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Healthcare resource utilisation and costs of agitation in people with dementia living in care homes in England
Supplementary data files for manuscript titled "From spreadsheet lab data...
zenodo.org
zip
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gesa Witt; Gesa Witt; Yojana Gadiya; Yojana Gadiya; Tooba Abbassi-Daloii; Tooba Abbassi-Daloii; Vassilios Ioannidis; Vassilios Ioannidis; Nick Juty; Nick Juty; Claus Stie Kallesøe; Claus Stie Kallesøe; Marie Attwood; Manfred Kohler; Manfred Kohler; Philip Gribbon; Marie Attwood; Philip Gribbon (2025). Supplementary data files for manuscript titled "From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research" [Dataset]. http://doi.org/10.5281/zenodo.15234457
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15234457
Dataset updated
Apr 17, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gesa Witt; Gesa Witt; Yojana Gadiya; Yojana Gadiya; Tooba Abbassi-Daloii; Tooba Abbassi-Daloii; Vassilios Ioannidis; Vassilios Ioannidis; Nick Juty; Nick Juty; Claus Stie Kallesøe; Claus Stie Kallesøe; Marie Attwood; Manfred Kohler; Manfred Kohler; Philip Gribbon; Marie Attwood; Philip Gribbon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 17, 2025
Description
This data repository contains all the necessary supplementary files for the manuscript titled "From spreadsheet lab data templates to knowledge graphs: A FAIR data journey in the domain of AMR research."

The repository is a copy of the GitHub page with the source code used to generate the graph and additional files required for the Lab Data Template.

Below we provide a brief overview of the data files in the `additional folder` and their underlying purpose:

The Data Survey collects relevant project and data set information to set up a Data Management Plan. It can serve as an input for Lab Data Template development.

The Lab Data Templates facilitate the collection of AMR research data (in vivo and in vitro) in several sub-tables. The Excel format is compatible with upload procedures into the data repository 'grit' and serves as input for a knowledge graph workflow.

The Data dictionary is connected to the Lab Data Templates and ensures harmonized data entries. In addition, the dictionaries collect metadata beyond the content of the Lab Data Template (e.g. bacterial strain information or compound information) and link to ontologies where possible.

The FAIR assessments have been used as a primer for improving the template. This report is generated using the FAIR-DSM model.

The templates have been used during the IMI2 GNA NOW project to collect information and have been improved according to FAIR standards in collaboration with the IMI FAIRplus project ("post FAIRification").
E
New Oxford Dictionary of English, 2nd Edition
live.european-language-grid.eu
catalog.elra.info
Updated Dec 6, 2005
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2005). New Oxford Dictionary of English, 2nd Edition [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/2276
Explore at:
Dataset updated
Dec 6, 2005
License
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in XML or SGML. - Source dictionary data. The NODE data set includes all the information present in the New Oxford Dictionary of English itself, such as definition text, example sentences, grammatical indicators, and encyclopaedic material. - Morphological data. Each NODE lemma (both headwords and subentries) has a full listing of all possible syntactic forms (e.g. plurals for nouns, inflections for verbs, comparatives and superlatives for adjectives), tagged to show their syntactic relationships. Each form has an IPA pronunciation. Full morphological data is also given for spelling variants (e.g. typical American variants), and a system of links enables straightforward correlation of variant forms to standard forms. The data set thus provides robust support for all look-up routines, and is equally viable for applications dealing with American and British English. - Phrases and idioms. The NODE data set provides a rich and flexible codification of over 10,000 phrasal verbs and other multi-word phrases. It features comprehensive lexical resources enabling applications to identify a phrase not only in the form listed in the dictionary but also in a range of real-world variations, including alternative wording, variable syntactic patterns, inflected verbs, optional determiners, etc. - Subject classification. Using a categorization scheme of 200 key domains, over 80,000 words and senses have been associated with particular subject areas, from aeronautics to zoology. As well as facilitating the extraction of subject-specific sub-lexicons, this also provides an extensive resource for document categorization and information retrieval. - Semantic relationships. The relationships between every noun and noun sense in the dictionary are being codified using an extensive semantic taxonomy on the model of the Princeton WordNet project. (Mapping to WordNet 1.7 is supported.) This structure allows elements of the basic lexical database to function as a formal knowledge database, enabling functionality such as sense disambiguation and logical inference. - Derived from the detailed and authoritative corpus-based research of Oxford University Press's lexicographic team, the NODE data set is a powerful asset for any task dealing with real-world contemporary English usage. By integrating a number of different data types into a single structure, it creates a coherent resource which can be queried along numerous axes, allowing open-ended exploitation by many kinds of language-related applications.
Data from: US Federal LCA Commons Life Cycle Inventory Unit Process Template...
catalog.data.gov
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). US Federal LCA Commons Life Cycle Inventory Unit Process Template [Dataset]. https://catalog.data.gov/dataset/us-federal-lca-commons-life-cycle-inventory-unit-process-template-3cc7d
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Area covered
United States
Description
An excel template with data elements and conventions corresponding to the openLCA unit process data model. Includes LCA Commons data and metadata guidelines and definitions Resources in this dataset:Resource Title: READ ME - data dictionary. File Name: lcaCommonsSubmissionGuidelines_FINAL_2014-09-22.pdfResource Title: US Federal LCA Commons Life Cycle Inventory Unit Process Template. File Name: FedLCA_LCI_template_blank EK 7-30-2015.xlsxResource Description: Instructions: This template should be used for life cycle inventory (LCI) unit process development and is associated with an openLCA plugin to import these data into an openLCA database. See www.openLCA.org to download the latest release of openLCA for free, and to access available plugins.
a
v8 USAR Data Dictionary Long Form
napsg.hub.arcgis.com
Updated Dec 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NAPSG Foundation (2020). v8 USAR Data Dictionary Long Form [Dataset]. https://napsg.hub.arcgis.com/documents/38d6053c996c4a1d89ee245cc1f82f12
Explore at:
Dataset updated
Dec 15, 2020
Dataset authored and provided by
NAPSG Foundation
Area covered

Description
v8 USAR waypoint data dictionary. It is all content on one long pdf page. It can not easily be printed. For a printable v8 USAR Data Dictionary, please see v8 USAR Data Dictionary 8.5 x 11.On 5/14/2021 this item was deprecated in favor of v8 USAR Data Dictionary 8.5 x 11 to simplify updating documentation.To suggest edits or revisions, please contact jdoke@publicsafetygis.org