Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand its contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a dataset. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search engine indexing to reach a broader audience of interested parties. This tutorial first explains the terminology and standards surrounding data dictionaries and codebooks. We then present a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared dataset accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we explain how to use freely available web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable (FAIR; Wilkinson et al., 2016).
An in-depth description of the Building Footprint GIS data layer outlining terms of use, update frequency, attribute explanations, and more.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This file contains the data dictionary for Street Centerline (Native) dataset. To leave feedback or ask a question about this dataset, please fill out the following form: Street Centerline (Native) Schema Data Dictionary feedback form.
buildings data-dictionary lake-county-illinois planimetrics planimetrics-and-landmarks readme
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Very common form names and the number and percentage of studies their used in.
A Data Dictionary for the TSS Summarized Reports at the Building and Individual level reports.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The LScDC (Leicester Scientific Dictionary-Core Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScDC (Leicester Scientific Dictionary-Core) is formed using the updated LScD (Leicester Scientific Dictionary) - Version 3*. All steps applied to build the new version of core dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. The files provided with this description are also same as described as for LScDC Version 2. The numbers of words in the 3rd versions of LScD and LScDC are summarized below. # of wordsLScD (v3) 972,060LScDC (v3) 103,998 * Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v3 ** Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v2[Version 2] Getting StartedThis file describes a sorted and cleaned list of words from LScD (Leicester Scientific Dictionary), explains steps for sub-setting the LScD and basic statistics of words in the LSC (Leicester Scientific Corpus), to be found in [1, 2]. The LScDC (Leicester Scientific Dictionary-Core) is a list of words ordered by the number of documents containing the words, and is available in the CSV file published. There are 104,223 unique words (lemmas) in the LScDC. This dictionary is created to be used in future work on the quantification of the sense of research texts. The objective of sub-setting the LScD is to discard words which appear too rarely in the corpus. In text mining algorithms, usage of enormous number of text data brings the challenge to the performance and the accuracy of data mining applications. The performance and the accuracy of models are heavily depend on the type of words (such as stop words and content words) and the number of words in the corpus. Rare occurrence of words in a collection is not useful in discriminating texts in large corpora as rare words are likely to be non-informative signals (or noise) and redundant in the collection of texts. The selection of relevant words also holds out the possibility of more effective and faster operation of text mining algorithms.To build the LScDC, we decided the following process on LScD: removing words that appear in no more than 10 documents (
Overview: The Lower Nooksack Water Budget Project involved assembling a wide range of existing data related to WRIA 1 and specifically the Lower Nooksack Subbasin, updating existing data sets and generating new data sets. This Data Management Plan provides an overview of the data sets, formats and collaboration environment that was used to develop the project. Use of a plan during development of the technical work products provided a forum for the data development and management to be conducted with transparent methods and processes. At project completion, the Data Management Plan provides an accessible archive of the data resources used and supporting information on the data storage, intended access, sharing and re-use guidelines.
One goal of the Lower Nooksack Water Budget project is to make this “usable technical information” as accessible as possible across technical, policy and general public users. The project data, analyses and documents will be made available through the WRIA 1 Watershed Management Project website http://wria1project.org. This information is intended for use by the WRIA 1 Joint Board and partners working to achieve the adopted goals and priorities of the WRIA 1 Watershed Management Plan.
Model outputs for the Lower Nooksack Water Budget are summarized by sub-watersheds (drainages) and point locations (nodes). In general, due to changes in land use over time and changes to available streamflow and climate data, the water budget for any watershed needs to be updated periodically. Further detailed information about data sources is provided in review packets developed for specific technical components including climate, streamflow and groundwater level, soils and land cover, and water use.
Purpose: This project involves assembling a wide range of existing data related to the WRIA 1 and specifically the Lower Nooksack Subbasin, updating existing data sets and generating new data sets. Data will be used as input to various hydrologic, climatic and geomorphic components of the Topnet-Water Management (WM) model, but will also be available to support other modeling efforts in WRIA 1. Much of the data used as input to the Topnet model is publicly available and maintained by others, (i.e., USGS DEMs and streamflow data, SSURGO soils data, University of Washington gridded meteorological data). Pre-processing is performed to convert these existing data into a format that can be used as input to the Topnet model. Post-processing of Topnet model ASCII-text file outputs is subsequently combined with spatial data to generate GIS data that can be used to create maps and illustrations of the spatial distribution of water information. Other products generated during this project will include documentation of methods, input by WRIA 1 Joint Board Staff Team during review and comment periods, communication tools developed for public engagement and public comment on the project.
In order to maintain an organized system of developing and distributing data, Lower Nooksack Water Budget project collaborators should be familiar with standards for data management described in this document, and the following issues related to generating and distributing data: 1. Standards for metadata and data formats 2. Plans for short-term storage and data management (i.e., file formats, local storage and back up procedures and security) 3. Legal and ethical issues (i.e., intellectual property, confidentiality of study participants) 4. Access policies and provisions (i.e., how the data will be made available to others, any restrictions needed) 5. Provisions for long-term archiving and preservation (i.e., establishment of a new data archive or utilization of an existing archive) 6. Assigned data management responsibilities (i.e., persons responsible for ensuring data Management, monitoring compliance with the Data Management Plan)
This resource is a subset of the Lower Nooksack Water Budget (LNWB) Collection Resource.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of closed cases that resulted in penalty assessments by EBSA since 2000. This data provides information on EBSA's enforcement programs to enforce ERISA's Form 5500 Annual Return/Report filing requirement focusing on deficient filers, late filers and non-filers.
Dataset tables listing: EBSA Data Dictionary, EBSA Metadata and EBSA OCATS.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the survey of income and program participation (sipp) with r if the census bureau's budget was gutted and only one complex sample survey survived, pray it's the survey of income and program participation (sipp). it's giant. it's rich with variables. it's monthly. it follows households over three, four, now five year panels. the congressional budget office uses it for their health insurance simulation . analysts read that sipp has person-month files, get scurred, and retreat to inferior options. the american community survey may be the mount everest of survey data, but sipp is most certainly the amazon. questions swing wild and free through the jungle canopy i mean core data dictionary. legend has it that there are still species of topical module variables that scientists like you have yet to analyze. ponce de león would've loved it here. ponce. what a name. what a guy. the sipp 2008 panel data started from a sample of 105,663 individuals in 42,030 households. once the sample gets drawn, the census bureau surveys one-fourth of the respondents every four months, over f our or five years (panel durations vary). you absolutely must read and understand pdf pages 3, 4, and 5 of this document before starting any analysis (start at the header 'waves and rotation groups'). if you don't comprehend what's going on, try their survey design tutorial. since sipp collects information from respondents regarding every month over the duration of the panel, you'll need to be hyper-aware of whether you want your results to be point-in-time, annualized, or specific to some other period. the analysis scripts below provide examples of each. at every four-month interview point, every respondent answers every core question for the previous four months. after that, wave-specific addenda (called topical modules) get asked, but generally only regarding a single prior month. to repeat: core wave files contain four records per person, topical modules contain one. if you stacked every core wave, you would have one record per person per month for the duration o f the panel. mmmassive. ~100,000 respondents x 12 months x ~4 years. have an analysis plan before you start writing code so you extract exactly what you need, nothing more. better yet, modify something of mine. cool? this new github repository contains eight, you read me, eight scripts: 1996 panel - download and create database.R 2001 panel - download and create database.R 2004 panel - download and create database.R 2008 panel - download and create database.R since some variables are character strings in one file and integers in anoth er, initiate an r function to harmonize variable class inconsistencies in the sas importation scripts properly handle the parentheses seen in a few of the sas importation scripts, because the SAScii package currently does not create an rsqlite database, initiate a variant of the read.SAScii
function that imports ascii data directly into a sql database (.db) download each microdata file - weights, topical modules, everything - then read 'em into sql 2008 panel - full year analysis examples.R< br /> define which waves and specific variables to pull into ram, based on the year chosen loop through each of twelve months, constructing a single-year temporary table inside the database read that twelve-month file into working memory, then save it for faster loading later if you like read the main and replicate weights columns into working memory too, merge everything construct a few annualized and demographic columns using all twelve months' worth of information construct a replicate-weighted complex sample design with a fay's adjustment factor of one-half, again save it for faster loading later, only if you're so inclined reproduce census-publish ed statistics, not precisely (due to topcoding described here on pdf page 19) 2008 panel - point-in-time analysis examples.R define which wave(s) and specific variables to pull into ram, based on the calendar month chosen read that interview point (srefmon)- or calendar month (rhcalmn)-based file into working memory read the topical module and replicate weights files into working memory too, merge it like you mean it construct a few new, exciting variables using both core and topical module questions construct a replicate-weighted complex sample design with a fay's adjustment factor of one-half reproduce census-published statistics, not exactly cuz the authors of this brief used the generalized variance formula (gvf) to calculate the margin of error - see pdf page 4 for more detail - the friendly statisticians at census recommend using the replicate weights whenever possible. oh hayy, now it is. 2008 panel - median value of household assets.R define which wave(s) and spe cific variables to pull into ram, based on the topical module chosen read the topical module and replicate weights files into working memory too, merge once again construct a replicate-weighted complex sample design with a...
The downloadall extension for CKAN enhances dataset accessibility by adding a "Download all" button to dataset pages. This feature enables users to download a single zip file containing all resource files associated with a dataset, along with a datapackage.json file that provides machine-readable metadata. The extension streamlines the data packaging and distribution process, ensuring data and its documentation are kept together. Key Features: Single-Click Download: Adds a "Download all" button to dataset pages, allowing users to download all resources and metadata in one go. Data Package Creation: Generates a datapackage.json file conforming to the Frictionless Data standard, including dataset metadata. Comprehensive Data Packaging: Packages all data files and datapackage.json into a single zip file to ensure usability. Data Dictionary Inclusion: If resources are stored in the DataStore (using xloader or datapusher), the datapackage.json will include the data dictionary (schema) of the data, specifying column types. Background Zip Creation: Utilises a CKAN background job to (re)create the zip file when a dataset is created, updated, or the data dictionary changes. The extension will detect updates when all data have been uploaded but only if the dataset is updated. Command-Line Interface: Includes a command-line interface for various operations. Technical Integration: The downloadall extension integrates into CKAN as a plugin, adding a new button to the dataset view. It depends on the CKAN background job worker to generate the zip files, and if used with DataStore and xloader (or datapusher), incorporates the data dictionary into the datapackage.json. The extension requires activation in the CKAN configuration file (production.ini). Specific CKAN versions are supported, primarily 2.7 and 2.8. Benefits & Impact: Implementing the downloadall extension can improve data accessibility and usability by providing a convenient way to download datasets and their associated metadata. It streamlines workflows for data analysts, researchers, and others who need comprehensive access to datasets and their documentation. The inclusion of machine-readable metadata in the form of a datapackage.json facilitates automation and standardisation in data processing and validation.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Resource Description: The dataset contains variables corresponding to availability, source (country, state and town if country is the United States), quality, and price (by weight or volume) of 13 fresh fruits and 32 fresh vegetables sold in farmers markets and grocery stores located in 5 Lower Mississippi Delta towns.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Delta Produce Sources Study data dictionary. File Name: DPS Data Dictionary Public.csvResource Description: This file is the data dictionary corresponding to the Delta Produce Sources Study dataset.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel
https://ora.ox.ac.uk/terms_of_usehttps://ora.ox.ac.uk/terms_of_use
The DMLBS is distinctive not only for the breadth of its coverage but also for the fact that it is wholly based on original research, i.e. on a fresh reading of medieval Latin texts for this specific purpose, where possible in the best available source, whether that be original manuscripts or modern critical editions. (The method is that used by other major dictionaries, such as the monumental Oxford English Dictionary and, for Latin, the Oxford Latin Dictionary, the Thesaurus Linguae Latinae.) In the nearly 50 years of drafting the Dictionary, different editorial practices and conventions have inevitably created a text that varies significantly from the earliest fascicules to the final ones while remaining recognizably the same underlying work. Many of these variations have been the result of conscious decisions, other simply the result of the Dictionary being the work of many people over many years.
Work on digitizing the Dictionary began in earnest in 2009, with a move from a traditional print-based workflow to an electronic XML-based workflow, first for material already drafted on paper slips but not yet keyed as electronic data, and then subsequently with the introduction of full ab initio electronic drafting.
However, even then the majority of the dictionary's content still existed only in print — in the thirteen fascicules (more than 2,500 three-column pages containing nearly 65,000 entries) published since 1965. Once the new workflow for the remaining material to be published was fully established within the project, work began on digitizing earlier fascicules; this work was undertaken by a specialist outside contractor, which captured these printed pages and tagged the material in accordance with the Dictionary schema. The captured material was then evaluated and corrected within the project. Plans for the project itself developing and hosting an online platform for the full dataset were discontinued in 2014 due to lack of technical support and funding, but partnerships have been established to ensure that online publication is achieved.
Technical Overview:
The DMLBS is held in XML according to customized XSD schemas. All data is held in unicode encoding.
Data structure: At the heart of the DMLBS XML workflow sit the data schemas which describe and are used to constrain the structure of the data. The DMLBS uses XSD schemas. The Dictionary data is represented essentially in the form in which it has been published in print. In addition to the schema for the Dictionary text, there is a further schema for the Dictionary's complex bibliography, which is also held in XML form. The schemas in use were custom-built for the DMLBS in order to match the project’s very specific needs, ensuring that the drafted or captured text always complies with the long-standing structures and conventions of the printed dictionary by requiring, allowing or prohibiting as necessary. (Although the use of TEI encoding was seriously considered, it was clear from initial exploration that the level of customization and optimization required to bring the TEI in line with the practical production needs of the dictionary was too great to be feasible.)
Data encoding and entry: The encoding chosen for all DMLBS data is Unicode. In addition to the Roman alphabet, with the full range of diacritics (including the macron and breve to mark vowel length), the Dictionary regularly uses Anglo-Saxon letters (such as thorn, wynn, and yogh) and polytonic Greek, along with assorted other letters and symbols. The ‘Dictionary of Medieval Latin from British Sources’ (DMLBS) was prepared by a project team of specialist researchers as a research project of the British Academy, overseen by a committee appointed by the Academy to direct its work. Initially based in London at the Public Record Office, the editorial team moved to Oxford in the early 1980s and since the late 1990s has formed part of the Faculty of Classics at Oxford University. The main aim of the DMLBS project has been to create a successor to the previous standard dictionary of medieval Latin, the Glossarium ... mediae et infimae Latinitatis, first compiled in the seventeenth century by the French scholar, Du Cange (Charles du Fresne), and a history of the project is available at http://www.dmlbs.ox.ac.uk/about-us/history-of-the-project and in Richard Ashdowne ‘Dictionary of Medieval Latin from British Sources’, British Academy Review 24 (2014), 46–53. The project has been supported financially by major research grants from the Arts & Humanities Research Council, the Packard Humanities Institute, and the OUP John Fell Research Fund, and by a small annual grant from the British Academy. It also received institutional support from the British Academy and the University of Oxford.
The Delta Produce Sources Study was an observational study designed to measure and compare food environments of farmers markets (n=3) and grocery stores (n=12) in 5 rural towns located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys from June 2019 to March 2020 using a modified version of the Nutrition Environment Measures Survey (NEMS) Farmers Market Audit tool. The tool was modified to collect information pertaining to source of fresh produce and also for use with both farmers markets and grocery stores. Availability, source, quality, and price information were collected and compared between farmers markets and grocery stores for 13 fresh fruits and 32 fresh vegetables via SAS software programming. Because the towns were not randomly selected and the sample sizes are relatively small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Resources in this dataset:Resource Title: Delta Produce Sources Study dataset . File Name: DPS Data Public.csvResource Description: The dataset contains variables corresponding to availability, source (country, state and town if country is the United States), quality, and price (by weight or volume) of 13 fresh fruits and 32 fresh vegetables sold in farmers markets and grocery stores located in 5 Lower Mississippi Delta towns.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Delta Produce Sources Study data dictionary. File Name: DPS Data Dictionary Public.csvResource Description: This file is the data dictionary corresponding to the Delta Produce Sources Study dataset.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset
level. This is also referred to as the package
in some CKAN documentation. This is the main
table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db
database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.
https://hdl.handle.net/20.500.14106/licence-otahttps://hdl.handle.net/20.500.14106/licence-ota
(:unav)...........................................
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Resource Title: FDH Data Dictionary File Name: FDH_Data_Dictionary.csv Resource Description: Data dictionary for the data compiled as a result of the efforts described in Ashworth et al. (2023) - Framework to Develop an Open-Source Forage Data Network to Improve Primary Productivity and Enhance System Resiliency (in review). Includes descriptions for the data fields in the FDH Data data file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description : This is an online edition of An Anglo-Saxon Dictionary, or a dictionary of "Old English". The dictionary records the state of the English language as it was used between ca. 700-1100 AD by the Anglo-Saxon inhabitants of the British Isles. This project is based on a digital edition of An Anglo-Saxon dictionary, based on the manuscript collections of the late Joseph Bosworth (the so called Main Volume, first edition 1898) and its Supplement (first edition 1921), edited by Joseph Bosworth and T. Northcote Toller, today the largest complete dictionary of Old English (one day to be hopefully supplanted by the DOE). Alistair Campbell's "enlarged addenda and corrigenda" from 1972 are not public domain and are therefore not part of the online dictionary. Please see the front & back matter of the paper dictionary for further information, prefaces and lists of references & contractions. The digitization project was initiated by Sean Crist in 2001 as a part of his Germanic Lexicon Project and many individuals and institutions have contributed to this project. Check out the original GLP webpage and the old Bosworth-Toller offline application webpage (to be updated). Currently the project is hosted by the Faculty of Arts, Charles University. In 2010, the data from the GLP were converted to create the current site. Care was taken to preserve the typography of the original dictionary, but also provide a modern, user friendly interface for contemporary users. In 2013, the entries were structurally re-tagged and the original typography was abandoned, though the immediate access to the scans of the paper dictionary was preserved. Our aim is to reach beyond a simple digital edition and create an online environment dedicated to all interested in Old English and Anglo-Saxon culture. Feel free to join in the editing of the Dictionary, commenting on its numerous entries or participating in the discussions at our forums. We hope that by drawing the attention of the community of Anglo-Saxonists to our site and joining our resources, we may create a more useful tool for everybody. The most immediate project to draw on the corrected and tagged data of the Dictionary is a Morphological Analyzer of Old English (currently under development). We are grateful for the generous support of the Charles University Grant Agency and for the free hosting at the Faculty of Arts at Charles University. The site is currently maintained and developed by Ondrej Tichy et al. at the Department of English Language and ELT Methodology, Faculty of Arts, Charles University in Prague (Czech Republic).
The Department of Housing Preservation and Development (HPD) reports on buildings, units, and projects that began after January 1, 2014 and are counted towards the Housing New York plan. The Housing New York Units by Building file presents this data by building, and includes building-level data, such as house number, street name, BBL, and BIN for each building in a project. The unit counts are provided by building. For additional documentation, including a data dictionary, review the attachments in the “About this Dataset” section of the Primer landing page.
This digital data release presents contour data from multiple subsurface geologic horizons as presented in previously published summaries of the regional subsurface configuration of the Michigan and Illinois Basins. The original maps that served as the source of the digital data within this geodatabase are from the Geological Society of America’s Decade of North American Geology project series, “The Geology of North America” volume D-2, chapter 13 “The Michigan Basin” and chapter 14 “Illinois Basin Region”. Contour maps in the original published chapters were generated from geophysical well logs (generally gamma-ray) and adapted from previously published contour maps. The published contour maps illustrated the distribution sedimentary strata within the Illinois and Michigan Basin in the context of the broad 1st order supercycles of L.L. Sloss including the Sauk, Tippecanoe, Kaskaskia, Absaroka, Zuni, and Tejas supersequences. Because these maps represent time-transgressive surfaces, contours frequently delineate the composite of multiple named sedimentary formations at once. Structure contour maps on the top of the Precambrian basement surface in both the Michigan and Illinois basins illustrate the general structural geometry which undergirds the sedimentary cover. Isopach maps of the Sauk 2 and 3, Tippecanoe 1 and 2, Kaskaskia 1 and 2, Absaroka, and Zuni sequences illustrate the broad distribution of sedimentary units in the Michigan Basin, as do isopach maps of the Sauk, Upper Sauk, Tippecanoe 1 and 2, Lower Kaskaskia 1, Upper Kaskaskia 1-Lower Kaskaskia 2, Kaskaskia 2, and Absaroka supersequences in the Illinois Basins. Isopach contours and structure contours were formatted and attributed as GIS data sets for use in digital form as part of U.S. Geological Survey’s ongoing effort to inventory, catalog, and release subsurface geologic data in geospatial form. This effort is part of a broad directive to develop 2D and 3D geologic information at detailed, national, and continental scales. This data approximates, but does not strictly follow the USGS National Cooperative Geologic Mapping Program's GeMS data structure schema for geologic maps. Structure contour lines and isopach contours for each supersequence are stored within separate “IsoValueLine” feature classes. These are distributed within a geographic information system geodatabase and are also saved as shapefiles. Contour data is provided in both feet and meters to maintain consistency with the original publication and for ease of use. Nonspatial tables define the data sources used, define terms used in the dataset, and describe the geologic units referenced herein. A tabular data dictionary describes the entity and attribute information for all attributes of the geospatial data and accompanying nonspatial tables.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand its contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a dataset. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search engine indexing to reach a broader audience of interested parties. This tutorial first explains the terminology and standards surrounding data dictionaries and codebooks. We then present a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared dataset accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we explain how to use freely available web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable (FAIR; Wilkinson et al., 2016).