64 datasets found
  1. d

    Open Data Dictionary Template Individual

    • opendata.dc.gov
    • catalog.data.gov
    • +1more
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Washington, DC (2023). Open Data Dictionary Template Individual [Dataset]. https://opendata.dc.gov/documents/cb6a686b1e344eeb8136d0103c942346
    Explore at:
    Dataset updated
    Jan 5, 2023
    Dataset authored and provided by
    City of Washington, DC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This template covers section 2.5 Resource Fields: Entity and Attribute Information of the Data Discovery Form cited in the Open Data DC Handbook (2022). It completes documentation elements that are required for publication. Each field column (attribute) in the dataset needs a description clarifying the contents of the column. Data originators are encouraged to enter the code values (domains) of the column to help end-users translate the contents of the column where needed, especially when lookup tables do not exist.

  2. d

    Data from: Data Dictionary Template

    • catalog.data.gov
    • data.amerigeoss.org
    • +6more
    Updated Mar 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2023). Data Dictionary Template [Dataset]. https://catalog.data.gov/dataset/data-dictionary-template-2e170
    Explore at:
    Dataset updated
    Mar 18, 2023
    Dataset provided by
    City of Tempe
    Description

    Data Dictionary template for Tempe Open Data.

  3. f

    Data Dictionary

    • mcri.figshare.com
    txt
    Updated Sep 6, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer Piscionere (2018). Data Dictionary [Dataset]. http://doi.org/10.25374/MCRI.7039280.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 6, 2018
    Dataset provided by
    Murdoch Childrens Research Institute
    Authors
    Jennifer Piscionere
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is a data dictionary example we will use in the MVP presentation. It can be deleted after 13/9/18.

  4. H

    Replication Code for: Proxy Advisory Firms and Corporate Shareholder...

    • dataverse.harvard.edu
    Updated Sep 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Replication Code for: Proxy Advisory Firms and Corporate Shareholder Engagement [Dataset]. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ABLKE4
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 5, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Joshua White
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Contains compressed file, "Proxy Advisory Firms and Corporate Shareholder Engagement.zip", which contains Stata code, Stata pseudo-datasets (to demonstrate format of data), and a data dictionary. Review of Financial Studies, forthcoming. (2024)

  5. l

    LScD (Leicester Scientific Dictionary)

    • figshare.le.ac.uk
    docx
    Updated Apr 15, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neslihan Suzen (2020). LScD (Leicester Scientific Dictionary) [Dataset]. http://doi.org/10.25392/leicester.data.9746900.v3
    Explore at:
    docxAvailable download formats
    Dataset updated
    Apr 15, 2020
    Dataset provided by
    University of Leicester
    Authors
    Neslihan Suzen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Leicester
    Description

    LScD (Leicester Scientific Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScD (Leicester Scientific Dictionary) is created from the updated LSC (Leicester Scientific Corpus) - Version 2*. All pre-processing steps applied to build the new version of the dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. After pre-processing steps, the total number of unique words in the new version of the dictionary is 972,060. The files provided with this description are also same as described as for LScD Version 2 below.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2** Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v2[Version 2] Getting StartedThis document provides the pre-processing steps for creating an ordered list of words from the LSC (Leicester Scientific Corpus) [1] and the description of LScD (Leicester Scientific Dictionary). This dictionary is created to be used in future work on the quantification of the meaning of research texts. R code for producing the dictionary from LSC and instructions for usage of the code are available in [2]. The code can be also used for list of texts from other sources, amendments to the code may be required.LSC is a collection of abstracts of articles and proceeding papers published in 2014 and indexed by the Web of Science (WoS) database [3]. Each document contains title, list of authors, list of categories, list of research areas, and times cited. The corpus contains only documents in English. The corpus was collected in July 2018 and contains the number of citations from publication date to July 2018. The total number of documents in LSC is 1,673,824.LScD is an ordered list of words from texts of abstracts in LSC.The dictionary stores 974,238 unique words, is sorted by the number of documents containing the word in descending order. All words in the LScD are in stemmed form of words. The LScD contains the following information:1.Unique words in abstracts2.Number of documents containing each word3.Number of appearance of a word in the entire corpusProcessing the LSCStep 1.Downloading the LSC Online: Use of the LSC is subject to acceptance of request of the link by email. To access the LSC for research purposes, please email to ns433@le.ac.uk. The data are extracted from Web of Science [3]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.Step 2.Importing the Corpus to R: The full R code for processing the corpus can be found in the GitHub [2].All following steps can be applied for arbitrary list of texts from any source with changes of parameter. The structure of the corpus such as file format and names (also the position) of fields should be taken into account to apply our code. The organisation of CSV files of LSC is described in README file for LSC [1].Step 3.Extracting Abstracts and Saving Metadata: Metadata that include all fields in a document excluding abstracts and the field of abstracts are separated. Metadata are then saved as MetaData.R. Fields of metadata are: List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.Step 4.Text Pre-processing Steps on the Collection of Abstracts: In this section, we presented our approaches to pre-process abstracts of the LSC.1.Removing punctuations and special characters: This is the process of substitution of all non-alphanumeric characters by space. We did not substitute the character “-” in this step, because we need to keep words like “z-score”, “non-payment” and “pre-processing” in order not to lose the actual meaning of such words. A processing of uniting prefixes with words are performed in later steps of pre-processing.2.Lowercasing the text data: Lowercasing is performed to avoid considering same words like “Corpus”, “corpus” and “CORPUS” differently. Entire collection of texts are converted to lowercase.3.Uniting prefixes of words: Words containing prefixes joined with character “-” are united as a word. The list of prefixes united for this research are listed in the file “list_of_prefixes.csv”. The most of prefixes are extracted from [4]. We also added commonly used prefixes: ‘e’, ‘extra’, ‘per’, ‘self’ and ‘ultra’.4.Substitution of words: Some of words joined with “-” in the abstracts of the LSC require an additional process of substitution to avoid losing the meaning of the word before removing the character “-”. Some examples of such words are “z-test”, “well-known” and “chi-square”. These words have been substituted to “ztest”, “wellknown” and “chisquare”. Identification of such words is done by sampling of abstracts form LSC. The full list of such words and decision taken for substitution are presented in the file “list_of_substitution.csv”.5.Removing the character “-”: All remaining character “-” are replaced by space.6.Removing numbers: All digits which are not included in a word are replaced by space. All words that contain digits and letters are kept because alphanumeric characters such as chemical formula might be important for our analysis. Some examples are “co2”, “h2o” and “21st”.7.Stemming: Stemming is the process of converting inflected words into their word stem. This step results in uniting several forms of words with similar meaning into one form and also saving memory space and time [5]. All words in the LScD are stemmed to their word stem.8.Stop words removal: Stop words are words that are extreme common but provide little value in a language. Some common stop words in English are ‘I’, ‘the’, ‘a’ etc. We used ‘tm’ package in R to remove stop words [6]. There are 174 English stop words listed in the package.Step 5.Writing the LScD into CSV Format: There are 1,673,824 plain processed texts for further analysis. All unique words in the corpus are extracted and written in the file “LScD.csv”.The Organisation of the LScDThe total number of words in the file “LScD.csv” is 974,238. Each field is described below:Word: It contains unique words from the corpus. All words are in lowercase and their stem forms. The field is sorted by the number of documents that contain words in descending order.Number of Documents Containing the Word: In this content, binary calculation is used: if a word exists in an abstract then there is a count of 1. If the word exits more than once in a document, the count is still 1. Total number of document containing the word is counted as the sum of 1s in the entire corpus.Number of Appearance in Corpus: It contains how many times a word occurs in the corpus when the corpus is considered as one large document.Instructions for R CodeLScD_Creation.R is an R script for processing the LSC to create an ordered list of words from the corpus [2]. Outputs of the code are saved as RData file and in CSV format. Outputs of the code are:Metadata File: It includes all fields in a document excluding abstracts. Fields are List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.File of Abstracts: It contains all abstracts after pre-processing steps defined in the step 4.DTM: It is the Document Term Matrix constructed from the LSC[6]. Each entry of the matrix is the number of times the word occurs in the corresponding document.LScD: An ordered list of words from LSC as defined in the previous section.The code can be used by:1.Download the folder ‘LSC’, ‘list_of_prefixes.csv’ and ‘list_of_substitution.csv’2.Open LScD_Creation.R script3.Change parameters in the script: replace with the full path of the directory with source files and the full path of the directory to write output files4.Run the full code.References[1]N. Suzen. (2019). LSC (Leicester Scientific Corpus) [Dataset]. Available: https://doi.org/10.25392/leicester.data.9449639.v1[2]N. Suzen. (2019). LScD-LEICESTER SCIENTIFIC DICTIONARY CREATION. Available: https://github.com/neslihansuzen/LScD-LEICESTER-SCIENTIFIC-DICTIONARY-CREATION[3]Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4]A. Thomas, "Common Prefixes, Suffixes and Roots," Center for Development and Learning, 2013.[5]C. Ramasubramanian and R. Ramya, "Effective pre-processing activities in text mining using improved porter’s stemming algorithm," International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, no. 12, pp. 4536-4538, 2013.[6]I. Feinerer, "Introduction to the tm Package Text Mining in R," Accessible en ligne: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf, 2013.

  6. Data from: Generalizable EHR-R-REDCap pipeline for a national...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +2more
    zip
    Updated Jan 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller (2022). Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry [Dataset]. http://doi.org/10.5061/dryad.rjdfn2zcm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2022
    Dataset provided by
    Massachusetts General Hospital
    Harvard Medical School
    Authors
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.

    Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.

    Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.

    Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.

    Methods eLAB Development and Source Code (R statistical software):

    eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).

    eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.

    Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.

    The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).

    Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.

    Data Dictionary (DD)

    EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.

    Study Cohort

    This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.

    Statistical Analysis

    OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.

  7. Data dictionary.docx

    • figshare.com
    docx
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monica Panca (2019). Data dictionary.docx [Dataset]. http://doi.org/10.6084/m9.figshare.7708016.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Monica Panca
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Healthcare resource utilisation and costs of agitation in people with dementia living in care homes in England

  8. H

    Dictionary of Titles

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahad Althobaiti; Ahmad Alabdulkareem; Judy Hanwen Shen; Iyad Rahwan; Esteban Moro; Alex Rutherford (2022). Dictionary of Titles [Dataset]. http://doi.org/10.7910/DVN/DQW8IP
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Shahad Althobaiti; Ahmad Alabdulkareem; Judy Hanwen Shen; Iyad Rahwan; Esteban Moro; Alex Rutherford
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Hand transcribed content from the United States Bureau of Labour Statistics Dictionary of Titles (DoT). The DoT is a record of occupations and a description of the tasks performed. Five editions exist from 1939, 1949, 1965, 1977 and 1991. The DoT was replaced by O*NET structured data on jobs, workers and their characteristics. However, apart from the 1991 data, the data in the DoT is not easily ingestible, existing only in scalar PDF documents. Attempts at Optical Character Recognition led to low accuracy. For that reason we present here hand transcribed textual data from these documents. Various data are available for each occupation e.g. numerical codes, references to other occupations as well as the free text description. For that reason the data for each edition is presented in 'long' format with a variable number of lines, with a blank line between occupations. Consult the transcription instructions for more details. Structured meta-data (see here) on occupations is also available for the 1965, 1977 and 1991 editions. For the 1965, 1977 and 1991 editions, this data can be extracted from the numerical codes with the occupational entries, the key for these codes is found in the 1965 edition in separate tables exist which were transcribed. The instructions provided to transcribers for this edition are also added to the repository. The original documents are freely available in PDF format (e.g. here) This data accompanies the paper 'Longitudinal Complex Dynamics of Labour Markets Reveal Increasing Polarisation' by Althobaiti et al

  9. Data from: US Federal LCA Commons Life Cycle Inventory Unit Process Template...

    • catalog.data.gov
    • gimi9.com
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2024). US Federal LCA Commons Life Cycle Inventory Unit Process Template [Dataset]. https://catalog.data.gov/dataset/us-federal-lca-commons-life-cycle-inventory-unit-process-template-3cc7d
    Explore at:
    Dataset updated
    Mar 30, 2024
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Area covered
    United States
    Description

    An excel template with data elements and conventions corresponding to the openLCA unit process data model. Includes LCA Commons data and metadata guidelines and definitions Resources in this dataset:Resource Title: READ ME - data dictionary. File Name: lcaCommonsSubmissionGuidelines_FINAL_2014-09-22.pdfResource Title: US Federal LCA Commons Life Cycle Inventory Unit Process Template. File Name: FedLCA_LCI_template_blank EK 7-30-2015.xlsxResource Description: Instructions: This template should be used for life cycle inventory (LCI) unit process development and is associated with an openLCA plugin to import these data into an openLCA database. See www.openLCA.org to download the latest release of openLCA for free, and to access available plugins.

  10. C

    Data Dictionary Migration Chain

    • ckan.mobidatalab.eu
    Updated Jul 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OverheidNl (2023). Data Dictionary Migration Chain [Dataset]. https://ckan.mobidatalab.eu/dataset/gegevenswoordenboek-migratieketen
    Explore at:
    http://publications.europa.eu/resource/authority/file-type/zipAvailable download formats
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    OverheidNl
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Since 2013, the Dutch Migration Chain has had a chain-wide data dictionary, the Data Dictionary Migration Chain (GMK). The Migration Chain consists of the following organisations: - Central Agency for the Reception of Asylum Seekers - Correctional Institutions Agency, Ministry of Justice and Security - Repatriation and Departure Service, Ministry of Justice and Security - Directorate-General for Migration, Ministry of Justice and Security - Immigration and Naturalization Service , Ministry of Justice and Security - International Organization for Migration - Royal Netherlands Marechaussee - Ministry of Foreign Affairs - National Police - Council of State - Council for the Judiciary - Netherlands Council for Refugees - Seaport Police. One of the principles in the basic starting architecture of the migration chain is that there is no difference of opinion about the meaning of the information that can be extracted from an integrated customer view. A uniform conceptual framework goes further than a glossary of the most important concepts: each shared data can be related to a concept in the conceptual framework; in the description of the concepts, the relations to each other are named. Chain parties have aligned their own conceptual frameworks with the uniform conceptual framework in the migration chain. The GMK is an overview of the common terminology used within the migration chain. This promotes a correct interpretation of the information exchanged within or reported on the processes of the migration chain. A correct interpretation of information prevents miscommunication, mistakes and errors. For users in the migration chain, the GMK is available on the non-public Rijksweb (gmk.vk.rijksweb.nl). In the context of openness and transparency, it has been decided to make the description of concepts and management information from the GMK accessible as open data. This means that the data via Data.overheid.nl is available and reusable for everyone. By making the data transparent, the ministry also hopes that publications by and about the work in the migration chain, for example the State of Migration, are easier to explain and provide context.

  11. Soil Survey Geographic Database (SSURGO)

    • agdatacommons.nal.usda.gov
    pdf
    Updated Feb 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA Natural Resources Conservation Service (2024). Soil Survey Geographic Database (SSURGO) [Dataset]. http://doi.org/10.15482/USDA.ADC/1242479
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Natural Resources Conservation Servicehttp://www.nrcs.usda.gov/
    United States Department of Agriculturehttp://usda.gov/
    Authors
    USDA Natural Resources Conservation Service
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The SSURGO database contains information about soil as collected by the National Cooperative Soil Survey over the course of a century. The information can be displayed in tables or as maps and is available for most areas in the United States and the Territories, Commonwealths, and Island Nations served by the USDA-NRCS (Natural Resources Conservation Service). The information was gathered by walking over the land and observing the soil. Many soil samples were analyzed in laboratories. The maps outline areas called map units. The map units describe soils and other components that have unique properties, interpretations, and productivity. The information was collected at scales ranging from 1:12,000 to 1:63,360. More details were gathered at a scale of 1:12,000 than at a scale of 1:63,360. The mapping is intended for natural resource planning and management by landowners, townships, and counties. Some knowledge of soils data and map scale is necessary to avoid misunderstandings. The maps are linked in the database to information about the component soils and their properties for each map unit. Each map unit may contain one to three major components and some minor components. The map units are typically named for the major components. Examples of information available from the database include available water capacity, soil reaction, electrical conductivity, and frequency of flooding; yields for cropland, woodland, rangeland, and pastureland; and limitations affecting recreational development, building site development, and other engineering uses. SSURGO datasets consist of map data, tabular data, and information about how the maps and tables were created. The extent of a SSURGO dataset is a soil survey area, which may consist of a single county, multiple counties, or parts of multiple counties. SSURGO map data can be viewed in the Web Soil Survey or downloaded in ESRI® Shapefile format. The coordinate systems are Geographic. Attribute data can be downloaded in text format that can be imported into a Microsoft® Access® database. A complete SSURGO dataset consists of:

    GIS data (as ESRI® Shapefiles) attribute data (dbf files - a multitude of separate tables) database template (MS Access format - this helps with understanding the structure and linkages of the various tables) metadata

    Resources in this dataset:Resource Title: SSURGO Metadata - Tables and Columns Report. File Name: SSURGO_Metadata_-_Tables_and_Columns.pdfResource Description: This report contains a complete listing of all columns in each database table. Please see SSURGO Metadata - Table Column Descriptions Report for more detailed descriptions of each column.

    Find the Soil Survey Geographic (SSURGO) web site at https://www.nrcs.usda.gov/wps/portal/nrcs/detail/vt/soils/?cid=nrcs142p2_010596#Datamart Title: SSURGO Metadata - Table Column Descriptions Report. File Name: SSURGO_Metadata_-_Table_Column_Descriptions.pdfResource Description: This report contains the descriptions of all columns in each database table. Please see SSURGO Metadata - Tables and Columns Report for a complete listing of all columns in each database table.

    Find the Soil Survey Geographic (SSURGO) web site at https://www.nrcs.usda.gov/wps/portal/nrcs/detail/vt/soils/?cid=nrcs142p2_010596#Datamart Title: SSURGO Data Dictionary. File Name: SSURGO 2.3.2 Data Dictionary.csvResource Description: CSV version of the data dictionary

  12. d

    Marine Pigments, Productivity, and Associated Chemistry Data Dictionary...

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact) (2025). Marine Pigments, Productivity, and Associated Chemistry Data Dictionary Release 3.0 from 1956-09-11 to 1992-04-30 (NCEI Accession 9400133) [Dataset]. https://catalog.data.gov/dataset/marine-pigments-productivity-and-associated-chemistry-data-dictionary-release-3-0-from-1956-09-
    Explore at:
    Dataset updated
    Mar 1, 2025
    Dataset provided by
    (Point of Contact)
    Description

    These data constitute Release 3.0 of the Marine Pigments, Productivity, and Associated Chemistry Data Dictionary, developed by Dr. William Balch and Mr. Charles Byrne of Rosenstiel School of Marine and Atmospheric Science, University of Miami, FL. The data format is designed for archiving phytoplankton pigment and productivity data in association with the Joint Global Ocean Flux Study (JGOFS). This version supersedes previous releases of these data. Data were collected during multiple studies world-wide. Parameters include: Date, position, originator, nitrate, phosphate, carbon, chlorophyll, phaeophytin, and other pigments, fluorescence/and or light transmission measurements, methodology, and climate. A complete file listing and the record format is included with the data. 192 files containing 18357 profiles of phytoplankton data were received from NODC's Ocean Climate Laboratory.

  13. E

    GlobalPhone Portuguese (Brazilian) Pronunciation Dictionary

    • live.european-language-grid.eu
    • catalogue.elra.info
    audio format
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GlobalPhone Portuguese (Brazilian) Pronunciation Dictionary [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/2389
    Explore at:
    audio formatAvailable download formats
    License

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Area covered
    Brazil
    Description

    The GlobalPhone pronunciation dictionaries, created within the framework of the multilingual speech and language corpus GlobalPhone, were developed in collaboration with the Karlsruhe Institute of Technology (KIT).

    The GlobalPhone pronunciation dictionaries contain the pronunciations of all word forms found in the transcription data of the GlobalPhone speech & text database. The pronunciation dictionaries are currently available in 18 languages: Arabic (29230 entries/27059 words), Bulgarian (20193 entries), Croatian (23497 entries/20628 words), Czech (33049 entries/32942 words), French (36837 entries/20710 words), German (48979 entries/46035 words), Hausa (42662 entries/42079 words), Japanese (18094 entries), Polish (36484 entries), Portuguese (Brazilian) (54146 entries/54130 words), Russian (28818 entries/27667 words), Spanish (Latin American) (43264 entries/33960 words), Swedish (about 25000 entries), Turkish (31330 entries/31087 words), Vietnamese (38504 entries/29974 words), Chinese-Mandarin (73388 pronunciations), Korean (3500 syllables), and Thai (a small set with 12,420 pronunciation entries of 12,420 different words, and does not include pronunciation variants, and a larger set which contains 25,570 pronunciation entries of 22,462 different words units, and includes 3,108 entries of up to four pronunciation variants).

    1) Dictionary Encoding: The pronunciation dictionary entries consist of full word forms and are either given in the original script of that language, mostly in UTF-8 encoding (Bulgarian, Croatian, Czech, French, Polish, Russian, Spanish, Thai) corresponding to the trl-files of the GlobalPhone transcriptions or in Romanized script (Arabic, German, Hausa, Japanese, Korean, Mandarin, Portuguese, Swedish, Turkish, Vietnamese) corresponding to the rmn-files of the GlobalPhone transcriptions, respectively. In the latter case the documentation mostly provides a mapping from the Romanized to the original script.

    2) Dictionary Phone set: The phone sets for each language were derived individually from the literature following best practices for automatic speech processing. Each phone set is explained and described in the documentation using the international standards of the International Phonetic Alphabet (IPA). For most languages a mapping to the language independent GlobalPhone naming conventions (indicated by “M_”) is provided for the purpose of data sharing across languages to build multilingual acoustic models.

    3) Dictionary Generation: Whenever the grapheme-to-phoneme relationship allowed, the dictionaries were created semi-automatically in a rule-based fashion using a set of grapheme-to-phoneme mapping rules. The number of rules highly depends on the language. After the automatic creation process, all dictionaries were manually cross-checked by native speakers, correcting potential errors of the automatic pronunciation generation process. Most of the dictionaries have been applied to large vocabulary speech recognition. In many cases the GlobalPhone dictionaries were compared to straight-forward grapheme-based speech recognition and to alternative sources, such as Wiktionary and usually demonstrated to be superior in terms of quality, coverage, and accuracy.

    4) Format: The format of the dictionaries is the same across languages and is straight-forward. Each line consists of one word form and its pronunciation separated by blank. The pronunciation consists of a concatenation of phone symbols separated by blanks. Both, words and their pronunciations are given in tcl-script list format, i.e. enclosed in “{}”, since phones can carry tags, indicating the tone and length of a vowel, or the word boundary tag “WB”, indicating the boundary of a dictionary unit. The WB tag can for example be included as a standard question in the decision tree questions for capturing crossword models in context-dependent modeling. Pronunciation variants are indicated by (

    5) Documentation: The pronunciation dictionaries for each language are complemented by a documentation that describes the format of the dictionary, the phone set including its mapping to the International Phonetic Alphabet (IPA), and the frequency distribution of the phones in the dictionary. Most of the pronunciation dictionaries have been successfully applied to large vocabulary speech recognition and references to publications are given when available.

  14. Messy Spreadsheet Example for Instruction

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Renata Gonçalves Curty; Renata Gonçalves Curty (2024). Messy Spreadsheet Example for Instruction [Dataset]. http://doi.org/10.5281/zenodo.12586563
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Renata Gonçalves Curty; Renata Gonçalves Curty
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 28, 2024
    Description

    A disorganized toy spreadsheet used for teaching good data organization. Learners are tasked with identifying as many errors as possible before creating a data dictionary and reconstructing the spreadsheet according to best practices.

  15. E

    GlobalPhone Japanese Pronunciation Dictionary

    • live.european-language-grid.eu
    • catalogue.elra.info
    audio format
    Updated Nov 24, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GlobalPhone Japanese Pronunciation Dictionary [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/2372
    Explore at:
    audio formatAvailable download formats
    Dataset updated
    Nov 24, 2014
    License

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    The GlobalPhone pronunciation dictionaries, created within the framework of the multilingual speech and language corpus GlobalPhone, were developed in collaboration with the Karlsruhe Institute of Technology (KIT).

    The GlobalPhone pronunciation dictionaries contain the pronunciations of all word forms found in the transcription data of the GlobalPhone speech & text database. The pronunciation dictionaries are currently available in 18 languages: Arabic (29230 entries/27059 words), Bulgarian (20193 entries), Croatian (23497 entries/20628 words), Czech (33049 entries/32942 words), French (36837 entries/20710 words), German (48979 entries/46035 words), Hausa (42662 entries/42079 words), Japanese (18094 entries), Polish (36484 entries), Portuguese (Brazilian) (54146 entries/54130 words), Russian (28818 entries/27667 words), Spanish (Latin American) (43264 entries/33960 words), Swedish (about 25000 entries), Turkish (31330 entries/31087 words), Vietnamese (38504 entries/29974 words), Chinese-Mandarin (73388 pronunciations), Korean (3500 syllables), and Thai (a small set with 12,420 pronunciation entries of 12,420 different words, and does not include pronunciation variants, and a larger set which contains 25,570 pronunciation entries of 22,462 different words units, and includes 3,108 entries of up to four pronunciation variants).

    1) Dictionary Encoding: The pronunciation dictionary entries consist of full word forms and are either given in the original script of that language, mostly in UTF-8 encoding (Bulgarian, Croatian, Czech, French, Polish, Russian, Spanish, Thai) corresponding to the trl-files of the GlobalPhone transcriptions or in Romanized script (Arabic, German, Hausa, Japanese, Korean, Mandarin, Portuguese, Swedish, Turkish, Vietnamese) corresponding to the rmn-files of the GlobalPhone transcriptions, respectively. In the latter case the documentation mostly provides a mapping from the Romanized to the original script.

    2) Dictionary Phone set: The phone sets for each language were derived individually from the literature following best practices for automatic speech processing. Each phone set is explained and described in the documentation using the international standards of the International Phonetic Alphabet (IPA). For most languages a mapping to the language independent GlobalPhone naming conventions (indicated by “M_”) is provided for the purpose of data sharing across languages to build multilingual acoustic models.

    3) Dictionary Generation: Whenever the grapheme-to-phoneme relationship allowed, the dictionaries were created semi-automatically in a rule-based fashion using a set of grapheme-to-phoneme mapping rules. The number of rules highly depends on the language. After the automatic creation process, all dictionaries were manually cross-checked by native speakers, correcting potential errors of the automatic pronunciation generation process. Most of the dictionaries have been applied to large vocabulary speech recognition. In many cases the GlobalPhone dictionaries were compared to straight-forward grapheme-based speech recognition and to alternative sources, such as Wiktionary and usually demonstrated to be superior in terms of quality, coverage, and accuracy.

    4) Format: The format of the dictionaries is the same across languages and is straight-forward. Each line consists of one word form and its pronunciation separated by blank. The pronunciation consists of a concatenation of phone symbols separated by blanks. Both, words and their pronunciations are given in tcl-script list format, i.e. enclosed in “{}”, since phones can carry tags, indicating the tone and length of a vowel, or the word boundary tag “WB”, indicating the boundary of a dictionary unit. The WB tag can for example be included as a standard question in the decision tree questions for capturing crossword models in context-dependent modeling. Pronunciation variants are indicated by (

    5) Documentation: The pronunciation dictionaries for each language are complemented by a documentation that describes the format of the dictionary, the phone set including its mapping to the International Phonetic Alphabet (IPA), and the frequency distribution of the phones in the dictionary. Most of the pronunciation dictionaries have been successfully applied to large vocabulary speech recognition and references to publications are given when available.

  16. c

    Digital database of a 3D Geological Model of the Powder River Basin and...

    • s.cnmilf.com
    • data.usgs.gov
    • +1more
    Updated Oct 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Digital database of a 3D Geological Model of the Powder River Basin and Williston Basin Regions, USA [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/digital-database-of-a-3d-geological-model-of-the-powder-river-basin-and-williston-basin-re
    Explore at:
    Dataset updated
    Oct 4, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Powder River Basin, United States
    Description

    This digital GIS dataset and accompanying nonspatial files synthesize model outputs from a regional-scale volumetric 3-D geologic model that portrays the generalized subsurface geology of the Powder River Basin and Williston Basin regions from a wide variety of input data sources. The study area includes the Hartville Uplift, Laramie Range, Bighorn Mountains, Powder River Basin, and Williston Basin. The model data released here consist of the stratigraphic contact elevation of major Phanerozoic sedimentary units that broadly define the geometry of the subsurface, the elevation of Tertiary intrusive and Precambrian basement rocks, and point data that illustrate an estimation of the three-dimensional geometry of fault surfaces. The presence of folds and unconformities are implied by the 3D geometry of the stratigraphic units, but these are not included as discrete features in this data release. The 3D geologic model was constructed from a wide variety of publicly available surface and subsurface geologic data; none of these input data are part of this Data Release, but data sources are thoroughly documented such that a user could obtain these data from other sources if desired. The PowderRiverWilliston3D geodatabase contains 40 subsurface horizons in raster format that represent the tops of modeled subsurface units, and a feature dataset “GeologicModel”. The GeologicModel feature dataset contains a feature class of 30 estimated faults served in elevation grid format (FaultPoints), a feature class illustrating the spatial extent of 22 fault blocks (FaultBlockFootprints), and a feature class containing a polygon delineating the study areas (ModelBoundary). Nonspatial tables define the data sources used (DataSources), define terms used in the dataset (Glossary), and provide a description of the modeled surfaces (DescriptionOfModelUnits). Separate file folders contain the vector data in shapefile format, the raster data in ASCII format, and the tables as comma-separated values. In addition, a tabular data dictionary describes the entity and attribute information for all attributes of the geospatial data and the accompanying nonspatial tables (EntityAndAttributes). An included READ_ME file documents the process of manipulating and interpreting publicly available surface and subsurface geologic data to create the model. It additionally contains critical information about model units, and uncertainty regarding their ability to predict true ground conditions. Accompanying this data release is the “PowderRiverWillistonInputSummaryTable.csv”, which tabulates the global settings for each fault block, the stratigraphic horizons modeled in each fault block, the types and quantity of data inputs for each stratigraphic horizon, and then the settings associated with each data input.

  17. E

    Generation Scotland SFHS Data Dictionary

    • find.data.gov.scot
    • dtechtive.com
    csv, jpg, pdf, txt +2
    Updated Jan 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. School of Molecular, Genetic and Population Health Sciences. Institute of Genetics and Molecular Medicine (2018). Generation Scotland SFHS Data Dictionary [Dataset]. http://doi.org/10.7488/ds/2277
    Explore at:
    jpg(1.082 MB), xlsx(0.0731 MB), csv(0.0003 MB), csv(0.0033 MB), csv(0.0008 MB), txt(0.0166 MB), pdf(0.1808 MB), txt(0.0002 MB), txt(0.0021 MB), xls(0.2178 MB), csv(0.1004 MB)Available download formats
    Dataset updated
    Jan 5, 2018
    Dataset provided by
    University of Edinburgh. School of Molecular, Genetic and Population Health Sciences. Institute of Genetics and Molecular Medicine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    UNITED KINGDOM
    Description

    The GS:SFHS Data Dictionary is a set of information describing the contents, format, and structure of the phenotype data collected during recruitment (2006-2011) to the Generation Scotland Scottish Family Health Study (GS:SFHS), or derived subsequently from study data collected during recruitment. This dataset replaces the one at https://datashare.is.ed.ac.uk/handle/10283/2724

  18. d

    Data from: ESS-DIVE Reporting Format for File-level Metadata

    • search.dataone.org
    • data.ess-dive.lbl.gov
    • +2more
    Updated Oct 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Terri Velliquette; Jessica Welch; Michael Crow; Ranjeet Devarakonda; Susan Heinz; Robert Crystal-Ornelas (2021). ESS-DIVE Reporting Format for File-level Metadata [Dataset]. https://search.dataone.org/view/ess-dive-a95fac98da3b481-20210928T175904096
    Explore at:
    Dataset updated
    Oct 8, 2021
    Dataset provided by
    ESS-DIVE
    Authors
    Terri Velliquette; Jessica Welch; Michael Crow; Ranjeet Devarakonda; Susan Heinz; Robert Crystal-Ornelas
    Time period covered
    Jan 1, 2020 - Sep 30, 2021
    Description

    The ESS-DIVE reporting format for file-level metadata (FLMD) provides granular information at the data file level to describe the contents, scope, and structure of the data file to enable comparison of data files within a data package. The FLMD are fully consistent with and augment the metadata collected at the data package level. We developed the FLMD template based on a review of a small number of existing FLMD in use at other agencies and repositories with valuable input from the Environmental Systems Science (ESS) Community. Also included is a template for a CSV Data Dictionary where users can provide file-level information about the contents of a CSV data file (e.g., define column names, provide units). Files are in .csv, .xlsx, and .md. Templates are in both .csv and .xlsx (open with e.g. Microsoft Excel, LibreOffice, or Google Sheets). Open the .md files by downloading and using a text editor (e.g. Notepad or TextEdit). Though we provide Excel templates for the file-level metadata reporting format, our instructions encourage users to 'Save the FLMD template as a CSV following the CSV Reporting Format guidance'. In addition, we developed the ESS-DIVE File Level Metadata Extractor which is a lightweight python script that can extract some FLMD fields following the recommended FLMD format and structure.

  19. m

    Asset database for the Hunter subregion on 24 February 2016

    • demo.dev.magda.io
    • researchdata.edu.au
    • +2more
    Updated Aug 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2023). Asset database for the Hunter subregion on 24 February 2016 [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-7674a664-16fd-4ebc-b560-e34ba5e910c4
    Explore at:
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    Bioregional Assessment Program
    Description

    Abstract The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this …Show full descriptionAbstract The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived. Asset database for the Hunter subregion on 24 February 2016 (V2.5) supersedes the previous version of the HUN Asset database V2.4 (Asset database for the Hunter subregion on 20 November 2015, GUID: 0bbcd7f6-2d09-418c-9549-8cbd9520ce18). It contains the Asset database (HUN_asset_database_20160224.mdb), a Geodatabase version for GIS mapping purposes (HUN_asset_database_20160224_GISOnly.gdb), the draft Water Dependent Asset Register spreadsheet (BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20160224.xlsx), a data dictionary (HUN_asset_database_doc_20160224.doc), and a folder (NRM_DOC) containing documentation associated with the Water Asset Information Tool (WAIT) process as outlined below. This version should be used for Materiality Test (M2) test. The Asset database is registered to the BA repository as an ESRI personal goedatabase (.mdb - doubling as a MS Access database) that can store, query, and manage non-spatial data while the spatial data is in a separate file geodatabase joined by AID/ElementID. Under the BA program, a spatial assets database is developed for each defined bioregional assessment project. The spatial elements that underpin the identification of water dependent assets are identified in the first instance by regional NRM organisations (via the WAIT tool) and supplemented with additional elements from national and state/territory government datasets. A report on the WAIT process for the Hunter is included in the zip file as part of this dataset. Elements are initially included in the preliminary assets database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). Elements are then grouped into assets which are evaluated by project teams to determine whether they meet the second Materiality Test (M2). Assets meeting both Materiality Tests comprise the water dependent asset list. Descriptions of the assets identified in the Hunter subregion are found in the "AssetList" table of the database. Assets are the spatial features used by project teams to model scenarios under the BA program. Detailed attribution does not exist at the asset level. Asset attribution includes only the core set of BA-derived attributes reflecting the BA classification hierarchy, as described in Appendix A of "HUN_asset_database_doc_20160224.doc ", located in this filet. The "Element_to_Asset" table contains the relationships and identifies the elements that were grouped to create each asset. Detailed information describing the database structure and content can be found in the document "HUN_asset_database_doc_20160224.doc" located in this file. Some of the source data used in the compilation of this dataset is restricted. The public version of this asset database can be accessed via the following dataset: Asset database for the Hunter subregion on 24 February 2016 Public 20170112 v02 (https://data.gov.au/data/dataset/9d16592c-543b-42d9-a1f4-0f6d70b9ffe7) Dataset History OBJECTID VersionID Notes Date_ 1 1 Initial database. 29/08/2014 3 1.1 Update the classification for seven identical assets from Gloucester subregion 16/09/2014 4 1.2 Added in NSW GDEs from Hunter - Central Rivers GDE mapping from NSW DPI (50 635 polygons). 28/01/2015 5 1.3 New AIDs assiged to NSW GDE assets (Existing AID + 20000) to avoid duplication of AIDs assigned in other databases. 12/02/2015 6 1.4 "(1) Add 20 additional datasets required by HUN assessment project team after HUN community workshop (2) Turn off previous GW point assets (AIDs from 7717-7810 inclusive) (3) Turn off new GW point asset (AID: 0) (4) Assets (AIDs: 8023-8026) are duplicated to 4 assets (AID: 4747,4745,4744,4743 respectively) in NAM subregion . Their AID, Asset Name, Group, SubGroup, Depth, Source, ListDate and Geometry are using values from that NAM assets. (5) Asset (AID 8595) is duplicated to 1 asset ( AID 57) in GLO subregion . Its AID, Asset Name, Group, SubGroup, Depth, Source, ListDate and Geometry are using values from that GLO assets. (6) 39 assets (AID from 2969 to 5040) are from NAM Asset database and their attributes were updated to use the latest attributes from NAM asset database (7)The databases, especially spatial database, were changed such as duplicated attributes fields in spatial data were removed and only ID field is kept. The user needs to join the Table Assetlist or Elementlist to the spatial data" 16/06/2015 7 2 "(1) Updated 131 new GW point assets with previous AID and some of them may include different element number due to the change of 77 FTypes requested by Hunter assessment project team (2) Added 104 EPBC assets, which were assessed and excluded by ERIN (3) Merged 30 Darling Hardyhead assets to one (asset AID 60140) and deleted another 29 (4) Turned off 5 assets from community workshop (60358 - 60362) as they are duplicated to 5 assets from 104 EPBC excluded assets (5) Updated M2 test results (6) Asset Names (AID: 4743 and 4747) were changed as requested by Hunter assessment project team (4 lower cases to 4 upper case only). Those two assets are from Namoi asset database and their asset names may not match with original names in Namoi asset database. (7)One NSW WSP asset (AID: 60814) was added in as requested by Hunter assessment project team. The process method (without considering 1:M relation) for this asset is not robust and is different to other NSW WSP assets. It should NOT use for other subregions. (8) Queries of Find_All_Used_Assets and Find_All_WD_Assets in the asset database can be used to extract all used assts and all water dependant assts" 20/07/2015 8 2.1 "(1) There are following six assets (in Hun subregion), which is same as 6 assets in GIP subregion. Their AID, Asset Name, Group, SubGroup, Depth, Source and ListDate are using values from GIP assets. You will not see AIDs from AID_from_HUN in whole HUN asset datable and spreadsheet anymore and you only can see AIDs from AID_from_GIP ( Actually (a) AID 11636 is GIP got from MBC (B) only AID, Asset Name and ListDate are different and changed) (2) For BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20150827.xlsx, (a) Extracted long ( >255 characters) WD rationale for 19 assets (AIDs: 8682,9065,9073,9087,9088,9100,9102,9103,60000,60001,60792,60793,60801,60713,60739,60751,60764,60774,60812 ) in tab "Water-dependent asset register" and 37 assets (AIDs: 5040,8651,8677,8682,8650,8686,8687,8718,8762,9094,9065,9067,9073,9077,9081,9086,9087,9088,9100,9102,9103,60000,60001,60739,60742,60751,60713,60764,60771, 60774,60792,60793,60798,60801,60809,60811,60812) in tab "Asset list" in 1.30 Excel file (b) recreated draft BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20150827.xlsx (3) Modified queries (Find_All_Asset_List and Find_Waterdependent_asset_register) for (2)(a)" 27/08/2015 9 2.2 "(1) Updated M2 results from the internal review for 386 Sociocultural assets (2)Updated the class to Ecological/Vegetation/Habitat (potential species distribution) for assets/elements from sources of WAIT_ALA_ERIN, NSW_TSEC, NSW_DPI_Fisheries_DarlingHardyhead" 8/09/2015 10 2.3 "(1) Updated M2 results from the internal review * Changed "Assessment team do not say No" to "All economic assets are by definition water dependent" * Changed "Assessment team say No" : to "These are water dependent, but excluded by the project team based on intersection with the PAE is negligible" * Changed "Rivertyles" to "RiverStyles"" 22/09/2015 11 2.4 "(1) Updated M2 test results for 86 assets from the external review (2) Updated asset names for two assets (AID: 8642 and 8643) required from the external review (3) Created Draft Water Dependent Asset Register file using the template V5" 20/11/2015 12 2.5 "Total number of registered water assets was increased by 1 (= +2-1) due to: Two assets changed M2 test from "No" to "Yes" , but one asset assets changed M2 test from "Yes" to "No" from the review done by Ecologist group." 24/02/2016 Dataset Citation Bioregional Assessment Programme (2015) Asset database for the Hunter subregion on 24 February 2016. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/a39290ac-3925-4abc-9ecb-b91e911f008f. Dataset Ancestors Derived From GW Element Bores with Unknown FTYPE Hunter NSW Office of Water 20150514 Derived From Travelling Stock Route Conservation Values Derived From Spatial Threatened Species and Communities (TESC) NSW 20131129 Derived From NSW Wetlands Derived From Climate Change Corridors Coastal North East NSW Derived From Communities of National Environmental Significance Database - RESTRICTED - Metadata only Derived From Climate Change Corridors for Nandewar and New England Tablelands Derived From National Groundwater Dependent Ecosystems (GDE) Atlas Derived From Asset database for the Hunter subregion on 27 August 2015 Derived From Birds Australia - Important Bird Areas (IBA) 2009 Derived From Estuarine Macrophytes of Hunter Subregion NSW DPI Hunter 2004 Derived From Hunter CMA GDEs (DRAFT DPI pre-release) Derived From Camerons Gorge Grassy White Box Endangered Ecological Community (EEC) 2008 Derived From NSW Office of Water Surface Water Licences Processed for Hunter v1 20140516 Derived From Fauna Corridors for North East NSW Derived From Asset database for the Hunter subregion on 12 February 2015 Derived From New South Wales NSW Regional CMA Water Asset Information WAIT tool databases,

  20. Open Data Inventory

    • open.canada.ca
    • ouvert.canada.ca
    • +1more
    csv, html, xls
    Updated Dec 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Treasury Board of Canada Secretariat (2024). Open Data Inventory [Dataset]. https://open.canada.ca/data/en/dataset/4ed351cf-95d8-4c10-97ac-6b3511f359b7
    Explore at:
    csv, html, xlsAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
    Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Building a comprehensive data inventory as required by section 6.3 of the Directive on Open Government: “Establishing and maintaining comprehensive inventories of data and information resources of business value held by the department to determine their eligibility and priority, and to plan for their effective release.” Creating a data inventory is among the first steps in identifying federal data that is eligible for release. Departmental data inventories has been published on the Open Government portal, Open.Canada.ca, so that Canadians can see what federal data is collected and have the opportunity to indicate what data is of most interest to them, helping departments to prioritize data releases based on both external demand and internal capacity. The objective of the inventory is to provide a landscape of all federal data. While it is recognized that not all data is eligible for release due to the nature of the content, departments are responsible for identifying and including all datasets of business values as part of the inventory exercise with the exception of datasets whose title contains information that should not be released to be released to the public due to security or privacy concerns. These titles have been excluded from the inventory. Departments were provided with an open data inventory template with standardized elements to populate, and upload in the metadata catalogue, the Open Government Registry. These elements are described in the data dictionary file. Departments are responsible for maintaining up-to-date data inventories that reflect significant additions to their data holdings. For purposes of this open data inventory exercise, a dataset is defined as: “An organized collection of data used to carry out the business of a department or agency, that can be understood alone or in conjunction with other datasets”. Please note that the Open Data Inventory is no longer being maintained by Government of Canada organizations and is therefore not being updated. However, we will continue to provide access to the dataset for review and analysis.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
City of Washington, DC (2023). Open Data Dictionary Template Individual [Dataset]. https://opendata.dc.gov/documents/cb6a686b1e344eeb8136d0103c942346

Open Data Dictionary Template Individual

Explore at:
Dataset updated
Jan 5, 2023
Dataset authored and provided by
City of Washington, DC
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Description

This template covers section 2.5 Resource Fields: Entity and Attribute Information of the Data Discovery Form cited in the Open Data DC Handbook (2022). It completes documentation elements that are required for publication. Each field column (attribute) in the dataset needs a description clarifying the contents of the column. Data originators are encouraged to enter the code values (domains) of the column to help end-users translate the contents of the column where needed, especially when lookup tables do not exist.

Search
Clear search
Close search
Google apps
Main menu