An excel template with data elements and conventions corresponding to the openLCA unit process data model. Includes LCA Commons data and metadata guidelines and definitions Resources in this dataset:Resource Title: READ ME - data dictionary. File Name: lcaCommonsSubmissionGuidelines_FINAL_2014-09-22.pdfResource Title: US Federal LCA Commons Life Cycle Inventory Unit Process Template. File Name: FedLCA_LCI_template_blank EK 7-30-2015.xlsxResource Description: Instructions: This template should be used for life cycle inventory (LCI) unit process development and is associated with an openLCA plugin to import these data into an openLCA database. See www.openLCA.org to download the latest release of openLCA for free, and to access available plugins.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LScD (Leicester Scientific Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScD (Leicester Scientific Dictionary) is created from the updated LSC (Leicester Scientific Corpus) - Version 2*. All pre-processing steps applied to build the new version of the dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. After pre-processing steps, the total number of unique words in the new version of the dictionary is 972,060. The files provided with this description are also same as described as for LScD Version 2 below.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2** Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v2[Version 2] Getting StartedThis document provides the pre-processing steps for creating an ordered list of words from the LSC (Leicester Scientific Corpus) [1] and the description of LScD (Leicester Scientific Dictionary). This dictionary is created to be used in future work on the quantification of the meaning of research texts. R code for producing the dictionary from LSC and instructions for usage of the code are available in [2]. The code can be also used for list of texts from other sources, amendments to the code may be required.LSC is a collection of abstracts of articles and proceeding papers published in 2014 and indexed by the Web of Science (WoS) database [3]. Each document contains title, list of authors, list of categories, list of research areas, and times cited. The corpus contains only documents in English. The corpus was collected in July 2018 and contains the number of citations from publication date to July 2018. The total number of documents in LSC is 1,673,824.LScD is an ordered list of words from texts of abstracts in LSC.The dictionary stores 974,238 unique words, is sorted by the number of documents containing the word in descending order. All words in the LScD are in stemmed form of words. The LScD contains the following information:1.Unique words in abstracts2.Number of documents containing each word3.Number of appearance of a word in the entire corpusProcessing the LSCStep 1.Downloading the LSC Online: Use of the LSC is subject to acceptance of request of the link by email. To access the LSC for research purposes, please email to ns433@le.ac.uk. The data are extracted from Web of Science [3]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.Step 2.Importing the Corpus to R: The full R code for processing the corpus can be found in the GitHub [2].All following steps can be applied for arbitrary list of texts from any source with changes of parameter. The structure of the corpus such as file format and names (also the position) of fields should be taken into account to apply our code. The organisation of CSV files of LSC is described in README file for LSC [1].Step 3.Extracting Abstracts and Saving Metadata: Metadata that include all fields in a document excluding abstracts and the field of abstracts are separated. Metadata are then saved as MetaData.R. Fields of metadata are: List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.Step 4.Text Pre-processing Steps on the Collection of Abstracts: In this section, we presented our approaches to pre-process abstracts of the LSC.1.Removing punctuations and special characters: This is the process of substitution of all non-alphanumeric characters by space. We did not substitute the character “-” in this step, because we need to keep words like “z-score”, “non-payment” and “pre-processing” in order not to lose the actual meaning of such words. A processing of uniting prefixes with words are performed in later steps of pre-processing.2.Lowercasing the text data: Lowercasing is performed to avoid considering same words like “Corpus”, “corpus” and “CORPUS” differently. Entire collection of texts are converted to lowercase.3.Uniting prefixes of words: Words containing prefixes joined with character “-” are united as a word. The list of prefixes united for this research are listed in the file “list_of_prefixes.csv”. The most of prefixes are extracted from [4]. We also added commonly used prefixes: ‘e’, ‘extra’, ‘per’, ‘self’ and ‘ultra’.4.Substitution of words: Some of words joined with “-” in the abstracts of the LSC require an additional process of substitution to avoid losing the meaning of the word before removing the character “-”. Some examples of such words are “z-test”, “well-known” and “chi-square”. These words have been substituted to “ztest”, “wellknown” and “chisquare”. Identification of such words is done by sampling of abstracts form LSC. The full list of such words and decision taken for substitution are presented in the file “list_of_substitution.csv”.5.Removing the character “-”: All remaining character “-” are replaced by space.6.Removing numbers: All digits which are not included in a word are replaced by space. All words that contain digits and letters are kept because alphanumeric characters such as chemical formula might be important for our analysis. Some examples are “co2”, “h2o” and “21st”.7.Stemming: Stemming is the process of converting inflected words into their word stem. This step results in uniting several forms of words with similar meaning into one form and also saving memory space and time [5]. All words in the LScD are stemmed to their word stem.8.Stop words removal: Stop words are words that are extreme common but provide little value in a language. Some common stop words in English are ‘I’, ‘the’, ‘a’ etc. We used ‘tm’ package in R to remove stop words [6]. There are 174 English stop words listed in the package.Step 5.Writing the LScD into CSV Format: There are 1,673,824 plain processed texts for further analysis. All unique words in the corpus are extracted and written in the file “LScD.csv”.The Organisation of the LScDThe total number of words in the file “LScD.csv” is 974,238. Each field is described below:Word: It contains unique words from the corpus. All words are in lowercase and their stem forms. The field is sorted by the number of documents that contain words in descending order.Number of Documents Containing the Word: In this content, binary calculation is used: if a word exists in an abstract then there is a count of 1. If the word exits more than once in a document, the count is still 1. Total number of document containing the word is counted as the sum of 1s in the entire corpus.Number of Appearance in Corpus: It contains how many times a word occurs in the corpus when the corpus is considered as one large document.Instructions for R CodeLScD_Creation.R is an R script for processing the LSC to create an ordered list of words from the corpus [2]. Outputs of the code are saved as RData file and in CSV format. Outputs of the code are:Metadata File: It includes all fields in a document excluding abstracts. Fields are List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.File of Abstracts: It contains all abstracts after pre-processing steps defined in the step 4.DTM: It is the Document Term Matrix constructed from the LSC[6]. Each entry of the matrix is the number of times the word occurs in the corresponding document.LScD: An ordered list of words from LSC as defined in the previous section.The code can be used by:1.Download the folder ‘LSC’, ‘list_of_prefixes.csv’ and ‘list_of_substitution.csv’2.Open LScD_Creation.R script3.Change parameters in the script: replace with the full path of the directory with source files and the full path of the directory to write output files4.Run the full code.References[1]N. Suzen. (2019). LSC (Leicester Scientific Corpus) [Dataset]. Available: https://doi.org/10.25392/leicester.data.9449639.v1[2]N. Suzen. (2019). LScD-LEICESTER SCIENTIFIC DICTIONARY CREATION. Available: https://github.com/neslihansuzen/LScD-LEICESTER-SCIENTIFIC-DICTIONARY-CREATION[3]Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4]A. Thomas, "Common Prefixes, Suffixes and Roots," Center for Development and Learning, 2013.[5]C. Ramasubramanian and R. Ramya, "Effective pre-processing activities in text mining using improved porter’s stemming algorithm," International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, no. 12, pp. 4536-4538, 2013.[6]I. Feinerer, "Introduction to the tm Package Text Mining in R," Accessible en ligne: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf, 2013.
Overview: The Lower Nooksack Water Budget Project involved assembling a wide range of existing data related to WRIA 1 and specifically the Lower Nooksack Subbasin, updating existing data sets and generating new data sets. This Data Management Plan provides an overview of the data sets, formats and collaboration environment that was used to develop the project. Use of a plan during development of the technical work products provided a forum for the data development and management to be conducted with transparent methods and processes. At project completion, the Data Management Plan provides an accessible archive of the data resources used and supporting information on the data storage, intended access, sharing and re-use guidelines.
One goal of the Lower Nooksack Water Budget project is to make this “usable technical information” as accessible as possible across technical, policy and general public users. The project data, analyses and documents will be made available through the WRIA 1 Watershed Management Project website http://wria1project.org. This information is intended for use by the WRIA 1 Joint Board and partners working to achieve the adopted goals and priorities of the WRIA 1 Watershed Management Plan.
Model outputs for the Lower Nooksack Water Budget are summarized by sub-watersheds (drainages) and point locations (nodes). In general, due to changes in land use over time and changes to available streamflow and climate data, the water budget for any watershed needs to be updated periodically. Further detailed information about data sources is provided in review packets developed for specific technical components including climate, streamflow and groundwater level, soils and land cover, and water use.
Purpose: This project involves assembling a wide range of existing data related to the WRIA 1 and specifically the Lower Nooksack Subbasin, updating existing data sets and generating new data sets. Data will be used as input to various hydrologic, climatic and geomorphic components of the Topnet-Water Management (WM) model, but will also be available to support other modeling efforts in WRIA 1. Much of the data used as input to the Topnet model is publicly available and maintained by others, (i.e., USGS DEMs and streamflow data, SSURGO soils data, University of Washington gridded meteorological data). Pre-processing is performed to convert these existing data into a format that can be used as input to the Topnet model. Post-processing of Topnet model ASCII-text file outputs is subsequently combined with spatial data to generate GIS data that can be used to create maps and illustrations of the spatial distribution of water information. Other products generated during this project will include documentation of methods, input by WRIA 1 Joint Board Staff Team during review and comment periods, communication tools developed for public engagement and public comment on the project.
In order to maintain an organized system of developing and distributing data, Lower Nooksack Water Budget project collaborators should be familiar with standards for data management described in this document, and the following issues related to generating and distributing data: 1. Standards for metadata and data formats 2. Plans for short-term storage and data management (i.e., file formats, local storage and back up procedures and security) 3. Legal and ethical issues (i.e., intellectual property, confidentiality of study participants) 4. Access policies and provisions (i.e., how the data will be made available to others, any restrictions needed) 5. Provisions for long-term archiving and preservation (i.e., establishment of a new data archive or utilization of an existing archive) 6. Assigned data management responsibilities (i.e., persons responsible for ensuring data Management, monitoring compliance with the Data Management Plan)
This resource is a subset of the LNWB Ch03 Data Processes Collection Resource.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.
Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.
Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.
Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.
Methods eLAB Development and Source Code (R statistical software):
eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).
eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.
Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.
The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).
Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.
Data Dictionary (DD)
EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.
Study Cohort
This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.
Statistical Analysis
OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.
Abstract The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this …Show full descriptionAbstract The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived. Asset database for the Hunter subregion on 24 February 2016 (V2.5) supersedes the previous version of the HUN Asset database V2.4 (Asset database for the Hunter subregion on 20 November 2015, GUID: 0bbcd7f6-2d09-418c-9549-8cbd9520ce18). It contains the Asset database (HUN_asset_database_20160224.mdb), a Geodatabase version for GIS mapping purposes (HUN_asset_database_20160224_GISOnly.gdb), the draft Water Dependent Asset Register spreadsheet (BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20160224.xlsx), a data dictionary (HUN_asset_database_doc_20160224.doc), and a folder (NRM_DOC) containing documentation associated with the Water Asset Information Tool (WAIT) process as outlined below. This version should be used for Materiality Test (M2) test. The Asset database is registered to the BA repository as an ESRI personal goedatabase (.mdb - doubling as a MS Access database) that can store, query, and manage non-spatial data while the spatial data is in a separate file geodatabase joined by AID/ElementID. Under the BA program, a spatial assets database is developed for each defined bioregional assessment project. The spatial elements that underpin the identification of water dependent assets are identified in the first instance by regional NRM organisations (via the WAIT tool) and supplemented with additional elements from national and state/territory government datasets. A report on the WAIT process for the Hunter is included in the zip file as part of this dataset. Elements are initially included in the preliminary assets database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). Elements are then grouped into assets which are evaluated by project teams to determine whether they meet the second Materiality Test (M2). Assets meeting both Materiality Tests comprise the water dependent asset list. Descriptions of the assets identified in the Hunter subregion are found in the "AssetList" table of the database. Assets are the spatial features used by project teams to model scenarios under the BA program. Detailed attribution does not exist at the asset level. Asset attribution includes only the core set of BA-derived attributes reflecting the BA classification hierarchy, as described in Appendix A of "HUN_asset_database_doc_20160224.doc ", located in this filet. The "Element_to_Asset" table contains the relationships and identifies the elements that were grouped to create each asset. Detailed information describing the database structure and content can be found in the document "HUN_asset_database_doc_20160224.doc" located in this file. Some of the source data used in the compilation of this dataset is restricted. The public version of this asset database can be accessed via the following dataset: Asset database for the Hunter subregion on 24 February 2016 Public 20170112 v02 (https://data.gov.au/data/dataset/9d16592c-543b-42d9-a1f4-0f6d70b9ffe7) Dataset History OBJECTID VersionID Notes Date_ 1 1 Initial database. 29/08/2014 3 1.1 Update the classification for seven identical assets from Gloucester subregion 16/09/2014 4 1.2 Added in NSW GDEs from Hunter - Central Rivers GDE mapping from NSW DPI (50 635 polygons). 28/01/2015 5 1.3 New AIDs assiged to NSW GDE assets (Existing AID + 20000) to avoid duplication of AIDs assigned in other databases. 12/02/2015 6 1.4 "(1) Add 20 additional datasets required by HUN assessment project team after HUN community workshop (2) Turn off previous GW point assets (AIDs from 7717-7810 inclusive) (3) Turn off new GW point asset (AID: 0) (4) Assets (AIDs: 8023-8026) are duplicated to 4 assets (AID: 4747,4745,4744,4743 respectively) in NAM subregion . Their AID, Asset Name, Group, SubGroup, Depth, Source, ListDate and Geometry are using values from that NAM assets. (5) Asset (AID 8595) is duplicated to 1 asset ( AID 57) in GLO subregion . Its AID, Asset Name, Group, SubGroup, Depth, Source, ListDate and Geometry are using values from that GLO assets. (6) 39 assets (AID from 2969 to 5040) are from NAM Asset database and their attributes were updated to use the latest attributes from NAM asset database (7)The databases, especially spatial database, were changed such as duplicated attributes fields in spatial data were removed and only ID field is kept. The user needs to join the Table Assetlist or Elementlist to the spatial data" 16/06/2015 7 2 "(1) Updated 131 new GW point assets with previous AID and some of them may include different element number due to the change of 77 FTypes requested by Hunter assessment project team (2) Added 104 EPBC assets, which were assessed and excluded by ERIN (3) Merged 30 Darling Hardyhead assets to one (asset AID 60140) and deleted another 29 (4) Turned off 5 assets from community workshop (60358 - 60362) as they are duplicated to 5 assets from 104 EPBC excluded assets (5) Updated M2 test results (6) Asset Names (AID: 4743 and 4747) were changed as requested by Hunter assessment project team (4 lower cases to 4 upper case only). Those two assets are from Namoi asset database and their asset names may not match with original names in Namoi asset database. (7)One NSW WSP asset (AID: 60814) was added in as requested by Hunter assessment project team. The process method (without considering 1:M relation) for this asset is not robust and is different to other NSW WSP assets. It should NOT use for other subregions. (8) Queries of Find_All_Used_Assets and Find_All_WD_Assets in the asset database can be used to extract all used assts and all water dependant assts" 20/07/2015 8 2.1 "(1) There are following six assets (in Hun subregion), which is same as 6 assets in GIP subregion. Their AID, Asset Name, Group, SubGroup, Depth, Source and ListDate are using values from GIP assets. You will not see AIDs from AID_from_HUN in whole HUN asset datable and spreadsheet anymore and you only can see AIDs from AID_from_GIP ( Actually (a) AID 11636 is GIP got from MBC (B) only AID, Asset Name and ListDate are different and changed) (2) For BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20150827.xlsx, (a) Extracted long ( >255 characters) WD rationale for 19 assets (AIDs: 8682,9065,9073,9087,9088,9100,9102,9103,60000,60001,60792,60793,60801,60713,60739,60751,60764,60774,60812 ) in tab "Water-dependent asset register" and 37 assets (AIDs: 5040,8651,8677,8682,8650,8686,8687,8718,8762,9094,9065,9067,9073,9077,9081,9086,9087,9088,9100,9102,9103,60000,60001,60739,60742,60751,60713,60764,60771, 60774,60792,60793,60798,60801,60809,60811,60812) in tab "Asset list" in 1.30 Excel file (b) recreated draft BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20150827.xlsx (3) Modified queries (Find_All_Asset_List and Find_Waterdependent_asset_register) for (2)(a)" 27/08/2015 9 2.2 "(1) Updated M2 results from the internal review for 386 Sociocultural assets (2)Updated the class to Ecological/Vegetation/Habitat (potential species distribution) for assets/elements from sources of WAIT_ALA_ERIN, NSW_TSEC, NSW_DPI_Fisheries_DarlingHardyhead" 8/09/2015 10 2.3 "(1) Updated M2 results from the internal review * Changed "Assessment team do not say No" to "All economic assets are by definition water dependent" * Changed "Assessment team say No" : to "These are water dependent, but excluded by the project team based on intersection with the PAE is negligible" * Changed "Rivertyles" to "RiverStyles"" 22/09/2015 11 2.4 "(1) Updated M2 test results for 86 assets from the external review (2) Updated asset names for two assets (AID: 8642 and 8643) required from the external review (3) Created Draft Water Dependent Asset Register file using the template V5" 20/11/2015 12 2.5 "Total number of registered water assets was increased by 1 (= +2-1) due to: Two assets changed M2 test from "No" to "Yes" , but one asset assets changed M2 test from "Yes" to "No" from the review done by Ecologist group." 24/02/2016 Dataset Citation Bioregional Assessment Programme (2015) Asset database for the Hunter subregion on 24 February 2016. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/a39290ac-3925-4abc-9ecb-b91e911f008f. Dataset Ancestors Derived From GW Element Bores with Unknown FTYPE Hunter NSW Office of Water 20150514 Derived From Travelling Stock Route Conservation Values Derived From Spatial Threatened Species and Communities (TESC) NSW 20131129 Derived From NSW Wetlands Derived From Climate Change Corridors Coastal North East NSW Derived From Communities of National Environmental Significance Database - RESTRICTED - Metadata only Derived From Climate Change Corridors for Nandewar and New England Tablelands Derived From National Groundwater Dependent Ecosystems (GDE) Atlas Derived From Asset database for the Hunter subregion on 27 August 2015 Derived From Birds Australia - Important Bird Areas (IBA) 2009 Derived From Estuarine Macrophytes of Hunter Subregion NSW DPI Hunter 2004 Derived From Hunter CMA GDEs (DRAFT DPI pre-release) Derived From Camerons Gorge Grassy White Box Endangered Ecological Community (EEC) 2008 Derived From NSW Office of Water Surface Water Licences Processed for Hunter v1 20140516 Derived From Fauna Corridors for North East NSW Derived From Asset database for the Hunter subregion on 12 February 2015 Derived From New South Wales NSW Regional CMA Water Asset Information WAIT tool databases,
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The SSURGO database contains information about soil as collected by the National Cooperative Soil Survey over the course of a century. The information can be displayed in tables or as maps and is available for most areas in the United States and the Territories, Commonwealths, and Island Nations served by the USDA-NRCS (Natural Resources Conservation Service). The information was gathered by walking over the land and observing the soil. Many soil samples were analyzed in laboratories. The maps outline areas called map units. The map units describe soils and other components that have unique properties, interpretations, and productivity. The information was collected at scales ranging from 1:12,000 to 1:63,360. More details were gathered at a scale of 1:12,000 than at a scale of 1:63,360. The mapping is intended for natural resource planning and management by landowners, townships, and counties. Some knowledge of soils data and map scale is necessary to avoid misunderstandings. The maps are linked in the database to information about the component soils and their properties for each map unit. Each map unit may contain one to three major components and some minor components. The map units are typically named for the major components. Examples of information available from the database include available water capacity, soil reaction, electrical conductivity, and frequency of flooding; yields for cropland, woodland, rangeland, and pastureland; and limitations affecting recreational development, building site development, and other engineering uses. SSURGO datasets consist of map data, tabular data, and information about how the maps and tables were created. The extent of a SSURGO dataset is a soil survey area, which may consist of a single county, multiple counties, or parts of multiple counties. SSURGO map data can be viewed in the Web Soil Survey or downloaded in ESRI® Shapefile format. The coordinate systems are Geographic. Attribute data can be downloaded in text format that can be imported into a Microsoft® Access® database. A complete SSURGO dataset consists of:
GIS data (as ESRI® Shapefiles) attribute data (dbf files - a multitude of separate tables) database template (MS Access format - this helps with understanding the structure and linkages of the various tables) metadata
Resources in this dataset:Resource Title: SSURGO Metadata - Tables and Columns Report. File Name: SSURGO_Metadata_-_Tables_and_Columns.pdfResource Description: This report contains a complete listing of all columns in each database table. Please see SSURGO Metadata - Table Column Descriptions Report for more detailed descriptions of each column.
Find the Soil Survey Geographic (SSURGO) web site at https://www.nrcs.usda.gov/wps/portal/nrcs/detail/vt/soils/?cid=nrcs142p2_010596#Datamart Title: SSURGO Metadata - Table Column Descriptions Report. File Name: SSURGO_Metadata_-_Table_Column_Descriptions.pdfResource Description: This report contains the descriptions of all columns in each database table. Please see SSURGO Metadata - Tables and Columns Report for a complete listing of all columns in each database table.
Find the Soil Survey Geographic (SSURGO) web site at https://www.nrcs.usda.gov/wps/portal/nrcs/detail/vt/soils/?cid=nrcs142p2_010596#Datamart Title: SSURGO Data Dictionary. File Name: SSURGO 2.3.2 Data Dictionary.csvResource Description: CSV version of the data dictionary
This digital GIS dataset and accompanying nonspatial files synthesize model outputs from a regional-scale volumetric 3-D geologic model that portrays the generalized subsurface geology of the Powder River Basin and Williston Basin regions from a wide variety of input data sources. The study area includes the Hartville Uplift, Laramie Range, Bighorn Mountains, Powder River Basin, and Williston Basin. The model data released here consist of the stratigraphic contact elevation of major Phanerozoic sedimentary units that broadly define the geometry of the subsurface, the elevation of Tertiary intrusive and Precambrian basement rocks, and point data that illustrate an estimation of the three-dimensional geometry of fault surfaces. The presence of folds and unconformities are implied by the 3D geometry of the stratigraphic units, but these are not included as discrete features in this data release. The 3D geologic model was constructed from a wide variety of publicly available surface and subsurface geologic data; none of these input data are part of this Data Release, but data sources are thoroughly documented such that a user could obtain these data from other sources if desired. The PowderRiverWilliston3D geodatabase contains 40 subsurface horizons in raster format that represent the tops of modeled subsurface units, and a feature dataset “GeologicModel”. The GeologicModel feature dataset contains a feature class of 30 estimated faults served in elevation grid format (FaultPoints), a feature class illustrating the spatial extent of 22 fault blocks (FaultBlockFootprints), and a feature class containing a polygon delineating the study areas (ModelBoundary). Nonspatial tables define the data sources used (DataSources), define terms used in the dataset (Glossary), and provide a description of the modeled surfaces (DescriptionOfModelUnits). Separate file folders contain the vector data in shapefile format, the raster data in ASCII format, and the tables as comma-separated values. In addition, a tabular data dictionary describes the entity and attribute information for all attributes of the geospatial data and the accompanying nonspatial tables (EntityAndAttributes). An included READ_ME file documents the process of manipulating and interpreting publicly available surface and subsurface geologic data to create the model. It additionally contains critical information about model units, and uncertainty regarding their ability to predict true ground conditions. Accompanying this data release is the “PowderRiverWillistonInputSummaryTable.csv”, which tabulates the global settings for each fault block, the stratigraphic horizons modeled in each fault block, the types and quantity of data inputs for each stratigraphic horizon, and then the settings associated with each data input.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a list containing sense IDs from Open Slovene WordNet 1.0 (OSWN; http://hdl.handle.net/11356/1888) and the Digital Dictionary Database of Slovene (DDDS) developed by the Centre for Language Resources and Technologies of the University of Ljubljana.
The file consists of four columns containing the following data:
The list allows the end user to access OSWN data through the DDDS API (documented at https://wiki.cjvt.si/books/digital-dictionary-database/chapter/rest-api), namely which senses and lexical units from DDDS are assigned to a certain synset ID in OSWN.
This digital data release presents contour data from multiple subsurface geologic horizons as presented in previously published summaries of the regional subsurface configuration of the Michigan and Illinois Basins. The original maps that served as the source of the digital data within this geodatabase are from the Geological Society of America’s Decade of North American Geology project series, “The Geology of North America” volume D-2, chapter 13 “The Michigan Basin” and chapter 14 “Illinois Basin Region”. Contour maps in the original published chapters were generated from geophysical well logs (generally gamma-ray) and adapted from previously published contour maps. The published contour maps illustrated the distribution sedimentary strata within the Illinois and Michigan Basin in the context of the broad 1st order supercycles of L.L. Sloss including the Sauk, Tippecanoe, Kaskaskia, Absaroka, Zuni, and Tejas supersequences. Because these maps represent time-transgressive surfaces, contours frequently delineate the composite of multiple named sedimentary formations at once. Structure contour maps on the top of the Precambrian basement surface in both the Michigan and Illinois basins illustrate the general structural geometry which undergirds the sedimentary cover. Isopach maps of the Sauk 2 and 3, Tippecanoe 1 and 2, Kaskaskia 1 and 2, Absaroka, and Zuni sequences illustrate the broad distribution of sedimentary units in the Michigan Basin, as do isopach maps of the Sauk, Upper Sauk, Tippecanoe 1 and 2, Lower Kaskaskia 1, Upper Kaskaskia 1-Lower Kaskaskia 2, Kaskaskia 2, and Absaroka supersequences in the Illinois Basins. Isopach contours and structure contours were formatted and attributed as GIS data sets for use in digital form as part of U.S. Geological Survey’s ongoing effort to inventory, catalog, and release subsurface geologic data in geospatial form. This effort is part of a broad directive to develop 2D and 3D geologic information at detailed, national, and continental scales. This data approximates, but does not strictly follow the USGS National Cooperative Geologic Mapping Program's GeMS data structure schema for geologic maps. Structure contour lines and isopach contours for each supersequence are stored within separate “IsoValueLine” feature classes. These are distributed within a geographic information system geodatabase and are also saved as shapefiles. Contour data is provided in both feet and meters to maintain consistency with the original publication and for ease of use. Nonspatial tables define the data sources used, define terms used in the dataset, and describe the geologic units referenced herein. A tabular data dictionary describes the entity and attribute information for all attributes of the geospatial data and accompanying nonspatial tables.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset
level. This is also referred to as the package
in some CKAN documentation. This is the main
table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db
database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.
This data set holds the publicly-available version of the database of water-dependent assets that was compiled for the bioregional assessment (BA) of the Galilee subregion as part of the Bioregional Assessment Technical Programme. Though all life is dependent on water, for the purposes of a bioregional assessment, a water-dependent asset is an asset potentially impacted by changes in the groundwater and/or surface water regime due to coal resource development. The water must be other than local rainfall. Examples include wetlands, rivers, bores and groundwater dependent ecosystems.
The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets including Natural Resource Management regions, and Australian and state and territory government databases. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived. A single asset is represented spatially in the asset database by single or multiple spatial features (point, line or polygon). Individual points, lines or polygons are termed elements.
This dataset contains the unrestricted publicly-available components of spatial and non-spatial (attribute) data of the (restricted) Asset database for the Galilee subregion on 04 January 2016 (12ff5782-a3d9-40e8-987c-520d5fa366dd);. The database is provided primarily as an ESRI File geodatabase (.gdb), which is able to be opened in readily available open source software such as QGIS. Other formats include the Microsoft Access database (.mdb in ESRI Personal Geodatabase format), industry-standard ESRI Shapefiles and tab-delimited text files of all the attribute tables.
The restricted version of the Galilee Asset database has a total count of 403 918 Elements and 4 426 Assets. In the public version of the Asset GalileeGalilee database 13759 spatial Element features (~3%) have been removed from the Element List and Element Layer(s) and 352 spatial Assets (~8%) have been removed from the spatial Asset Layer(s)
The elements/assets removed from the restricted Asset Database are from the following data sources:
1) Environmental Asset Database - Commonwealth Environmental Water Office - RESTRICTED (Metadata only) (29fd1654-8aa1-4cb3-b65e-0b37698ac9a6)
2) Key Environmental Assets - KEA - of the Murray Darling Basin RESTRICTED (Metadata only) (9948195e-3d3b-49dc-96d2-ea7765297308)
3) Species Profile and Threats Database (SPRAT) - RESTRICTED - Metadata only) (7276dd93-cc8c-4c01-8df0-cef743c72112)
4) Australia, Register of the National Estate (RNE) - Spatial Database (RNESDB) (Internal 878f6780-be97-469b-8517-54bd12a407d0)
5) Communities of National Environmental Significance Database - RESTRICTED - Metadata only (c01c4693-0a51-4dbc-bbbd-7a07952aa5f6)
These important assets are included in the bioregional assessment, but are unable to be publicly distributed by the Bioregional Assessment Programme due to restrictions in their licensing conditions. Please note that many of these data sets are available directly from their custodian. For more precise details please see the associated explanatory Data Dictionary document enclosed with this dataset.
The public version of the asset database retains all of the unrestricted components of the Asset database for the Galilee subregion on 04 January 2016 - any material that is unable to be published or redistributed to a third party by the BA Programme has been removed from the database. The data presented corresponds to the assets published Cooper subregion product 1.3: Description of the water-dependent asset register and asset list for the Galilee subregion on 04 January 2016, and the associated Water-dependent asset register and asset list for the Galilee subregion on 04 January 2016.
Individual spatial features or elements are initially included in database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). In accordance to BA submethodology M02: Compiling water-dependent assets, individual spatial elements are then grouped into assets which are evaluated by project teams to determine whether they meet materiality test 2 (M2), which are assets that are considered to be water dependent.
Following delivery of the first pass asset list, project teams make a determination as to whether an asset (comprised of one or more elements) is water dependent, as assessed against the materiality tests detailed in the BA Methodology. These decisions are provided to ERIN by the assessment team and incorporated into the AssetList table in the Asset database.
Development of the Asset Register from the Asset database:
Decisions for M0 (fit for BA purpose), M1 (PAE) and M2 (water dependent) determine which assets are included in the "asset list" and "water-dependent asset register" which are published as Product 1.3.
The rule sets are applied as follows:
M0 M1 M2 Result
No n/a n/a Asset is not included in the asset list or the water-dependent asset register
(≠ No) No n/a Asset is not included in the asset list or the water-dependent asset register
(≠ No) Yes No Asset included in published asset list but not in water dependent asset register
(≠ No) Yes Yes Asset included in both asset list and water-dependent asset register
Assessment teams are then able to use the database to assign receptors and impact variables to water-dependent assets and the development of a receptor register as detailed in BA submethodology M03: Assigning receptors to water-dependent assets and the receptor register is then incorporated into the asset database.
At this stage of its development, the for the Galilee subregion on 04 January 2016, which this document describes, does not contain any receptor information.
Bioregional Assessment Programme (2013) Asset database for the Galilee subregion on 04 January 2016 Public. Bioregional Assessment Derived Dataset. Viewed 10 December 2018, http://data.bioregionalassessments.gov.au/dataset/eb4cf797-9b8f-4dff-9d7a-a5dfbc8d2bed.
Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements 20131204
Derived From Queensland QLD - Regional - NRM - Water Asset Information Tool - WAIT - databases
Derived From Matters of State environmental significance (version 4.1), Queensland
Derived From Communities of National Environmental Significance Database - RESTRICTED - Metadata only
Derived From South Australia SA - Regional - NRM Board - Water Asset Information Tool - WAIT - databases
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas
Derived From National Groundwater Information System (NGIS) v1.1
Derived From Birds Australia - Important Bird Areas (IBA) 2009
Derived From Queensland QLD Regional CMA Water Asset Information WAIT tool databases RESTRICTED Includes ALL Reports
Derived From Asset database for the Galilee subregion on 04 January 2016
Derived From Environmental Asset Database - Commonwealth Environmental Water Office
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas (including WA)
Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements linked to bores v3 03122014
Derived From QLD Dept of Natural Resources and Mines, Surface Water Entitlements 131204
Derived From Ramsar Wetlands of Australia
Derived From Permanent and Semi-Permanent Waterbodies of the Lake Eyre Basin (Queensland and South Australia) (DRAFT)
Derived From Asset database for the Galilee subregion on 2 December 2014
Derived From Key Environmental Assets - KEA - of the Murray Darling Basin
Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements linked to bores and NGIS v4 28072014
Derived From National Heritage List Spatial Database (NHL) (v2.1)
Derived From Great Artesian Basin and Laura Basin groundwater recharge areas
Derived From [QLD DNRM Licence Locations
This digital dataset was created as part of a U.S. Geological Survey study, done in cooperation with the Monterey County Water Resource Agency, to conduct a hydrologic resource assessment and develop an integrated numerical hydrologic model of the hydrologic system of Salinas Valley, CA. As part of this larger study, the USGS developed this digital dataset of geologic data and three-dimensional hydrogeologic framework models, referred to here as the Salinas Valley Geological Framework (SVGF), that define the elevation, thickness, extent, and lithology-based texture variations of nine hydrogeologic units in Salinas Valley, CA. The digital dataset includes a geospatial database that contains two main elements as GIS feature datasets: (1) input data to the 3D framework and textural models, within a feature dataset called “ModelInput”; and (2) interpolated elevation, thicknesses, and textural variability of the hydrogeologic units stored as arrays of polygonal cells, within a feature dataset called “ModelGrids”. The model input data in this data release include stratigraphic and lithologic information from water, monitoring, and oil and gas wells, as well as data from selected published cross sections, point data derived from geologic maps and geophysical data, and data sampled from parts of previous framework models. Input surface and subsurface data have been reduced to points that define the elevation of the top of each hydrogeologic units at x,y locations; these point data, stored in a GIS feature class named “ModelInputData”, serve as digital input to the framework models. The location of wells used a sources of subsurface stratigraphic and lithologic information are stored within the GIS feature class “ModelInputData”, but are also provided as separate point feature classes in the geospatial database. Faults that offset hydrogeologic units are provided as a separate line feature class. Borehole data are also released as a set of tables, each of which may be joined or related to well location through a unique well identifier present in each table. Tables are in Excel and ascii comma-separated value (CSV) format and include separate but related tables for well location, stratigraphic information of the depths to top and base of hydrogeologic units intercepted downhole, downhole lithologic information reported at 10-foot intervals, and information on how lithologic descriptors were classed as sediment texture. Two types of geologic frameworks were constructed and released within a GIS feature dataset called “ModelGrids”: a hydrostratigraphic framework where the elevation, thickness, and spatial extent of the nine hydrogeologic units were defined based on interpolation of the input data, and (2) a textural model for each hydrogeologic unit based on interpolation of classed downhole lithologic data. Each framework is stored as an array of polygonal cells: essentially a “flattened”, two-dimensional representation of a digital 3D geologic framework. The elevation and thickness of the hydrogeologic units are contained within a single polygon feature class SVGF_3DHFM, which contains a mesh of polygons that represent model cells that have multiple attributes including XY location, elevation and thickness of each hydrogeologic unit. Textural information for each hydrogeologic unit are stored in a second array of polygonal cells called SVGF_TextureModel. The spatial data are accompanied by non-spatial tables that describe the sources of geologic information, a glossary of terms, a description of model units that describes the nine hydrogeologic units modeled in this study. A data dictionary defines the structure of the dataset, defines all fields in all spatial data attributer tables and all columns in all nonspatial tables, and duplicates the Entity and Attribute information contained in the metadata file. Spatial data are also presented as shapefiles. Downhole data from boreholes are released as a set of tables related by a unique well identifier, tables are in Excel and ascii comma-separated value (CSV) format.
This digital data release contains gridded elevation surfaces for twenty-six (26) subsurface horizons, a grid of the estimated thickness of strata eroded during the Cenozoic, and fault traces at the level of the Precambrian surface from a previously published 3D geologic model of the Anadarko Basin Province (Higley and others, 2014). In the original release of the 3D model, elevation surfaces were exported to a Zmap interchange file format, potentially limiting access to the data for users without access to specialized software. In this digital data release, elevation surfaces are provided in more readily accessible formats and modeled horizons are given more thorough stratigraphic descriptions than provided in the original model documentation. Within the AnadarkoBasin_Higley geodatabase, the GeologicMap feature dataset contains a line feature class (ContactsAndFaults) containing fault traces at the level of the Precambrian surface, a polyline representing the approximate Anadarko Basin boundary, and model area boundary digitized from the original publication; a polygon feature dataset (MapUnitPolys) with the approximate Anadarko Basin boundary and the model area boundary; and raster datasets for the 26 subsurface horizons and a single thickness grid representing the estimated eroded thickness of strata. Nonspatial tables define the data sources used (DataSources), define terms used in the dataset (Glossary), and provide a description of the modeled surfaces (DescriptionOfMapUnits) that provides the user with far greater stratigraphic detail than the original publication. Separate file folders contain the vector data in shapefile format, the raster data in ASCII and GeoTiff file formats, and the tables as comma-separated values file format. In addition, a tabular data dictionary describes the entity and attribute information for all attributes of the geospatial data and the accompanying nonspatial tables (EntityAndAttributes). Elevation surfaces exported from the 3D model in Zmap interchange file format and additional datasets are available through the original publication (Higley and others, 2014: https://pubs.usgs.gov/dds/dds-069/dds-069-ee/).
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Abstract This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied. Hydstra Water Levels 9-6-12 The …Show full descriptionAbstract This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied. Hydstra Water Levels 9-6-12 The Hydstra data for NSW Office of Water (NoW) was extracted on 9 June 2012 although the latest water level reading date is April 2012. The Hydstra groundwater database was provided by NoW and was imported into MS Access from the HYDMEAS database file (dbf). The Hydstra database stores groundwater level information from ~ 11,800 bores. This includes the NSW monitoring network of bores and private bores (eg. Stock bores with a water level reading). Site information on the bores has not been included as this data is available from the NoW NGIS. The NGIS has been quality checked and should be used for all site data. The Hydstra data should only be used for bores that are not included in the WDTF. Of the 4000 extra bores in hydstra, many water level readings have been taken at the time of drilling and many are once off readings. These data have not been validated. Any information obtained from the Hydstra data should be assessed for its quality prior to inclusion in a Bioregional Assessment. There is no data dictionary that came with the hydstra data. Groundwater levels are variable 110. Variable 110 = Depth to groundwater from the measurement point. The measurement point can then be derived from the NGIS which will have an elevation Field = TSRefElev. Wdtf NoW 19-11-12 Through the Water Regulations, a full extract of NSW monitoring data was provided in WDTF in November 2012, although the latest water level reading date is September 2012. WDTF was provided for ~7,800 monitoring bores across NSW. This data was extracted from the original WDTF using python scripts, into the MS Access database. Site information on the bores has not been included as this data is available from the NoW NGIS. The NGIS has been quality checked and should be used for all site data. The WDTF is provided to the Bureau by NoW as "Validated" data. Therefore, WDTF should be used as the primary source for groundwater level data. Dataset History Hydstra Water Levels 9-6-12 The Hydstra data for NSW Office of Water (NoW) was extracted on 9 June 2012 from the NSW corporate groundwater database although the latest water level reading date is April 2012. The Hydstra groundwater database was provided by NoW and was imported into MS Access from the HYDMEAS database file (dbf). Wdtf NoW 19-11-12 Through the Water Regulations, a full extract of NSW monitoring data was provided in WDTF in November 2012, although the latest water level reading date is September 2012. Dataset Citation NSW Office of Water (2012) NSW Groundwater Database (Water Data Transfer Format and Hydstra). Bioregional Assessment Source Dataset. Viewed 28 September 2017, http://data.bioregionalassessments.gov.au/dataset/75390050-09da-46ad-b342-dbc98982aafd.
The National Child Development Study (NCDS) is a continuing longitudinal study that seeks to follow the lives of all those living in Great Britain who were born in one particular week in 1958. The aim of the study is to improve understanding of the factors affecting human development over the whole lifespan.
The NCDS has its origins in the Perinatal Mortality Survey (PMS) (the original PMS study is held at the UK Data Archive under SN 2137). This study was sponsored by the National Birthday Trust Fund and designed to examine the social and obstetric factors associated with stillbirth and death in early infancy among the 17,000 children born in England, Scotland and Wales in that one week. Selected data from the PMS form NCDS sweep 0, held alongside NCDS sweeps 1-3, under SN 5565.
Survey and Biomeasures Data (GN 33004):
To date there have been nine attempts to trace all members of the birth cohort in order to monitor their physical, educational and social development. The first three sweeps were carried out by the National Children's Bureau, in 1965, when respondents were aged 7, in 1969, aged 11, and in 1974, aged 16 (these sweeps form NCDS1-3, held together with NCDS0 under SN 5565). The fourth sweep, also carried out by the National Children's Bureau, was conducted in 1981, when respondents were aged 23 (held under SN 5566). In 1985 the NCDS moved to the Social Statistics Research Unit (SSRU) - now known as the Centre for Longitudinal Studies (CLS). The fifth sweep was carried out in 1991, when respondents were aged 33 (held under SN 5567). For the sixth sweep, conducted in 1999-2000, when respondents were aged 42 (NCDS6, held under SN 5578), fieldwork was combined with the 1999-2000 wave of the 1970 Birth Cohort Study (BCS70), which was also conducted by CLS (and held under GN 33229). The seventh sweep was conducted in 2004-2005 when the respondents were aged 46 (held under SN 5579), the eighth sweep was conducted in 2008-2009 when respondents were aged 50 (held under SN 6137) and the ninth sweep was conducted in 2013 when respondents were aged 55 (held under SN 7669).
Four separate datasets covering responses to NCDS over all sweeps are available. National Child Development Deaths Dataset: Special Licence Access (SN 7717) covers deaths; National Child Development Study Response and Outcomes Dataset (SN 5560) covers all other responses and outcomes; National Child Development Study: Partnership Histories (SN 6940) includes data on live-in relationships; and National Child Development Study: Activity Histories (SN 6942) covers work and non-work activities. Users are advised to order these studies alongside the other waves of NCDS.
From 2002-2004, a Biomedical Survey was completed and is available under End User Licence (EUL) (SN 8731) and Special Licence (SL) (SN 5594). Proteomics analyses of blood samples are available under SL SN 9254.
Linked Geographical Data (GN 33497):
A number of geographical variables are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies.
Linked Administrative Data (GN 33396):
A number of linked administrative datasets are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies. These include a Deaths dataset (SN 7717) available under SL and the Linked Health Administrative Datasets (SN 8697) available under Secure Access.
Additional Sub-Studies (GN 33562):
In addition to the main NCDS sweeps, further studies have also been conducted on a range of subjects such as parent migration, unemployment, behavioural studies and respondent essays. The full list of NCDS studies available from the UK Data Service can be found on the NCDS series access data webpage.
How to access genetic and/or bio-medical sample data from a range of longitudinal surveys:
For information on how to access biomedical data from NCDS that are not held at the UKDS, see the CLS Genetic data and biological samples webpage.
Further information about the full NCDS series can be found on the Centre for Longitudinal Studies website.
The NCDS linked Scottish Medical Records (SMR) datasets include data files from the NHS Digital Hospital Episode Statistics (HES) database for those cohort members who provided consent to health data linkage in the Age 50 sweep, and had ever lived in Scotland.
The SMR database contains information about all hospital admissions in Scotland. The following datasets are available:
Researchers who require access to more than one dataset need to apply for them individually.
Further information about the SMR database can be found on the https://www.ndc.scot.nhs.uk/Data-Dictionary/SMR-Datasets/">Information Services Division Scotland SMR Datasetswebpage.
CLS/SMR Digital Sub-licence agreement:
The linked SMR data have been processed by CLS and supplied to the UK Data Service (UKDS) under Secure Access Licence. Applicants wishing to access these data need to establish the necessary agreement with the UKDS and abide by the terms and conditions of the UKDS Secure Access licence. An additional condition of the licensing is that it is not permitted to link SMR data to NCDS data that include Scottish geographies.
Non-straightforward requests to include additional data not held by UKDS would be handled by the CLS Data Access Committee and referred to the Public Benefit and Privacy Panel (PBPP) if necessary.
The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets including Natural Resource Management regions, and Australian and state and territory government databases. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.
This data set holds the publicly-available version of the database of water-dependent assets that was compiled for the bioregional assessment (BA) of the Clarence-Moreton subregion as part of the Bioregional Assessment Technical Programme. Though all life is dependent on water, for the purposes of a bioregional assessment, a water-dependent asset is an asset potentially impacted by changes in the groundwater and/or surface water regime due to coal resource development. The water must be other than local rainfall. Examples include wetlands, rivers, bores and groundwater dependent ecosystems.
A single asset is represented spatially in the asset database by single or multiple spatial features (point, line or polygon). Individual points, lines or polygons are termed elements.
This dataset contains the unrestricted publicly-available components of spatial and non-spatial (attribute) data of the (restricted) Asset database for the Clarence-Moreton bioregion on 24 February 2016 (6d11ffbc-ea57-49cb-8e00-f97761e0c5d6). The database is provided primarily as an ESRI File geodatabase (.gdb), which is able to be opened in readily available open source software such as QGIS. Other formats include the Microsoft Access database (.mdb in ESRI Personal Geodatabase format), industry-standard ESRI Shapefiles and tab-delimited text files of all the attribute tables.
The restricted version of the Clarence-Moreton Asset database has a total count of 294961 Elements and 2708 Assets. In the public version of the Asset Clarence-Moreton database 60074 spatial Element features (\~19%) have been removed from the Element List and Element Layer(s) and 729 spatial Assets (\~24%) have been removed from the spatial Asset Layer(s)
The elements/assets removed from the restricted Asset Database are from the following data sources:
1) Species Profile and Threats Database (SPRAT) - RESTRICTED - Metadata only) (7276dd93-cc8c-4c01-8df0-cef743c72112)
2) Australia, Register of the National Estate (RNE) - Spatial Database (RNESDB) (Internal 878f6780-be97-469b-8517-54bd12a407d0)
3) Communities of National Environmental Significance Database - RESTRICTED - Metadata only (c01c4693-0a51-4dbc-bbbd-7a07952aa5f6)
These important assets are included in the bioregional assessment, but are unable to be publicly distributed by the Bioregional Assessment Programme due to restrictions in their licensing conditions. Please note that many of these data sets are available directly from their custodian. For more precise details please see the associated explanatory Data Dictionary document enclosed with this dataset.
The public version of the asset database retains all of the unrestricted components of the Asset database for the Clarence-Moreton bioregion on 24 February 2016 - any material that is unable to be published or redistributed to a third party by the BA Programme has been removed from the database. The data presented corresponds to the assets published Clarence-Moreton bioregion product 1.3: Description of the water-dependent asset register and asset list for the Clarence-Moreton bioregion on 24 February 2016 , and the associated Water-dependent asset register and asset list for the Clarence-Moreton bioregion on 24 February 2016 .
Individual spatial features or elements are initially included in database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). In accordance to BA submethodology M02: Compiling water-dependent assets, individual spatial elements are then grouped into assets which are evaluated by project teams to determine whether they meet materiality test 2 (M2), which are assets that are considered to be water dependent.
Following delivery of the first pass asset list, project teams make a determination as to whether an asset (comprised of one or more elements) is water dependent, as assessed against the materiality tests detailed in the BA Methodology. These decisions are provided to ERIN by the assessment team and incorporated into the AssetList table in the Asset database.
Development of the Asset Register from the Asset database:
Decisions for M0 (fit for BA purpose), M1 (PAE) and M2 (water dependent) determine which assets are included in the "asset list" and "water-dependent asset register" which are published as Product 1.3.
The rule sets are applied as follows:
M0\tM1\tM2\tResult
No\tn/a\tn/a\tAsset is not included in the asset list or the water-dependent asset register
(≠ No)\tNo\tn/a\tAsset is not included in the asset list or the water-dependent asset register
(≠ No)\tYes\tNo\tAsset included in published asset list but not in water dependent asset register
(≠ No)\tYes\tYes\tAsset included in both asset list and water-dependent asset register
Assessment teams are then able to use the database to assign receptors and impact variables to water-dependent assets and the development of a receptor register as detailed in BA submethodology M03: Assigning receptors to water-dependent assets and the receptor register is then incorporated into the asset database.
At this stage of its development, the Asset database for the Clarence-Moreton bioregion on 24 February 2016, which this document describes, does contain receptor information, and the receptor information was removed from this public version.
Bioregional Assessment Programme (2014) Asset database for the Clarence-Moreton bioregion on 24 February 2016 Public. Bioregional Assessment Derived Dataset. Viewed 10 July 2017, http://data.bioregionalassessments.gov.au/dataset/ba1d4c6f-e657-4e42-bd3c-413c21c7b735.
Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements 20131204
Derived From Combined Surface Waterbodies for the Clarence-Moreton bioregion
Derived From Queensland QLD - Regional - NRM - Water Asset Information Tool - WAIT - databases
Derived From Version 02 Asset list for Clarence Morton 8/8/2014 - ERIN ORIGINAL DATA
Derived From CLM16swo NSW Office of Water Surface Water Offtakes processed for Clarence Moreton v3 12032014
Derived From Asset database for the Clarence-Moreton bioregion on 11 December 2014, minor version v20150220
Derived From Matters of State environmental significance (version 4.1), Queensland
Derived From Geofabric Surface Network - V2.1
Derived From Communities of National Environmental Significance Database - RESTRICTED - Metadata only
Derived From Geofabric Surface Catchments - V2.1
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas
Derived From CLM - 16swo NSW Office of Water Surface Water Offtakes - Clarence Moreton v1 24102013
Derived From Multi-resolution Valley Bottom Flatness MrVBF at three second resolution CSIRO 20000211
Derived From National Groundwater Information System (NGIS) v1.1
Derived From Mitchell Landscapes NSW OEH v3 2011
Derived From Asset database for the Clarence-Moreton bioregion on 24 February 2016
Derived From Geofabric Surface Network - V2.1.1
Derived From Birds Australia - Important Bird Areas (IBA) 2009
Derived From Australia - Species of National Environmental Significance Database
Derived From Multi-resolution Ridge Top Flatness at 3 second resolution CSIRO 20000211
Derived From South East Queensland GDE (draft)
Derived From Natural Resource Management (NRM) Regions 2010
Derived From Version 01 Asset list for Clarence Morton 10/3/2014 - ERIN ORIGINAL DATA
Derived From NSW Office of Water Surface Water Entitlements Locations v1_Oct2013
Derived From [QLD Dept of Natural Resources and Mines,
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
The GlobalPhone pronunciation dictionaries, created within the framework of the multilingual speech and language corpus GlobalPhone, were developed in collaboration with the Karlsruhe Institute of Technology (KIT).
The GlobalPhone pronunciation dictionaries contain the pronunciations of all word forms found in the transcription data of the GlobalPhone speech & text database. The pronunciation dictionaries are currently available in 18 languages: Arabic (29230 entries/27059 words), Bulgarian (20193 entries), Croatian (23497 entries/20628 words), Czech (33049 entries/32942 words), French (36837 entries/20710 words), German (48979 entries/46035 words), Hausa (42662 entries/42079 words), Japanese (18094 entries), Polish (36484 entries), Portuguese (Brazilian) (54146 entries/54130 words), Russian (28818 entries/27667 words), Spanish (Latin American) (43264 entries/33960 words), Swedish (about 25000 entries), Turkish (31330 entries/31087 words), Vietnamese (38504 entries/29974 words), Chinese-Mandarin (73388 pronunciations), Korean (3500 syllables), and Thai (a small set with 12,420 pronunciation entries of 12,420 different words, and does not include pronunciation variants, and a larger set which contains 25,570 pronunciation entries of 22,462 different words units, and includes 3,108 entries of up to four pronunciation variants).
1) Dictionary Encoding: The pronunciation dictionary entries consist of full word forms and are either given in the original script of that language, mostly in UTF-8 encoding (Bulgarian, Croatian, Czech, French, Polish, Russian, Spanish, Thai) corresponding to the trl-files of the GlobalPhone transcriptions or in Romanized script (Arabic, German, Hausa, Japanese, Korean, Mandarin, Portuguese, Swedish, Turkish, Vietnamese) corresponding to the rmn-files of the GlobalPhone transcriptions, respectively. In the latter case the documentation mostly provides a mapping from the Romanized to the original script.
2) Dictionary Phone set: The phone sets for each language were derived individually from the literature following best practices for automatic speech processing. Each phone set is explained and described in the documentation using the international standards of the International Phonetic Alphabet (IPA). For most languages a mapping to the language independent GlobalPhone naming conventions (indicated by “M_”) is provided for the purpose of data sharing across languages to build multilingual acoustic models.
3) Dictionary Generation: Whenever the grapheme-to-phoneme relationship allowed, the dictionaries were created semi-automatically in a rule-based fashion using a set of grapheme-to-phoneme mapping rules. The number of rules highly depends on the language. After the automatic creation process, all dictionaries were manually cross-checked by native speakers, correcting potential errors of the automatic pronunciation generation process. Most of the dictionaries have been applied to large vocabulary speech recognition. In many cases the GlobalPhone dictionaries were compared to straight-forward grapheme-based speech recognition and to alternative sources, such as Wiktionary and usually demonstrated to be superior in terms of quality, coverage, and accuracy.
4) Format: The format of the dictionaries is the same across languages and is straight-forward. Each line consists of one word form and its pronunciation separated by blank. The pronunciation consists of a concatenation of phone symbols separated by blanks. Both, words and their pronunciations are given in tcl-script list format, i.e. enclosed in “{}”, since phones can carry tags, indicating the tone and length of a vowel, or the word boundary tag “WB”, indicating the boundary of a dictionary unit. The WB tag can for example be included as a standard question in the decision tree questions for capturing crossword models in context-dependent modeling. Pronunciation variants are indicated by (
5) Documentation: The pronunciation dictionaries for each language are complemented by a documentation that describes the format of the dictionary, the phone set including its mapping to the International Phonetic Alphabet (IPA), and the frequency distribution of the phones in the dictionary. Most of the pronunciation dictionaries have been successfully applied to large vocabulary speech recognition and references to publications are given when available.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Abstract This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied. These data represent the OZMIN …Show full descriptionAbstract This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied. These data represent the OZMIN Oracle relational database containing geological and resource information for Australian mineral deposits. OZMIN has been compiled from published references and has been designed so that attribute information can be retrieved and analysed in relation to spatial data contained in geographic information systems. The national mineral deposits dataset contains data on over one thousand major and historically significant mineral deposits for 60 mineral commodities (including coal). Data available via mapping interfaces on the Geoscience Australia website are updated weekly whilst data available via download are a snapshot at the "Ending Date" of the current database entries. Full Metadata available at: http://www.ga.gov.au/meta/ANZCW0703003393.html Dataset History The data within this dataset is derived directly from the corporate ORACLE OZMIN Mineral Deposits database. An ASCII extraction of the Geoscience Australia ORACLE database is generated as ASCII comma-delimited files for each table that is part of or used by the OZMIN database. Only data that is part of the current release of OZMIN (Release 3 - October 2000) is included. An MS ACCESS database format is also replicated from the ORACLE database and uses the same table structure. Only data that is part of the current release of OZMIN (Release 3 - October 2000) is included. The spatial representation of this database in (ArcView and MapInfo format) is extracted and generated using ArcInfo GIS software to meet the published data standard within the Geoscience Australia data dictionary. The extraction of the spatial GIS datasets is done within ArcInfo using advanced AML code (ORACOV.AML) developed by Dmitar Butrovski, Geoscience Australia. Further information can be found at http://www.ga.gov.au/metadata-gateway/metadata/record/gcat_a05f7892-b68d-7506-e044-00144fdd4fa6/OZMIN+Mineral+Deposits+Database Dataset Citation Geoscience Australia (2013) OZMIN Mineral Deposits Database. Bioregional Assessment Source Dataset. Viewed 12 December 2018, http://data.bioregionalassessments.gov.au/dataset/34247a24-d3cf-4a98-bb9d-81671ddb99de.
Data from this project focuses on the evaluation of breeding lines. Significant progress was made in advancing breeding populations directed towards release of improved varieties in Tanzania. Thirty promising F4:7, 1st generation 2014 PIC (Phaseolus Improvement Cooperative) and ~100 F4:6, 2nd generation 2015 PIC breeding lines were selected. In addition, ~300 F4:5, 3rd generation 2016 PIC single plant selections were completed in Arusha and Mbeya. These breeding lines, derived from 109 PIC populations specifically developed to combine abiotic and biotic stress tolerance, showed superior agronomic potential compared with checks and local landraces. The diversity, scale, and potential of the material in the PIC breeding pipeline is invaluable and requires continued support to ensure the release of varieties that promise to increase the productivity of common bean in the E. African region. Data available includes databases, spreadsheets, and images related to the project. Resources in this dataset:Resource Title: Data Dictionary. File Name: ADP-1_DD.pdfResource Title: ADP-1 Database. File Name: ADP1-DB.zipResource Description: This file is a link to a draft version of the development and characterization of the common bean diversity panel (ADP) database in Microsoft Access. Preliminary information is provided in this database, while the full version is being prepared. In order to use the database you’ll need to download the complete file, extract it and open the MS access file. You must allow active content when opening the database for it to work properly. Downloaded on November 17, 2017.Resource Title: Anthracnose Screening of Andean Diversity Panel (ADP) . File Name: Anthracnose-screening-of-ADP.pdfResource Description: Approximately 230 ADP lines of the ADP were screened with 8 races of anthracnose under controlled conditions at Michigan State University. Dr. James Kelly has provided this valuable dataset for sharing in light of the Open Data policy of the US government. This dataset represents the first comprehensive screening of the ADP with a broad set of races of a specific pathogen.Resource Title: ARS - Feed the Future Shared Data . File Name: ARS-FtF-Data-Sharing.zipResource Description: The data provided herein is an early draft version of the data that has been generated by the ARS Feed-the-Future Grain Legumes Project that is focused on common bean research. Resource Title: PIC (Phaseolus Improvement Cooperative) Populations . File Name: PIC-breeding-populations.xlsxResource Description: The complete list of PIC breeding populations (Excel Format) PIC (Phaseolus Improvement Cooperative) populations are bulked populations for improvement of common bean in Feed the Future Countries, with a principal focus on sub-Saharan Africa. These populations are for distribution to collaborators, are segregating for key biotic and abiotic stress constraints, and can be used for selection and release of improved cultivars/germplasm. Many of these populations are derived from crosses between ADP landrances and cultivars from sub-Saharan Africa and other improved genotypes with key biotic or abiotic stress tolerance. Phenotypic and genotypic information related to the parents of the crosses can be found in the ADP Database.
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
The GlobalPhone pronunciation dictionaries, created within the framework of the multilingual speech and language corpus GlobalPhone, were developed in collaboration with the Karlsruhe Institute of Technology (KIT).
The GlobalPhone pronunciation dictionaries contain the pronunciations of all word forms found in the transcription data of the GlobalPhone speech & text database. The pronunciation dictionaries are currently available in 18 languages: Arabic (29230 entries/27059 words), Bulgarian (20193 entries), Croatian (23497 entries/20628 words), Czech (33049 entries/32942 words), French (36837 entries/20710 words), German (48979 entries/46035 words), Hausa (42662 entries/42079 words), Japanese (18094 entries), Polish (36484 entries), Portuguese (Brazilian) (54146 entries/54130 words), Russian (28818 entries/27667 words), Spanish (Latin American) (43264 entries/33960 words), Swedish (about 25000 entries), Turkish (31330 entries/31087 words), Vietnamese (38504 entries/29974 words), Chinese-Mandarin (73388 pronunciations), Korean (3500 syllables), and Thai (a small set with 12,420 pronunciation entries of 12,420 different words, and does not include pronunciation variants, and a larger set which contains 25,570 pronunciation entries of 22,462 different words units, and includes 3,108 entries of up to four pronunciation variants).
1) Dictionary Encoding: The pronunciation dictionary entries consist of full word forms and are either given in the original script of that language, mostly in UTF-8 encoding (Bulgarian, Croatian, Czech, French, Polish, Russian, Spanish, Thai) corresponding to the trl-files of the GlobalPhone transcriptions or in Romanized script (Arabic, German, Hausa, Japanese, Korean, Mandarin, Portuguese, Swedish, Turkish, Vietnamese) corresponding to the rmn-files of the GlobalPhone transcriptions, respectively. In the latter case the documentation mostly provides a mapping from the Romanized to the original script.
2) Dictionary Phone set: The phone sets for each language were derived individually from the literature following best practices for automatic speech processing. Each phone set is explained and described in the documentation using the international standards of the International Phonetic Alphabet (IPA). For most languages a mapping to the language independent GlobalPhone naming conventions (indicated by “M_”) is provided for the purpose of data sharing across languages to build multilingual acoustic models.
3) Dictionary Generation: Whenever the grapheme-to-phoneme relationship allowed, the dictionaries were created semi-automatically in a rule-based fashion using a set of grapheme-to-phoneme mapping rules. The number of rules highly depends on the language. After the automatic creation process, all dictionaries were manually cross-checked by native speakers, correcting potential errors of the automatic pronunciation generation process. Most of the dictionaries have been applied to large vocabulary speech recognition. In many cases the GlobalPhone dictionaries were compared to straight-forward grapheme-based speech recognition and to alternative sources, such as Wiktionary and usually demonstrated to be superior in terms of quality, coverage, and accuracy.
4) Format: The format of the dictionaries is the same across languages and is straight-forward. Each line consists of one word form and its pronunciation separated by blank. The pronunciation consists of a concatenation of phone symbols separated by blanks. Both, words and their pronunciations are given in tcl-script list format, i.e. enclosed in “{}”, since phones can carry tags, indicating the tone and length of a vowel, or the word boundary tag “WB”, indicating the boundary of a dictionary unit. The WB tag can for example be included as a standard question in the decision tree questions for capturing crossword models in context-dependent modeling. Pronunciation variants are indicated by (
5) Documentation: The pronunciation dictionaries for each language are complemented by a documentation that describes the format of the dictionary, the phone set including its mapping to the International Phonetic Alphabet (IPA), and the frequency distribution of the phones in the dictionary. Most of the pronunciation dictionaries have been successfully applied to large vocabulary speech recognition and references to publications are given when available.
An excel template with data elements and conventions corresponding to the openLCA unit process data model. Includes LCA Commons data and metadata guidelines and definitions Resources in this dataset:Resource Title: READ ME - data dictionary. File Name: lcaCommonsSubmissionGuidelines_FINAL_2014-09-22.pdfResource Title: US Federal LCA Commons Life Cycle Inventory Unit Process Template. File Name: FedLCA_LCI_template_blank EK 7-30-2015.xlsxResource Description: Instructions: This template should be used for life cycle inventory (LCI) unit process development and is associated with an openLCA plugin to import these data into an openLCA database. See www.openLCA.org to download the latest release of openLCA for free, and to access available plugins.