53 datasets found
  1. f

    Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"

    • figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Miron; Rafael Gonçalves; Mark A. Musen (2023). Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov" [Dataset]. http://doi.org/10.6084/m9.figshare.12743939.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Authors
    Laura Miron; Rafael Gonçalves; Mark A. Musen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This fileset provides supporting data and corpora for the empirical study described in: Laura Miron, Rafael S. Goncalves and Mark A. Musen. Obstacles to the Reuse of Metadata in ClinicalTrials.govDescription of filesOriginal data files:- AllPublicXml.zip contains the set of all public XML records in ClinicalTrials.gov (protocols and summary results information), on which all remaining analyses are based. Set contains 302,091 records downloaded on April 3, 2019.- public.xsd is the XML schema downloaded from ClinicalTrials.gov on April 3, 2019, used to validate records in AllPublicXML.BioPortal API Query Results- condition_matches.csv contains the results of querying the BioPortal API for all ontology terms that are an 'exact match' to each condition string scraped from the ClinicalTrials.gov XML. Columns={filename, condition, url, bioportal term, cuis, tuis}. - intervention_matches.csv contains BioPortal API query results for all interventions scraped from the ClinicalTrials.gov XML. Columns={filename, intervention, url, bioportal term, cuis, tuis}.Data Element Definitions- supplementary_table_1.xlsx Mapping of element names, element types, and whether elements are required in ClinicalTrials.gov data dictionaries, the ClinicalTrials.gov XML schema declaration for records (public.XSD), the Protocol Registration System (PRS), FDAAA801, and the WHO required data elements for clinical trial registrations.Column and value definitions: - CT.gov Data Dictionary Section: Section heading for a group of data elements in the ClinicalTrials.gov data dictionary (https://prsinfo.clinicaltrials.gov/definitions.html) - CT.gov Data Dictionary Element Name: Name of an element/field according to the ClinicalTrials.gov data dictionaries (https://prsinfo.clinicaltrials.gov/definitions.html) and (https://prsinfo.clinicaltrials.gov/expanded_access_definitions.html) - CT.gov Data Dictionary Element Type: "Data" if the element is a field for which the user provides a value, "Group Heading" if the element is a group heading for several sub-fields, but is not in itself associated with a user-provided value. - Required for CT.gov for Interventional Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to interventional records (only observational or expanded access) - Required for CT.gov for Observational Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to observational records (only interventional or expanded access) - Required in CT.gov for Expanded Access Records?: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to expanded access records (only interventional or observational) - CT.gov XSD Element Definition: abbreviated xpath to the corresponding element in the ClinicalTrials.gov XSD (public.XSD). The full xpath includes 'clinical_study/' as a prefix to every element. (There is a single top-level element called "clinical_study" for all other elements.) - Required in XSD? : "Yes" if the element is required according to public.XSD, "No" if the element is optional, "-" if the element is not made public or included in the XSD - Type in XSD: "text" if the XSD type was "xs:string" or "textblock", name of enum given if type was enum, "integer" if type was "xs:integer" or "xs:integer" extended with the "type" attribute, "struct" if the type was a struct defined in the XSD - PRS Element Name: Name of the corresponding entry field in the PRS system - PRS Entry Type: Entry type in the PRS system. This column contains some free text explanations/observations - FDAAA801 Final Rule FIeld Name: Name of the corresponding required field in the FDAAA801 Final Rule (https://www.federalregister.gov/documents/2016/09/21/2016-22129/clinical-trials-registration-and-results-information-submission). This column contains many empty values where elements in ClinicalTrials.gov do not correspond to a field required by the FDA - WHO Field Name: Name of the corresponding field required by the WHO Trial Registration Data Set (v 1.3.1) (https://prsinfo.clinicaltrials.gov/trainTrainer/WHO-ICMJE-ClinTrialsgov-Cross-Ref.pdf)Analytical Results:- EC_human_review.csv contains the results of a manual review of random sample eligibility criteria from 400 CT.gov records. Table gives filename, criteria, and whether manual review determined the criteria to contain criteria for "multiple subgroups" of participants.- completeness.xlsx contains counts and percentages of interventional records missing fields required by FDAAA801 and its Final Rule.- industry_completeness.xlsx contains percentages of interventional records missing required fields, broken up by agency class of trial's lead sponsor ("NIH", "US Fed", "Industry", or "Other"), and before and after the effective date of the Final Rule- location_completeness.xlsx contains percentages of interventional records missing required fields, broken up by whether record listed at least one location in the United States and records with only international location (excluding trials with no listed location), and before and after the effective date of the Final RuleIntermediate Results:- cache.zip contains pickle and csv files of pandas dataframes with values scraped from the XML records in AllPublicXML. Downloading these files greatly speeds up running analysis steps from jupyter notebooks in our github repository.

  2. Z

    Dataset: effect estimates obtained from randomised and non-randomised...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maximilian Salcher-Konrad (2024). Dataset: effect estimates obtained from randomised and non-randomised studies [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4958221
    Explore at:
    Dataset updated
    Jul 24, 2024
    Dataset provided by
    Huseyin Naci
    Mary Nguyen
    Maximilian Salcher-Konrad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides study-level information on matched sets of published randomised and non-randomised studies answering the same clinical questions. The dataset includes 346 meta-analyses, and data on over 2,700 unique studies.

    The dataset includes the following information (definitions and coding are provided in the spreadsheet):

    IDs for clinical topics and source meta-analyses

    IDs for individual studies

    Brief descriptions of clinical topics

    Therapeutic area (ATC Code first level)

    Topic narrowness categorical

    Indicators for high-quality source meta-analyses (Cochrane review; top journal)

    Time period

    Comparator type

    Outcome type

    Meta-analysis-level median risk of bias for NRS and RCT

    Publication year of individual studies

    Risk of Bias in individual studies

    Standardised effect estimates (logOR) and variance for each study

    Summary logOR and variance for NRS and RCT, respectively, within each meta-analysis for the full sample

    The protocol for this study was registered on PROSPERO:

    Maximilian Salcher-Konrad. Comparison of treatment effects in randomised vs. non-randomised studies and the role of analytical methods to control for confounding: a meta-epidemiological study. PROSPERO 2018 CRD42018062204 Available from: https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42018062204

  3. d

    Data from: Observation definitions and their implications in machine...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Hill; Russ Schumacher; Mitchell Green (2024). Observation definitions and their implications in machine learning-based predictions of excessive rainfall [Dataset]. http://doi.org/10.5061/dryad.kwh70rzdx
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 7, 2024
    Dataset provided by
    Dryad
    Authors
    Aaron Hill; Russ Schumacher; Mitchell Green
    Description

    Data from: Observation definitions and their implications in machine learning-based predictions of excessive rainfall

    Hill, Aaron J., Russ S. Schumacher, and Mitchell L. Green, Jr. "Observation Definitions and their Implications in Machine Learning-based Predictions of Excessive Rainfall", Weather and Forecasting (published online ahead of print 2024), https://doi.org/10.1175/WAF-D-24-0033.1

    Day 1, 2, and 3 forecasts from the machine learning-based prediction system detailed in the associated manuscript (citated above) as well as those from the Weather Prediction Center (WPC) and observations (Unified Flood Verification System; UFVS) of excessive rainfall hazards. Forecasts, outlooks, and observations for each forecast day are contained in a single netCDF file and labeled accordingly (e.g., day1_csu_mlp_20201005_20231003.nc). An additional forecast file (i.e., day1_exps_20201005_20231003.nc) contains a number of experimental machine learnin...

  4. Standard terms and definitions applicable to the quality assurance of...

    • data.aeronomie.be
    • data.europa.eu
    pdf
    Updated Jan 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Royal Belgian Institute for Space Aeronomy (2025). Standard terms and definitions applicable to the quality assurance of Essential Climate Variable data records [Dataset]. https://data.aeronomie.be/dataset/standard-terms-and-definitions-applicable-to-the-quality-assurance-of-essential-climate-variable-da
    Explore at:
    pdf, pdf(863196)Available download formats
    Dataset updated
    Jan 30, 2025
    Dataset authored and provided by
    Royal Belgian Institute for Space Aeronomy
    License

    http://publications.europa.eu/resource/authority/licence/CC_BY_4_0http://publications.europa.eu/resource/authority/licence/CC_BY_4_0

    Description

    This document contains a selection of standard terms and definitions relevant to the quality assurance of Essential Climate Variable (ECVs) data records. It reproduces appropriate terms and definitions published by normalization bodies, mainly by BIPM/JCGM/ISO in their International Vocabulary of Metrology (VIM) and Guide to the Expression of Uncertainties (GUM). It also reproduces selected terms and definitions related to the quality assurance and validation of Earth Observation (EO) data, available publicly on the ISO website and on the Cal/Val portal of the Committee on Earth Observation Satellites (CEOS).

    Several of those terms have been recommended by CEOS in the GEO-CEOS Quality Assurance framework for Earth Observation (QA4EO) and, as such, are applicable to virtually all Copernicus data sets of EO origin. Terms and definitions are expected to evolve as normalization organisations regularly update their standards.

  5. f

    Data from: Combining Broad and Narrow Case Definitions in Matched...

    • tandf.figshare.com
    bin
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ting Ye; Kan Chen; Dylan Small (2025). Combining Broad and Narrow Case Definitions in Matched Case-Control Studies: Firearms in the Home and Suicide Risk [Dataset]. http://doi.org/10.6084/m9.figshare.28057249.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Ting Ye; Kan Chen; Dylan Small
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Does having firearms in the home increase suicide risk? To test this hypothesis, a matched case-control study can be performed, in which suicide case subjects are compared to living controls who are similar in observed covariates in terms of their retrospective exposure to firearms at home. In this application, cases can be defined using a broad case definition (suicide) or a narrow case definition (suicide occurred at home). The broad case definition offers a larger number of cases, but the narrow case definition may offer a larger effect size, which can reduce sensitivity to bias from unmeasured confounding. However, when the goal is to test whether there is a treatment effect based on the broad case definition, restricting to the narrow case definition may introduce selection bias (i.e., bias due to selecting samples based on characteristics affected by the treatment) because exposure to firearms in the home may affect the location of suicide and thus the type of a case a subject is. We propose a new sensitivity analysis framework for combining broad and narrow case definitions in matched case-control studies, that considers the unmeasured confounding bias and selection bias simultaneously. We develop a valid randomization-based testing procedure using only the narrow case matched sets when the effect of the unmeasured confounder on receiving treatment and the effect of the treatment on case definition among the always-cases are controlled by sensitivity parameters. We then use the Bonferroni method to combine the testing procedures using the broad and narrow case definitions. With the proposed methods, we find robust evidence that having firearms at home increases suicide risk. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

  6. f

    DataSheet1_Applying the estimand and target trial frameworks to external...

    • frontiersin.figshare.com
    docx
    Updated Jan 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Letizia Polito; Qixing Liang; Navdeep Pal; Philani Mpofu; Ahmed Sawas; Olivier Humblet; Kaspar Rufibach; Dominik Heinzmann (2024). DataSheet1_Applying the estimand and target trial frameworks to external control analyses using observational data: a case study in the solid tumor setting.DOCX [Dataset]. http://doi.org/10.3389/fphar.2024.1223858.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jan 26, 2024
    Dataset provided by
    Frontiers
    Authors
    Letizia Polito; Qixing Liang; Navdeep Pal; Philani Mpofu; Ahmed Sawas; Olivier Humblet; Kaspar Rufibach; Dominik Heinzmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction: In causal inference, the correct formulation of the scientific question of interest is a crucial step. The purpose of this study was to apply causal inference principles to external control analysis using observational data and illustrate the process to define the estimand attributes.Methods: This study compared long-term survival outcomes of a pooled set of three previously reported randomized phase 3 trials studying patients with metastatic non-small cell lung cancer receiving front-line chemotherapy and similar patients treated with front-line chemotherapy as part of routine clinical care. Causal inference frameworks were applied to define the estimand aligned with the research question and select the estimator to estimate the estimand of interest.Results: The estimand attributes of the ideal trial were defined using the estimand framework. The target trial framework was used to address specific issues in defining the estimand attributes using observational data from a nationwide electronic health record-derived de-identified database. The two frameworks combined allow to clearly define the estimand and the aligned estimator while accounting for key baseline confounders, index date, and receipt of subsequent therapies. The hazard ratio estimate (point estimate with 95% confidence interval) comparing the randomized clinical trial pooled control arm with the external control was close to 1, which is indicative of similar survival between the two arms.Discussion: The proposed combined framework provides clarity on the causal contrast of interest and the estimator to adopt, and thus facilitates design and interpretation of the analyses.

  7. A

    ‘Coho Distribution [ds326]’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jul 13, 2007
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2007). ‘Coho Distribution [ds326]’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-coho-distribution-ds326-e46b/a917556b/?iid=000-775&v=presentation
    Explore at:
    Dataset updated
    Jul 13, 2007
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Coho Distribution [ds326]’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/b1cc7bc9-0960-4008-a7e6-ffbae224a88e on 27 January 2022.

    --- Dataset description provided by original source is as follows ---

    June 2016 VersionThis dataset represents the "Observed Distribution" for coho salmon in California by using observations made only between 1990 and the present. It was developed for the express purpose of assisting with species recovery planning efforts. The process for developing this dataset was to collect as many observations of the species as possible and derive the stream-based geographic distribution for the species based solely on these positive observations.For the purpose of this dataset an observation is defined as a report of a sighting or other evidence of the presence of the species at a given place and time. As such, observations are modeled by year observed as point locations in the GIS. All such observations were collected with information regarding who reported the observation, their agency/organization/affiliation, the date that they observed the species, who compiled the information, etc. This information is maintained in the developers file geodatabase (©Environmental Science Research Institute (ESRI) 2016).To develop this distribution dataset, the species observations were applied to California Streams, a CDFW derivative of USGS National Hydrography Dataset (NHD) High Resolution hydrography. For each observation, a path was traced down the hydrography from the point of observation to the ocean, thereby deriving the shortest migration route from the point of observation to the sea. By appending all of these migration paths together, the "Observed Distribution" for the species is developed.It is important to note that this layer does not attempt to model the entire possible distribution of the species. Rather, it only represents the known distribution based on where the species has been observed and reported. While some observations indeed represent the upstream extent of the species (e.g., an observation made at a hard barrier), the majority of observations only indicate where the species was sampled for or otherwise observed. Because of this, this dataset likely underestimates the absolute geographic distribution of the species.It is also important to note that the species may not be found on an annual basis in all indicated reaches due to natural variations in run size, water conditions, and other environmental factors. As such, the information in this dataset should not be used to verify that the species are currently present in a given stream. Conversely, the absence of distribution linework for a given stream does not necessarily indicate that the species does not occur in that stream. The observation data were compiled from a variety of disparate sources including but not limited to CDFW, USFS, NMFS, timber companies, and the public. Forms of documentation include CDFW administrative reports, personal communications with biologists, observation reports, and literature reviews. The source of each feature (to the best available knowledge) is included in the data attributes for the observations in the geodatabase, but not for the resulting linework. The spatial data has been referenced to California Streams, a CDFW derivative of USGS National Hydrography Dataset (NHD) High Resolution hydrography.Usage of this dataset:Examples of appropriate uses include:- species recovery planning- Evaluation of future survey sites for the species- Validating species distribution modelsExamples of inappropriate uses include:- Assuming absence of a line feature means that the species are not present in that stream.- Using this data to make parcel or ground level land use management decisions.- Using this dataset to prove or support non-existence of the species at any spatial scale.- Assuming that the line feature represents the maximum possible extent of species distribution.All users of this data should seek the assistance of qualified professionals such as surveyors, hydrologists, or fishery biologists as needed to ensure that such users possess complete, precise, and up to date information on species distribution and water body location.Any copy of this dataset is considered to be a snapshot of the species distribution at the time of release. It is impingent upon the user to ensure that they have the most recent version prior to making management or planning decisions.Please refer to "Use Constraints" section below.

    --- Original source retains full ownership of the source dataset ---

  8. e

    Maximum Likelihood Estimates of Temperatures using Data from the Hadley...

    • data.europa.eu
    Updated Feb 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Maximum Likelihood Estimates of Temperatures using Data from the Hadley Centre and the Climate Research Unit (version 1.0) [Dataset]. https://data.europa.eu/data/datasets/de-dkrz-wdcc-iso3881426?locale=en
    Explore at:
    Dataset updated
    Feb 20, 2021
    Description

    HadCRU_MLE_v1.0 is a dataset of monthly gridded surface temperatures for the Earth during the instrumental period (since 1850). The name ‘HadCRU_MLE_v1.0’ reflects the dataset’s use of maximum likelihood estimation and observational data primarily from the Met Office Hadley Centre and the Climate Research Unit of the University of East Anglia. Source datasets used to create HadCRU_MLE_v1.0 include land surface air temperature anomalies of HadCRUT4, sea surface temperature anomalies of HadSST4, sea ice coverage of HadISST2, the surface temperature climatology of Jones et al. (1999), the sea surface temperature climatology of HadSST3, land mask data of OSTIA, surface elevation data of GMTED2010, and climate model output of CCSM4 for a pre-industrial control scenario. HadCRU_MLE_v1.0 was generated using information from the Met Office Hadley Centre, the Climate Research Unit of the University of East Anglia, the E.U. Copernicus Marine Service, the U.S. Geological Survey, and the University Corporation of Atmospheric Research. The primary motivation to develop HadCRU_MLE_v1.0 was to correct for two biases that may exist in global instrumental temperature datasets. The first bias is an amplification bias caused by not adequately accounting for the tendency of different regions of the planet to warm at different rates.The second bias is a sea ice bias caused by not adequately accounting for changes in sea ice coverage during the instrumental period. Corrections to these biases increased the estimate of global mean surface temperature change during the instrumental period. The new dataset has improvements compared to the Cowtan and Way version 2 dataset, including an improved statistical foundation for estimating model parameters, taking advantage of temporal correlations of observations, taking advantage of correlations between land and sea observations, and accounting for more sources of uncertainty. To properly correct for amplification bias, HadCRU_MLE_v1.0 incorporates the behaviour of the El Niño Southern Oscillation. HadCRU_MLE_v1.0 includes mean surface temperature anomalies for each month from 1850 to 2018 and for each 5° latitude by 5° longitude grid cell. Future versions of HadCRU_MLE may become available to extend the temporal coverage beyond 2018. The maximum likelihood estimation approach allows for the estimated field of surface temperature anomalies to be temporally and spatially complete for the entire instrumental period and for the entire surface of the Earth. A 5° by 5° gridded 1961-1990 temperature climatology for HadCRU_MLE_v1.0 is available, although caution is advised when interpreting this temperature climatology since the source datasets used for temperature Climatologies do not correspond perfectly with the source datasets used for temperature anomalies.Other information of HadCRU_MLE_v1.0 is available, including the estimated local amplification factors, the magnitude of the corrections for sea ice bias, and the impact of the El Niño Southern Oscillation on surface temperature anomalies.

    The surface temperature of HadCRU_MLE_v1.0 combines land surface air temperatures 2 m above the surface with sea surface temperatures 0.2 m below the surface. The dataset is consistent with the definition of surface temperature used in empirical datasets according to NOAA. Source datasets used to create HadCRU_MLE_v1.0 include land surface air temperature anomalies of HadCRUT4, sea surface temperature anomalies of HadSST4, sea ice coverage of HadISST2, the surface temperature climatology of Jones et al. (1999), the sea surface temperature climatology of HadSST3, land mask data of OSTIA, surface elevation data of GMTED2010, and climate model output of CCSM4 for a pre-industrial control scenario. HadCRU_MLE_v1.0 was generated using information from the Met Office Hadley Centre, the Climate Research Unit of the University of East Anglia, the E.U. Copernicus Marine Service, the U.S. Geological Survey, and the University Corporation of Atmospheric Research. The primary motivation to develop HadCRU_MLE_v1.0 was to correct for two biases that may exist in global instrumental temperature datasets. The first bias is an amplification bias caused by not adequately accounting for the tendency of different regions of the planet to warm at different rates. The second bias is a sea ice bias caused by not adequately accounting for changes in sea ice coverage during the instrumental period.Corrections to these biases increased the estimate of global mean surface temperature change during the instrumental period. The new dataset has improvements compared to the Cowtan and Way version 2 dataset, including an improved statistical foundation for estimating model parameters, taking advantage of temporal correlations of observations, taking advantage of correlations between land and sea observations, and accounting for more sources of uncertainty. To properly correct for amplification bias, HadCRU_MLE_v1.0 incorporates

  9. ESA Biomass Climate Change Initiative (Biomass_cci): Global datasets of...

    • catalogue.ceda.ac.uk
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maurizio Santoro; Oliver Cartus (2025). ESA Biomass Climate Change Initiative (Biomass_cci): Global datasets of forest above-ground biomass for the years 2007, 2010, 2015, 2016, 2017, 2018, 2019, 2020, 2021 and 2022, v6.0 [Dataset]. https://catalogue.ceda.ac.uk/uuid/95913ffb6467447ca72c4e9d8cf30501
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    Maurizio Santoro; Oliver Cartus
    License

    https://artefacts.ceda.ac.uk/licences/specific_licences/esacci_biomass_terms_and_conditions_v2.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/esacci_biomass_terms_and_conditions_v2.pdf

    Time period covered
    Jan 1, 2007 - Dec 31, 2022
    Area covered
    Earth
    Description

    This dataset comprises estimates of forest above-ground biomass (AGB) for the years 2007, 2010, 2015, 2016, 2017, 2018, 2019, 2020, 2021 and 2022. They are derived from a combination of Earth observation data, depending on the year, from the Copernicus Sentinel-1 mission, Envisat’s ASAR (Advanced Synthetic Aperture Radar) instrument and JAXA’s (Japan Aerospace Exploration Agency) Advanced Land Observing Satellite (ALOS-1 and ALOS-2), along with additional information from Earth observation sources. The data has been produced as part of the European Space Agency's (ESA's) Climate Change Initiative (CCI) programme by the Biomass CCI team.

    This release of the data is version 6. Compared to version 5, version 6 consists of an update of the maps of AGB for the years 2010, 2015, 2016, 2017, 2018, 2019, 2020, 2021 and new AGB maps for 2007 and 2022. AGB change maps have been created for consecutive years (e.g., 2020-2019), for a decadal interval (2020-2010) as well as for the interval 2010-2007. The pool of remote sensing data includes multi-temporal observations at L-band for all biomes and for all years and extended ICESat-2 observations to calibrate retrieval models. A cost function that preserves the temporal features as expressed in the remote sensing data has been refined to limit biases between the 2007-2010 and the 2015+ maps.

    The data products consist of two (2) global layers that include estimates of: 1) above ground biomass (AGB, unit: tons/ha i.e., Mg/ha) (raster dataset). This is defined as the mass, expressed as oven-dry weight of the woody parts (stem, bark, branches and twigs) of all living trees excluding stump and roots per unit area 2) per-pixel estimates of above-ground biomass uncertainty expressed as the standard deviation in Mg/ha (raster dataset)

    Additionally provided in this version release are aggregated data products. These aggregated products of the AGB and AGB change data layers are available at coarser resolutions (1, 10, 25 and 50km).

    In addition, files describing the AGB change between two consecutive years (i.e., 2016-2015, 2017-2016, 2018-2017, 2019-2018, 2020-2019, 2021-2020, 2022-2021), over a decade (2020-2010) and over 2010-2007 are provided. Each AGB change product consists of two sets of maps: the standard deviation of the AGB change and a quality flag of the AGB change. Note that the change itself can be simply computed as the difference between two AGB maps, so is not provided directly.

    Data are provided in both netcdf and geotiff format.

  10. a

    ‘Coho Distribution [ds326]’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jul 12, 2007
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2007). ‘Coho Distribution [ds326]’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-coho-distribution-ds326-8ce8/latest
    Explore at:
    Dataset updated
    Jul 12, 2007
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Coho Distribution [ds326]’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/56847514-cf82-4bbe-809b-05499d165c9a on 26 January 2022.

    --- Dataset description provided by original source is as follows ---

    June 2016 VersionThis dataset represents the "Observed Distribution" for coho salmon in California by using observations made only between 1990 and the present. It was developed for the express purpose of assisting with species recovery planning efforts. The process for developing this dataset was to collect as many observations of the species as possible and derive the stream-based geographic distribution for the species based solely on these positive observations.For the purpose of this dataset an observation is defined as a report of a sighting or other evidence of the presence of the species at a given place and time. As such, observations are modeled by year observed as point locations in the GIS. All such observations were collected with information regarding who reported the observation, their agency/organization/affiliation, the date that they observed the species, who compiled the information, etc. This information is maintained in the developers file geodatabase (©Environmental Science Research Institute (ESRI) 2016).To develop this distribution dataset, the species observations were applied to California Streams, a CDFW derivative of USGS National Hydrography Dataset (NHD) High Resolution hydrography. For each observation, a path was traced down the hydrography from the point of observation to the ocean, thereby deriving the shortest migration route from the point of observation to the sea. By appending all of these migration paths together, the "Observed Distribution" for the species is developed.It is important to note that this layer does not attempt to model the entire possible distribution of the species. Rather, it only represents the known distribution based on where the species has been observed and reported. While some observations indeed represent the upstream extent of the species (e.g., an observation made at a hard barrier), the majority of observations only indicate where the species was sampled for or otherwise observed. Because of this, this dataset likely underestimates the absolute geographic distribution of the species.It is also important to note that the species may not be found on an annual basis in all indicated reaches due to natural variations in run size, water conditions, and other environmental factors. As such, the information in this dataset should not be used to verify that the species are currently present in a given stream. Conversely, the absence of distribution linework for a given stream does not necessarily indicate that the species does not occur in that stream. The observation data were compiled from a variety of disparate sources including but not limited to CDFW, USFS, NMFS, timber companies, and the public. Forms of documentation include CDFW administrative reports, personal communications with biologists, observation reports, and literature reviews. The source of each feature (to the best available knowledge) is included in the data attributes for the observations in the geodatabase, but not for the resulting linework. The spatial data has been referenced to California Streams, a CDFW derivative of USGS National Hydrography Dataset (NHD) High Resolution hydrography.Usage of this dataset:Examples of appropriate uses include:- species recovery planning- Evaluation of future survey sites for the species- Validating species distribution modelsExamples of inappropriate uses include:- Assuming absence of a line feature means that the species are not present in that stream.- Using this data to make parcel or ground level land use management decisions.- Using this dataset to prove or support non-existence of the species at any spatial scale.- Assuming that the line feature represents the maximum possible extent of species distribution.All users of this data should seek the assistance of qualified professionals such as surveyors, hydrologists, or fishery biologists as needed to ensure that such users possess complete, precise, and up to date information on species distribution and water body location.Any copy of this dataset is considered to be a snapshot of the species distribution at the time of release. It is impingent upon the user to ensure that they have the most recent version prior to making management or planning decisions.Please refer to "Use Constraints" section below.

    --- Original source retains full ownership of the source dataset ---

  11. Dataset for modeling spatial and temporal variation in natural background...

    • catalog.data.gov
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Dataset for modeling spatial and temporal variation in natural background specific conductivity [Dataset]. https://catalog.data.gov/dataset/dataset-for-modeling-spatial-and-temporal-variation-in-natural-background-specific-conduct
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    This file contains the data set used to develop a random forest model predict background specific conductivity for stream segments in the contiguous United States. This Excel readable file contains 56 columns of parameters evaluated during development. The data dictionary provides the definition of the abbreviations and the measurement units. Each row is a unique sample described as R** which indicates the NHD Hydrologic Unit (underscore), up to a 7-digit COMID, (underscore) sequential sample month. To develop models that make stream-specific predictions across the contiguous United States, we used StreamCat data set and process (Hill et al. 2016; https://github.com/USEPA/StreamCat). The StreamCat data set is based on a network of stream segments from NHD+ (McKay et al. 2012). These stream segments drain an average area of 3.1 km2 and thus define the spatial grain size of this data set. The data set consists of minimally disturbed sites representing the natural variation in environmental conditions that occur in the contiguous 48 United States. More than 2.4 million SC observations were obtained from STORET (USEPA 2016b), state natural resource agencies, the U.S. Geological Survey (USGS) National Water Information System (NWIS) system (USGS 2016), and data used in Olson and Hawkins (2012) (Table S1). Data include observations made between 1 January 2001 and 31 December 2015 thus coincident with Moderate Resolution Imaging Spectroradiometer (MODIS) satellite data (https://modis.gsfc.nasa.gov/data/). Each observation was related to the nearest stream segment in the NHD+. Data were limited to one observation per stream segment per month. SC observations with ambiguous locations and repeat measurements along a stream segment in the same month were discarded. Using estimates of anthropogenic stress derived from the StreamCat database (Hill et al. 2016), segments were selected with minimal amounts of human activity (Stoddard et al. 2006) using criteria developed for each Level II Ecoregion (Omernik and Griffith 2014). Segments were considered as potentially minimally stressed where watersheds had 0 - 0.5% impervious surface, 0 – 5% urban, 0 – 10% agriculture, and population densities from 0.8 – 30 people/km2 (Table S3). Watersheds with observations with large residuals in initial models were identified and inspected for evidence of other human activities not represented in StreamCat (e.g., mining, logging, grazing, or oil/gas extraction). Observations were removed from disturbed watersheds, with a tidal influence or unusual geologic conditions such as hot springs. About 5% of SC observations in each National Rivers and Stream Assessment (NRSA) region were then randomly selected as independent validation data. The remaining observations became the large training data set for model calibration. This dataset is associated with the following publication: Olson, J., and S. Cormier. Modeling spatial and temporal variation in natural background specific conductivity. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 53(8): 4316-4325, (2019).

  12. d

    Coho Distribution [ds326]

    • catalog.data.gov
    • data.ca.gov
    • +4more
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2024). Coho Distribution [ds326] [Dataset]. https://catalog.data.gov/dataset/coho-distribution-ds326-cc8ae
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    California Department of Fish and Wildlife
    Description

    November 2022 VersionThis dataset represents the "Observed Distribution" for coho salmon in California by using observations made only between 1990 and the present. It was developed for the express purpose of assisting with species recovery planning efforts. The process for developing this dataset was to collect as many observations of the species as possible and derive the stream-based geographic distribution for the species based solely on these positive observations.For the purpose of this dataset an observation is defined as a report of a sighting or other evidence of the presence of the species at a given place and time. As such, observations are modeled by year observed as point locations in the GIS. All such observations were collected with information regarding who reported the observation, their agency/organization/affiliation, the date that they observed the species, who compiled the information, etc. This information is maintained in the developers file geodatabase (©Environmental Science Research Institute (ESRI) 2016).To develop this distribution dataset, the species observations were applied to California Streams, a CDFW derivative of USGS National Hydrography Dataset (NHD) High Resolution hydrography. For each observation, a path was traced down the hydrography from the point of observation to the ocean, thereby deriving the shortest migration route from the point of observation to the sea. By appending all of these migration paths together, the "Observed Distribution" for the species is developed.It is important to note that this layer does not attempt to model the entire possible distribution of the species. Rather, it only represents the known distribution based on where the species has been observed and reported. While some observations indeed represent the upstream extent of the species (e.g., an observation made at a hard barrier), the majority of observations only indicate where the species was sampled for or otherwise observed. Because of this, this dataset likely underestimates the absolute geographic distribution of the species.It is also important to note that the species may not be found on an annual basis in all indicated reaches due to natural variations in run size, water conditions, and other environmental factors. As such, the information in this dataset should not be used to verify that the species are currently present in a given stream. Conversely, the absence of distribution linework for a given stream does not necessarily indicate that the species does not occur in that stream. The observation data were compiled from a variety of disparate sources including but not limited to CDFW, USFS, NMFS, timber companies, and the public. Forms of documentation include CDFW administrative reports, personal communications with biologists, observation reports, and literature reviews. The source of each feature (to the best available knowledge) is included in the data attributes for the observations in the geodatabase, but not for the resulting linework. The spatial data has been referenced to California Streams, a CDFW derivative of USGS National Hydrography Dataset (NHD) High Resolution hydrography.Usage of this dataset:Examples of appropriate uses include:- species recovery planning- Evaluation of future survey sites for the species- Validating species distribution modelsExamples of inappropriate uses include:- Assuming absence of a line feature means that the species are not present in that stream.- Using this data to make parcel or ground level land use management decisions.- Using this dataset to prove or support non-existence of the species at any spatial scale.- Assuming that the line feature represents the maximum possible extent of species distribution.All users of this data should seek the assistance of qualified professionals such as surveyors, hydrologists, or fishery biologists as needed to ensure that such users possess complete, precise, and up to date information on species distribution and water body location.Any copy of this dataset is considered to be a snapshot of the species distribution at the time of release. It is impingent upon the user to ensure that they have the most recent version prior to making management or planning decisions.Please refer to "Use Constraints" section below.

  13. LENS2 ensemble data for selected variables in monthly means (1990-2014)

    • zenodo.org
    bin, text/x-python
    Updated Mar 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marta Alerany Solé; Marta Alerany Solé; Kai Keller; Kai Keller (2025). LENS2 ensemble data for selected variables in monthly means (1990-2014) [Dataset]. http://doi.org/10.5281/zenodo.15045356
    Explore at:
    bin, text/x-pythonAvailable download formats
    Dataset updated
    Mar 23, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marta Alerany Solé; Marta Alerany Solé; Kai Keller; Kai Keller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains selected variables as monthly means, extracted from the original The Community Earth System Model 2 (CESM2) Large Ensemble Community Project (LENS2) dataset. Additionally, it contains observational reference datasets for the respective variables. The data in this bucket is lossy compressed with zfp. We provide full ensembles with 100 members for the variables listed in the table below. The 3d winds and temperature (ua, va, ta) are provided at preassure levels 200 and 850 hPa. Specific humidity is provided at levels 300 and 850 hPa. The full set comprises 4 repositories:

    • LENS2 data repository (1960-1990) (DOI: 10.5281/zenodo.15049672)
    • LENS2 data repository (1990-2014) (DOI: 10.5281/zenodo.15045356)
    • LENS2 data repository (1859-1880) (DOI: 10.5281/zenodo.15049720)
    • Script repository (DOI: 10.5281/zenodo.15052431)

    The LENS2 project provides open access to multi-decadal climate simulation data at 1-degree horizontal resolution, conducted with a large ensemble comprising 100
    members (Rodgers et al., 2021). The simulation covers a historical period (1850-2014) and a future projection (2015-2100), following the Shared Socioeconomic Pathways (SSP) scenario SSP3-7.0.

    For a full description of the dataset, we refer to the webpage of the LENS2 project: https://www.cesm.ucar.edu/community-projects/lens2, and to the article from Rodgers et al., 2021 (DOI: 10.5194/esd-12-1393-2021).

    VariableDescriptionUnitsObservations period 1850-1880Observations period 1960-1990Observations period 1990-2014
    tas2m air temperatureKNOAA 20th Century Reanalysis (V3)ERA5ERA5
    pslSea level pressurePaNOAA 20th Century Reanalysis (V3)ERA5ERA5
    taAir temperatureKNOAA 20th Century Reanalysis (V3)ERA5ERA5
    uaZonal windm/sNOAA 20th Century Reanalysis (V3)ERA5ERA5
    vaMeridional windm/sNOAA 20th Century Reanalysis (V3)ERA5ERA5
    tauuZonal surface stressPa-ERA5ERA5
    tauvMeridional surface stressPa-ERA5ERA5
    husSpecific humiditykg/kgNOAA 20th Century Reanalysis (V3)ERA5ERA5
    tosSea surface temperatureK-ERA5ERA5

  14. u

    Data for Analysis of features in a sliding threshold of observation for...

    • deepblue.lib.umich.edu
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liemohn, Michael W; Adam, Joshua G; Ganushkina, Natalia Y, Data for Analysis of features in a sliding threshold of observation for numeric evaluation (STONE) curve [Dataset]. http://doi.org/10.7302/2mcx-5749
    Explore at:
    Dataset provided by
    Deep Blue Data
    Authors
    Liemohn, Michael W; Adam, Joshua G; Ganushkina, Natalia Y
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    Sep 20, 2013
    Description

    Many statistical tools have been developed to aid in the assessment of a numerical model’s quality at reproducing observations. Some of these techniques focus on the identification of events within the data set, times when the observed value is beyond some threshold value that defines it as a value of keen interest. An example of this is whether it will rain, in which events are defined as any precipitation above some defined amount. A method called the sliding threshold of observation for numeric evaluation (STONE) curve sweeps the event definition threshold of both the model output and the observations, resulting in the identification of threshold intervals for which the model does well at sorting the observations into events and nonevents. An excellent data-model comparison will have a smooth STONE curve, but the STONE curve can have wiggles and ripples in it. These features reveal clusters when the model systematically overestimates or underestimates the observations. This study establishes the connection between features in the STONE curve and attributes of the data-model relationship. The method is applied to a space weather example.

  15. u

    JRA-55: Japanese 55-year Reanalysis, Monthly Means and Variances

    • data.ucar.edu
    • rda-web-prod.ucar.edu
    • +2more
    grib
    Updated Aug 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Japan Meteorological Agency, Japan (2024). JRA-55: Japanese 55-year Reanalysis, Monthly Means and Variances [Dataset]. http://doi.org/10.5065/D60G3H5B
    Explore at:
    gribAvailable download formats
    Dataset updated
    Aug 4, 2024
    Dataset provided by
    Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory
    Authors
    Japan Meteorological Agency, Japan
    Time period covered
    Jan 1, 1958 - Jan 1, 2024
    Area covered
    Earth
    Description

    The Japan Meteorological Agency (JMA) conducted JRA-55, the second Japanese global atmospheric reanalysis project. It covers 55 years, extending back to 1958, coinciding with the establishment of the global radiosonde observing system. Compared to its predecessor, JRA-25, JRA-55 is based on a new data assimilation and prediction system (DA) that improves many deficiencies found in the first Japanese reanalysis. These improvements have come about by implementing higher spatial resolution (TL319L60), a new radiation scheme, four-dimensional variational data assimilation (4D-Var) with Variational Bias Correction (VarBC) for satellite radiances, and introduction of greenhouse gases with time varying concentrations. The entire JRA-55 production was completed in 2013, and thereafter will be continued on a real time basis. Specific early results of quality assessment of JRA-55 indicate that a large temperature bias in the lower stratosphere has been significantly reduced compared to JRA-25 through a combination of the new radiation scheme and application of VarBC (which also reduces unrealistic temperature variations). In addition, a dry land surface anomaly in the Amazon basin has been mitigated, and overall forecast scores are much improved over JRA-25. Most of the observational data employed in JRA-55 are those used in JRA-25. Additionally, newly reprocessed METEOSAT and GMS data were supplied by EUMETSAT and MSC/JMA respectively. Snow depth data over the United States, Russia and Mongolia were supplied by UCAR, RIHMI and IMH respectively. The Data Support Section (DSS) at NCAR has processed the 1.25 degree version of JRA-55 with the RDA (Research Data Archive) archiving and metadata system. The model resolution data has also been acquired, archived and processed as well, including transformation of the TL319L60 grid to a regular latitude-longitude Gaussian grid (320 latitudes by 640 longitudes, nominally 0.5625 degree). All RDA JRA-55 data is available for internet...

  16. Defining Representativeness Heuristic in Trauma Triage Data File

    • figshare.com
    txt
    Updated Nov 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreyus Kulkarni; Barry Dewitt; Matthew R Rosengart; Baruch Fischhoff; Derek C Angus; Donald M. Yealy; Melissa Saul; Deepika Mohan (2018). Defining Representativeness Heuristic in Trauma Triage Data File [Dataset]. http://doi.org/10.6084/m9.figshare.7359527.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 19, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Shreyus Kulkarni; Barry Dewitt; Matthew R Rosengart; Baruch Fischhoff; Derek C Angus; Donald M. Yealy; Melissa Saul; Deepika Mohan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the data set used to define the representativeness heuristic in trauma triage. We performed a retrospective observational cohort study of moderate-to-severely injured patients who presented to non-trauma centers at UPMC from 2010-2014. We identified these patients using a validated algorithm to converted ICD-9 discharge codes into Abbreviated Injury Scale scores and Injury Severity Scores. We then abstracted initial encounter notes from the UPMC medical record for these patients and coded them for evidence of "representative" characteristics. We looked for differences in the presence of these characteristics by injury subgroups in between patients who were appropriately transferred to a trauma center and those who were not. We then performed a multi-variate logistic regression with random effects for hospital to identify the effect of having any representative characteristics at all on odds of transfer while adjusting for other covariates.

  17. f

    Evolution of cohort selection methods and procedures in the CTS from its...

    • plos.figshare.com
    • figshare.com
    xls
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James V. Lacey Jr; Emma S. Spielfogel; Jennifer L. Benbow; Kristen E. Savage; Kai Lin; Cheryl A. M. Anderson; Jessica Clague-DeHart; Christine N. Duffy; Maria Elena Martinez; Hannah Lui Park; Caroline A. Thompson; Sophia S. Wang; Sandeep Chandra (2025). Evolution of cohort selection methods and procedures in the CTS from its beginning, in 1995–1996, through the CTS Researcher Platform. [Dataset]. http://doi.org/10.1371/journal.pone.0296611.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 13, 2025
    Dataset provided by
    PLOS ONE
    Authors
    James V. Lacey Jr; Emma S. Spielfogel; Jennifer L. Benbow; Kristen E. Savage; Kai Lin; Cheryl A. M. Anderson; Jessica Clague-DeHart; Christine N. Duffy; Maria Elena Martinez; Hannah Lui Park; Caroline A. Thompson; Sophia S. Wang; Sandeep Chandra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Evolution of cohort selection methods and procedures in the CTS from its beginning, in 1995–1996, through the CTS Researcher Platform.

  18. Z

    Observational Dataset for "Constraining Global Coronal Models with Multiple...

    • data.niaid.nih.gov
    Updated Mar 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bale, S. D. (2022). Observational Dataset for "Constraining Global Coronal Models with Multiple Independent Observables", Badman et al. (2022). Arxiv : https://arxiv.org/abs/2201.11818 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6342186
    Explore at:
    Dataset updated
    Mar 17, 2022
    Dataset provided by
    Petrie, G. J.
    Rouillard, A. P.
    Warren, H. P..
    Jones, S. I.
    Poirier, N
    Wallace, S
    Harra, L.
    Bale, S. D.
    Brooks, D. H.
    Arge, C. N.
    Kouloumvakos, A
    Velli, M
    Panasenco, O.
    Badman, S. T.
    de Pablos Aguero, D.
    Riley, P.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Observational Dataset for "Constraining Global Coronal Models with Multiple Independent Observables", Badman et al. (2022). Arxiv : https://arxiv.org/abs/2201.11818

    Contact : Samuel T. Badman (he/him) samuel_badman@berkeley.edu, Space Sciences Lab, UC Berkeley.

    License : Creative Commons Attribution 4.0 International

    Research Goal of Dataset : Data supports the above titled work in defining a framework for evaluating the magnetic structure of global coronal models via the evaluation of three single valued metrics. This repository contains observational data products used as input for the studies described in this work with the aim to allow external coronal modelers to reproduce and evaluate their own work against the same dataset we used.

    Structure of files : This repository contains three subfolders each containing observational data relating to the three metrics defined in Badman et. al. (2022). These are :

    1) ``Metric1_EUVCarringtonMaps''

    Content :

    Carrington maps of extreme ultraviolet (EUV) emission as observed by the SDO/AIA. These files contain slices of different wavelengths together, saved in hdf5 format. Maps for Carrington rotations (#### = 2210,2215,2216,2221) span the time intervals of interest in the associated work. The 193 angstrom wavelength slice from these maps were used as input into the EZSEG algorithm (see manuscript text) to generate ``observations'' of coronal hole boundaries which can then be compared via binary classification to modeled open field boundaries.

    A python script which demonstrates reading in the hdf5 files and viewing the names of the different slices, then plots the 193 slice. The slice name of primary interest is 193A ('map_0193'), but slices at 171,211 angstrom, and a magnetogram are included.

    2) ``Metric2_StreamerBelt''
    A python script which demonstrates reading and plotting an example white light carrington map from this data set, as well as overplotting the downstream data extraction of the streamer maximum brightness (SMB) line.

    2a) ``Metric2_StreamerBelt/WL_CarringtonMaps''

    Content :

    Carrington maps of white light intensity extracted at 5.0Rs altitude using coronagraph images taken by SOHO/LASCO, using the method described in the manuscript and Poirier et al. (2021). Maps at a daily cadence over each 60 day time interval studied in the manuscript are included here, incorporating the new data available as the sun rotated. Here saved as fits files.

    Carrington maps as above but saved in .mat format (MATLAB).

    2b) ``Metric2_StreamerBelt/SMB_Line_Extractions''

    Downstream processed versions of the relevant White light carrington map from which the line of maximum brightness (SMB line) has been extracted, as well as the streamer belt "thickness" at each longitude. This is tabulated as a 3d coordinate gridded evenly in longitude, and each SMB grid point as a northwards and southwards thickness, tabulated in degrees. These data are described in the header of each file and the extraction process is described in detail in the manuscript.

    3)Metric3_InSituTimeSeries

    Content :

    In situ polarity timeseries for 60 day intervals at 1 hour cadences during PSP encounters ## = [01,02,03], measured by spacecraft XYZ = [PSP,STA,OMN], Parker Solar Probe, STEREO A and OMNI (Earth-L1 dataset). Data values are +/- 1 indicating if magnetic vector is directed sunward or antisunward for each hour. This value is determined as described in the main text by finding the peak of a histogram of 1D B_R values over that hour interval and taking its sign.

    A python script which demonstrates reading in the in situ timeseries for encounter 1 and plotting them.

    A python script which demonstrates an open source method to produce source surface footpoints for a given spacecraft (here PSP) which can be used to sub-sample a HCS map provided by a modeler to generate a modeled time series which can be used to produce scores for metric 3 described in the associated manuscript.

    Python scripts included in this dataset use python packages

    astropy - https://github.com/astropy/astropy h5py - https://github.com/h5py/h5py astrospice - https://github.com/dstansby/astrospice matplotlib - https://github.com/matplotlib/matplotlib sunpy - https://github.com/sunpy/sunpy

  19. d

    Streamflow Observation Points in the Upper Missouri River Basin, 1973-2018

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Streamflow Observation Points in the Upper Missouri River Basin, 1973-2018 [Dataset]. https://catalog.data.gov/dataset/streamflow-observation-points-in-the-upper-missouri-river-basin-1973-2018
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Missouri River
    Description

    This produced dataset includes spatially aggregated records of measurements and observations from public and private organizations across the Upper Missouri River Basin. For this dataset the Upper Missouri River Basin is defined as Hydrologic Unit Code 1002-1013, and includes portions of the states of Montana, Wyoming, North Dakota, and South Dakota. Streamflow observations, defined as this dataset as the identification of flowing, dry, or pooled streamflow conditions, are an essential part of understanding the relationship between streamflow permanence and climatic and physical factors. For the purpose of this investigation, all streamflow observations were identified as perennial, non-perennial, or pooled to be used in the PROSPER (PRObability of Streamflow PERmanence) model.

  20. ECMWF Reanalysis v5

    • ecmwf.int
    application/x-grib
    Updated Dec 31, 1969
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Centre for Medium-Range Weather Forecasts (1969). ECMWF Reanalysis v5 [Dataset]. https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5
    Explore at:
    application/x-grib(1 datasets)Available download formats
    Dataset updated
    Dec 31, 1969
    Dataset authored and provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    License

    http://apps.ecmwf.int/datasets/licences/copernicushttp://apps.ecmwf.int/datasets/licences/copernicus

    Description

    land and oceanic climate variables. The data cover the Earth on a 31km grid and resolve the atmosphere using 137 levels from the surface up to a height of 80km. ERA5 includes information about uncertainties for all variables at reduced spatial and temporal resolutions.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Laura Miron; Rafael Gonçalves; Mark A. Musen (2023). Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov" [Dataset]. http://doi.org/10.6084/m9.figshare.12743939.v2

Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Laura Miron; Rafael Gonçalves; Mark A. Musen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This fileset provides supporting data and corpora for the empirical study described in: Laura Miron, Rafael S. Goncalves and Mark A. Musen. Obstacles to the Reuse of Metadata in ClinicalTrials.govDescription of filesOriginal data files:- AllPublicXml.zip contains the set of all public XML records in ClinicalTrials.gov (protocols and summary results information), on which all remaining analyses are based. Set contains 302,091 records downloaded on April 3, 2019.- public.xsd is the XML schema downloaded from ClinicalTrials.gov on April 3, 2019, used to validate records in AllPublicXML.BioPortal API Query Results- condition_matches.csv contains the results of querying the BioPortal API for all ontology terms that are an 'exact match' to each condition string scraped from the ClinicalTrials.gov XML. Columns={filename, condition, url, bioportal term, cuis, tuis}. - intervention_matches.csv contains BioPortal API query results for all interventions scraped from the ClinicalTrials.gov XML. Columns={filename, intervention, url, bioportal term, cuis, tuis}.Data Element Definitions- supplementary_table_1.xlsx Mapping of element names, element types, and whether elements are required in ClinicalTrials.gov data dictionaries, the ClinicalTrials.gov XML schema declaration for records (public.XSD), the Protocol Registration System (PRS), FDAAA801, and the WHO required data elements for clinical trial registrations.Column and value definitions: - CT.gov Data Dictionary Section: Section heading for a group of data elements in the ClinicalTrials.gov data dictionary (https://prsinfo.clinicaltrials.gov/definitions.html) - CT.gov Data Dictionary Element Name: Name of an element/field according to the ClinicalTrials.gov data dictionaries (https://prsinfo.clinicaltrials.gov/definitions.html) and (https://prsinfo.clinicaltrials.gov/expanded_access_definitions.html) - CT.gov Data Dictionary Element Type: "Data" if the element is a field for which the user provides a value, "Group Heading" if the element is a group heading for several sub-fields, but is not in itself associated with a user-provided value. - Required for CT.gov for Interventional Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to interventional records (only observational or expanded access) - Required for CT.gov for Observational Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to observational records (only interventional or expanded access) - Required in CT.gov for Expanded Access Records?: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to expanded access records (only interventional or observational) - CT.gov XSD Element Definition: abbreviated xpath to the corresponding element in the ClinicalTrials.gov XSD (public.XSD). The full xpath includes 'clinical_study/' as a prefix to every element. (There is a single top-level element called "clinical_study" for all other elements.) - Required in XSD? : "Yes" if the element is required according to public.XSD, "No" if the element is optional, "-" if the element is not made public or included in the XSD - Type in XSD: "text" if the XSD type was "xs:string" or "textblock", name of enum given if type was enum, "integer" if type was "xs:integer" or "xs:integer" extended with the "type" attribute, "struct" if the type was a struct defined in the XSD - PRS Element Name: Name of the corresponding entry field in the PRS system - PRS Entry Type: Entry type in the PRS system. This column contains some free text explanations/observations - FDAAA801 Final Rule FIeld Name: Name of the corresponding required field in the FDAAA801 Final Rule (https://www.federalregister.gov/documents/2016/09/21/2016-22129/clinical-trials-registration-and-results-information-submission). This column contains many empty values where elements in ClinicalTrials.gov do not correspond to a field required by the FDA - WHO Field Name: Name of the corresponding field required by the WHO Trial Registration Data Set (v 1.3.1) (https://prsinfo.clinicaltrials.gov/trainTrainer/WHO-ICMJE-ClinTrialsgov-Cross-Ref.pdf)Analytical Results:- EC_human_review.csv contains the results of a manual review of random sample eligibility criteria from 400 CT.gov records. Table gives filename, criteria, and whether manual review determined the criteria to contain criteria for "multiple subgroups" of participants.- completeness.xlsx contains counts and percentages of interventional records missing fields required by FDAAA801 and its Final Rule.- industry_completeness.xlsx contains percentages of interventional records missing required fields, broken up by agency class of trial's lead sponsor ("NIH", "US Fed", "Industry", or "Other"), and before and after the effective date of the Final Rule- location_completeness.xlsx contains percentages of interventional records missing required fields, broken up by whether record listed at least one location in the United States and records with only international location (excluding trials with no listed location), and before and after the effective date of the Final RuleIntermediate Results:- cache.zip contains pickle and csv files of pandas dataframes with values scraped from the XML records in AllPublicXML. Downloading these files greatly speeds up running analysis steps from jupyter notebooks in our github repository.

Search
Clear search
Close search
Google apps
Main menu