17 datasets found
  1. Z

    Example subjects for Mobilise-D data standardization

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palmerini, Luca; Reggi, Luca; Bonci, Tecla; Del Din, Silvia; Micó-Amigo, Encarna; Salis, Francesca; Bertuletti, Stefano; Caruso, Marco; Cereatti, Andrea; Gazit, Eran; Paraschiv-Ionescu, Anisoara; Soltani, Abolfazl; Kluge, Felix; Küderle, Arne; Ullrich, Martin; Kirk, Cameron; Hiden, Hugo; D'Ascanio, Ilaria; Hansen, Clint; Rochester, Lynn; Mazzà, Claudia; Chiari, Lorenzo; on behalf of the Mobilise-D consortium (2022). Example subjects for Mobilise-D data standardization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7185428
    Explore at:
    Dataset updated
    Oct 11, 2022
    Dataset provided by
    Newcastle University, Translational and Clinical Research Institute, Faculty of Medical Sciences, UK. The Newcastle upon Tyne NHS Foundation Trust, UK.
    Newcastle University, Translational and Clinical Research Institute, Faculty of Medical Sciences, UK.
    Neurogeriatrics Kiel, Department of Neurology, University Hospital Schleswig-Holstein, Germany.
    The University of Sheffield, INSIGNEO Institute for in silico Medicine, UK. The University of Sheffield, Department of Mechanical Engineering, UK
    University of Bologna, Department of Electrical, Electronic and Information Engineering 'Guglielmo Marconi', Italy. University of Bologna, Health Sciences and Technologies—Interdepartmental Center for Industrial Research (CIRI-SDV), Italy
    Laboratory of Movement Analysis and Measurement, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland.
    Newcastle University, School of Computing, UK.
    Machine Learning and Data Analytics Lab, Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, Germany.
    Tel Aviv Sourasky Medical Center, Center for the Study of Movement, Cognition and Mobility, Neurological Institute, Israel.
    University of Sassari, Department of Biomedical Sciences, Italy.
    University of Bologna, Health Sciences and Technologies—Interdepartmental Center for Industrial Research (CIRI-SDV), Italy
    Politecnico di Torino, Department of Electronics and Telecommunications, Italy. Politecnico di Torino, PolitoBIOMed Lab – Biomedical Engineering Lab, Italy.
    https://www.mobilise-d.eu/partners
    University of Bologna, Department of Electrical, Electronic and Information Engineering 'Guglielmo Marconi', Italy.
    Politecnico di Torino, Department of Electronics and Telecommunications, Italy.
    Authors
    Palmerini, Luca; Reggi, Luca; Bonci, Tecla; Del Din, Silvia; Micó-Amigo, Encarna; Salis, Francesca; Bertuletti, Stefano; Caruso, Marco; Cereatti, Andrea; Gazit, Eran; Paraschiv-Ionescu, Anisoara; Soltani, Abolfazl; Kluge, Felix; Küderle, Arne; Ullrich, Martin; Kirk, Cameron; Hiden, Hugo; D'Ascanio, Ilaria; Hansen, Clint; Rochester, Lynn; Mazzà, Claudia; Chiari, Lorenzo; on behalf of the Mobilise-D consortium
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Standardized data from Mobilise-D participants (YAR dataset) and pre-existing datasets (ICICLE, MSIPC2, Gait in Lab and real-life settings, MS project, UNISS-UNIGE) are provided in the shared folder, as an example of the procedures proposed in the publication "Mobility recorded by wearable devices and gold standards: the Mobilise-D procedure for data standardization" that is currently under review in Scientific data. Please refer to that publication for further information. Please cite that publication if using these data.

    The code to standardize an example subject (for the ICICLE dataset) and to open the standardized Matlab files in other languages (Python, R) is available in github (https://github.com/luca-palmerini/Procedure-wearable-data-standardization-Mobilise-D).

  2. Meta data and supporting documentation

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  3. Z

    Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mast, Austin R.; Paul, Deborah L.; Rios, Nelson; Bruhn, Robert; Dalton, Trevor; Krimmel, Erica R.; Pearson, Katelin D.; Sherman, Aja; Shorthouse, David P.; Simmons, Nancy B.; Soltis, Pam; Upham, Nathan; Abibou, Djihbrihou (2024). Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3974999
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    University of Florida
    Yale University Peabody Museum of Natural History
    American Museum of Natural History
    Agriculture and Agri-Food Canada
    Florida State University
    Arizona State University
    Authors
    Mast, Austin R.; Paul, Deborah L.; Rios, Nelson; Bruhn, Robert; Dalton, Trevor; Krimmel, Erica R.; Pearson, Katelin D.; Sherman, Aja; Shorthouse, David P.; Simmons, Nancy B.; Soltis, Pam; Upham, Nathan; Abibou, Djihbrihou
    License

    https://creativecommons.org/licenses/publicdomain/https://creativecommons.org/licenses/publicdomain/

    Area covered
    World
    Description

    This repository is associated with NSF DBI 2033973, RAPID Grant: Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses (https://www.nsf.gov/awardsearch/showAward?AWD_ID=2033973). Specifically, this repository contains (1) raw data from iDigBio (http://portal.idigbio.org) and GBIF (https://www.gbif.org), (2) R code for reproducible data wrangling and improvement, (3) protocols associated with data enhancements, and (4) enhanced versions of the dataset published at various project milestones. Additional code associated with this grant can be found in the BIOSPEX repository (https://github.com/iDigBio/Biospex). Long-term data management of the enhanced specimen data created by this project is expected to be accomplished by the natural history collections curating the physical specimens, a list of which can be found in this Zenodo resource.

    Grant abstract: "The award to Florida State University will support research contributing to the development of georeferenced, vetted, and versioned data products of the world's specimens of horseshoe bats and their relatives for use by researchers studying the origins and spread of SARS-like coronaviruses, including the causative agent of COVID-19. Horseshoe bats and other closely related species are reported to be reservoirs of several SARS-like coronaviruses. Species of these bats are primarily distributed in regions where these viruses have been introduced to populations of humans. Currently, data associated with specimens of these bats are housed in natural history collections that are widely distributed both nationally and globally. Additionally, information tying these specimens to localities are mostly vague, or in many instances missing. This decreases the utility of the specimens for understanding the source, emergence, and distribution of SARS-COV-2 and similar viruses. This project will provide quality georeferenced data products through the consolidation of ancillary information linked to each bat specimen, using the extended specimen model. The resulting product will serve as a model of how data in biodiversity collections might be used to address emerging diseases of zoonotic origin. Results from the project will be disseminated widely in opensource journals, at scientific meetings, and via websites associated with the participating organizations and institutions. Support of this project provides a quality resource optimized to inform research relevant to improving our understanding of the biology and spread of SARS-CoV-2. The overall objectives are to deliver versioned data products, in formats used by the wider research and biodiversity collections communities, through an open-access repository; project protocols and code via GitHub and described in a peer-reviewed paper, and; sustained engagement with biodiversity collections throughout the project for reintegration of improved data into their local specimen data management systems improving long-term curation.

    This RAPID award will produce and deliver a georeferenced, vetted and consolidated data product for horseshoe bats and related species to facilitate understanding of the sources, distribution, and spread of SARS-CoV-2 and related viruses, a timely response to the ongoing global pandemic caused by SARS-CoV-2 and an important contribution to the global effort to consolidate and provide quality data that are relevant to understanding emergent and other properties the current pandemic. This RAPID award is made by the Division of Biological Infrastructure (DBI) using funds from the Coronavirus Aid, Relief, and Economic Security (CARES) Act.

    This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria."

    Files included in this resource

    9d4b9069-48c4-4212-90d8-4dd6f4b7f2a5.zip: Raw data from iDigBio, DwC-A format

    0067804-200613084148143.zip: Raw data from GBIF, DwC-A format

    0067806-200613084148143.zip: Raw data from GBIF, DwC-A format

    1623690110.zip: Full export of this project's data (enhanced and raw) from BIOSPEX, CSV format

    bionomia-datasets-attributions.zip: Directory containing 103 Frictionless Data packages for datasets that have attributions made containing Rhinolophids or Hipposiderids, each package also containing a CSV file for mismatches in person date of birth/death and specimen eventDate. File bionomia-datasets-attributions-key_2021-02-25.csv included in this directory provides a key between dataset identifier (how the Frictionless Data package files are named) and dataset name.

    bionomia-problem-dates-all-datasets_2021-02-25.csv: List of 21 Hipposiderid or Rhinolophid records whose eventDate or dateIdentified mismatches a wikidata recipient’s date of birth or death across all datasets.

    flagEventDate.txt: file containing term definition to reference in DwC-A

    flagExclude.txt: file containing term definition to reference in DwC-A

    flagGeoreference.txt: file containing term definition to reference in DwC-A

    flagTaxonomy.txt: file containing term definition to reference in DwC-A

    georeferencedByID.txt: file containing term definition to reference in DwC-A

    identifiedByNames.txt: file containing term definition to reference in DwC-A

    instructions-to-get-people-data-from-bionomia-via-datasetKey: instructions given to data providers

    RAPID-code_collection-date.R: code associated with enhancing collection dates

    RAPID-code_compile-deduplicate.R: code associated with compiling and deduplicating raw data

    RAPID-code_external-linkages-bold.R: code associated with enhancing external linkages

    RAPID-code_external-linkages-genbank.R: code associated with enhancing external linkages

    RAPID-code_external-linkages-standardize.R: code associated with enhancing external linkages

    RAPID-code_people.R: code associated with enhancing data about people

    RAPID-code_standardize-country.R: code associated with standardizing country data

    RAPID-data-dictionary.pdf: metadata about terms included in this project’s data, in PDF format

    RAPID-data-dictionary.xlsx: metadata about terms included in this project’s data, in spreadsheet format

    rapid-data-providers_2021-05-03.csv: list of data providers and number of records provided to rapid-joined-records_country-cleanup_2020-09-23.csv

    rapid-final-data-product_2021-06-29.zip: Enhanced data from BIOSPEX, DwC-A format

    rapid-final-gazetteer.zip: Gazetteer providing georeference data and metadata for 10,341 localities assessed as part of this project

    rapid-joined-records_country-cleanup_2020-09-23.csv: data product initial version where raw data has been compiled and deduplicated, and country data has been standardized

    RAPID-protocol_collection-date.pdf: protocol associated with enhancing collection dates

    RAPID-protocol_compile-deduplicate.pdf: protocol associated with compiling and deduplicating raw data

    RAPID-protocol_external-linkages.pdf: protocol associated with enhancing external linkages

    RAPID-protocol_georeference.pdf: protocol associated with georeferencing

    RAPID-protocol_people.pdf: protocol associated with enhancing data about people

    RAPID-protocol_standardize-country.pdf: protocol associated with standardizing country data

    RAPID-protocol_taxonomic-names.pdf: protocol associated with enhancing taxonomic name data

    RAPIDAgentStrings1_archivedCopy_30March2021.ods: resource used in conjunction with RAPID people protocol

    recordedByNames.txt: file containing term definition to reference in DwC-A

    Rhinolophid-HipposideridAgentStrings_and_People2_archivedCopy_30March2021.ods: resource used in conjunction with RAPID people protocol

    wikidata-notes-for-bat-collectors_leachman_2020: please see https://zenodo.org/record/4724139 for this resource

  4. Simulation Data Set

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  5. CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for...

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thilde Terkelsen; Anders Krogh; Elena Papaleo (2023). CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for the analysis of quantitative biological data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007665
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Thilde Terkelsen; Anders Krogh; Elena Papaleo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the improvement of -omics and next-generation sequencing (NGS) methodologies, along with the lowered cost of generating these types of data, the analysis of high-throughput biological data has become standard both for forming and testing biomedical hypotheses. Our knowledge of how to normalize datasets to remove latent undesirable variances has grown extensively, making for standardized data that are easily compared between studies. Here we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper (https://github.com/ELELAB/CAncer-bioMarker-Prediction-Pipeline -CAMPP) intended to aid bioinformatic software-users with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. To avoid issues relating to R-package updates, a renv .lock file is provided to ensure R-package stability. Data-management includes missing value imputation, data normalization, and distributional checks. CAMPP performs (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis, and (VI) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist in streamlining bioinformatic analysis of quantitative biological data, whilst ensuring an appropriate bio-statistical framework.

  6. f

    Data from: hccTAAb Atlas: An Integrated Knowledge Database for...

    • acs.figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Dec 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tiandong Li; Peng Wang; Guiying Sun; Yuanlin Zou; Yifan Cheng; Han Wang; Yin Lu; Jianxiang Shi; Keyan Wang; Qiang Zhang; Hua Ye (2023). hccTAAb Atlas: An Integrated Knowledge Database for Tumor-Associated Autoantibodies in Hepatocellular Carcinoma [Dataset]. http://doi.org/10.1021/acs.jproteome.3c00579.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 29, 2023
    Dataset provided by
    ACS Publications
    Authors
    Tiandong Li; Peng Wang; Guiying Sun; Yuanlin Zou; Yifan Cheng; Han Wang; Yin Lu; Jianxiang Shi; Keyan Wang; Qiang Zhang; Hua Ye
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Tumor-associated autoantibodies (TAAbs) have demonstrated potential as biomarkers for cancer detection. However, the understanding of their role in hepatocellular carcinoma (HCC) remains limited. In this study, we aimed to systematically collect and standardize information about these TAAbs and establish a comprehensive database as a platform for in-depth research. A total of 170 TAAbs were identified from published papers retrieved from PubMed, Web of Science, and Embase. Following normative reannotation, these TAAbs were referred to as 162 official symbols. The hccTAAb (tumor-associated autoantibodies in hepatocellular carcinoma) atlas was developed using the R Shiny framework and incorporating literature-based and multiomics data sets. This comprehensive online resource provides key information such as sensitivity, specificity, and additional details such as official symbols, official full names, UniProt, NCBI, HPA, neXtProt, and aliases through hyperlinks. Additionally, hccTAAb offers six analytical modules for visualizing expression profiles, survival analysis, immune infiltration, similarity analysis, DNA methylation, and DNA mutation analysis. Overall, the hccTAAb Atlas provides valuable insights into the mechanisms underlying TAAb and has the potential to enhance the diagnosis and treatment of HCC using autoantibodies. The hccTAAb Atlas is freely accessible at https://nscc.v.zzu.edu.cn/hccTAAb/.

  7. e

    Standardized NEON organismal data (neonDivData)

    • portal.edirepository.org
    bin, csv
    Updated Apr 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daijiang Li; Sydne Record; Eric Sokol; Matthew Bitters; Melissa Chen; Anny Chung; Matthew Helmus; Ruvi Jaimes; Lara Jansen; Marta Jarzyna; Michael Just; Jalene LaMontagne; Brett Melbourne; Wynne Moss; Kari Norman; Stephanie Parker; Natalie Robinson; Bijan Seyednasrollah; Sarah Spaulding; Thilina Surasinghe; Sarah Thomsen; Phoebe Zarnetske (2022). Standardized NEON organismal data (neonDivData) [Dataset]. http://doi.org/10.6073/pasta/c28dd4f6e7989003505ea02e9a92afbf
    Explore at:
    csv(67793652 bytes), csv(266884330 bytes), csv(4643854 bytes), csv(12011 bytes), csv(944312 bytes), csv(6879 bytes), csv(25181268 bytes), csv(1949590 bytes), csv(375200 bytes), csv(3062147 bytes), csv(35160044 bytes), csv(738408 bytes), csv(18427828 bytes), csv(604110 bytes), csv(35684117 bytes), csv(86101256 bytes), bin(20729 bytes), bin(4674 bytes)Available download formats
    Dataset updated
    Apr 12, 2022
    Dataset provided by
    EDI
    Authors
    Daijiang Li; Sydne Record; Eric Sokol; Matthew Bitters; Melissa Chen; Anny Chung; Matthew Helmus; Ruvi Jaimes; Lara Jansen; Marta Jarzyna; Michael Just; Jalene LaMontagne; Brett Melbourne; Wynne Moss; Kari Norman; Stephanie Parker; Natalie Robinson; Bijan Seyednasrollah; Sarah Spaulding; Thilina Surasinghe; Sarah Thomsen; Phoebe Zarnetske
    Time period covered
    Jun 5, 2013 - Jul 28, 2020
    Area covered
    Variables measured
    sex, unit, year, State, endRH, month, sites, units, value, boutID, and 113 more
    Description

    To standardize NEON organismal data for major taxonomic groups, we first systematically reviewed NEON’s documentations for each taxonomic group. We then discussed as a group and with NEON staff to decide how to wrangle and standardize NEON organismal data. See Li et al. 2022 for more details. All R code to process NEON data products can be obtained through the R package ‘ecocomDP’. Once the data are in ecocomDP format, we further processed them to convert them into long data frames with code on Github (https://github.com/daijiang/neonDivData/tree/master/data-raw), which is also archived here.

  8. n

    Dataset: A three-dimensional approach to general plant fire syndromes

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Jaureguiberry; Sandra Díaz (2023). Dataset: A three-dimensional approach to general plant fire syndromes [Dataset]. http://doi.org/10.5061/dryad.j6q573njb
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 27, 2023
    Dataset provided by
    Instituto Multidisciplinario de Biología Vegetal (CONICET-Universidad Nacional de Córdoba) and FCEFyN
    Authors
    Pedro Jaureguiberry; Sandra Díaz
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description
    1. Plant fire syndromes are usually defined as combinations of fire response traits, the most common being resprouting (R) and seeding (S). Plant flammability (F), on the other hand, refers to a plant’s effects on communities and ecosystems. Despite its important ecological and evolutionary implications, F has rarely been considered to define plant fire syndromes and, if so, usually separated from response syndromes.
    2. We propose a three-dimensional model that combines R, S and F, encapsulating both plant response to fire regimes and the capacity to promote them. Each axis is divided into three possible standardized categories, reflecting low, medium and high values of each variable, with a total of 27 possible combinations of R, S and F.
    3. We hypothesized that different fire histories should be reflected in the position of species within the three-dimensional space and that this should help assess the importance of fire as an evolutionary force in determining R-S-F syndromes.
    4. To illustrate our approach we compiled information on the fire syndromes of 24 dominant species of different growth forms from the Chaco seasonally-dry forest of central Argentina, and we compared them to 33 species from different Mediterranean-type climate ecosystems (MTCEs) of the world.
    5. Chaco and MTCEs species differed in the range (seven syndromes vs. thirteen syndromes, respectively) and proportion of extreme syndromes (i.e. species with extreme values of R, S and/or F) representing 29% of species in the Chaco vs. 45% in the MTCEs.
    6. Additionally, we explored the patterns of R, S and F of 4032 species from seven regions with contrasting fire histories, and found significantly higher frequencies of extreme values (predominantly high) of all three variables in MTCEs compared to the other regions, where intermediate and low values predominated, broadly supporting our general hypothesis.
    7. The proposed three-dimensional approach should help standardize comparisons of fire syndromes across taxa, growth forms and regions with different fire histories. This will contribute to the understanding of the role of fire in the evolution of plant traits and assist vegetation modelling in the face of changes in fire regimes. Methods Data collection for Chaco species From previous studies, we compiled data on post-fire resprouting (R) (Jaureguiberry 2012; Jaureguibery et al. 2020), germination capacity after heat shock treatments (S) (Jaureguiberry & Díaz) and flammability (F) (Jaureguiberry et al. 2011) of 24 dominant species of the seasonally-dry Chaco forest of central Argentina (hereafter Chaco). We then transformed the original data from the mentioned studies into three possible categorical ordinal values: 1, 2 or 3, indicating low, medium and high values of each variable, respectively. To do so, we used the following criteria: 1) For R data: we focused on the survival percentage recorded for each species (Jaureguiberry et al., 2020) as a proxy for resprouting capacity (Pérez-Harguindeguy et al., 2013). This was because this variable is widely used in fire studies and has a standard scale and range of values, therefore facilitating comparisons between species from different regions. Survival percentages were assigned to one of three possible intervals: 0 to 33 %; 34 to 66 % and from 67 to 100%, and then each interval was assigned the value 1, 2 or 3 respectively, indicating low, medium and high values of resprouting capacity. 2) For S data: based on germination response to heat shock treatments we classified species as heat-sensitive (germination lower than the control), heat-tolerant (germination similar to the control) or heat-stimulated (germination higher than the control) (see details in Jaureguiberry and Díaz 2015). Each of these categories was respectively assigned a value of 1, 2 or 3. 3) For F data: while original measurements included burning rate, maximum temperature and biomass consumed (see details in Jaureguiberry et al. 2011), with the purpose of comparing Chaco species with species from other regions, and considering that burning rate is rarely reported, data of the two latter variables were collected from studies that followed Jaureguiberry et al. (2011). A PCA followed by cluster analysis allowed classifying species into the following categories: 1=low flammability; 2=moderate flammability; and 3=high flammability.

    Data collection for other regions We performed an unstructured literature review of fire-related traits relevant to our model. Whenever possible, we searched for the same or similar variables to those used for the Chaco, namely survival percentage, germination response to heat shock, and variables related to flammability (e.g. maximum temperature, biomass consumed and burning rate), as proxies for R, S and F, respectively. Classification into different R intervals was based either on quantitative data on survival percentage, or on qualitative information from major databases. For example, resprouting capacity reported as “low”, or “high” (e.g. Tavşanoğlu & Pausas, 2018) were assigned R values of 1 and 3, respectively. For Southern Australian species, those reported as “fire killed” and “weak resprouting” (Falster et al., 2021) were assigned a value of 1, while those reported as “intermediate resprouting” and “strong resprouting” were assigned values of 2 and 3, respectively. The vast majority of records in our dataset refer to resprouting of individuals one growing season after the fire. Flammability data for most of the species were based on quantitative measurements that have used the method of Jaureguiberry et al. (2011), which was standardised following the criteria explained earlier. However, for some species, classification was based either on other quantitative measures that followed other methodologies (e.g. measures based on plant parts such as twigs or leaves, or fuel beds) or on qualitative classifications reported in the literature (most of which are in turn based on reviews of quantitative measurements from previous studies). We standardised the original data collected for the other regions following the same approach as for the Chaco. We then built contingency tables to analyse each region and to compare between regions. The curated total number of records from our literature review was 4411 (records for R, S and F, were 3399, 678 and 334, respectively) for 4,032 species (many species had information on two variables, and very few on the three variables). The database covers a wide taxonomic range, encompassing species from approximately 1,250 genera and 180 botanical families, belonging to ten different growth forms, and coming from seven major regions with a wide range of evolutionary histories of fire, from long and intense (Mediterranean-Type Climate Ecosystems) to very recent (New Zealand).

  9. Data for: Patel et al. Carbon flux estimates are sensitive to data source: A...

    • osti.gov
    • knb.ecoinformatics.org
    • +1more
    Updated Dec 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental System Science Data Infrastructure for a Virtual Ecosystem (2021). Data for: Patel et al. Carbon flux estimates are sensitive to data source: A comparison of field and lab temperature sensitivity data [Dataset]. http://doi.org/10.15485/1889750
    Explore at:
    Dataset updated
    Dec 31, 2021
    Dataset provided by
    Department of Energy Biological and Environmental Research Program
    Office of Sciencehttp://www.er.doe.gov/
    Environmental System Science Data Infrastructure for a Virtual Ecosystem
    Soil Carbon Biogeochemistry
    Description

    This dataset contains data and code used for the paper "Carbon flux estimates are sensitive to data source: A comparison of field and lab temperature sensitivity data" [DOI COMING SOON]A large literature exists on mechanisms driving soil production of the greenhouse gases CO2 and CH4. Measurements of these gases’ fluxes are often performed using closed-chamber incubations in the laboratory or in situ, i.e., in the field. Although it is common knowledge that measurements obtained through field studies vs. laboratory incubations can diverge because of the vastly different conditions of these environments, few studies have systematically examined these patterns. It is crucial to understand the magnitude and reasons for any differences, as these data are used to parametrize and benchmark ecosystem- to global-scale models, which are then susceptible to the biases of the source data. Here, we specifically examine how greenhouse gas measurements may be influenced by whether the measurement/incubation was conducted in the field vs. laboratory, focusing on CO2 and CH4 measurements. We use Q10 of greenhouse gas flux (temperature sensitivity) for our analyses, because of the ubiquity of this metric in biological and Earth system sciences and its importance to many modeling frameworks. We predicted that laboratory measurements would be less variable, but also less representative of true field conditions. However, there was greater variability in the Q10 values calculated from lab-based measurements of CO2 fluxes, because lab experiments explore extremes rarely seen in situ, and reflect the physical and chemical disturbances occurring during sampling, transport, and incubation. Overall, respiration Q10 values were significantly greater in laboratory incubations (mean = 4.19) than field measurements (mean = 3.05), with strong influences of incubation temperature and climate region/biome. However, this was in part because field measurements typically represent total respiration (Rs), whereas lab incubations typically represent heterotrophic respiration (Rh), making direct comparisons difficult to interpret. Focusing only on Rh-derived Q10, these values showed almost identical distributions across laboratory (n = 1110) and field (n = 581) experiments, providing strong support for using the former as an experimental proxy for the latter, although we caution that geographic biases in the extant data make this conclusion tentative. Due to a smaller sample size of CH4 Q10 data, we were unable to perform a comparable robust analysis, but we expect similar interactions with soil temperature, moisture, and environmental/climatic variables. Our results here suggest the need for more concerted efforts to document and standardize these data, including sample and site metadata. This dataset contains a compressed (.zip) archive of the data and R scripts used for this manuscript. The dataset includes files in .csv format, which can be accessed and processed using MS Excel or R. This archive can also be accessed on GitHub at https://github.com/kaizadp/field_lab_q10 (DOI: 10.5281/zenodo.7106554).

  10. n

    Methods for normalizing microbiome data: an ecological perspective

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Oct 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger (2018). Methods for normalizing microbiome data: an ecological perspective [Dataset]. http://doi.org/10.5061/dryad.tn8qs35
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 30, 2018
    Dataset provided by
    University of New England
    James Cook University
    Authors
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description
    1. Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity), Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting community-level patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs. 2. We tested these theoretical predictions via simulations and a real-world data set. 3. Proportions and rarefying produced more accurate comparisons among communities and were the only methods that fully normalized read depths across samples. Additionally, upper quartile, CSS, edgeR-TMM, and DESeq-VS often masked differences among communities when common OTUs differed, and they produced false positives when rare OTUs differed. 4. Based on our simulations, normalizing via proportions may be superior to other commonly used methods for comparing ecological communities.
  11. d

    New Homeowner Data | USA Coverage | 74% Right Party Contact Rate | BatchData...

    • datarade.ai
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BatchData (2023). New Homeowner Data | USA Coverage | 74% Right Party Contact Rate | BatchData [Dataset]. https://datarade.ai/data-products/new-homeowner-data-usa-coverage-74-right-party-contact-r-batchdata
    Explore at:
    .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset authored and provided by
    BatchData
    Area covered
    United States
    Description

    Set Up We’ll help ensure you’re set up to get the data you need, how you need it. We’ll help you through provisioning the extraction, enrichment, formatting, delivery/update schedule, and reporting around your data. With hundreds of unique data points available, the information you need to find leads fast is at your fingertips - new homeowner data, home ownership data, B2C contact data and more, built for professional services companies.

    Custom Development We provide technical resources to support integration and delivery requirements specific to your business needs, augmenting developer resources to keep your team focused on other tasks.

    Enrichment Services Enrichment services improve the accuracy, completeness, and depth of your dataset by regularly filling in blank values, and updating outdated records. We’ll help ensure that the specific data points, update candances, and replacement rules fit your GTM strategy.

    Analysis Healthcheck We’ll audit your organization’s data health and usage strategy, and make sure you’re focused on the right KPIs and performance metrics.

    Implementation Support From technical architecture to scheduled and flexible delivery of data in multiple formats, we make it easy to realize the value of better data.

    Data Blending & Enhancement Combine multiple data sources to create a single, new dataset to standardize operations and enable better reporting.

  12. g

    2014 Wind Turbine Gearbox Damage Distribution based on the NREL Gearbox...

    • gimi9.com
    Updated Nov 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). 2014 Wind Turbine Gearbox Damage Distribution based on the NREL Gearbox Reliability Database | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_2014-wind-turbine-gearbox-damage-distribution-based-on-the-nrel-gearbox-reliability-databa/
    Explore at:
    Dataset updated
    Nov 6, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Despite the improvements in wind turbine gearbox design and manufacturing practices, the wind industry is still challenged by premature wind turbine gearbox failures. To help address this industry-wide challenge, a consortium called the Gearbox Reliability Collaborative (GRC) was launched by the U.S. Department of Energy's (DOE's) National Renewable Energy Laboratory (NREL) in 2007. It brings together the parties involved in the wind turbine gearbox supply chain to investigate possible root causes of premature wind turbine gearbox failures and research improvement strategies. One area of work under the GRC is a collection and analysis effort of failure data titled the Gearbox Reliability Database (GRD), which was started in 2009. The main objectives of the GRD are to categorize top wind turbine gearbox failure modes, identify possible root causes, and direct future wind turbine gearbox reliability research and development (R) activities. The GRD project has made huge progress in developing wind turbine gearbox reliability data collection tools, standardizing data recording practices, populating the database, and informing the industry with up-to-date R findings and statistics. The project currently has more than 20 data-sharing partners, including wind turbine and wind turbine gearbox manufacturers and owners/operators, gearbox rebuild shops, and operation and maintenance service providers. The assets represented by only owner/operator partners on this project comprised approximately 34% of the U.S. capacity at the end of 2013 (according to the American Wind Energy Association annual market report). The number and variety of partners and the assets they represent demonstrate the value and need of major wind turbine component data collection and analysis to the industry. The attached image shows the distribution of the damage component locations based on approximately 320 confirmable wind turbine gearbox damage records stored in the database. It is observed that wind turbine gearboxes could fail in drastically different ways. The majority of the damage occurs to bearings (64%), followed by gears (25%), and the other components account for 11% of the failures. Among the other components, lubrication and filtration system problems are dominant. Both bearing and gear faults are concentrated in the parallel section, which aligns with field observations made by wind turbine owner/operator partners. The top gearbox failure is axial cracks that occur to bearings located at the high- or intermediate-speed stage. This identification confirms the value and need for wind turbine gearbox R on bearing axial cracks root causes and mitigation methods, which is a joint research effort by Argonne National Laboratory and NREL funded by DOE's Wind and Water Power Program. The data-sharing partners highly value this project and recommend that NREL generate industry-wide reliability benchmarking statistics from the information contained in the database, which is currently not publicly available. Frequently, these reliability statistics are distorted and kept internally if they are generated by wind turbine original equipment manufacturers or owners/operators, which do not normally have a balanced representation of wind turbine makers and models. The GRD experiences provide the industry with valuable support to standardize reliability data collection for major wind turbine components and subsystems.

  13. Naturalistic Neuroimaging Database

    • openneuro.org
    Updated Apr 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2021). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v1.1.3
    Explore at:
    Dataset updated
    Apr 20, 2021
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Overview

    • The Naturalistic Neuroimaging Database (NNDb v2.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.
    • The NNDb V2.0 is now on Neuroscout, a platform for fast and flexible re-analysis of (naturalistic) fMRI studies. See: https://neuroscout.org/

    v2.0 Changes

    • Overview
      • We have replaced our own preprocessing pipeline with that implemented in AFNI’s afni_proc.py, thus changing only the derivative files. This introduces a fix for an issue with our normalization (i.e., scaling) step and modernizes and standardizes the preprocessing applied to the NNDb derivative files. We have done a bit of testing and have found that results in both pipelines are quite similar in terms of the resulting spatial patterns of activity but with the benefit that the afni_proc.py results are 'cleaner' and statistically more robust.
    • Normalization

      • Emily Finn and Clare Grall at Dartmouth and Rick Reynolds and Paul Taylor at AFNI, discovered and showed us that the normalization procedure we used for the derivative files was less than ideal for timeseries runs of varying lengths. Specifically, the 3dDetrend flag -normalize makes 'the sum-of-squares equal to 1'. We had not thought through that an implication of this is that the resulting normalized timeseries amplitudes will be affected by run length, increasing as run length decreases (and maybe this should go in 3dDetrend’s help text). To demonstrate this, I wrote a version of 3dDetrend’s -normalize for R so you can see for yourselves by running the following code:
      # Generate a resting state (rs) timeseries (ts)
      # Install / load package to make fake fMRI ts
      # install.packages("neuRosim")
      library(neuRosim)
      # Generate a ts
      ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1)
      # 3dDetrend -normalize
      # R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1"
      # Do for the full timeseries
      ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2));
      # Do this again for a shorter version of the same timeseries
      ts.shorter.length <- length(ts.normalised.long)/4
      ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2));
      # By looking at the summaries, it can be seen that the median values become  larger
      summary(ts.normalised.long)
      summary(ts.normalised.short)
      # Plot results for the long and short ts
      # Truncate the longer ts for plotting only
      ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length]
      # Give the plot a title
      title <- "3dDetrend -normalize for long (blue) and short (red) timeseries";
      plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short)));
      # Add zero line
      lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey');
      # 3dDetrend -normalize -polort 0 for long timeseries
      lines(ts.normalised.long.made.shorter, col='blue');
      # 3dDetrend -normalize -polort 0 for short timeseries
      lines(ts.normalised.short, col='red');
      
    • Standardization/modernization

      • The above individuals also encouraged us to implement the afni_proc.py script over our own pipeline. It introduces at least three additional improvements: First, we now use Bob’s @SSwarper to align our anatomical files with an MNI template (now MNI152_2009_template_SSW.nii.gz) and this, in turn, integrates nicely into the afni_proc.py pipeline. This seems to result in a generally better or more consistent alignment, though this is only a qualitative observation. Second, all the transformations / interpolations and detrending are now done in fewers steps compared to our pipeline. This is preferable because, e.g., there is less chance of inadvertently reintroducing noise back into the timeseries (see Lindquist, Geuter, Wager, & Caffo 2019). Finally, many groups are advocating using tools like fMRIPrep or afni_proc.py to increase standardization of analyses practices in our neuroimaging community. This presumably results in less error, less heterogeneity and more interpretability of results across studies. Along these lines, the quality control (‘QC’) html pages generated by afni_proc.py are a real help in assessing data quality and almost a joy to use.
    • New afni_proc.py command line

      • The following is the afni_proc.py command line that we used to generate blurred and censored timeseries files. The afni_proc.py tool comes with extensive help and examples. As such, you can quickly understand our preprocessing decisions by scrutinising the below. Specifically, the following command is most similar to Example 11 for ‘Resting state analysis’ in the help file (see https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html): afni_proc.py \ -subj_id "$sub_id_name_1" \ -blocks despike tshift align tlrc volreg mask blur scale regress \ -radial_correlate_blocks tcat volreg \ -copy_anat anatomical_warped/anatSS.1.nii.gz \ -anat_has_skull no \ -anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \ -anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \ -anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \ -anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \ -anat_follower_erode fsvent fswm \ -dsets media_?.nii.gz \ -tcat_remove_first_trs 8 \ -tshift_opts_ts -tpattern alt+z2 \ -align_opts_aea -cost lpc+ZZ -giant_move -check_flip \ -tlrc_base "$basedset" \ -tlrc_NL_warp \ -tlrc_NL_warped_dsets \ anatomical_warped/anatQQ.1.nii.gz \ anatomical_warped/anatQQ.1.aff12.1D \ anatomical_warped/anatQQ.1_WARP.nii.gz \ -volreg_align_to MIN_OUTLIER \ -volreg_post_vr_allin yes \ -volreg_pvra_base_index MIN_OUTLIER \ -volreg_align_e2a \ -volreg_tlrc_warp \ -mask_opts_automask -clfrac 0.10 \ -mask_epi_anat yes \ -blur_to_fwhm -blur_size $blur \ -regress_motion_per_run \ -regress_ROI_PC fsvent 3 \ -regress_ROI_PC_per_run fsvent \ -regress_make_corr_vols aeseg fsvent \ -regress_anaticor_fast \ -regress_anaticor_label fswm \ -regress_censor_motion 0.3 \ -regress_censor_outliers 0.1 \ -regress_apply_mot_types demean deriv \ -regress_est_blur_epits \ -regress_est_blur_errts \ -regress_run_clustsim no \ -regress_polort 2 \ -regress_bandpass 0.01 1 \ -html_review_style pythonic We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).

      We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.

      Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.

    • Effect on results

      • From numerous tests on our own analyses, we have qualitatively found that results using our old vs the new afni_proc.py preprocessing pipeline do not change all that much in terms of general spatial patterns. There is, however, an
  14. Water Rights Demand Analysis Methodology Datasets

    • data.ca.gov
    • data.cnra.ca.gov
    • +2more
    csv, xlsx
    Updated Apr 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California State Water Resources Control Board (2022). Water Rights Demand Analysis Methodology Datasets [Dataset]. https://data.ca.gov/dataset/water-rights-demand-analysis-methodology-datasets
    Explore at:
    xlsx(22642), csv(30703430), xlsx(24426), xlsx(20310), xlsx(21845), csv(396514022), csv(20471058), csv(16822525)Available download formats
    Dataset updated
    Apr 7, 2022
    Dataset authored and provided by
    California State Water Resources Control Board
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The following datasets are used for the Water Rights Demand Analysis project and are formatted to be used in the calculations. The State Water Resources Control Board Division of Water Rights (Division) has developed a methodology to standardize and improve the accuracy of water diversion and use data that is used to determine water availability and inform water management and regulatory decisions. The Water Rights Demand Data Analysis Methodology (Methodology https://www.waterboards.ca.gov/drought/drought_tools_methods/demandanalysis.html ) is a series of data pre-processing steps, R Scripts, and data processing modules that identify and help address data quality issues related to both the self-reported water diversion and use data from water right holders or their agents and the Division of Water Rights electronic water rights data.

  15. f

    Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data...

    • frontiersin.figshare.com
    application/cdfv2
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_1_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.doc [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s001
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

  16. Annual Survey of State Government Finances 1992-2018

    • search.datacite.org
    • openicpsr.org
    • +2more
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Kaplan (2021). Annual Survey of State Government Finances 1992-2018 [Dataset]. http://doi.org/10.3886/e101880
    Explore at:
    Dataset updated
    2021
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    DataCitehttps://www.datacite.org/
    Authors
    Jacob Kaplan
    Description

    Version 4 release notes:Changes release notes description, does not change data.Version 3 release notesAdds 2018 data.Renames some columns so all column names are <= 32 characters to fix Stata limit.
    Version 2 release notesAdds 2017 data. R and Stata files now available.

    The .csv file includes data from the years 1992-2016. No data was changed. Only column names were changed to standardize it across years. Some columns (e.g. Population) that are not in all years are removed. Amounts are in thousands of dollars.
    The zip file includes all raw (completely untouched) files for years 1992-2016.

    From the Census, "The Annual Survey of State Government Finances provides a comprehensive summary of the annual survey findings for state governments, as well as data for individual states. The tables contain detail of revenue by source, expenditure by object and function, indebtedness by term, and assets by purpose." (link to this quote is below)

    Information from the U.S. Census about the data is here. https://www.census.gov/programs-surveys/state/about.html

  17. f

    Data repository.

    • plos.figshare.com
    xlsx
    Updated Mar 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kayleigh R. Cook; Zebenay B. Zeleke; Ephrem Gebrehana; Daniel Burssa; Bantalem Yeshanew; Atkilt Michael; Yoseph Tediso; Taylor Jaraczewski; Chris Dodgion; Andualem Beyene; Katherine R. Iverson (2024). Data repository. [Dataset]. http://doi.org/10.1371/journal.pgph.0002600.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 27, 2024
    Dataset provided by
    PLOS Global Public Health
    Authors
    Kayleigh R. Cook; Zebenay B. Zeleke; Ephrem Gebrehana; Daniel Burssa; Bantalem Yeshanew; Atkilt Michael; Yoseph Tediso; Taylor Jaraczewski; Chris Dodgion; Andualem Beyene; Katherine R. Iverson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In 2015, the Ethiopian Federal Ministry of Health (FMOH) developed the Saving Lives through Safe Surgery (SaLTS) initiative to improve national surgical care. Previous work led to development and implementation of 15 surgical key performance indicators (KPIs) to standardize surgical data practices. The objective of this project is to investigate current practices of KPI data collection and assess quality to improve data management and strengthen surgical systems. The first portion of the study documented the surgical data collection process including methods, instruments, and effectiveness at 10 hospitals across 2 regions in Ethiopia. Secondly, data for KPIs of focus [1. Surgical Volume, 2. Perioperative Mortality Rate (POMR), 3. Adverse Anesthetic Outcome (AAO), 4. Surgical Site Infection (SSI), and 5. Safe Surgery Checklist (SSC) Utilization] were compared between registries, KPI reporting forms, and the DHIS2 (district health information system) electronic database for a 6-month period (January—June 2022). Quality was assessed based on data completeness and consistency. The data collection process involved hospital staff recording data elements in registries, quality officers calculating KPIs, completing monthly KPI reporting forms, and submitting data into DHIS2 for the national and regional health bureaus. Data quality verifications revealed discrepancies in consistency at all hospitals, ranging from 1–3 indicators. For all hospitals, average monthly surgical volume was 57 cases, POMR was 0.38% (13/3399), inpatient SSI rate was 0.79% (27/3399), AAO rate was 0.15% (5/3399), and mean SSC utilization monthly was 93% (100% median). Half of the hospitals had incomplete data within the registries, ranging from 2–5 indicators. AAO, SSC, and SSI were commonly missing data in registries. Non-standardized KPI reporting forms contributed significantly to the findings. Facilitators to quality data collection included continued use of registries from previous interventions and use of a separate logbook to document specific KPIs. Delayed rollout of these indicators in each region contributed to issues in data quality. Barriers involved variable indicator recording from different personnel, data collection tools that generate false positives (i.e. completeness of SSC defined as paper form filled out prior to patient discharge) or missing data because of reporting time period (i.e. monthly SSI may miss infections outside of one month), inadequate data elements in registries, and lack of standardized monthly KPI reporting forms. As the FMOH introduces new indicators and changes, we recommend continuous and consistent quality checks and data capacity building, including the use of routinely generated health information for quality improvement projects at the department level.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Palmerini, Luca; Reggi, Luca; Bonci, Tecla; Del Din, Silvia; Micó-Amigo, Encarna; Salis, Francesca; Bertuletti, Stefano; Caruso, Marco; Cereatti, Andrea; Gazit, Eran; Paraschiv-Ionescu, Anisoara; Soltani, Abolfazl; Kluge, Felix; Küderle, Arne; Ullrich, Martin; Kirk, Cameron; Hiden, Hugo; D'Ascanio, Ilaria; Hansen, Clint; Rochester, Lynn; Mazzà, Claudia; Chiari, Lorenzo; on behalf of the Mobilise-D consortium (2022). Example subjects for Mobilise-D data standardization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7185428

Example subjects for Mobilise-D data standardization

Explore at:
Dataset updated
Oct 11, 2022
Dataset provided by
Newcastle University, Translational and Clinical Research Institute, Faculty of Medical Sciences, UK. The Newcastle upon Tyne NHS Foundation Trust, UK.
Newcastle University, Translational and Clinical Research Institute, Faculty of Medical Sciences, UK.
Neurogeriatrics Kiel, Department of Neurology, University Hospital Schleswig-Holstein, Germany.
The University of Sheffield, INSIGNEO Institute for in silico Medicine, UK. The University of Sheffield, Department of Mechanical Engineering, UK
University of Bologna, Department of Electrical, Electronic and Information Engineering 'Guglielmo Marconi', Italy. University of Bologna, Health Sciences and Technologies—Interdepartmental Center for Industrial Research (CIRI-SDV), Italy
Laboratory of Movement Analysis and Measurement, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland.
Newcastle University, School of Computing, UK.
Machine Learning and Data Analytics Lab, Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, Germany.
Tel Aviv Sourasky Medical Center, Center for the Study of Movement, Cognition and Mobility, Neurological Institute, Israel.
University of Sassari, Department of Biomedical Sciences, Italy.
University of Bologna, Health Sciences and Technologies—Interdepartmental Center for Industrial Research (CIRI-SDV), Italy
Politecnico di Torino, Department of Electronics and Telecommunications, Italy. Politecnico di Torino, PolitoBIOMed Lab – Biomedical Engineering Lab, Italy.
https://www.mobilise-d.eu/partners
University of Bologna, Department of Electrical, Electronic and Information Engineering 'Guglielmo Marconi', Italy.
Politecnico di Torino, Department of Electronics and Telecommunications, Italy.
Authors
Palmerini, Luca; Reggi, Luca; Bonci, Tecla; Del Din, Silvia; Micó-Amigo, Encarna; Salis, Francesca; Bertuletti, Stefano; Caruso, Marco; Cereatti, Andrea; Gazit, Eran; Paraschiv-Ionescu, Anisoara; Soltani, Abolfazl; Kluge, Felix; Küderle, Arne; Ullrich, Martin; Kirk, Cameron; Hiden, Hugo; D'Ascanio, Ilaria; Hansen, Clint; Rochester, Lynn; Mazzà, Claudia; Chiari, Lorenzo; on behalf of the Mobilise-D consortium
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Standardized data from Mobilise-D participants (YAR dataset) and pre-existing datasets (ICICLE, MSIPC2, Gait in Lab and real-life settings, MS project, UNISS-UNIGE) are provided in the shared folder, as an example of the procedures proposed in the publication "Mobility recorded by wearable devices and gold standards: the Mobilise-D procedure for data standardization" that is currently under review in Scientific data. Please refer to that publication for further information. Please cite that publication if using these data.

The code to standardize an example subject (for the ICICLE dataset) and to open the standardized Matlab files in other languages (Python, R) is available in github (https://github.com/luca-palmerini/Procedure-wearable-data-standardization-Mobilise-D).

Search
Clear search
Close search
Google apps
Main menu