23 datasets found

f
R and Python libraries for the standardization of data extraction and...
datasetcatalog.nlm.nih.gov
figshare.com
Updated May 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zwiggelaar, Reyer; Spick, Matt; Harrison, Charlie; Suchak, Tulsi; Aliu, Anietie E.; Geifman, Nophar (2025). R and Python libraries for the standardization of data extraction and analysis from NHANES. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002102076
Explore at:
Dataset updated
May 8, 2025
Authors
Zwiggelaar, Reyer; Spick, Matt; Harrison, Charlie; Suchak, Tulsi; Aliu, Anietie E.; Geifman, Nophar
Description
R and Python libraries for the standardization of data extraction and analysis from NHANES.
Z
Example subjects for Mobilise-D data standardization
data.niaid.nih.gov
zenodo.org
Updated Oct 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palmerini, Luca; Reggi, Luca; Bonci, Tecla; Del Din, Silvia; Micó-Amigo, Encarna; Salis, Francesca; Bertuletti, Stefano; Caruso, Marco; Cereatti, Andrea; Gazit, Eran; Paraschiv-Ionescu, Anisoara; Soltani, Abolfazl; Kluge, Felix; Küderle, Arne; Ullrich, Martin; Kirk, Cameron; Hiden, Hugo; D'Ascanio, Ilaria; Hansen, Clint; Rochester, Lynn; Mazzà, Claudia; Chiari, Lorenzo; on behalf of the Mobilise-D consortium (2022). Example subjects for Mobilise-D data standardization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7185428
Explore at:
Dataset updated
Oct 11, 2022
Dataset provided by
https://www.mobilise-d.eu/partners
University of Bologna, Department of Electrical, Electronic and Information Engineering 'Guglielmo Marconi', Italy.
University of Bologna, Health Sciences and Technologies—Interdepartmental Center for Industrial Research (CIRI-SDV), Italy
Laboratory of Movement Analysis and Measurement, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland.
Politecnico di Torino, Department of Electronics and Telecommunications, Italy. Politecnico di Torino, PolitoBIOMed Lab – Biomedical Engineering Lab, Italy.
Newcastle University, Translational and Clinical Research Institute, Faculty of Medical Sciences, UK.
Politecnico di Torino, Department of Electronics and Telecommunications, Italy.
Neurogeriatrics Kiel, Department of Neurology, University Hospital Schleswig-Holstein, Germany.
Tel Aviv Sourasky Medical Center, Center for the Study of Movement, Cognition and Mobility, Neurological Institute, Israel.
Machine Learning and Data Analytics Lab, Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, Germany.
The University of Sheffield, INSIGNEO Institute for in silico Medicine, UK. The University of Sheffield, Department of Mechanical Engineering, UK
University of Bologna, Department of Electrical, Electronic and Information Engineering 'Guglielmo Marconi', Italy. University of Bologna, Health Sciences and Technologies—Interdepartmental Center for Industrial Research (CIRI-SDV), Italy
Newcastle University, Translational and Clinical Research Institute, Faculty of Medical Sciences, UK. The Newcastle upon Tyne NHS Foundation Trust, UK.
University of Sassari, Department of Biomedical Sciences, Italy.
Newcastle University, School of Computing, UK.
Authors
Palmerini, Luca; Reggi, Luca; Bonci, Tecla; Del Din, Silvia; Micó-Amigo, Encarna; Salis, Francesca; Bertuletti, Stefano; Caruso, Marco; Cereatti, Andrea; Gazit, Eran; Paraschiv-Ionescu, Anisoara; Soltani, Abolfazl; Kluge, Felix; Küderle, Arne; Ullrich, Martin; Kirk, Cameron; Hiden, Hugo; D'Ascanio, Ilaria; Hansen, Clint; Rochester, Lynn; Mazzà, Claudia; Chiari, Lorenzo; on behalf of the Mobilise-D consortium
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Standardized data from Mobilise-D participants (YAR dataset) and pre-existing datasets (ICICLE, MSIPC2, Gait in Lab and real-life settings, MS project, UNISS-UNIGE) are provided in the shared folder, as an example of the procedures proposed in the publication "Mobility recorded by wearable devices and gold standards: the Mobilise-D procedure for data standardization" that is currently under review in Scientific data. Please refer to that publication for further information. Please cite that publication if using these data.

The code to standardize an example subject (for the ICICLE dataset) and to open the standardized Matlab files in other languages (Python, R) is available in github (https://github.com/luca-palmerini/Procedure-wearable-data-standardization-Mobilise-D).
Data applied to automatic method to transform routine otolith images for a...
seanoe.org
image/*
Updated 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Andrialovanirina; Alizee Hache; Kelig Mahe; Sébastien Couette; Emilie Poisson Caillault (2022). Data applied to automatic method to transform routine otolith images for a standardized otolith database using R [Dataset]. http://doi.org/10.17882/91023
Explore at:
image/*Available download formats
Unique identifier
https://doi.org/10.17882/91023
Dataset updated
2022
Dataset provided by
SEANOE
Authors
Nicolas Andrialovanirina; Alizee Hache; Kelig Mahe; Sébastien Couette; Emilie Poisson Caillault
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
fisheries management is generally based on age structure models. thus, fish ageing data are collected by experts who analyze and interpret calcified structures (scales, vertebrae, fin rays, otoliths, etc.) according to a visual process. the otolith, in the inner ear of the fish, is the most commonly used calcified structure because it is metabolically inert and historically one of the first proxies developed. it contains information throughout the whole life of the fish and provides age structure data for stock assessments of all commercial species. the traditional human reading method to determine age is very time-consuming. automated image analysis can be a low-cost alternative method, however, the first step is the transformation of routinely taken otolith images into standardized images within a database to apply machine learning techniques on the ageing data. otolith shape, resulting from the synthesis of genetic heritage and environmental effects, is a useful tool to identify stock units, therefore a database of standardized images could be used for this aim. using the routinely measured otolith data of plaice (pleuronectes platessa; linnaeus, 1758) and striped red mullet (mullus surmuletus; linnaeus, 1758) in the eastern english channel and north-east arctic cod (gadus morhua; linnaeus, 1758), a greyscale images matrix was generated from the raw images in different formats. contour detection was then applied to identify broken otoliths, the orientation of each otolith, and the number of otoliths per image. to finalize this standardization process, all images were resized and binarized. several mathematical morphology tools were developed from these new images to align and to orient the images, placing the otoliths in the same layout for each image. for this study, we used three databases from two different laboratories using three species (cod, plaice and striped red mullet). this method was approved to these three species and could be applied for others species for age determination and stock identification.
Meta data and supporting documentation
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thilde Terkelsen; Anders Krogh; Elena Papaleo (2023). CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for the analysis of quantitative biological data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007665
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1007665
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Thilde Terkelsen; Anders Krogh; Elena Papaleo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the improvement of -omics and next-generation sequencing (NGS) methodologies, along with the lowered cost of generating these types of data, the analysis of high-throughput biological data has become standard both for forming and testing biomedical hypotheses. Our knowledge of how to normalize datasets to remove latent undesirable variances has grown extensively, making for standardized data that are easily compared between studies. Here we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper (https://github.com/ELELAB/CAncer-bioMarker-Prediction-Pipeline -CAMPP) intended to aid bioinformatic software-users with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. To avoid issues relating to R-package updates, a renv .lock file is provided to ensure R-package stability. Data-management includes missing value imputation, data normalization, and distributional checks. CAMPP performs (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis, and (VI) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist in streamlining bioinformatic analysis of quantitative biological data, whilst ensuring an appropriate bio-statistical framework.
Simulation Data Set
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

SMDG, A Standardized Fundus Glaucoma Dataset

kaggle.com

zip

Updated Apr 23, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Riley Kiefer (2023). SMDG, A Standardized Fundus Glaucoma Dataset [Dataset]. https://www.kaggle.com/datasets/deathtrooper/multichannel-glaucoma-benchmark-dataset/code

Explore at:

zip(3144020550 bytes)Available download formats

Dataset updated

Apr 23, 2023

Authors

Riley Kiefer

Description

Standardized Multi-Channel Dataset for Glaucoma (SMDG-19), a standardization of 19 public glaucoma datasets for AI applications.

Standardized Multi-Channel Dataset for Glaucoma (SMDG-19) is a collection and standardization of 19 public datasets, comprised of full-fundus glaucoma images, associated image metadata like, optic disc segmentation, optic cup segmentation, blood vessel segmentation, and any provided per-instance text metadata like sex and age. This dataset is designed to be exploratory and open-ended with multiple use cases and no established training/validation/test cases. This dataset is the largest public repository of fundus images with glaucoma.

Citation

Please cite at least the first work in academic publications: 1. Kiefer, Riley, et al. "A Catalog of Public Glaucoma Datasets for Machine Learning Applications: A detailed description and analysis of public glaucoma datasets available to machine learning engineers tackling glaucoma-related problems using retinal fundus images and OCT images." Proceedings of the 2023 7th International Conference on Information System and Data Mining. 2023. 2. R. Kiefer, M. Abid, M. R. Ardali, J. Steen and E. Amjadian, "Automated Fundus Image Standardization Using a Dynamic Global Foreground Threshold Algorithm," 2023 8th International Conference on Image, Vision and Computing (ICIVC), Dalian, China, 2023, pp. 460-465, doi: 10.1109/ICIVC58118.2023.10270429. 3. Kiefer, Riley, et al. "A Catalog of Public Glaucoma Datasets for Machine Learning Applications: A detailed description and analysis of public glaucoma datasets available to machine learning engineers tackling glaucoma-related problems using retinal fundus images and OCT images." Proceedings of the 2023 7th International Conference on Information System and Data Mining. 2023. 4. R. Kiefer, J. Steen, M. Abid, M. R. Ardali and E. Amjadian, "A Survey of Glaucoma Detection Algorithms using Fundus and OCT Images," 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 2022, pp. 0191-0196, doi: 10.1109/IEMCON56893.2022.9946629.

Please also see the following optometry abstract publications: 1. A Comprehensive Survey of Publicly Available Glaucoma Datasets for Automated Glaucoma Detection; AAO 2022; https://aaopt.org/past-meeting-abstract-archives/?SortBy=ArticleYear&ArticleType=&ArticleYear=2022&Title=&Abstract=&Authors=&Affiliation=&PROGRAMNUMBER=225129 2. Standardized and Open-Access Glaucoma Dataset for Artificial Intelligence Applications; ARVO 2023; https://iovs.arvojournals.org/article.aspx?articleid=2790420 3. Ground truth validation of publicly available datasets utilized in artificial intelligence models for glaucoma detection; ARVO 2023; https://iovs.arvojournals.org/article.aspx?articleid=2791017

Please also see the DOI citations for this and related datasets: 1. SMDG; @dataset{smdg, title={SMDG, A Standardized Fundus Glaucoma Dataset}, url={https://www.kaggle.com/ds/2329670}, DOI={10.34740/KAGGLE/DS/2329670}, publisher={Kaggle}, author={Riley Kiefer}, year={2023} } 2. EyePACS-light-v1 @dataset{eyepacs-light-v1, title={Glaucoma Dataset: EyePACS AIROGS - Light}, url={https://www.kaggle.com/ds/3222646}, DOI={10.34740/KAGGLE/DS/3222646}, publisher={Kaggle}, author={Riley Kiefer}, year={2023} } 3. EyePACS-light-v2 @dataset{eyepacs-light-v2, title={Glaucoma Dataset: EyePACS-AIROGS-light-V2}, url={https://www.kaggle.com/dsv/7300206}, DOI={10.34740/KAGGLE/DSV/7300206}, publisher={Kaggle}, author={Riley Kiefer}, year={2023} }

Dataset Objective

The objective of this dataset is a machine learning-ready dataset for glaucoma-related applications. Using the help of the community, new open-source glaucoma datasets will be reviewed for standardization and inclusion in this dataset.

Data Standardization

Full fundus images (and corresponding segmentation maps) are standardized using a novel algorithm (Citation 1) by cropping the background, centering the fundus image, padding missing information, and resizing to 512x512 pixels. This standardization ensures that the most amount of foreground information is prevalent during the resizing process for machine-learning-ready image processing.
Each available metadata text is standardized by provided each fundus image as a row and each fundus attribute as a column in a CSV file

Dataset Instance	Original Fundus	Standardized Fundus Image
sjchoi86-HRF	https://user-images.githubusercontent.com/65875562/204170005-2d4dd051-0032-40c8-ba0b-390b6080bb69.png">	https://user-images.githubusercontent.com/65875562/204170011-51b7d001-4d43-4f0d-835e-984d45116b18.png">
BEH	https://user-images.githubusercontent.com/65875562/211052753-93f8a3aa-cc65-4790-8da6-229f512a6afb.PNG">	<img src="htt...

f
Proteomics Wants cRacker: Automated Standardized Data Analysis of LC–MS...
acs.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Henrik Zauber; Waltraud X. Schulze (2023). Proteomics Wants cRacker: Automated Standardized Data Analysis of LC–MS Derived Proteomic Data [Dataset]. http://doi.org/10.1021/pr300413v.s002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1021/pr300413v.s002
Dataset updated
Jun 3, 2023
Dataset provided by
ACS Publications
Authors
Henrik Zauber; Waltraud X. Schulze
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The large-scale analysis of thousands of proteins under various experimental conditions or in mutant lines has gained more and more importance in hypothesis-driven scientific research and systems biology in the past years. Quantitative analysis by large scale proteomics using modern mass spectrometry usually results in long lists of peptide ion intensities. The main interest for most researchers, however, is to draw conclusions on the protein level. Postprocessing and combining peptide intensities of a proteomic data set requires expert knowledge, and the often repetitive and standardized manual calculations can be time-consuming. The analysis of complex samples can result in very large data sets (lists with several 1000s to 100 000 entries of different peptides) that cannot easily be analyzed using standard spreadsheet programs. To improve speed and consistency of the data analysis of LC–MS derived proteomic data, we developed cRacker. cRacker is an R-based program for automated downstream proteomic data analysis including data normalization strategies for metabolic labeling and label free quantitation. In addition, cRacker includes basic statistical analysis, such as clustering of data, or ANOVA and t tests for comparison between treatments. Results are presented in editable graphic formats and in list files.
n
Data from: WiBB: An integrated method for quantifying the relative...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
zip
Updated Aug 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qin Li; Xiaojun Kou (2021). WiBB: An integrated method for quantifying the relative importance of predictive variables [Dataset]. http://doi.org/10.5061/dryad.xsj3tx9g1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.xsj3tx9g1
Dataset updated
Aug 20, 2021
Dataset provided by
Field Museum of Natural History
Beijing Normal University
Authors
Qin Li; Xiaojun Kou
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
This dataset contains simulated datasets, empirical data, and R scripts described in the paper: “Li, Q. and Kou, X. (2021) WiBB: An integrated method for quantifying the relative importance of predictive variables. Ecography (DOI: 10.1111/ecog.05651)”.

A fundamental goal of scientific research is to identify the underlying variables that govern crucial processes of a system. Here we proposed a new index, WiBB, which integrates the merits of several existing methods: a model-weighting method from information theory (Wi), a standardized regression coefficient method measured by ß* (B), and bootstrap resampling technique (B). We applied the WiBB in simulated datasets with known correlation structures, for both linear models (LM) and generalized linear models (GLM), to evaluate its performance. We also applied two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate their performance in comparison with the WiBB method on ranking predictor importances under various scenarios. We also applied it to an empirical dataset in a plant genus Mimulus to select bioclimatic predictors of species’ presence across the landscape. Results in the simulated datasets showed that the WiBB method outperformed the ß* and SWi methods in scenarios with small and large sample sizes, respectively, and that the bootstrap resampling technique significantly improved the discriminant ability. When testing WiBB in the empirical dataset with GLM, it sensibly identified four important predictors with high credibility out of six candidates in modeling geographical distributions of 71 Mimulus species. This integrated index has great advantages in evaluating predictor importance and hence reducing the dimensionality of data, without losing interpretive power. The simplicity of calculation of the new metric over more sophisticated statistical procedures, makes it a handy method in the statistical toolbox.

Methods To simulate independent datasets (size = 1000), we adopted Galipaud et al.’s approach (2014) with custom modifications of the data.simulation function, which used the multiple normal distribution function rmvnorm in R package mvtnorm(v1.0-5, Genz et al. 2016). Each dataset was simulated with a preset correlation structure between a response variable (y) and four predictors(x1, x2, x3, x4). The first three (genuine) predictors were set to be strongly, moderately, and weakly correlated with the response variable, respectively (denoted by large, medium, small Pearson correlation coefficients, r), while the correlation between the response and the last (spurious) predictor was set to be zero. We simulated datasets with three levels of differences of correlation coefficients of consecutive predictors, where ∆r = 0.1, 0.2, 0.3, respectively. These three levels of ∆r resulted in three correlation structures between the response and four predictors: (0.3, 0.2, 0.1, 0.0), (0.6, 0.4, 0.2, 0.0), and (0.8, 0.6, 0.3, 0.0), respectively. We repeated the simulation procedure 200 times for each of three preset correlation structures (600 datasets in total), for LM fitting later. For GLM fitting, we modified the simulation procedures with additional steps, in which we converted the continuous response into binary data O (e.g., occurrence data having 0 for absence and 1 for presence). We tested the WiBB method, along with two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate the ability to correctly rank predictor importances under various scenarios. The empirical dataset of 71 Mimulus species was collected by their occurrence coordinates and correponding values extracted from climatic layers from WorldClim dataset (www.worldclim.org), and we applied the WiBB method to infer important predictors for their geographical distributions.
n
Method-Naming-Standards-Survey-Dataset
narcis.nl
data.mendeley.com
+1more
Updated Jan 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alsuhaibani, R (via Mendeley Data) (2021). Method-Naming-Standards-Survey-Dataset [Dataset]. http://doi.org/10.17632/5d7vx88sph.1
Explore at:
Unique identifier
https://doi.org/10.17632/5d7vx88sph.1
Dataset updated
Jan 25, 2021
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Alsuhaibani, R (via Mendeley Data)
Description
This dataset includes the following files:

A pdf file containing the method naming standards survey questions we used in Qualtrics for surveying professional developers. The file contains the Likert scale questions and source code examples used in the survey.

A CSV file containing professional developers responses to the Likert scale questions and their feedback about each method naming standard, as well as their answers to the demographic questions.

A pdf copy of the survey paper (Preprint).

Survey Paper Citation: Alsuhaibani, R., Newman, C., Decker, M., Collard, M.L., Maletic, J.I., "On the Naming of Methods: A Survey of Professional Developers", in the Proceedings of the 43rd International Conference on Software Engineering (ICSE), Madrid Spain, May 25 - 28, 2021, 12 pages
Data and Code for "Meta-analysis of variation suggests that embracing...
figshare.com
zip
Updated May 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Takuji Usui; Alistair McNair Senior; Malcolm Macleod; Sarah K. McCann; Shinichi Nakagawa (2021). Data and Code for "Meta-analysis of variation suggests that embracing variability improves both replicability and generalizability in preclinical research" [Dataset]. http://doi.org/10.6084/m9.figshare.14527317.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14527317.v4
Dataset updated
May 2, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Takuji Usui; Alistair McNair Senior; Malcolm Macleod; Sarah K. McCann; Shinichi Nakagawa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository is for: "Meta-analysis of variation suggests that embracing variability improves both replicability and generalizability in preclinical research".1. Main analysis folder contains data (.rds) and R code (.R) for meta-regressions of lnCV, lnRR and lnCVR contained within the main manuscript (Figs 1-3).2. Supplementary folder contains data (.rds) and R code (.R) for: i) second-order meta-regrssion of lnH; ii) arm-based meta-regression of lnSD; iii) sensitivity analyses of lnCV, lnRR and lnCVR; and iv) raw data for plotting the mean-variance relationship.
GEO ExpressionMatrixHandlingNormalization GSE32138
kaggle.com
zip
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Nagendra (2025). GEO ExpressionMatrixHandlingNormalization GSE32138 [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/geo-expressionmatrixhandlingnormalization-gse32138
Explore at:
zip(8536153 bytes)Available download formats
Dataset updated
Nov 29, 2025
Authors
Dr. Nagendra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
• This dataset contains expression matrix handling and normalization results derived from GEO dataset GSE32138. • It includes raw gene expression values processed using standardized bioinformatics workflows. • The dataset demonstrates quantile normalization applied to microarray-based expression data. • It provides visualization outputs used to assess data distribution before and after normalization. • The goal of this dataset is to support reproducible analysis of GSE32138 preprocessing and quality control. • Researchers can use the files for practice in normalization, exploratory data analysis, and visualization. • This dataset is useful for learning microarray preprocessing techniques in R or Python.
Data Sheet 4_Italian standardization of the BPSD-SINDEM scale for the...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated Nov 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federico Emanuele Pozzi; Fabrizia D'Antonio; Marta Zuffi; Oriana Pelati; Davide Vernè; Massimiliano Panigutti; Margherita Alberoni; Maria Grazia Di Maggio; Alfredo Costa; Sindem BPSD Study Group; Lucio Tremolizzo; Elisabetta Farina (2024). Data Sheet 4_Italian standardization of the BPSD-SINDEM scale for the assessment of neuropsychiatric symptoms in persons with dementia.docx [Dataset]. http://doi.org/10.3389/fneur.2024.1455787.s005
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fneur.2024.1455787.s005
Dataset updated
Nov 21, 2024
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Federico Emanuele Pozzi; Fabrizia D'Antonio; Marta Zuffi; Oriana Pelati; Davide Vernè; Massimiliano Panigutti; Margherita Alberoni; Maria Grazia Di Maggio; Alfredo Costa; Sindem BPSD Study Group; Lucio Tremolizzo; Elisabetta Farina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionBehavioral and Psychological Symptoms of Dementia (BPSD) are a heterogeneous set of psychological reactions and abnormal behaviors in people with dementia (PwD). Current assessment tools, like the Neuropsychiatric Inventory (NPI), only rely on caregiver assessment of BPSD and are therefore prone to bias.Materials and methodsA multidisciplinary team developed the BPSD-SINDEM scale as a three-part instrument, with two questionnaires administered to the caregiver (evaluating BPSD extent and caregiver distress) and a clinician-rated observational scale. This first instrument was tested on a sample of 33 dyads of PwD and their caregivers, and the results were qualitatively appraised in order to revise the tool through a modified Delphi method. During this phase, the wording of the questions was slightly changed, and the distress scale was changed into a coping scale based on the high correlation between extent and distress (r = 0.94). The final version consisted of three 17-item subscales, evaluating BPSD extent and caregiver coping, and the unchanged clinician-rated observational scale.ResultsThis tool was quantitatively validated in a sample of 208 dyads. It demonstrated good concurrent validity, with the extent subscale correlating positively with NPI scores (r = 0.64, p
User modifiable specifications read by R to create interactive graphics...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meridith Blevins; Firas H. Wehbe; Peter F. Rebeiro; Yanink Caro-Vega; Catherine C. McGowan; Bryan E. Shepherd (2023). User modifiable specifications read by R to create interactive graphics referred to as Panel 1, Panels 2–3, and Maps 1–2. [Dataset]. http://doi.org/10.1371/journal.pone.0151201.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0151201.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Meridith Blevins; Firas H. Wehbe; Peter F. Rebeiro; Yanink Caro-Vega; Catherine C. McGowan; Bryan E. Shepherd
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
User modifiable specifications read by R to create interactive graphics referred to as Panel 1, Panels 2–3, and Maps 1–2.
f
BIEN data validation and standardization tools.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bradley L. Boyle; Brian S. Maitner; George G. C. Barbosa; Rohith K. Sajja; Xiao Feng; Cory Merow; Erica A. Newman; Daniel S. Park; Patrick R. Roehrdanz; Brian J. Enquist (2023). BIEN data validation and standardization tools. [Dataset]. http://doi.org/10.1371/journal.pone.0268162.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0268162.t001
Dataset updated
Jun 5, 2023
Dataset provided by
PLOS ONE
Authors
Bradley L. Boyle; Brian S. Maitner; George G. C. Barbosa; Rohith K. Sajja; Xiao Feng; Cory Merow; Erica A. Newman; Daniel S. Park; Patrick R. Roehrdanz; Brian J. Enquist
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BIEN data validation and standardization tools.
Z
soilmap_simple: a simplified and standardized derivative of the digital soil...
data.niaid.nih.gov
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vanderhaeghe, Floris; De Vos, Bruno; Cools, Nathalie (2025). soilmap_simple: a simplified and standardized derivative of the digital soil map of the Flemish Region [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3732903
Explore at:
Dataset updated
Mar 24, 2025
Dataset provided by
Research Institute for Nature and Forest (INBO)
Authors
Vanderhaeghe, Floris; De Vos, Bruno; Cools, Nathalie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Flanders, Flemish Region
Description
The data source soilmap_simple is a simplified and standardized derived form of the 'digital soil map of the Flemish Region' (the shapefile of which we named soilmap, for analytical workflows in R) published by 'Databank Ondergrond Vlaanderen’ (DOV). It is a GeoPackage that contains a spatial polygon layer ‘soilmap_simple’ in the Belgian Lambert 72 coordinate reference system (EPSG-code 31370), plus a non-spatial table ‘explanations’ with the meaning of category codes that occur in the spatial layer. Further documentation about the digital soil map of the Flemish Region is available in Van Ranst & Sys (2000) and Dudal et al. (2005).

This version of soilmap_simple was derived from version 'soilmap_2017-06-20' (Zenodo DOI) as follows:

all attribute variables received English names (purpose of standardization), starting with prefix bsm_ (referring to the 'Belgian soil map');

attribute variables were reordered;

the values of the morphogenetic substrate, texture and drainage variables (bsm_mo_substr, bsm_mo_tex and bsm_mo_drain + their _explan counterparts) were filled for most features in the 'coastal plain' area.

To derive morphogenetic texture and drainage levels from the geomorphological soil types, a conversion table by Bruno De Vos & Carole Ampe was applied (for earlier work on this, see Ampe 2013).

Substrate classes were copied over from bsm_ge_substr into bsm_mo_substr (bsm_ge_substr already followed the categories of bsm_mo_substr).

These steps coincide with the approach that had been taken to construct the Unitype variable in the soilmap data source;

only a minimal number of variables were selected: those that are most useful for analytical work.

See R-code in the GitHub repository 'n2khab-preprocessing' at commit b3c6696 for the creation from the soilmap data source.

A reading function to return soilmap_simple (this data source) or soilmap in a standardized way into the R environment is provided by the R-package n2khab.

The attributes of the spatial polygon layer soilmap_simple can have mo_ in their name to refer to the Belgian Morphogenetic System:

bsm_poly_id: unique polygon ID (numeric)

bsm_region: name of the region

bsm_converted: boolean. Were morphogenetic texture and drainage variables (bsm_mo_tex and bsm_mo_drain) derived from a conversion table (see above)? Value TRUE is largely confined to the 'coastal plain' areas.

bsm_mo_soilunitype: code of the soil type (applying morphogenetic codes within the coastal plain areas when possible, just as for the following three variables)

bsm_mo_substr: code of the soil substrate

bsm_mo_tex: code of the soil texture category

bsm_mo_drain: code of the soil drainage category

bsm_mo_prof: code of the soil profile category

bsm_mo_parentmat: code of a variant regarding the parent material

bsm_mo_profvar: code of a variant regarding the soil profile

The non-spatial table explanations has following variables:

subject: attribute name of the spatial layer: either bsm_mo_substr, bsm_mo_tex, bsm_mo_drain, bsm_mo_prof, bsm_mo_parentmat or bsm_mo_profvar

code: category code that occurs as value for the corresponding attribute in the spatial layer

name: explanation of the value of code
f
Description of the general genotyping sheet variables.
plos.figshare.com
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matteo Di Bernardo; Timothy A. Crombie; Daniel E. Cook; Erik C. Andersen (2023). Description of the general genotyping sheet variables. [Dataset]. http://doi.org/10.1371/journal.pone.0254293.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0254293.t002
Dataset updated
Jun 9, 2023
Dataset provided by
PLOS ONE
Authors
Matteo Di Bernardo; Timothy A. Crombie; Daniel E. Cook; Erik C. Andersen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description of the general genotyping sheet variables.
f
Correlations coefficients (r) for six horticultural traits and nine...
datasetcatalog.nlm.nih.gov
Updated Jul 16, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
van Eeuwijk, Fred A.; van Bueren, Edith T. Lammerts; Myers, James R.; Paulo, Maria João; Zhu, Ning; Renaud, Erica N. C.; Juvik, John A. (2014). Correlations coefficients (r) for six horticultural traits and nine phytochemicals, calculated using data standardized across trials. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001170148
Explore at:
Dataset updated
Jul 16, 2014
Authors
van Eeuwijk, Fred A.; van Bueren, Edith T. Lammerts; Myers, James R.; Paulo, Maria João; Zhu, Ning; Renaud, Erica N. C.; Juvik, John A.
Description
Correlation results include means from 23 cultivars, across eight pair combinations of location (Maine/Oregon), season (Fall/Spring) and management system (Conventional/Organic), 2006–2008a.aFor empty cells, r is not significantly different from zero (P<0.05).
n
Data from: Standardizing protocols for determining the cause of mortality in...
data.niaid.nih.gov
datasetcatalog.nlm.nih.gov
+1more
zip
Updated Jun 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bogdan Cristescu; Mark Elbroch; Tavis Forrester; Maximilian Allen; Derek Spitz; Christopher Wilmers; Heiko Wittmer (2022). Standardizing protocols for determining the cause of mortality in wildlife studies [Dataset]. http://doi.org/10.7291/D1GD50
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.7291/D1GD50
Dataset updated
Jun 22, 2022
Dataset provided by
University of Illinois Urbana-Champaign
Victoria University of Wellington
Oregon Department of Fish and Wildlife
Panthera Corporation
University of California, Santa Cruz
Authors
Bogdan Cristescu; Mark Elbroch; Tavis Forrester; Maximilian Allen; Derek Spitz; Christopher Wilmers; Heiko Wittmer
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Mortality site investigations of telemetered wildlife are important for cause-specific survival analyses and understanding underlying causes of observed population dynamics. Yet eroding ecoliteracy and a lack of quality control in data collection can lead researchers to make incorrect conclusions, which may negatively impact management decisions for wildlife populations. We reviewed a random sample of 50 peer-reviewed studies published between 2000 and 2019 on survival and cause-specific mortality of ungulates monitored with telemetry devices. This concise review revealed extensive variation in reporting of field procedures, with many studies omitting critical information for cause of mortality inference. Field protocols used to investigate mortality sites and ascertain the cause of mortality are often minimally described and frequently fail to address how investigators dealt with uncertainty. We outline a step-by-step procedure for mortality site investigations of telemetered ungulates, including evidence that should be documented in the field. Specifically, we highlight data that can be useful to differentiate predation from scavenging and more conclusively identify the predator species that killed the ungulate. We also outline how uncertainty in identifying the cause of mortality could be acknowledged and reported. We demonstrate the importance of rigorous protocols and prompt site investigations using data from our 5-year study on survival and cause-specific mortality of telemetered mule deer (Odocoileus hemionus) in northern California. Over the course of our study, we visited mortality sites of neonates (n = 91) and adults (n = 23) to ascertain the cause of mortality. Rapid site visitations significantly improved the successful identification of the cause of mortality and confidence levels for neonates. We discuss the need for rigorous and standardized protocols that include measures of confidence for mortality site investigations. We invite reviewers and journal editors to encourage authors to provide supportive information associated with the identification of causes of mortality, including uncertainty. Methods Three datasets on neonate and adult mule deer (Odocoileus hemionus) mortality site investigations were generated through ecological fieldwork in northern California, USA (2015-2020). The datasets in Dryad are: Does.csv (for use with R); Fawns.csv (for use with R); Full_data.xlsx (which combines the 2 .csv files and includes additional information) Two R code files associated with the 2 .csv datasets above are available in Zenodo: RScript_Does.R; RScript_Fawns.R The data were analyzed using RStudio v.1.1.447 and a variety of packages, including: broom, caret, ciTools, effects, lattice, modEvA, nnet, and tidyverse. The data are associated with the publication "Standardizing protocols for determining the cause of mortality in wildlife studies" in Ecology and Evolution.
Z
The structure of the initial data for modeling tree mortality using logistic...
data.niaid.nih.gov
Updated Apr 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Kachaev (2021). The structure of the initial data for modeling tree mortality using logistic regression models in the R [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4659648
Explore at:
Dataset updated
Apr 5, 2021
Authors
Alexander Kachaev
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A variant of the description of the data structure for modeling tree mortality is given in (Kachaev, 2020).

Time series of primary measurements of tree rings are directly used to compute logistic regression models for tree mortality (Cailleret, et al., 2016).

To expand the number of variables in modeling tree mortality, we introduce derived time series: calculated as a result of tree-ring standardization methods (Bunn, 2010) and empirical mode decompositions (Donghoh and Hee-Seok, 2018).

There are currently six standardization methods available in the dplR library (Bunn, 2010). These methods are: smoothing spline - Spline, modified negative exponential curve - ModNegExp, mean - Mean, model residuals AR - Ar, Friedman smoothing - Friedman and modified Hugershoff curve - ModHugershoff. Standardized time series are inserted into the tree data structure with the addition of the "Tdetr" object: ["Spline", "ModNegExp", "Mean", "Ar", "Friedman", "ModHugershoff"].

The empirical mode decomposition method is implemented in the EMD library (Donghoh and Hee-Seok, 2018). The algorithm decomposes the original time series into a set of time series IMFn (empirical modes) plus the residual series. The total sum of the empirical modes with the residual series gives the original series. The set of time series (empirical modes with a residual series) is inserted into the tree data structure with the addition of the "Temd" object: ["imf1", "imf2", "imf3", "imf4", "res", "low "," high "]. Let's denote the original series as Series, then low = Series- (imf1 + imf2) and high = Series- (imf3 + imf4), these are the series obtained as a result of low-frequency and high-frequency filtering of the original series.

References:

Bunn A.G. (2010). Statistical and visual crossdating in R using the dplR library." Dendrochronologia, 28(4), 251{258. ISSN 11257865. doi:10.1016/j.dendro.2009.12.001. URL http://linkinghub.elsevier.com/retrieve/pii/S1125786510000172.

Cailleret, Maxime et al. (2016), Data from: Towards a common methodology for developing logistic tree mortality models based on ring-width data, Dryad, Dataset, https://doi.org/10.5061/dryad.1bv6n

Donghoh Kim and Hee-Seok Oh (2018) EMD: Empirical Mode Decomposition and Hilbert Spectral Analysis. R package version 1.5.8.

Kachaev, Alexander (2020), “Tree ring growth data in Json format for the development of logistic tree mortality models.”, Mendeley Data, V1, doi: 10.17632/3vht95njg3.1

Facebook

Twitter

Click to copy link

Link copied

Cite

Zwiggelaar, Reyer; Spick, Matt; Harrison, Charlie; Suchak, Tulsi; Aliu, Anietie E.; Geifman, Nophar (2025). R and Python libraries for the standardization of data extraction and analysis from NHANES. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002102076

R and Python libraries for the standardization of data extraction and analysis from NHANES.

Explore at:

Dataset updated

May 8, 2025

Authors

Zwiggelaar, Reyer; Spick, Matt; Harrison, Charlie; Suchak, Tulsi; Aliu, Anietie E.; Geifman, Nophar

Description

R and Python libraries for the standardization of data extraction and analysis from NHANES.

Clear search

Close search

Google apps

Main menu

R and Python libraries for the standardization of data extraction and...

Example subjects for Mobilise-D data standardization

Data applied to automatic method to transform routine otolith images for a...

Meta data and supporting documentation

CAncer bioMarker Prediction Pipeline (CAMPP)—A standardized framework for...

Simulation Data Set

SMDG, A Standardized Fundus Glaucoma Dataset

Standardized Multi-Channel Dataset for Glaucoma (SMDG-19), a standardization of 19 public glaucoma datasets for AI applications.

Citation

Dataset Objective

Data Standardization

Proteomics Wants cRacker: Automated Standardized Data Analysis of LC–MS...

Data from: WiBB: An integrated method for quantifying the relative...

Method-Naming-Standards-Survey-Dataset

Data and Code for "Meta-analysis of variation suggests that embracing...

GEO ExpressionMatrixHandlingNormalization GSE32138

Data Sheet 4_Italian standardization of the BPSD-SINDEM scale for the...

User modifiable specifications read by R to create interactive graphics...

BIEN data validation and standardization tools.

soilmap_simple: a simplified and standardized derivative of the digital soil...

Description of the general genotyping sheet variables.

Correlations coefficients (r) for six horticultural traits and nine...

Data from: Standardizing protocols for determining the cause of mortality in...

The structure of the initial data for modeling tree mortality using logistic...

R and Python libraries for the standardization of data extraction and analysis from NHANES.