Facebook
TwitterR and Python libraries for the standardization of data extraction and analysis from NHANES.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Standardized data from Mobilise-D participants (YAR dataset) and pre-existing datasets (ICICLE, MSIPC2, Gait in Lab and real-life settings, MS project, UNISS-UNIGE) are provided in the shared folder, as an example of the procedures proposed in the publication "Mobility recorded by wearable devices and gold standards: the Mobilise-D procedure for data standardization" that is currently under review in Scientific data. Please refer to that publication for further information. Please cite that publication if using these data.
The code to standardize an example subject (for the ICICLE dataset) and to open the standardized Matlab files in other languages (Python, R) is available in github (https://github.com/luca-palmerini/Procedure-wearable-data-standardization-Mobilise-D).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
fisheries management is generally based on age structure models. thus, fish ageing data are collected by experts who analyze and interpret calcified structures (scales, vertebrae, fin rays, otoliths, etc.) according to a visual process. the otolith, in the inner ear of the fish, is the most commonly used calcified structure because it is metabolically inert and historically one of the first proxies developed. it contains information throughout the whole life of the fish and provides age structure data for stock assessments of all commercial species. the traditional human reading method to determine age is very time-consuming. automated image analysis can be a low-cost alternative method, however, the first step is the transformation of routinely taken otolith images into standardized images within a database to apply machine learning techniques on the ageing data. otolith shape, resulting from the synthesis of genetic heritage and environmental effects, is a useful tool to identify stock units, therefore a database of standardized images could be used for this aim. using the routinely measured otolith data of plaice (pleuronectes platessa; linnaeus, 1758) and striped red mullet (mullus surmuletus; linnaeus, 1758) in the eastern english channel and north-east arctic cod (gadus morhua; linnaeus, 1758), a greyscale images matrix was generated from the raw images in different formats. contour detection was then applied to identify broken otoliths, the orientation of each otolith, and the number of otoliths per image. to finalize this standardization process, all images were resized and binarized. several mathematical morphology tools were developed from these new images to align and to orient the images, placing the otoliths in the same layout for each image. for this study, we used three databases from two different laboratories using three species (cod, plaice and striped red mullet). this method was approved to these three species and could be applied for others species for age determination and stock identification.
Facebook
TwitterWe include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the improvement of -omics and next-generation sequencing (NGS) methodologies, along with the lowered cost of generating these types of data, the analysis of high-throughput biological data has become standard both for forming and testing biomedical hypotheses. Our knowledge of how to normalize datasets to remove latent undesirable variances has grown extensively, making for standardized data that are easily compared between studies. Here we present the CAncer bioMarker Prediction Pipeline (CAMPP), an open-source R-based wrapper (https://github.com/ELELAB/CAncer-bioMarker-Prediction-Pipeline -CAMPP) intended to aid bioinformatic software-users with data analyses. CAMPP is called from a terminal command line and is supported by a user-friendly manual. The pipeline may be run on a local computer and requires little or no knowledge of programming. To avoid issues relating to R-package updates, a renv .lock file is provided to ensure R-package stability. Data-management includes missing value imputation, data normalization, and distributional checks. CAMPP performs (I) k-means clustering, (II) differential expression/abundance analysis, (III) elastic-net regression, (IV) correlation and co-expression network analyses, (V) survival analysis, and (VI) protein-protein/miRNA-gene interaction networks. The pipeline returns tabular files and graphical representations of the results. We hope that CAMPP will assist in streamlining bioinformatic analysis of quantitative biological data, whilst ensuring an appropriate bio-statistical framework.
Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterStandardized Multi-Channel Dataset for Glaucoma (SMDG-19) is a collection and standardization of 19 public datasets, comprised of full-fundus glaucoma images, associated image metadata like, optic disc segmentation, optic cup segmentation, blood vessel segmentation, and any provided per-instance text metadata like sex and age. This dataset is designed to be exploratory and open-ended with multiple use cases and no established training/validation/test cases. This dataset is the largest public repository of fundus images with glaucoma.
Please cite at least the first work in academic publications: 1. Kiefer, Riley, et al. "A Catalog of Public Glaucoma Datasets for Machine Learning Applications: A detailed description and analysis of public glaucoma datasets available to machine learning engineers tackling glaucoma-related problems using retinal fundus images and OCT images." Proceedings of the 2023 7th International Conference on Information System and Data Mining. 2023. 2. R. Kiefer, M. Abid, M. R. Ardali, J. Steen and E. Amjadian, "Automated Fundus Image Standardization Using a Dynamic Global Foreground Threshold Algorithm," 2023 8th International Conference on Image, Vision and Computing (ICIVC), Dalian, China, 2023, pp. 460-465, doi: 10.1109/ICIVC58118.2023.10270429. 3. Kiefer, Riley, et al. "A Catalog of Public Glaucoma Datasets for Machine Learning Applications: A detailed description and analysis of public glaucoma datasets available to machine learning engineers tackling glaucoma-related problems using retinal fundus images and OCT images." Proceedings of the 2023 7th International Conference on Information System and Data Mining. 2023. 4. R. Kiefer, J. Steen, M. Abid, M. R. Ardali and E. Amjadian, "A Survey of Glaucoma Detection Algorithms using Fundus and OCT Images," 2022 IEEE 13th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 2022, pp. 0191-0196, doi: 10.1109/IEMCON56893.2022.9946629.
Please also see the following optometry abstract publications: 1. A Comprehensive Survey of Publicly Available Glaucoma Datasets for Automated Glaucoma Detection; AAO 2022; https://aaopt.org/past-meeting-abstract-archives/?SortBy=ArticleYear&ArticleType=&ArticleYear=2022&Title=&Abstract=&Authors=&Affiliation=&PROGRAMNUMBER=225129 2. Standardized and Open-Access Glaucoma Dataset for Artificial Intelligence Applications; ARVO 2023; https://iovs.arvojournals.org/article.aspx?articleid=2790420 3. Ground truth validation of publicly available datasets utilized in artificial intelligence models for glaucoma detection; ARVO 2023; https://iovs.arvojournals.org/article.aspx?articleid=2791017
Please also see the DOI citations for this and related datasets: 1. SMDG; @dataset{smdg, title={SMDG, A Standardized Fundus Glaucoma Dataset}, url={https://www.kaggle.com/ds/2329670}, DOI={10.34740/KAGGLE/DS/2329670}, publisher={Kaggle}, author={Riley Kiefer}, year={2023} } 2. EyePACS-light-v1 @dataset{eyepacs-light-v1, title={Glaucoma Dataset: EyePACS AIROGS - Light}, url={https://www.kaggle.com/ds/3222646}, DOI={10.34740/KAGGLE/DS/3222646}, publisher={Kaggle}, author={Riley Kiefer}, year={2023} } 3. EyePACS-light-v2 @dataset{eyepacs-light-v2, title={Glaucoma Dataset: EyePACS-AIROGS-light-V2}, url={https://www.kaggle.com/dsv/7300206}, DOI={10.34740/KAGGLE/DSV/7300206}, publisher={Kaggle}, author={Riley Kiefer}, year={2023} }
The objective of this dataset is a machine learning-ready dataset for glaucoma-related applications. Using the help of the community, new open-source glaucoma datasets will be reviewed for standardization and inclusion in this dataset.
| Dataset Instance | Original Fundus | Standardized Fundus Image |
|---|---|---|
| sjchoi86-HRF | ||
| BEH | <img src="htt... |
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The large-scale analysis of thousands of proteins under various experimental conditions or in mutant lines has gained more and more importance in hypothesis-driven scientific research and systems biology in the past years. Quantitative analysis by large scale proteomics using modern mass spectrometry usually results in long lists of peptide ion intensities. The main interest for most researchers, however, is to draw conclusions on the protein level. Postprocessing and combining peptide intensities of a proteomic data set requires expert knowledge, and the often repetitive and standardized manual calculations can be time-consuming. The analysis of complex samples can result in very large data sets (lists with several 1000s to 100 000 entries of different peptides) that cannot easily be analyzed using standard spreadsheet programs. To improve speed and consistency of the data analysis of LC–MS derived proteomic data, we developed cRacker. cRacker is an R-based program for automated downstream proteomic data analysis including data normalization strategies for metabolic labeling and label free quantitation. In addition, cRacker includes basic statistical analysis, such as clustering of data, or ANOVA and t tests for comparison between treatments. Results are presented in editable graphic formats and in list files.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This dataset contains simulated datasets, empirical data, and R scripts described in the paper: “Li, Q. and Kou, X. (2021) WiBB: An integrated method for quantifying the relative importance of predictive variables. Ecography (DOI: 10.1111/ecog.05651)”.
A fundamental goal of scientific research is to identify the underlying variables that govern crucial processes of a system. Here we proposed a new index, WiBB, which integrates the merits of several existing methods: a model-weighting method from information theory (Wi), a standardized regression coefficient method measured by ß* (B), and bootstrap resampling technique (B). We applied the WiBB in simulated datasets with known correlation structures, for both linear models (LM) and generalized linear models (GLM), to evaluate its performance. We also applied two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate their performance in comparison with the WiBB method on ranking predictor importances under various scenarios. We also applied it to an empirical dataset in a plant genus Mimulus to select bioclimatic predictors of species’ presence across the landscape. Results in the simulated datasets showed that the WiBB method outperformed the ß* and SWi methods in scenarios with small and large sample sizes, respectively, and that the bootstrap resampling technique significantly improved the discriminant ability. When testing WiBB in the empirical dataset with GLM, it sensibly identified four important predictors with high credibility out of six candidates in modeling geographical distributions of 71 Mimulus species. This integrated index has great advantages in evaluating predictor importance and hence reducing the dimensionality of data, without losing interpretive power. The simplicity of calculation of the new metric over more sophisticated statistical procedures, makes it a handy method in the statistical toolbox.
Methods To simulate independent datasets (size = 1000), we adopted Galipaud et al.’s approach (2014) with custom modifications of the data.simulation function, which used the multiple normal distribution function rmvnorm in R package mvtnorm(v1.0-5, Genz et al. 2016). Each dataset was simulated with a preset correlation structure between a response variable (y) and four predictors(x1, x2, x3, x4). The first three (genuine) predictors were set to be strongly, moderately, and weakly correlated with the response variable, respectively (denoted by large, medium, small Pearson correlation coefficients, r), while the correlation between the response and the last (spurious) predictor was set to be zero. We simulated datasets with three levels of differences of correlation coefficients of consecutive predictors, where ∆r = 0.1, 0.2, 0.3, respectively. These three levels of ∆r resulted in three correlation structures between the response and four predictors: (0.3, 0.2, 0.1, 0.0), (0.6, 0.4, 0.2, 0.0), and (0.8, 0.6, 0.3, 0.0), respectively. We repeated the simulation procedure 200 times for each of three preset correlation structures (600 datasets in total), for LM fitting later. For GLM fitting, we modified the simulation procedures with additional steps, in which we converted the continuous response into binary data O (e.g., occurrence data having 0 for absence and 1 for presence). We tested the WiBB method, along with two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate the ability to correctly rank predictor importances under various scenarios. The empirical dataset of 71 Mimulus species was collected by their occurrence coordinates and correponding values extracted from climatic layers from WorldClim dataset (www.worldclim.org), and we applied the WiBB method to infer important predictors for their geographical distributions.
Facebook
TwitterThis dataset includes the following files:
A pdf file containing the method naming standards survey questions we used in Qualtrics for surveying professional developers. The file contains the Likert scale questions and source code examples used in the survey.
A CSV file containing professional developers responses to the Likert scale questions and their feedback about each method naming standard, as well as their answers to the demographic questions.
A pdf copy of the survey paper (Preprint).
Survey Paper Citation: Alsuhaibani, R., Newman, C., Decker, M., Collard, M.L., Maletic, J.I., "On the Naming of Methods: A Survey of Professional Developers", in the Proceedings of the 43rd International Conference on Software Engineering (ICSE), Madrid Spain, May 25 - 28, 2021, 12 pages
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository is for: "Meta-analysis of variation suggests that embracing variability improves both replicability and generalizability in preclinical research".1. Main analysis folder contains data (.rds) and R code (.R) for meta-regressions of lnCV, lnRR and lnCVR contained within the main manuscript (Figs 1-3).2. Supplementary folder contains data (.rds) and R code (.R) for: i) second-order meta-regrssion of lnH; ii) arm-based meta-regression of lnSD; iii) sensitivity analyses of lnCV, lnRR and lnCVR; and iv) raw data for plotting the mean-variance relationship.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
• This dataset contains expression matrix handling and normalization results derived from GEO dataset GSE32138. • It includes raw gene expression values processed using standardized bioinformatics workflows. • The dataset demonstrates quantile normalization applied to microarray-based expression data. • It provides visualization outputs used to assess data distribution before and after normalization. • The goal of this dataset is to support reproducible analysis of GSE32138 preprocessing and quality control. • Researchers can use the files for practice in normalization, exploratory data analysis, and visualization. • This dataset is useful for learning microarray preprocessing techniques in R or Python.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionBehavioral and Psychological Symptoms of Dementia (BPSD) are a heterogeneous set of psychological reactions and abnormal behaviors in people with dementia (PwD). Current assessment tools, like the Neuropsychiatric Inventory (NPI), only rely on caregiver assessment of BPSD and are therefore prone to bias.Materials and methodsA multidisciplinary team developed the BPSD-SINDEM scale as a three-part instrument, with two questionnaires administered to the caregiver (evaluating BPSD extent and caregiver distress) and a clinician-rated observational scale. This first instrument was tested on a sample of 33 dyads of PwD and their caregivers, and the results were qualitatively appraised in order to revise the tool through a modified Delphi method. During this phase, the wording of the questions was slightly changed, and the distress scale was changed into a coping scale based on the high correlation between extent and distress (r = 0.94). The final version consisted of three 17-item subscales, evaluating BPSD extent and caregiver coping, and the unchanged clinician-rated observational scale.ResultsThis tool was quantitatively validated in a sample of 208 dyads. It demonstrated good concurrent validity, with the extent subscale correlating positively with NPI scores (r = 0.64, p
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
User modifiable specifications read by R to create interactive graphics referred to as Panel 1, Panels 2–3, and Maps 1–2.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BIEN data validation and standardization tools.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data source soilmap_simple is a simplified and standardized derived form of the 'digital soil map of the Flemish Region' (the shapefile of which we named soilmap, for analytical workflows in R) published by 'Databank Ondergrond Vlaanderen’ (DOV). It is a GeoPackage that contains a spatial polygon layer ‘soilmap_simple’ in the Belgian Lambert 72 coordinate reference system (EPSG-code 31370), plus a non-spatial table ‘explanations’ with the meaning of category codes that occur in the spatial layer. Further documentation about the digital soil map of the Flemish Region is available in Van Ranst & Sys (2000) and Dudal et al. (2005).
This version of soilmap_simple was derived from version 'soilmap_2017-06-20' (Zenodo DOI) as follows:
all attribute variables received English names (purpose of standardization), starting with prefix bsm_ (referring to the 'Belgian soil map');
attribute variables were reordered;
the values of the morphogenetic substrate, texture and drainage variables (bsm_mo_substr, bsm_mo_tex and bsm_mo_drain + their _explan counterparts) were filled for most features in the 'coastal plain' area.
To derive morphogenetic texture and drainage levels from the geomorphological soil types, a conversion table by Bruno De Vos & Carole Ampe was applied (for earlier work on this, see Ampe 2013).
Substrate classes were copied over from bsm_ge_substr into bsm_mo_substr (bsm_ge_substr already followed the categories of bsm_mo_substr).
These steps coincide with the approach that had been taken to construct the Unitype variable in the soilmap data source;
only a minimal number of variables were selected: those that are most useful for analytical work.
See R-code in the GitHub repository 'n2khab-preprocessing' at commit b3c6696 for the creation from the soilmap data source.
A reading function to return soilmap_simple (this data source) or soilmap in a standardized way into the R environment is provided by the R-package n2khab.
The attributes of the spatial polygon layer soilmap_simple can have mo_ in their name to refer to the Belgian Morphogenetic System:
bsm_poly_id: unique polygon ID (numeric)
bsm_region: name of the region
bsm_converted: boolean. Were morphogenetic texture and drainage variables (bsm_mo_tex and bsm_mo_drain) derived from a conversion table (see above)? Value TRUE is largely confined to the 'coastal plain' areas.
bsm_mo_soilunitype: code of the soil type (applying morphogenetic codes within the coastal plain areas when possible, just as for the following three variables)
bsm_mo_substr: code of the soil substrate
bsm_mo_tex: code of the soil texture category
bsm_mo_drain: code of the soil drainage category
bsm_mo_prof: code of the soil profile category
bsm_mo_parentmat: code of a variant regarding the parent material
bsm_mo_profvar: code of a variant regarding the soil profile
The non-spatial table explanations has following variables:
subject: attribute name of the spatial layer: either bsm_mo_substr, bsm_mo_tex, bsm_mo_drain, bsm_mo_prof, bsm_mo_parentmat or bsm_mo_profvar
code: category code that occurs as value for the corresponding attribute in the spatial layer
name: explanation of the value of code
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description of the general genotyping sheet variables.
Facebook
TwitterCorrelation results include means from 23 cultivars, across eight pair combinations of location (Maine/Oregon), season (Fall/Spring) and management system (Conventional/Organic), 2006–2008a.aFor empty cells, r is not significantly different from zero (P<0.05).
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Mortality site investigations of telemetered wildlife are important for cause-specific survival analyses and understanding underlying causes of observed population dynamics. Yet eroding ecoliteracy and a lack of quality control in data collection can lead researchers to make incorrect conclusions, which may negatively impact management decisions for wildlife populations. We reviewed a random sample of 50 peer-reviewed studies published between 2000 and 2019 on survival and cause-specific mortality of ungulates monitored with telemetry devices. This concise review revealed extensive variation in reporting of field procedures, with many studies omitting critical information for cause of mortality inference. Field protocols used to investigate mortality sites and ascertain the cause of mortality are often minimally described and frequently fail to address how investigators dealt with uncertainty. We outline a step-by-step procedure for mortality site investigations of telemetered ungulates, including evidence that should be documented in the field. Specifically, we highlight data that can be useful to differentiate predation from scavenging and more conclusively identify the predator species that killed the ungulate. We also outline how uncertainty in identifying the cause of mortality could be acknowledged and reported. We demonstrate the importance of rigorous protocols and prompt site investigations using data from our 5-year study on survival and cause-specific mortality of telemetered mule deer (Odocoileus hemionus) in northern California. Over the course of our study, we visited mortality sites of neonates (n = 91) and adults (n = 23) to ascertain the cause of mortality. Rapid site visitations significantly improved the successful identification of the cause of mortality and confidence levels for neonates. We discuss the need for rigorous and standardized protocols that include measures of confidence for mortality site investigations. We invite reviewers and journal editors to encourage authors to provide supportive information associated with the identification of causes of mortality, including uncertainty. Methods Three datasets on neonate and adult mule deer (Odocoileus hemionus) mortality site investigations were generated through ecological fieldwork in northern California, USA (2015-2020). The datasets in Dryad are: Does.csv (for use with R); Fawns.csv (for use with R); Full_data.xlsx (which combines the 2 .csv files and includes additional information) Two R code files associated with the 2 .csv datasets above are available in Zenodo: RScript_Does.R; RScript_Fawns.R The data were analyzed using RStudio v.1.1.447 and a variety of packages, including: broom, caret, ciTools, effects, lattice, modEvA, nnet, and tidyverse. The data are associated with the publication "Standardizing protocols for determining the cause of mortality in wildlife studies" in Ecology and Evolution.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A variant of the description of the data structure for modeling tree mortality is given in (Kachaev, 2020).
Time series of primary measurements of tree rings are directly used to compute logistic regression models for tree mortality (Cailleret, et al., 2016).
To expand the number of variables in modeling tree mortality, we introduce derived time series: calculated as a result of tree-ring standardization methods (Bunn, 2010) and empirical mode decompositions (Donghoh and Hee-Seok, 2018).
There are currently six standardization methods available in the dplR library (Bunn, 2010). These methods are: smoothing spline - Spline, modified negative exponential curve - ModNegExp, mean - Mean, model residuals AR - Ar, Friedman smoothing - Friedman and modified Hugershoff curve - ModHugershoff. Standardized time series are inserted into the tree data structure with the addition of the "Tdetr" object: ["Spline", "ModNegExp", "Mean", "Ar", "Friedman", "ModHugershoff"].
The empirical mode decomposition method is implemented in the EMD library (Donghoh and Hee-Seok, 2018). The algorithm decomposes the original time series into a set of time series IMFn (empirical modes) plus the residual series. The total sum of the empirical modes with the residual series gives the original series. The set of time series (empirical modes with a residual series) is inserted into the tree data structure with the addition of the "Temd" object: ["imf1", "imf2", "imf3", "imf4", "res", "low "," high "]. Let's denote the original series as Series, then low = Series- (imf1 + imf2) and high = Series- (imf3 + imf4), these are the series obtained as a result of low-frequency and high-frequency filtering of the original series.
References:
Bunn A.G. (2010). Statistical and visual crossdating in R using the dplR library." Dendrochronologia, 28(4), 251{258. ISSN 11257865. doi:10.1016/j.dendro.2009.12.001. URL http://linkinghub.elsevier.com/retrieve/pii/S1125786510000172.
Cailleret, Maxime et al. (2016), Data from: Towards a common methodology for developing logistic tree mortality models based on ring-width data, Dryad, Dataset, https://doi.org/10.5061/dryad.1bv6n
Donghoh Kim and Hee-Seok Oh (2018) EMD: Empirical Mode Decomposition and Hilbert Spectral Analysis. R package version 1.5.8.
Kachaev, Alexander (2020), “Tree ring growth data in Json format for the development of logistic tree mortality models.”, Mendeley Data, V1, doi: 10.17632/3vht95njg3.1
Facebook
TwitterR and Python libraries for the standardization of data extraction and analysis from NHANES.