45 datasets found

f
Evaluating Functional Diversity: Missing Trait Data and the Importance of...
plos.figshare.com
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Májeková; Taavi Paal; Nichola S. Plowman; Michala Bryndová; Liis Kasari; Anna Norberg; Matthias Weiss; Tom R. Bishop; Sarah H. Luke; Katerina Sam; Yoann Le Bagousse-Pinguet; Jan Lepš; Lars Götzenberger; Francesco de Bello (2023). Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation [Dataset]. http://doi.org/10.1371/journal.pone.0149270
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0149270
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Maria Májeková; Taavi Paal; Nichola S. Plowman; Michala Bryndová; Liis Kasari; Anna Norberg; Matthias Weiss; Tom R. Bishop; Sarah H. Luke; Katerina Sam; Yoann Le Bagousse-Pinguet; Jan Lepš; Lars Götzenberger; Francesco de Bello
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data.
Supplement 1. R code demonstrating how to fit a logistic regression model,...
wiley.figshare.com
figshare.com
html
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David I. Warton; Francis K. C. Hui (2023). Supplement 1. R code demonstrating how to fit a logistic regression model, with a random intercept term, and how to use resampling-based hypothesis testing for inference. [Dataset]. http://doi.org/10.6084/m9.figshare.3550407.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3550407.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Wileyhttps://www.wiley.com/
Authors
David I. Warton; Francis K. C. Hui
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
File List glmmeg.R: R code demonstrating how to fit a logistic regression model, with a random intercept term, to randomly generated overdispersed binomial data. boot.glmm.R: R code for estimating P-values by applying the bootstrap to a GLMM likelihood ratio statistic. Description glmm.R is some example R code which show how to fit a logistic regression model (with or without a random effects term) and use diagnostic plots to check the fit. The code is run on some randomly generated data, which are generated in such a way that overdispersion is evident. This code could be directly applied for your own analyses if you read into R a data.frame called “dataset”, which has columns labelled “success” and “failure” (for number of binomial successes and failures), “species” (a label for the different rows in the dataset), and where we want to test for the effect of some predictor variable called “location”. In other cases, just change the labels and formula as appropriate. boot.glmm.R extends glmm.R by using bootstrapping to calculate P-values in a way that provides better control of Type I error in small samples. It accepts data in the same form as that generated in glmm.R.
Statistical analysis for: Mode I fracture of beech-adhesive bondline at...
zenodo.org
data.niaid.nih.gov
bin, csv, html, txt
Updated Oct 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Burnard; Michael Burnard; Jaka Gašper Pečnik; Jaka Gašper Pečnik (2022). Statistical analysis for: Mode I fracture of beech-adhesive bondline at three different temperatures [Dataset]. http://doi.org/10.5281/zenodo.6839197
Explore at:
csv, html, bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6839197
Dataset updated
Oct 4, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Burnard; Michael Burnard; Jaka Gašper Pečnik; Jaka Gašper Pečnik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset collects a raw dataset and a processed dataset derived from the raw dataset. There is a document containing the analytical code for statistical analysis of the processed dataset in .Rmd format and .html format.

The study examined some aspects of mechanical performance of solid wood composites. We were interested in certain properties of solid wood composites made using different adhesives with different grain orientations at the bondline, then treated at different temperatures prior to testing.

Performance was tested by assessing fracture energy and critical fracture energy, lap shear strength, and compression strength of the composites. This document concerns only the fracture properties, which are the focus of the related paper.

Notes:

* the raw data is provided in this upload, but the processing is not addressed here.
* the authors of this document are a subset of the authors of the related paper.
* this document and the related data files were uploaded at the time of submission for review. An update providing the doi of the related paper will be provided when it is available.
n
Data from: Generalizable EHR-R-REDCap pipeline for a national...
data.niaid.nih.gov
explore.openaire.eu
+2more
zip
Updated Jan 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller (2022). Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry [Dataset]. http://doi.org/10.5061/dryad.rjdfn2zcm
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.rjdfn2zcm
Dataset updated
Jan 9, 2022
Dataset provided by
Massachusetts General Hospital
Harvard Medical School
Authors
Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.

Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.

Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.

Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.

Methods eLAB Development and Source Code (R statistical software):

eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).

eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.

Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.

The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).

Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.

Data Dictionary (DD)

EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.

Study Cohort

This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.

Statistical Analysis

OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.
t
Solar self-sufficient households as a driving factor for sustainability...
service.tib.eu
Updated Nov 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Solar self-sufficient households as a driving factor for sustainability transformation - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/luh-solar-self-sufficient-households-as-a-driving-factor-for-sustainability-transformation
Explore at:
Dataset updated
Nov 14, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
To get the consumption model from Section 3.1, one needs load execute the file consumption_data.R. Load the data for the 3 Phases ./data/CONSUMPTION/PL1.csv, PL2.csv, PL3.csv, transform the data and build the model (starting line 225). The final consumption data can be found in one file for each year in ./data/CONSUMPTION/MEGA_CONS_list.Rdata To get the results for the optimization problem, one needs to execute the file analyze_data.R. It provides the functions to compare production and consumption data, and to optimize for the different values (PV, MBC,). To reproduce the figures one needs to execute the file visualize_results.R. It provides the functions to reproduce the figures. To calculate the solar radiation that is needed in the Section Production Data, follow file calculate_total_radiation.R. To reproduce the radiation data from from ERA5, that can be found in data.zip, do the following steps: 1. ERA5 - download the reanalysis datasets as GRIB file. For FDIR select "Total sky direct solar radiation at surface", for GHI select "Surface solar radiation downwards", and for ALBEDO select "Forecast albedo". 2. convert GRIB to csv with the file era5toGRID.sh 3. convert the csv file to the data that is used in this paper with the file convert_year_to_grid.R
F
Data from: Solar self-sufficient households as a driving factor for...
data.uni-hannover.de
.zip, r, rdata +2
Updated Dec 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institut für Kartographie und Geoinformatik (2024). Solar self-sufficient households as a driving factor for sustainability transformation [Dataset]. https://data.uni-hannover.de/eu/dataset/19503682-5752-4352-97f6-511ae31d97df
Explore at:
rdata(426), rdata(1024592), r(21968), txt(1431), rdata(408277), text/x-sh(183), .zip, r(63854), r(24773), r(3406), r(6280)Available download formats
Dataset updated
Dec 12, 2024
Dataset authored and provided by
Institut für Kartographie und Geoinformatik
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
To get the consumption model from Section 3.1, one needs load execute the file consumption_data.R. Load the data for the 3 Phases ./data/CONSUMPTION/PL1.csv, PL2.csv, PL3.csv, transform the data and build the model (starting line 225). The final consumption data can be found in one file for each year in ./data/CONSUMPTION/MEGA_CONS_list.Rdata

To get the results for the optimization problem, one needs to execute the file analyze_data.R. It provides the functions to compare production and consumption data, and to optimize for the different values (PV, MBC,).

To reproduce the figures one needs to execute the file visualize_results.R. It provides the functions to reproduce the figures.

To calculate the solar radiation that is needed in the Section Production Data, follow file calculate_total_radiation.R.

To reproduce the radiation data from from ERA5, that can be found in data.zip, do the following steps: 1. ERA5 - download the reanalysis datasets as GRIB file. For FDIR select "Total sky direct solar radiation at surface", for GHI select "Surface solar radiation downwards", and for ALBEDO select "Forecast albedo". 2. convert GRIB to csv with the file era5toGRID.sh 3. convert the csv file to the data that is used in this paper with the file convert_year_to_grid.R
Z
Transformations in PubChem - Full Dataset
data.niaid.nih.gov
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cheng, Tiejun (2025). Transformations in PubChem - Full Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5644560
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
Cheng, Tiejun
Helmus, Rick
Zhang, Jian (Jeff)
Blanke, Gerd
Schymanski, Emma
Thiessen, Paul
Bolton, Evan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is an archive of the data contained in the "Transformations" section in PubChem for integration into patRoon and other workflows.

For further details see the ECI GitLab site: README and main "tps" folder.

Credits:

Concepts: E Schymanski, E Bolton, J Zhang, T Cheng;

Code (in R): E Schymanski, R Helmus, P Thiessen

Transformations: E Schymanski, J Zhang, T Cheng and many contributors to various lists!

PubChem infrastructure: PubChem team

Reaction InChI (RInChI) calculations (v1.0): Gerd Blanke (previous versions of these files)

Acknowledgements: ECI team who contributed to related efforts, especially: J. Krier, A. Lai, M. Narayanan, T. Kondic, P. Chirsir, E. Palm. All contributors to the NORMAN-SLE transformations!

March 2025 released as v0.2.0 since the dataset grew by >3000 entries! The stats are:

14 March 2025

Unique Transformation Entries: 10904# Unique Reactions by CID: 9152# Unique Reactions by IK: 9139# Unique Reactions by IKFB: 8574# Unique NORMAN-SLE Compounds by CID: 8207# Unique ChEMBL Compounds by CID: 1419# Unique Compounds (all) by CID: 9267# Unique Predecessors (all) by CID: 3724# Unique Successors (all) by CID: 7331# Range of XlogP Differences: -9.9,10# Range of Mass Differences: -957.97490813,820.227106427
D
The Response Scale Transformation Project
ssh.datastations.nl
datacatalogue.cessda.eu
ods, odt +3
Updated Dec 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
de . de Jonge; de . de Jonge; R. Veenhoven; R. Veenhoven (2020). The Response Scale Transformation Project [Dataset]. http://doi.org/10.17026/DANS-ZX5-P7PE
Explore at:
odt(29645), text/x-fixed-field(22452), ods(11942), ods(11139), ods(12335), ods(13098), text/x-fixed-field(59602), ods(11639), text/x-fixed-field(132938), ods(17048), ods(11734), ods(28694), text/x-fixed-field(1966), ods(14745), ods(11407), ods(14469), ods(12777), ods(12209), text/x-fixed-field(24328), ods(11706), text/x-fixed-field(50572), text/x-fixed-field(18563), odt(9924), text/x-fixed-field(10883), text/x-fixed-field(202843), text/x-fixed-field(2386), ods(97328), text/x-fixed-field(11385), ods(102829), tsv(72547), ods(12341), ods(10873), ods(14458), text/x-fixed-field(10660), ods(38470), text/x-fixed-field(20572), ods(11725), text/x-fixed-field(8864), text/x-fixed-field(165989), ods(14478), text/x-fixed-field(152269), ods(28686), text/x-fixed-field(24767), text/x-fixed-field(2484), text/x-fixed-field(2201), zip(75713), ods(16139), text/x-fixed-field(4196), text/x-fixed-field(14402), odt(923518), ods(11649), text/x-fixed-field(141193), ods(12861), ods(11589), ods(12903), ods(11757), text/x-fixed-field(49805), text/x-fixed-field(12389), text/x-fixed-field(16732), text/x-fixed-field(195127), text/x-fixed-field(122450), ods(11357), text/x-fixed-field(2288), ods(110889), text/x-fixed-field(4853), odt(14649), text/x-fixed-field(1928), ods(12818), ods(12681), ods(11897), text/x-fixed-field(20730), text/x-fixed-field(82219), ods(12707), ods(12159), ods(12189), text/x-fixed-field(12852), odt(866474), ods(12251), text/x-fixed-field(110342), ods(12822), ods(11213), text/x-fixed-field(56990), ods(11821), ods(11480), ods(103685), ods(11803), odt(128432), text/x-fixed-field(19990), ods(12672), ods(12570), text/x-fixed-field(15210), ods(12086), text/x-fixed-field(27258), odt(48839), text/x-fixed-field(3925), ods(12771), tsv(8015), tsv(1382), tsv(1156), tsv(55966), tsv(15059), tsv(22016), tsv(108268), tsv(39979), tsv(117623), tsv(176555), tsv(96630), tsv(13894), tsv(8531), tsv(10586), tsv(2795), tsv(1843), tsv(145500), tsv(879), tsv(9895), tsv(3784), tsv(123192), tsv(136784), tsv(12070), tsv(11112), tsv(17100), tsv(15037), tsv(15159), tsv(1230), tsv(938), tsv(10114), tsv(17282), tsv(3351), tsv(18397), tsv(24102), tsv(43030), tsv(20907), tsv(47877), tsv(173744)Available download formats
Unique identifier
https://doi.org/10.17026/DANS-ZX5-P7PE
Dataset updated
Dec 9, 2020
Dataset provided by
DANS Data Station Social Sciences and Humanities
Authors
de . de Jonge; de . de Jonge; R. Veenhoven; R. Veenhoven
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In this project we have reviewed existing methods used to homogenize data and developed several new methods for dealing with this diversity in survey questions on the same subject. The project is a spin-off from the World Database of Happiness, the main aim of which is to collate and make available research findings on the subjective enjoyment of life and to prepare these data for research synthesis. The first methods we discuss were proposed in the book ‘Happiness in Nations’ and which were used at the inception of the World Database of Happiness. Some 10 years later a new method was introduced: the International Happiness Scale Interval Study (HSIS). Taking the HSIS as a basis the Continuum Approach was developed. Then, building on this approach, we developed the Reference Distribution Method.
g
R-scripts for uncertainty analysis v01
gimi9.com
researchdata.edu.au
+2more
Updated Apr 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). R-scripts for uncertainty analysis v01 [Dataset]. https://gimi9.com/dataset/au_322c38ef-272f-4e77-964c-a14259abe9cf/
Explore at:
Dataset updated
Apr 13, 2022
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Abstract This dataset was created within the Bioregional Assessment Programme. Data has not been derived from any source datasets. Metadata has been compiled by the Bioregional Assessment Programme. This dataset contains a set of generic R scripts that are used in the propagation of uncertainty through numerical models. ## Dataset History The dataset contains a set of R scripts that are loaded as a library. The R scripts are used to carry out the propagation of uncertainty through numerical models. The scripts contain the functions to create the statistical emulators and do the necessary data transformations and backtransformations. The scripts are self-documenting and created by Dan Pagendam (CSIRO) and Warren Jin (CSIRO). ## Dataset Citation Bioregional Assessment Programme (2016) R-scripts for uncertainty analysis v01. Bioregional Assessment Source Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/322c38ef-272f-4e77-964c-a14259abe9cf.
Data from: Data and scripts associated with a manuscript investigating...
osti.gov
Updated Feb 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arnon, Shai; Bar-Zeev, Edo; Borton, Mikayla A.; Brooks, Scott; Chu, Rosalie; Danczak, Robert E.; Garayburu-Caruso, Vanessa A.; Goldman, Amy E.; Graham, Emily B.; Jones, Michael; Jones, Nikki; Lewandowski, Jorg; Meile, Christof; Morad, Joseph W.; Muller, Birgit M.; Powers-McCormack, Beck; Renteria, Lupita; Schalles, John; Schulz, Hanna; Stegen, James C.; Toyoda, Jason G.; Ward, Adam; Wells, Jacqueline R. (2024). Data and scripts associated with a manuscript investigating dissolved organic matter and microbial community linkages across seven globally distributed rivers [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2319037
Explore at:
Dataset updated
Feb 20, 2024
Dataset provided by
Office of Sciencehttp://www.er.doe.gov/
United States Department of Energyhttp://energy.gov/
33.1805,35.6156|33.1805,35.6156|33.1805,35.6156|33.1805,35.6156|33.1805,35.615652.4764,13.6257|52.4764,13.6257|52.4764,13.6257|52.4764,13.6257|52.4764,13.625744.2065,-122.2566|44.2065,-122.2566|44.2065,-122.2566|44.2065,-122.2566|44.2065,-122.256631.3346,-81.4793|31.3346,-81.4793|31.3346,-81.4793|31.3346,-81.4793|31.3346,-81.479346.373,-119.272|46.373,-119.272|46.373,-119.272|46.373,-119.272|46.373,-119.27246.7386,-121.9181|46.7386,-121.9181|46.7386,-121.9181|46.7386,-121.9181|46.7386,-121.918135.9662,-84.3584|35.9662,-84.3584|35.9662,-84.3584|35.9662,-84.3584|35.9662,-84.3584
Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) (United States)
Authors
Arnon, Shai; Bar-Zeev, Edo; Borton, Mikayla A.; Brooks, Scott; Chu, Rosalie; Danczak, Robert E.; Garayburu-Caruso, Vanessa A.; Goldman, Amy E.; Graham, Emily B.; Jones, Michael; Jones, Nikki; Lewandowski, Jorg; Meile, Christof; Morad, Joseph W.; Muller, Birgit M.; Powers-McCormack, Beck; Renteria, Lupita; Schalles, John; Schulz, Hanna; Stegen, James C.; Toyoda, Jason G.; Ward, Adam; Wells, Jacqueline R.
Description
This data package is associated with the publication “Meta-metabolome ecology reveals that geochemistry and microbial functional potential are linked to organic matter development across seven rivers” submitted to Science of the Total Environment. This data package includes the data necessary to replicate the analyses presented within the manuscript to investigate dissolved organic matter (DOM) development across broad spatial distances and within divergent biomes. Specifically, we included the Fourier transform ion cyclotron mass spectrometry (FTICR-MS) data, geochemistry data, annotated metagenomic data, and results from ecological null modeling analyses in this data package. Additionally, we included the scripts necessary to generate the figures from the manuscript.Complete metagenomic data associated with this data package can be found at the National Center for Biotechnology (NCBI) under Bioproject PRJNA946291.This dataset consists of (1) four folders; (2) a file-level metadata (flmd) file; (3) a data dictionary (dd) file; (4) a factor sheet describing samples; and (5) a readme. The FTICR Data folder contains (1) the processed Fourier transform ion cyclotron mass spectrometry (FTICR-MS) data; (2) a transformation-weighted characteristics dendrogram generated from the FTICR-MS data; and (3) the script used to generate all FTICR-MS related figures. The Geochemical Data folder contains (1) the single geochemistry data filemore » and (2) the R script responsible for generating associated figures. The Metagenomic Data folder contains (1) annotation information across different levels; (2) carbohydrate active enzyme (CAZyme) information from the dbCAN database (Yin et al., 2012); (3) phylogenetic tree data (FASTAs, alignments, and tree file); and (4) the scripts necessary to analyze all of these data and generate figures. The Null Modeling Data folder contains (1) data generated during null modeling for each river and all rivers combined and (2) the R scripts necessary to process the data. All files are .csv, .pdf, .tsv, .tre, .faa, .afa, .tree, or .R.« less
f
Comparing spatial regression to random forests for large environmental data...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric W. Fox; Jay M. Ver Hoef; Anthony R. Olsen (2023). Comparing spatial regression to random forests for large environmental data sets [Dataset]. http://doi.org/10.1371/journal.pone.0229509
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0229509
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Eric W. Fox; Jay M. Ver Hoef; Anthony R. Olsen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates with nonlinear relationships, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records that are spatially autocorrelated. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. A primary application is mapping MMI predictions and prediction errors at 1.1 million perennial stream reaches across the conterminous United States. For the spatial regression model, we develop a novel transformation procedure that estimates Box-Cox transformations to linearize covariate relationships and handles possibly zero-inflated covariates. We find that the spatial regression model with transformations, and a subsequent selection of significant covariates, has cross-validation performance comparable to random forests. We also find that prediction interval coverage is close to nominal for each method, but that spatial regression prediction intervals tend to be narrower and have less variability than quantile regression forest prediction intervals. A simulation study is used to generalize results and clarify advantages of each modeling approach.
C2Metadata test files
openicpsr.org
spss, zip
Updated Aug 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
George Alter (2020). C2Metadata test files [Dataset]. http://doi.org/10.3886/E120642V1
Explore at:
spss, zipAvailable download formats
Unique identifier
https://doi.org/10.3886/E120642V1
Dataset updated
Aug 16, 2020
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
George Alter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The C2Metadata (“Continuous Capture of Metadata”) Project automates one of the most burdensome aspects of documenting the provenance of research data: describing data transformations performed by statistical software. Researchers in many fields use statistical software (SPSS, Stata, SAS, R, Python) for data transformation and data management as well as analysis. Scripts used with statistical software are translated into an independent Structured Data Transformation Language (SDTL), which serves as an intermediate language for describing data transformations. SDTL can be used to add variable-level provenance to data catalogs and codebooks and to create “variable lineages” for auditing software operations. This repository provides examples of scripts and metadata for use in testing C2Metadata tools.
d
Data from: Cooperation and coexpression: how coexpression networks shift in...
datadryad.org
data.niaid.nih.gov
+3more
zip
Updated Mar 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sathvik X. Palakurty; John R. Stinchcombe; Michelle E. Afkhami (2018). Cooperation and coexpression: how coexpression networks shift in response to multiple mutualists [Dataset]. http://doi.org/10.5061/dryad.2hj343f
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.2hj343f
Dataset updated
Mar 19, 2018
Dataset provided by
Dryad
Authors
Sathvik X. Palakurty; John R. Stinchcombe; Michelle E. Afkhami
Time period covered
2018
Description
Differential Coexpression ScriptThis script contains the use of previously normalized data to execute the DiffCoEx computational pipeline on an experiment with four treatment groups.differentialCoexpression.rNormalized Transformed Expression Count DataNormalized, transformed expression count data of Medicago truncatula and mycorrhizal fungi is given as an R data frame where the columns denote different genes and rows denote different samples. This data is used for downstream differential coexpression analyses.Expression_Data.zipNormalization and Transformation of Raw Count Data ScriptRaw count data is transformed and normalized with available R packages and RNA-Seq best practices.dataPrep.rRaw_Count_Data_Mycorrhizal_FungiRaw count data from HtSeq for mycorrhizal fungi reads are later transformed and normalized for use in differential coexpression analysis. 'R+' indicates that the sample was obtained from a plant grown in the presence of both mycorrhizal fungi and rhizobia. 'R-' indicate...
Z
Data and R-scripts for "Land-use trajectories for sustainable land system...
data.niaid.nih.gov
Updated Oct 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dominic A. Martin (2021). Data and R-scripts for "Land-use trajectories for sustainable land system transformations: identifying leverage points in a global biodiversity hotspot" (V2) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4601599
Explore at:
Dataset updated
Oct 14, 2021
Dataset authored and provided by
Dominic A. Martin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sustainable land system transformations are necessary to avert biodiversity and climate collapse. However, it remains unclear where entry points for transformations exist in complex land systems. Here, we conceptualize land systems along land-use trajectories, which allows us to identify and evaluate leverage points; i.e., entry points on the trajectory where targeted interventions have particular leverage to influence land-use decisions. We apply this framework in the biodiversity hotspot Madagascar. In the Northeast, smallholder agriculture results in a land-use trajectory originating in old-growth forests, spanning forest fragments, and reaching shifting hill rice cultivation and vanilla agroforests. Integrating interdisciplinary empirical data on seven taxa, five ecosystem services, and three measures of agricultural productivity, we assess trade-offs and co-benefits of land-use decisions at three leverage points along the trajectory. These trade-offs and co-benefits differ between leverage points: two leverage points are situated at the conversion of old-growth forests and forest fragments to shifting cultivation and agroforestry, resulting in considerable trade-offs, especially between endemic biodiversity and agricultural productivity. Here, interventions enabling smallholders to conserve forests are necessary. This is urgent since ongoing forest loss threatens to eliminate these leverage points due to path-dependency. The third leverage point allows for the restoration of land under shifting cultivation through vanilla agroforests and offers co-benefits between restoration goals and agricultural productivity. The co-occurring leverage points highlight that conservation and restoration are simultaneously necessary. Methodologically, the framework shows how leverage points can be identified, evaluated, and harnessed for land system transformations under the consideration of path-dependency along trajectories.
t
Raw data, R scripts and R datasets for statistical analyses from the...
researchdata.tuwien.ac.at
bin, txt
Updated Oct 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hester Sheehan; Hester Sheehan; Negin Afsharzadeh; Negin Afsharzadeh (2024). Raw data, R scripts and R datasets for statistical analyses from the research article 'Advancing Glycyrrhiza glabra L. cultivation and hairy root transformation and elicitation for future metabolite overexpression' [Dataset]. http://doi.org/10.48436/jczhc-srh29
Explore at:
txt, binAvailable download formats
Unique identifier
https://doi.org/10.48436/jczhc-srh29
Dataset updated
Oct 31, 2024
Dataset provided by
TU Wien
Authors
Hester Sheehan; Hester Sheehan; Negin Afsharzadeh; Negin Afsharzadeh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset description

This dataset was created during the research carrried out for the PhD of Negin Afsharzadeh and the subsequent manuscript arising from this research. The main purpose of this dataset is to create a record of the raw data that was used in the analyses in the manuscript.

This dataset includes:

raw data generated from experiments stored in an Excel spreadsheet with each sheet corresponding to a specific experiment or part of an experiment (Afsharzadeh_et_al_2024.xlsx)

R script used to analyse the raw data in the software, R (Afsharzadeh_et_al.R)

datasets that were used to analyse the data in the statistical software, R (germindata.txt, light.txt)

Context and methodology

Brief description of experiments:

In this study, we aimed to optimize approaches to improve the biotechnological production of important metabolites in G. glabra. The study is made up of four experiments that correspond to particular figures/tables in the manuscript and data, as described below.

Experiment 1:

We tested approaches for the cultivation of G. glabra, specifically the breaking of seed dormancy, to ensure timely and efficient seed germination. To do this, we tested the effect of different pretreatments, sterilization treatments and growth media on the germination success of G. glabra.

This experiment corresponds to:

Manuscript: Table 1 and Figure 1

Data: Afsharzadeh_et_al_2024.xlsx (Sheet 'Table_1'); Afsharzadeh_et_al.R; germindata.txt

Experiment 2 (Table 2):

We aimed to optimize the induction of hairy roots in G. glabra. Four strains of R. rhizogenes were tested to identify the most effective strain for inducing hairy root formation and we tested different tissue explants (cotyledons/hypocotyls) and methods of R. rhizogenes infection (injection or soaking for different durations) in these tissues.

This experiment corresponds to:

Manuscript: Table 2

Data: Afsharzadeh_et_al_2024.xlsx (Sheet 'Table_2')

Experiment 3 (Figure 2):

Eight distinct hairy root lines were established and the growth rate of these lines was measured over 40 days.

This experiment corresponds to:

Manuscript: Figure 2, Table S2

Data: Afsharzadeh_et_al_2024.xlsx (Sheet 'Figure_2')

Experiment 4 (Figure 3):

We aimed to test different qualities of light on hairy root cultures in order to induce higher growth and possible enhanced metabolite production. A line with a high growth rate from experiment 3, line S, was selected for growth under different light treatments: red light, blue light, and a combination of blue and red light. To assess the overall impact of these treatments, the growth of line S, as well as the increase in antioxidant capacity and total phenolic content, were tracked over this induction period.

This experiment corresponds to:

Manuscript: Figure 3, Figure S4

Data: Afsharzadeh_et_al_2024.xlsx (Sheets 'Figure_3_FW', 'Figure_3_FRAP', 'Figure_3_Phenol'); Afsharzadeh_et_al.R; light.txt

Technical details

To work with the .R file and the R datasets, it is necessary to use R: A Language and Environment for Statistical Computing and a package within R, aDHARMA. The versions used for the analyses are R version 4.4.1 and aDHARMA version 0.4.6.

The references for these are:

R Core Team, R: A Language and Environment for Statistical Computing 2024. https://www.R-project.org/

Hartig F, DHARMa: Residual Diagnostics for Hierarchical (Multi-Level/Mixed) Regression Models 2022. https://CRAN.R-project.org/package=DHARMa
Data applied to automatic method to transform routine otolith images for a...
seanoe.org
image/*
Updated 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Andrialovanirina; Alizee Hache; Kelig Mahe; Sébastien Couette; Emilie Poisson Caillault (2022). Data applied to automatic method to transform routine otolith images for a standardized otolith database using R [Dataset]. http://doi.org/10.17882/91023
Explore at:
image/*Available download formats
Unique identifier
https://doi.org/10.17882/91023
Dataset updated
2022
Dataset provided by
SEANOE
Authors
Nicolas Andrialovanirina; Alizee Hache; Kelig Mahe; Sébastien Couette; Emilie Poisson Caillault
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
fisheries management is generally based on age structure models. thus, fish ageing data are collected by experts who analyze and interpret calcified structures (scales, vertebrae, fin rays, otoliths, etc.) according to a visual process. the otolith, in the inner ear of the fish, is the most commonly used calcified structure because it is metabolically inert and historically one of the first proxies developed. it contains information throughout the whole life of the fish and provides age structure data for stock assessments of all commercial species. the traditional human reading method to determine age is very time-consuming. automated image analysis can be a low-cost alternative method, however, the first step is the transformation of routinely taken otolith images into standardized images within a database to apply machine learning techniques on the ageing data. otolith shape, resulting from the synthesis of genetic heritage and environmental effects, is a useful tool to identify stock units, therefore a database of standardized images could be used for this aim. using the routinely measured otolith data of plaice (pleuronectes platessa; linnaeus, 1758) and striped red mullet (mullus surmuletus; linnaeus, 1758) in the eastern english channel and north-east arctic cod (gadus morhua; linnaeus, 1758), a greyscale images matrix was generated from the raw images in different formats. contour detection was then applied to identify broken otoliths, the orientation of each otolith, and the number of otoliths per image. to finalize this standardization process, all images were resized and binarized. several mathematical morphology tools were developed from these new images to align and to orient the images, placing the otoliths in the same layout for each image. for this study, we used three databases from two different laboratories using three species (cod, plaice and striped red mullet). this method was approved to these three species and could be applied for others species for age determination and stock identification.
U
Data sets for the Journal of Non-Crystalline Solids X: Article entitled...
researchdata.bath.ac.uk
search.datacite.org
c
Updated May 10, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philip Salmon; Anita Zeidler (2019). Data sets for the Journal of Non-Crystalline Solids X: Article entitled "Pressure induced structural transformations in amorphous MgSiO_3 and CaSiO_3" [Dataset]. http://doi.org/10.15125/BATH-00601
Explore at:
cAvailable download formats
Unique identifier
https://doi.org/10.15125/BATH-00601
Dataset updated
May 10, 2019
Dataset provided by
University of Bath
Authors
Philip Salmon; Anita Zeidler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
University of Bath
Institut Laue-Langevin
Royal Society
United States Department of Energy
Engineering and Physical Sciences Research Council
Japan Society for the Promotion of Science
Atomic Weapons Establishment
Description
Data sets used to prepare Figures 1-14 in the Journal of Non-Crystalline Solids X article entitled "Pressure induced structural transformations in amorphous MgSiO_3 and CaSiO_3." The files are labelled according to the figure numbers. The data sets were created using the methodology described in the manuscript. Each of the plots was drawn using QtGrace (https://sourceforge.net/projects/qtgrace/). The data set corresponding to a plotted curve within an QtGrace file can be identified by clicking on that curve. The units for each axis are identified on the plots.

Figure 1 shows the pressure-volume EOS at room temperature for amorphous and crystalline (a) MgSiO_3 and (b) CaSiO_3.

Figure 2 shows the pressure dependence of the neutron total structure factor S_{N}(k) for amorphous (a) MgSiO_3 and (b) CaSiO_3.

Figure 3 shows the pressure dependence of the neutron total pair-distribution function G_{N}(r) for amorphous (a) MgSiO_3 and (b) CaSiO_3.

Figure 4 shows the pressure dependence of several D′_{N}(r) functions for amorphous MgSiO_3 measured using the D4c diffractometer.

Figure 5 shows the pressure dependence of the Si-O coordination number in amorphous (a) MgSiO_3 and (b) CaSiO_3, the Si-O bond length in amorphous (c) MgSiO_3 and (d) CaSiO_3, and (e) the fraction of n-fold (n = 4, 5, or 6) coordinated Si atoms in these materials.

Figure 6 shows the pressure dependence of the M-O (a) coordination number and (b) bond length for amorphous MgSiO_3 and CaSiO_3.

Figure 7 shows the S_{N}(k) or S_{X}(k) functions for (a) MgSiO_3 and (b) CaSiO_3 after recovery from a pressure of 8.2 or 17.5 GPa.

Figure 8 shows the G_{N}(r) or G_{X}(r) functions for (a) MgSiO_3 and (b) CaSiO_3 after recovery from a pressure of 8.2 or 17.5 GPa.

Figure 9 shows the pressure dependence of the Q^n speciation for fourfold coordinated Si atoms in amorphous (a) MgSiO_3 and (b) CaSiO_3.

Figure 10 shows the pressure dependence in amorphous MgSiO_3 and CaSiO_3 of (a) the overall M-O coordination number and its contributions from M-BO and M-NBO connections, (b) the fractions of M-BO and M-NBO bonds, and (c) the associated M-BO and M-NBO bond distances.

Figure 11 shows the pressure dependence of the fraction of n-fold (n = 4, 5, 6, 7, 8, or 9) coordinated M atoms in amorphous (a) MgSiO_3 and (b) CaSiO_3.

Figure 12 shows the pressure dependence of the O-Si-O, Si-O-Si, Si-O-M, O-M-O and M-O-M bond angle distributions (M = Mg or Ca) for amorphous MgSiO_3 (left hand column) and CaSiO_3 (right hand column).

Figure 13 shows the pressure dependence of the q-parameter distributions for n-fold (n = 4, 5, or 6) coordinated Si atoms in amorphous (a) MgSiO_3 and (b) CaSiO_3.

Figure 14 shows the pressure dependence of the q-parameter distributions for the M atoms in amorphous MgSiO_3 (left hand column) and CaSiO_3 (right hand column).
R script with .csv file for the analysis of data for the paper titled...
figshare.com
txt
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macaulay Winter (2023). R script with .csv file for the analysis of data for the paper titled "Effect of Chemotherapeutic Agents on Natural Transformation Frequency in Acinetobacter baylyi." [Dataset]. http://doi.org/10.6084/m9.figshare.24468091.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24468091.v1
Dataset updated
Oct 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Macaulay Winter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This research tested the effects of six chemotherapeutic drugs on the transformation frequency and growth rate of A. baylyi. Binomial models were constructed for each drug with emmeans postdoc tests to infer effect of drug concentration on A. baylyi behaviour. Significances suggested by the emmeans tests were annotated onto ggplot boxplots and assembled into one panel graph.Supplemental PDF is provided which displays relevant statistical outputs from data analysis.
Source data for Brabham et al. 2024 bioRxiv
figshare.com
application/x-gzip
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Moscou; Helen Brabham (2025). Source data for Brabham et al. 2024 bioRxiv [Dataset]. http://doi.org/10.6084/m9.figshare.28680800.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28680800.v1
Dataset updated
Mar 27, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Matthew Moscou; Helen Brabham
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Source data for Brabham et al. (2024) bioRxiv that includes raw data, uncropped images, and scripts used for data analysis and figure preparation. https://doi.org/10.1101/2024.06.25.599845
f
Data from: Monte Carlo Inference for Semiparametric Bayesian Regression
tandf.figshare.com
zip
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel R. Kowal; Bohan Wu (2024). Monte Carlo Inference for Semiparametric Bayesian Regression [Dataset]. http://doi.org/10.6084/m9.figshare.26863261.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26863261.v2
Dataset updated
Oct 1, 2024
Dataset provided by
Taylor & Francis
Authors
Daniel R. Kowal; Bohan Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data transformations are essential for broad applicability of parametric regression models. However, for Bayesian analysis, joint inference of the transformation and model parameters typically involves restrictive parametric transformations or nonparametric representations that are computationally inefficient and cumbersome for implementation and theoretical analysis, which limits their usability in practice. This article introduces a simple, general, and efficient strategy for joint posterior inference of an unknown transformation and all regression model parameters. The proposed approach directly targets the posterior distribution of the transformation by linking it with the marginal distributions of the independent and dependent variables, and then deploys a Bayesian nonparametric model via the Bayesian bootstrap. Crucially, this approach delivers (a) joint posterior consistency under general conditions, including multiple model misspecifications, and (b) efficient Monte Carlo (not Markov chain Monte Carlo) inference for the transformation and all parameters for important special cases. These tools apply across a variety of data domains, including real-valued, positive, and compactly-supported data. Simulation studies and an empirical application demonstrate the effectiveness and efficiency of this strategy for semiparametric Bayesian analysis with linear models, quantile regression, and Gaussian processes. The R package SeBR is available on CRAN. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

Facebook

Twitter

Click to copy link

Link copied

Cite

Maria Májeková; Taavi Paal; Nichola S. Plowman; Michala Bryndová; Liis Kasari; Anna Norberg; Matthias Weiss; Tom R. Bishop; Sarah H. Luke; Katerina Sam; Yoann Le Bagousse-Pinguet; Jan Lepš; Lars Götzenberger; Francesco de Bello (2023). Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation [Dataset]. http://doi.org/10.1371/journal.pone.0149270

Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation

Explore at:

83 scholarly articles cite this dataset (View in Google Scholar)

docxAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0149270

Dataset updated

May 30, 2023

Dataset provided by

PLOS ONE

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data.

Clear search

Close search

Google apps

Main menu

Evaluating Functional Diversity: Missing Trait Data and the Importance of...

Supplement 1. R code demonstrating how to fit a logistic regression model,...

Statistical analysis for: Mode I fracture of beech-adhesive bondline at...

Data from: Generalizable EHR-R-REDCap pipeline for a national...

Solar self-sufficient households as a driving factor for sustainability...

Data from: Solar self-sufficient households as a driving factor for...

Transformations in PubChem - Full Dataset

14 March 2025

The Response Scale Transformation Project

R-scripts for uncertainty analysis v01

Data from: Data and scripts associated with a manuscript investigating...

Comparing spatial regression to random forests for large environmental data...

C2Metadata test files

Data from: Cooperation and coexpression: how coexpression networks shift in...

Data and R-scripts for "Land-use trajectories for sustainable land system...

Raw data, R scripts and R datasets for statistical analyses from the...

Dataset description

Context and methodology

Brief description of experiments:

Experiment 1:

Experiment 2 (Table 2):

Experiment 3 (Figure 2):

Experiment 4 (Figure 3):

Technical details

Data applied to automatic method to transform routine otolith images for a...

Data sets for the Journal of Non-Crystalline Solids X: Article entitled...

R script with .csv file for the analysis of data for the paper titled...

Source data for Brabham et al. 2024 bioRxiv

Data from: Monte Carlo Inference for Semiparametric Bayesian Regression

Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation