90 datasets found
  1. f

    Data from: Evaluating Functional Diversity: Missing Trait Data and the...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 17, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bryndová, Michala; de Bello, Francesco; Lepš, Jan; Sam, Katerina; Weiss, Matthias; Paal, Taavi; Májeková, Maria; Bishop, Tom R.; Kasari, Liis; Luke, Sarah H.; Götzenberger, Lars; Norberg, Anna; Plowman, Nichola S.; Le Bagousse-Pinguet, Yoann (2016). Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001507382
    Explore at:
    Dataset updated
    Feb 17, 2016
    Authors
    Bryndová, Michala; de Bello, Francesco; Lepš, Jan; Sam, Katerina; Weiss, Matthias; Paal, Taavi; Májeková, Maria; Bishop, Tom R.; Kasari, Liis; Luke, Sarah H.; Götzenberger, Lars; Norberg, Anna; Plowman, Nichola S.; Le Bagousse-Pinguet, Yoann
    Description

    Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data.

  2. Supplement 1. R code for performing nonlinear regression, with data...

    • wiley.figshare.com
    html
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    E. Carol Adair; Sarah E. Hobbie; Russell K. Hobbie (2023). Supplement 1. R code for performing nonlinear regression, with data (embedded in the R code), and a short description of the program. [Dataset]. http://doi.org/10.6084/m9.figshare.3544664.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Wileyhttps://www.wiley.com/
    Authors
    E. Carol Adair; Sarah E. Hobbie; Russell K. Hobbie
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    File List nonlinear_regression.R Description The "nonlinear_regression.R" program provides a short example (with data) of one way to perform nonlinear regression in R (version 2.8.1). This example is not meant to provide extensive information on or training in programming in R, but rather is meant to serve as a starting point for performing nonlinear regression in R. R is a free statistical computing and graphics program that may be run on of UNIX platforms, Windows and MacOS. R may be downloaded here: http://www.r-project.org/.

    There are several good
     resources for learning how to program and perform extensive statistical
     analyses in R, including:
    
     Benjamin M. Bolker. Ecological Models and Data in R. Princeton
     University Press, 2008. ISBN 978-0-691-12522-0. [
     http://www.zoology.ufl.edu/bolker/emdbook/ ]
    
     Other references are provided at http://www.r-project.org/ under
    

    “Documentation” and “Books”.

  3. n

    Data from: Generalizable EHR-R-REDCap pipeline for a national...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jan 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller (2022). Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry [Dataset]. http://doi.org/10.5061/dryad.rjdfn2zcm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2022
    Dataset provided by
    Harvard Medical School
    Massachusetts General Hospital
    Authors
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.

    Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.

    Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.

    Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.

    Methods eLAB Development and Source Code (R statistical software):

    eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).

    eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.

    Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.

    The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).

    Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.

    Data Dictionary (DD)

    EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.

    Study Cohort

    This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.

    Statistical Analysis

    OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.

  4. o

    C2Metadata test files

    • openicpsr.org
    spss, zip
    Updated Aug 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Alter (2020). C2Metadata test files [Dataset]. http://doi.org/10.3886/E120642V1
    Explore at:
    spss, zipAvailable download formats
    Dataset updated
    Aug 16, 2020
    Dataset provided by
    ICPSR
    Authors
    George Alter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The C2Metadata (“Continuous Capture of Metadata”) Project automates one of the most burdensome aspects of documenting the provenance of research data: describing data transformations performed by statistical software. Researchers in many fields use statistical software (SPSS, Stata, SAS, R, Python) for data transformation and data management as well as analysis. Scripts used with statistical software are translated into an independent Structured Data Transformation Language (SDTL), which serves as an intermediate language for describing data transformations. SDTL can be used to add variable-level provenance to data catalogs and codebooks and to create “variable lineages” for auditing software operations. This repository provides examples of scripts and metadata for use in testing C2Metadata tools.

  5. Transformations in PubChem - Full Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schymanski, Emma; Bolton, Evan; Cheng, Tiejun; Thiessen, Paul; Zhang, Jian (Jeff); Helmus, Rick; Blanke, Gerd (2025). Transformations in PubChem - Full Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5644560
    Explore at:
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
    StructurePendium Technologies GmbH
    LCSB, Uni Luxembourg
    University of Amsterdam
    Authors
    Schymanski, Emma; Bolton, Evan; Cheng, Tiejun; Thiessen, Paul; Zhang, Jian (Jeff); Helmus, Rick; Blanke, Gerd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an archive of the data contained in the "Transformations" section in PubChem for integration into patRoon and other workflows.

    For further details see the ECI GitLab site: README and main "tps" folder.

    Credits:

    Concepts: E Schymanski, E Bolton, J Zhang, T Cheng;

    Code (in R): E Schymanski, R Helmus, P Thiessen

    Transformations: E Schymanski, J Zhang, T Cheng and many contributors to various lists!

    PubChem infrastructure: PubChem team

    Reaction InChI (RInChI) calculations (v1.0): Gerd Blanke (previous versions of these files)

    Acknowledgements: ECI team who contributed to related efforts, especially: J. Krier, A. Lai, M. Narayanan, T. Kondic, P. Chirsir, E. Palm. All contributors to the NORMAN-SLE transformations!

    March 2025 released as v0.2.0 since the dataset grew by >3000 entries! The stats are:

    14 March 2025

    Unique Transformation Entries: 10904# Unique Reactions by CID: 9152# Unique Reactions by IK: 9139# Unique Reactions by IKFB: 8574# Unique NORMAN-SLE Compounds by CID: 8207# Unique ChEMBL Compounds by CID: 1419# Unique Compounds (all) by CID: 9267# Unique Predecessors (all) by CID: 3724# Unique Successors (all) by CID: 7331# Range of XlogP Differences: -9.9,10# Range of Mass Differences: -957.97490813,820.227106427

  6. t

    Solar self-sufficient households as a driving factor for sustainability...

    • service.tib.eu
    Updated Nov 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Solar self-sufficient households as a driving factor for sustainability transformation - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/luh-solar-self-sufficient-households-as-a-driving-factor-for-sustainability-transformation
    Explore at:
    Dataset updated
    Nov 14, 2024
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    To get the consumption model from Section 3.1, one needs load execute the file consumption_data.R. Load the data for the 3 Phases ./data/CONSUMPTION/PL1.csv, PL2.csv, PL3.csv, transform the data and build the model (starting line 225). The final consumption data can be found in one file for each year in ./data/CONSUMPTION/MEGA_CONS_list.Rdata To get the results for the optimization problem, one needs to execute the file analyze_data.R. It provides the functions to compare production and consumption data, and to optimize for the different values (PV, MBC,). To reproduce the figures one needs to execute the file visualize_results.R. It provides the functions to reproduce the figures. To calculate the solar radiation that is needed in the Section Production Data, follow file calculate_total_radiation.R. To reproduce the radiation data from from ERA5, that can be found in data.zip, do the following steps: 1. ERA5 - download the reanalysis datasets as GRIB file. For FDIR select "Total sky direct solar radiation at surface", for GHI select "Surface solar radiation downwards", and for ALBEDO select "Forecast albedo". 2. convert GRIB to csv with the file era5toGRID.sh 3. convert the csv file to the data that is used in this paper with the file convert_year_to_grid.R

  7. d

    Data from: Cooperation and coexpression: how coexpression networks shift in...

    • datadryad.org
    • data.niaid.nih.gov
    • +2more
    zip
    Updated Mar 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sathvik X. Palakurty; John R. Stinchcombe; Michelle E. Afkhami (2018). Cooperation and coexpression: how coexpression networks shift in response to multiple mutualists [Dataset]. http://doi.org/10.5061/dryad.2hj343f
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 19, 2018
    Dataset provided by
    Dryad
    Authors
    Sathvik X. Palakurty; John R. Stinchcombe; Michelle E. Afkhami
    Time period covered
    Mar 1, 2018
    Description

    Differential Coexpression ScriptThis script contains the use of previously normalized data to execute the DiffCoEx computational pipeline on an experiment with four treatment groups.differentialCoexpression.rNormalized Transformed Expression Count DataNormalized, transformed expression count data of Medicago truncatula and mycorrhizal fungi is given as an R data frame where the columns denote different genes and rows denote different samples. This data is used for downstream differential coexpression analyses.Expression_Data.zipNormalization and Transformation of Raw Count Data ScriptRaw count data is transformed and normalized with available R packages and RNA-Seq best practices.dataPrep.rRaw_Count_Data_Mycorrhizal_FungiRaw count data from HtSeq for mycorrhizal fungi reads are later transformed and normalized for use in differential coexpression analysis. 'R+' indicates that the sample was obtained from a plant grown in the presence of both mycorrhizal fungi and rhizobia. 'R-' indicate...

  8. d

    Data from: Pesticide and transformation product concentrations and risk...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Pesticide and transformation product concentrations and risk quotients in U.S. headwater streams [Dataset]. https://catalog.data.gov/dataset/pesticide-and-transformation-product-concentrations-and-risk-quotients-in-u-s-headwater-st
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    This dataset includes a subset of previously released pesticide data (Morace and others, 2020) from the U.S. Geological Survey (USGS) National Water Quality Assessment Program (NAWQA) Regional Stream Quality Assessment (RSQA) project and the corresponding hazard index results calculated using the R package toxEval, which are relevant to Mahler and others, 2020. Pesticide and transformation products were analyzed at the USGS National Water Quality Laboratory in Denver, Colorado. Files are grouped as pesticides (parent compounds), transformation products (degradate compounds), compounds with no Acute Invertebrate (AI) benchmarks, compounds with no Acute Non-Vascular Plant (ANVP) benchmarks, and compounds not evaluated through the toxEval R program. See Morace and others, 2020 for corresponding quality assurance or quality control data.

  9. g

    R-scripts for uncertainty analysis v01

    • gimi9.com
    • researchdata.edu.au
    • +1more
    Updated Apr 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). R-scripts for uncertainty analysis v01 [Dataset]. https://gimi9.com/dataset/au_322c38ef-272f-4e77-964c-a14259abe9cf/
    Explore at:
    Dataset updated
    Apr 13, 2022
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Abstract This dataset was created within the Bioregional Assessment Programme. Data has not been derived from any source datasets. Metadata has been compiled by the Bioregional Assessment Programme. This dataset contains a set of generic R scripts that are used in the propagation of uncertainty through numerical models. ## Dataset History The dataset contains a set of R scripts that are loaded as a library. The R scripts are used to carry out the propagation of uncertainty through numerical models. The scripts contain the functions to create the statistical emulators and do the necessary data transformations and backtransformations. The scripts are self-documenting and created by Dan Pagendam (CSIRO) and Warren Jin (CSIRO). ## Dataset Citation Bioregional Assessment Programme (2016) R-scripts for uncertainty analysis v01. Bioregional Assessment Source Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/322c38ef-272f-4e77-964c-a14259abe9cf.

  10. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  11. k

    Data from: Reproduction Package for the Dissertation on Building...

    • radar.kit.edu
    • radar-service.eu
    tar
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heiko Klare (2023). Reproduction Package for the Dissertation on Building Transformation Networks for Consistent Evolution of Interrelated Models [Dataset]. http://doi.org/10.35097/1281
    Explore at:
    tar(1534837248 bytes)Available download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Karlsruhe Institute of Technology
    Klare, Heiko
    Authors
    Heiko Klare
    Description

    Instructions on how to use the data can be found within the repository.

  12. Assessment of data transformations for model-based clustering of RNA-Seq...

    • plos.figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janelle R. Noel-MacDonnell; Joseph Usset; Ellen L. Goode; Brooke L. Fridley (2023). Assessment of data transformations for model-based clustering of RNA-Seq data [Dataset]. http://doi.org/10.1371/journal.pone.0191758
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Janelle R. Noel-MacDonnell; Joseph Usset; Ellen L. Goode; Brooke L. Fridley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Quality control, global biases, normalization, and analysis methods for RNA-Seq data are quite different than those for microarray-based studies. The assumption of normality is reasonable for microarray based gene expression data; however, RNA-Seq data tend to follow an over-dispersed Poisson or negative binomial distribution. Little research has been done to assess how data transformations impact Gaussian model-based clustering with respect to clustering performance and accuracy in estimating the correct number of clusters in RNA-Seq data. In this article, we investigate Gaussian model-based clustering performance and accuracy in estimating the correct number of clusters by applying four data transformations (i.e., naïve, logarithmic, Blom, and variance stabilizing transformation) to simulated RNA-Seq data. To do so, an extensive simulation study was carried out in which the scenarios varied in terms of: how genes were selected to be included in the clustering analyses, size of the clusters, and number of clusters. Following the application of the different transformations to the simulated data, Gaussian model-based clustering was carried out. To assess clustering performance for each of the data transformations, the adjusted rand index, clustering error rate, and concordance index were utilized. As expected, our results showed that clustering performance was gained in scenarios where data transformations were applied to make the data appear “more” Gaussian in distribution.

  13. m

    Data from: Attention Allocation to Projection Level Alleviates...

    • data.mendeley.com
    Updated May 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang Cai (2024). Attention Allocation to Projection Level Alleviates Overconfidence in Situation Awareness [Dataset]. http://doi.org/10.17632/jb5j2rczjz.1
    Explore at:
    Dataset updated
    May 28, 2024
    Authors
    Yang Cai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains several files related to our research paper titled "Attention Allocation to Projection Level Alleviates Overconfidence in Situation Awareness". These files are intended to provide a comprehensive overview of the data analysis process and the presentation of results. Below is a list of the files included and a brief description of each:

    R Scripts: These are scripts written in the R programming language for data processing and analysis. The scripts detail the steps for data cleaning, transformation, statistical analysis, and the visualization of results. To replicate the study findings or to conduct further analyses on the dataset, users should run these scripts.

    R Markdown File: Offers a dynamic document that combines R code with rich text elements such as paragraphs, headings, and lists. This file is designed to explain the logic and steps of the analysis in detail, embedding R code chunks and the outcomes of code execution. It serves as a comprehensive guide to understanding the analytical process behind the study.

    HTML File: Generated from the R Markdown file, this file provides an interactive report of the results that can be viewed in any standard web browser. For those interested in browsing the study's findings without delving into the specifics of the analysis, this HTML file is the most convenient option. It presents the final analysis outcomes in an intuitive and easily understandable manner. For optimal viewing, we recommend opening the HTML file with the latest version of Google Chrome or any other modern web browser. This approach ensures that all interactive functionalities are fully operational.

    Together, these files form a complete framework for the research analysis, aimed at enhancing the transparency and reproducibility of the study.

  14. e

    Eximpedia Export Import Trade

    • eximpedia.app
    Updated Oct 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Oct 13, 2025
    Dataset provided by
    Eximpedia PTE LTD
    Eximpedia Export Import Trade Data
    Authors
    Seair Exim
    Area covered
    San Marino, Nauru, Saint Vincent and the Grenadines, Tajikistan, Kazakhstan, Palestine, Finland, Iran (Islamic Republic of), Bahrain, Denmark
    Description

    Eximpedia Export import trade data lets you search trade data and active Exporters, Importers, Buyers, Suppliers, manufacturers exporters from over 209 countries

  15. Cyclist Google Data Analytics Capstone Project R

    • kaggle.com
    zip
    Updated Oct 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ShrutiJainn (2022). Cyclist Google Data Analytics Capstone Project R [Dataset]. https://www.kaggle.com/datasets/shrutijainn/cyclist-google-data-analytics-capstone-project-r
    Explore at:
    zip(13225 bytes)Available download formats
    Dataset updated
    Oct 17, 2022
    Authors
    ShrutiJainn
    Description

    Scenario

    The director of marketing at Cyclists, a bike-share company in Chicago, believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclists bikes differently and design a new marketing strategy to convert casual riders into annual members.

    Objective/Purpose

    Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. The manager and her team are interested in analyzing the Cyclists historical bike trip data to identify trends.

    Business Task Identify trends to better understand user purchase behavior and recommend marketing strategies to convert casual riders into annual members.

    Data sources used We have used Cyclistic’s historical trip data to analyze and identify trends. We will use 12 months of Cyclistic trip data from January 2021 to December 2021. This is public data that we will use to explore how different customer types are using Cyclists' bikes.

    Documentation of any cleaning or manipulation of data The dataset from January 2021 to December 2021 is more than 1 GB in size. So it'll be somewhat difficult to perform data manipulation and transformation in spreadsheets because of the size of the file. So we can use SQL or R as they're comparatively more capable to handle heavier files. We'll be using R to perform the above-mentioned actions. sSo I have prepare the analysis using only First Quarter i.e from January-March'21.

  16. A

    On-road Emissions and Chemical Transformation of Nitrogen Oxides

    • data.amerigeoss.org
    • datasets.ai
    • +1more
    xls
    Updated Jul 26, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States[old] (2019). On-road Emissions and Chemical Transformation of Nitrogen Oxides [Dataset]. https://data.amerigeoss.org/en_AU/dataset/on-road-emissions-and-chemical-transformation-of-nitrogen-oxides
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 26, 2019
    Dataset provided by
    United States[old]
    Description

    On-road chase and PEMS measurement data while following traffic. Time-averaged to assess the emission rate of the followed vehicle, including the presence of a PEMS to measure direct tailpipe exhaust.

    This dataset is associated with the following publication: Snow, R., J. Faircloth, R. Baldauf, B. Yand, M. Zhang, P. Deshmukh, and X. Zhang. On-road Emissions and Chemical Transformation of Nitrogen Oxides. ATMOSPHERIC ENVIRONMENT. Elsevier Science Ltd, New York, NY, USA, 22, (2017).

  17. t

    Raw data, R scripts and R datasets for statistical analyses from the...

    • researchdata.tuwien.ac.at
    bin, txt
    Updated Oct 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hester Sheehan; Hester Sheehan; Negin Afsharzadeh; Negin Afsharzadeh (2024). Raw data, R scripts and R datasets for statistical analyses from the research article 'Advancing Glycyrrhiza glabra L. cultivation and hairy root transformation and elicitation for future metabolite overexpression' [Dataset]. http://doi.org/10.48436/jczhc-srh29
    Explore at:
    txt, binAvailable download formats
    Dataset updated
    Oct 31, 2024
    Dataset provided by
    TU Wien
    Authors
    Hester Sheehan; Hester Sheehan; Negin Afsharzadeh; Negin Afsharzadeh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset description

    This dataset was created during the research carrried out for the PhD of Negin Afsharzadeh and the subsequent manuscript arising from this research. The main purpose of this dataset is to create a record of the raw data that was used in the analyses in the manuscript.

    This dataset includes:

    • raw data generated from experiments stored in an Excel spreadsheet with each sheet corresponding to a specific experiment or part of an experiment (Afsharzadeh_et_al_2024.xlsx)
    • R script used to analyse the raw data in the software, R (Afsharzadeh_et_al.R)
    • datasets that were used to analyse the data in the statistical software, R (germindata.txt, light.txt)

    Context and methodology

    Brief description of experiments:

    In this study, we aimed to optimize approaches to improve the biotechnological production of important metabolites in G. glabra. The study is made up of four experiments that correspond to particular figures/tables in the manuscript and data, as described below.

    Experiment 1:

    We tested approaches for the cultivation of G. glabra, specifically the breaking of seed dormancy, to ensure timely and efficient seed germination. To do this, we tested the effect of different pretreatments, sterilization treatments and growth media on the germination success of G. glabra.

    This experiment corresponds to:

    • Manuscript: Table 1 and Figure 1
    • Data: Afsharzadeh_et_al_2024.xlsx (Sheet 'Table_1'); Afsharzadeh_et_al.R; germindata.txt

    Experiment 2 (Table 2):

    We aimed to optimize the induction of hairy roots in G. glabra. Four strains of R. rhizogenes were tested to identify the most effective strain for inducing hairy root formation and we tested different tissue explants (cotyledons/hypocotyls) and methods of R. rhizogenes infection (injection or soaking for different durations) in these tissues.

    This experiment corresponds to:

    • Manuscript: Table 2
    • Data: Afsharzadeh_et_al_2024.xlsx (Sheet 'Table_2')

    Experiment 3 (Figure 2):

    Eight distinct hairy root lines were established and the growth rate of these lines was measured over 40 days.

    This experiment corresponds to:

    • Manuscript: Figure 2, Table S2
    • Data: Afsharzadeh_et_al_2024.xlsx (Sheet 'Figure_2')

    Experiment 4 (Figure 3):

    We aimed to test different qualities of light on hairy root cultures in order to induce higher growth and possible enhanced metabolite production. A line with a high growth rate from experiment 3, line S, was selected for growth under different light treatments: red light, blue light, and a combination of blue and red light. To assess the overall impact of these treatments, the growth of line S, as well as the increase in antioxidant capacity and total phenolic content, were tracked over this induction period.

    This experiment corresponds to:

    • Manuscript: Figure 3, Figure S4
    • Data: Afsharzadeh_et_al_2024.xlsx (Sheets 'Figure_3_FW', 'Figure_3_FRAP', 'Figure_3_Phenol'); Afsharzadeh_et_al.R; light.txt

    Technical details

    To work with the .R file and the R datasets, it is necessary to use R: A Language and Environment for Statistical Computing and a package within R, aDHARMA. The versions used for the analyses are R version 4.4.1 and aDHARMA version 0.4.6.

    The references for these are:

    R Core Team, R: A Language and Environment for Statistical Computing 2024. https://www.R-project.org/

    Hartig F, DHARMa: Residual Diagnostics for Hierarchical (Multi-Level/Mixed) Regression Models 2022. https://CRAN.R-project.org/package=DHARMa

  18. Bellabeat Case Study: Data Insights

    • kaggle.com
    zip
    Updated Nov 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meaad Farag (2025). Bellabeat Case Study: Data Insights [Dataset]. https://www.kaggle.com/datasets/miadmmm/bellabeat-case-study-data-insights
    Explore at:
    zip(153579048 bytes)Available download formats
    Dataset updated
    Nov 5, 2025
    Authors
    Meaad Farag
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Bellabeat Case Study – Data Analysis & Marketing Insights

    This project explores Fitbit smart device data to identify user behavior trends and uncover actionable marketing insights for Bellabeat, a wellness technology company for women.

    The analysis was conducted using R and Excel Power Query, following a complete data analytics workflow — from data cleaning and transformation to visualization and marketing recommendations.

    Objectives

    • Detect behavioral patterns in user activity, calories, and sleep data.
    • Identify the most meaningful engagement indicators.
    • Segment users by activity levels for personalized marketing.
    • Provide data-driven recommendations for Bellabeat’s campaigns.

    Data

    • Raw data: FitBit Fitness Tracker Data (original)
    • Cleaned data: Cleaned_Bellabeat_Data_Final.xlsx
    • Data cleaning involved merging multiple daily/hourly datasets, removing duplicates, and filtering non-wear days (steps < 1000).

    Key Findings

    • Strong correlation: Steps ↔ Very Active Minutes (r ≈ 0.69).
    • Three user segments: Low, Medium, High activity.
    • Peak activity hours: 12 PM & 6–7 PM — best times for engagement.
    • Minimal difference: Weekday vs weekend steps.
    • Weak link: Sleep vs steps (r ≈ -0.17).

    Marketing Recommendations

    1. Focus messaging on “30 minutes active per day.”
    2. Personalize campaigns by user segment.
    3. Schedule push notifications around peak activity times.
    4. Build dashboards to monitor retention and engagement.

    Files Included

    • Bellabeat_CaseStudy_RMarkdown.Rmd – R analysis script
    • Bellabeat_CaseStudy_RMarkdown.html – rendered report
    • Bellabeat Case Study Report.docx – written case report
    • Bellabeat Case Study.pptx – presentation slides
    • Cleaned_Bellabeat_Data_Final.xlsx – cleaned dataset
    • FitBit Fitness Tracker Data_original data before cleaning.xlsx – raw dataset
    • /outputs – intermediate tables and analysis results
    • /plots – visualization charts
    • README.txt – detailed project documentation
  19. t

    BIOGRID CURATED DATA FOR PUBLICATION: M-Ras/R-Ras3, a transforming ras...

    • thebiogrid.org
    zip
    Updated Aug 20, 1999
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (1999). BIOGRID CURATED DATA FOR PUBLICATION: M-Ras/R-Ras3, a transforming ras protein regulated by Sos1, GRF1, and p120 Ras GTPase-activating protein, interacts with the putative Ras effector AF6. [Dataset]. https://thebiogrid.org/5251/publication/m-rasr-ras3-a-transforming-ras-protein-regulated-by-sos1-grf1-and-p120-ras-gtpase-activating-protein-interacts-with-the-putative-ras-effector-af6.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 20, 1999
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for Quilliam LA (1999):M-Ras/R-Ras3, a transforming ras protein regulated by Sos1, GRF1, and p120 Ras GTPase-activating protein, interacts with the putative Ras effector AF6. curated by BioGRID (https://thebiogrid.org); ABSTRACT: M-Ras is a Ras-related protein that shares approximately 55% identity with K-Ras and TC21. The M-Ras message was widely expressed but was most predominant in ovary and brain. Similarly to Ha-Ras, expression of mutationally activated M-Ras in NIH 3T3 mouse fibroblasts or C2 myoblasts resulted in cellular transformation or inhibition of differentiation, respectively. M-Ras only weakly activated extracellular signal-regulated kinase 2 (ERK2), but it cooperated with Raf, Rac, and Rho to induce transforming foci in NIH 3T3 cells, suggesting that M-Ras signaled via alternate pathways to these effectors. Although the mitogen-activated protein kinase/ERK kinase inhibitor, PD98059, blocked M-Ras-induced transformation, M-Ras was more effective than an activated mitogen-activated protein kinase/ERK kinase mutant at inducing focus formation. These data indicate that multiple pathways must contribute to M-Ras-induced transformation. M-Ras interacted poorly in a yeast two-hybrid assay with multiple Ras effectors, including c-Raf-1, A-Raf, B-Raf, phosphoinositol-3 kinase delta, RalGDS, and Rin1. Although M-Ras coimmunoprecipitated with AF6, a putative regulator of cell junction formation, overexpression of AF6 did not contribute to fibroblast transformation, suggesting the possibility of novel effector proteins. The M-Ras GTP/GDP cycle was sensitive to the Ras GEFs, Sos1, and GRF1 and to p120 Ras GAP. Together, these findings suggest that while M-Ras is regulated by similar upstream stimuli to Ha-Ras, novel targets may be responsible for its effects on cellular transformation and differentiation.

  20. D

    A&R Data Platforms Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). A&R Data Platforms Market Research Report 2033 [Dataset]. https://dataintelo.com/report/ar-data-platforms-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    A&R Data Platforms Market Outlook



    According to our latest research, the global A&R Data Platforms market size reached USD 715 million in 2024, reflecting robust expansion driven by digital transformation in the music industry. The market is projected to grow at a CAGR of 15.2% from 2025 to 2033, culminating in a forecasted market size of USD 2.7 billion by 2033. This growth is primarily attributed to the increasing demand for data-driven decision-making in artist and repertoire (A&R) processes, the proliferation of streaming services, and the integration of artificial intelligence and machine learning technologies into music management and discovery platforms.




    One of the most significant growth factors for the A&R Data Platforms market is the rapid digitization of the music industry. The shift from physical sales to digital distribution has created a massive influx of data, including streaming metrics, social media engagement, and fan demographics. This wealth of information has become invaluable for record labels, music publishers, and independent artists seeking to identify emerging trends, discover new talent, and optimize marketing strategies. As the volume and complexity of music-related data continue to rise, the need for sophisticated A&R data platforms capable of aggregating, analyzing, and visualizing these datasets has become more pronounced, driving market expansion.




    Another critical driver is the growing reliance on artificial intelligence and advanced analytics in the music sector. AI-powered A&R Data Platforms are revolutionizing the way industry professionals scout talent, manage artists, and negotiate contracts. By leveraging predictive analytics, sentiment analysis, and machine learning algorithms, these platforms can identify promising artists earlier in their careers, forecast potential hit songs, and assess market receptivity to new releases. This technological advancement not only enhances the accuracy of A&R decisions but also reduces the time and resources required for traditional talent scouting, thus accelerating the adoption of data-driven platforms across the industry.




    Moreover, the increasing competition among record labels and the rise of independent artists are fueling demand for comprehensive A&R Data Platforms. As the music landscape becomes more fragmented, with artists able to distribute and promote their work independently, the need for platforms that provide actionable insights into market trends, audience preferences, and royalty management has intensified. These platforms offer a competitive edge by enabling users to make informed decisions regarding artist signings, contract negotiations, and promotional strategies. As a result, both established industry players and emerging artists are investing heavily in A&R data solutions to stay ahead in a rapidly evolving market.




    Regionally, North America continues to dominate the A&R Data Platforms market, driven by the presence of major record labels, a vibrant independent music scene, and advanced technological infrastructure. Europe follows closely, benefiting from a strong tradition of music innovation and significant investment in digital transformation. The Asia Pacific region is emerging as a key growth area, fueled by increasing internet penetration, the popularity of streaming services, and a burgeoning youth population with a strong appetite for music consumption. Latin America and the Middle East & Africa are also witnessing steady growth, although market maturity and adoption rates vary across countries. Overall, the global outlook for the A&R Data Platforms market remains highly positive, with substantial opportunities for expansion in both mature and emerging regions.



    Component Analysis



    The A&R Data Platforms market is segmented by component into software and services, with each segment playing a crucial role in the overall value proposition. The software segment encompasses a wide array of solutions, including data aggregation tools, analytics dashboards, and AI-driven recommendation engines. These platforms are designed to streamline the A&R process by providing real-time access to critical data points such as streaming numbers, social media trends, and fan engagement metrics. The increasing sophistication of software solutions, including the integration of machine learning and natural language processing, has significantly enhanced the ability of users to identify emerging a

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bryndová, Michala; de Bello, Francesco; Lepš, Jan; Sam, Katerina; Weiss, Matthias; Paal, Taavi; Májeková, Maria; Bishop, Tom R.; Kasari, Liis; Luke, Sarah H.; Götzenberger, Lars; Norberg, Anna; Plowman, Nichola S.; Le Bagousse-Pinguet, Yoann (2016). Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001507382

Data from: Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation

Related Article
Explore at:
Dataset updated
Feb 17, 2016
Authors
Bryndová, Michala; de Bello, Francesco; Lepš, Jan; Sam, Katerina; Weiss, Matthias; Paal, Taavi; Májeková, Maria; Bishop, Tom R.; Kasari, Liis; Luke, Sarah H.; Götzenberger, Lars; Norberg, Anna; Plowman, Nichola S.; Le Bagousse-Pinguet, Yoann
Description

Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data.

Search
Clear search
Close search
Google apps
Main menu