72 datasets found
  1. q

    Module M.1 R basics for data exploration and management

    • qubeshub.org
    Updated Jun 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raisa Hernández-Pacheco; Alexandra Bland (2023). Module M.1 R basics for data exploration and management [Dataset]. http://doi.org/10.25334/M9B9-8073
    Explore at:
    Dataset updated
    Jun 26, 2023
    Dataset provided by
    QUBES
    Authors
    Raisa Hernández-Pacheco; Alexandra Bland
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Introduction to Primate Data Exploration and Linear Modeling with R was created with the goal of providing training to undergraduate biology students on data management and statistical analysis using authentic data of Cayo Santiago rhesus macaques. Module M.1 introduces basic functions from R, as well as from its package tidyverse, for data exploration and management.

  2. Additional file 1 of cytoviewer: an R/Bioconductor package for interactive...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    html
    Updated Aug 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lasse Meyer; Nils Eling; Bernd Bodenmiller (2024). Additional file 1 of cytoviewer: an R/Bioconductor package for interactive visualization and exploration of highly multiplexed imaging data [Dataset]. http://doi.org/10.6084/m9.figshare.26660383.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Aug 14, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Lasse Meyer; Nils Eling; Bernd Bodenmiller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1: Publication analysis code. Analysis code to reproduce present study.

  3. R Package History on CRAN

    • kaggle.com
    zip
    Updated Jul 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heads or Tails (2022). R Package History on CRAN [Dataset]. https://www.kaggle.com/datasets/headsortails/r-package-history-on-cran/code
    Explore at:
    zip(5637913 bytes)Available download formats
    Dataset updated
    Jul 18, 2022
    Authors
    Heads or Tails
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Comprehensive R Archive Network (CRAN) is the central repository for software packages in the powerful R programming language for statistical computing. It describes itself as "a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R." If you're installing an R package in the standard way then it is provided by one of the CRAN mirrors.

    The ecosystem of R packages continues to grow at an accelerated pace, covering a multitude of aspects of statistics, machine learning, data visualisation, and many other areas. This dataset provides monthly updates of all the packages available through CRAN, as well as their release histories. Explore the evolution of the R multiverse and all of its facets through this comprehensive data.

    Content

    I'm providing 2 csv tables that describe the current set of R packages on CRAN, as well as the version history of these packages. To derive the data, I made use of the fantastic functionality of the tools package, via the CRAN_package_db function, and the equally wonderful packageRank package and its packageHistory function. The results from those function were slightly adjusted and formatted. I might add further related tables over time.

    See the associated blog post for how the data was derived, and for some ideas on how to explore this dataset.

    These are the tables contained in this dataset:

    • cran_package_overview.csv: all R packages currently available through CRAN, with (usually) 1 row per package. (At the time of the creation of this Kaggle dataset there were a few packages with 2 entries and different dependencies. Feel free to contribute some EDA investigating those.) Packages are listed in alphabetical order according to their names.

    • cran_package_history.csv: version history of virtually all packages in the previous table. This table has one row for each combination of package name and version number, which in most cases leads to multiple rows per package. Packages are listed in alphabetical order according to their names.

    I will update this dataset on a roughly monthly cadence by checking which packages have newer version in the overview table, and then replacing

    Column Description

    Table cran_package_overview.csv: I decided to simplify the large number of columns provided by CRAN and tools::CRAN_package_db into a smaller set of more focus features. All columns are formatted as strings, except for the boolean feature needs_compilation, but the date_published can be read as a ymd date:

    • package: package name following the official spelling and capitalisation. Table is sorted alphabetically according to this column.
    • version: current version.
    • depends: package depends on which other packages.
    • imports: package imports which other packages.
    • licence: the licence under which the package is distributed (e.g. GPL versions)
    • needs_compilation: boolean feature describing whether the package needs to be compiled.
    • author: package author.
    • bug_reports: where to send bugs.
    • url: where to read more.
    • date_published: when the current version of the package was published. Note: this is not the date of the initial package release. See the package history table for that.
    • description: relatively detailed description of what the package is doing.
    • title: the title and tagline of the package.

    Table cran_package_history.csv: The output of packageRank::packageHistory for each package from the overview table. Almost all of them have a match in this table, and can be matched by package and version. All columns are strings, and the date can again be parsed as a ymd date:

    • package: package name. Joins to the feature of the same name in the overview table. Table is sorted alphabetically according to this column.
    • version: historical or current package version. Also joins. Secondary sorting column within each package name.
    • date: when this version was published. Should sort in the same way as the version does.
    • repository: on CRAN or in the Archive.

    Acknowledgements

    All data is being made publicly available by the Comprehensive R Archive Network (CRAN). I'm grateful to the authors and maintainers of the packages tools and packageRank for providing the functionality to query CRAN packages smoothly and easily.

    The vignette photo is the official logo for the R language © 2016 The R Foundation. You can distribute the logo under the terms of the Creative Commons Attribution-ShareAlike 4.0 International license...

  4. d

    Data from: Chronospaces: an R package for the statistical exploration of...

    • datadryad.org
    • search.dataone.org
    • +2more
    zip
    Updated Jul 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolás Mongiardino Koch; Pablo Milla Carmona (2024). Chronospaces: an R package for the statistical exploration of divergence times promotes the assessment of methodological sensitivity [Dataset]. http://doi.org/10.5061/dryad.cfxpnvxdn
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset provided by
    Dryad
    Authors
    Nicolás Mongiardino Koch; Pablo Milla Carmona
    Time period covered
    Feb 19, 2024
    Description

    Data from: Chronospaces: an R package for the statistical exploration of divergence times reveals extreme dependence on molecular clocks and gene choice

    https://doi.org/10.5061/dryad.cfxpnvxdn

    The data contained in this repository supports the results presented in Mongiardino Koch & Milla Carmona (2024), introducing the R package chronospace, and exploring its use to understand sources of uncertainty in divergence time estimation.

    Description of the data and file structure

    The repository contains two folders, which have been zipped for convenience.

    The first of these, 'Datasets', includes in turn three subfolders, containing the data obtained from three publications dealing with the diersification of three clades, and whose names denote the focal clade (i.e., 'Curculionoidea', 'Decapoda', and 'Eukaryota'). Each of these folders contain the same set of files:

    1. 'all_gene_trees.tre': A tree file containing all gene trees, ordered as in the phylogenomic dataset (see below)...
  5. mixOmics: An R package for ‘omics feature selection and multiple data...

    • plos.figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Rohart; Benoît Gautier; Amrit Singh; Kim-Anh Lê Cao (2023). mixOmics: An R package for ‘omics feature selection and multiple data integration [Dataset]. http://doi.org/10.1371/journal.pcbi.1005752
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Florian Rohart; Benoît Gautier; Amrit Singh; Kim-Anh Lê Cao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The advent of high throughput technologies has led to a wealth of publicly available ‘omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a ‘molecular signature’) to explain or predict biological conditions, but mainly for a single type of ‘omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous ‘omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple ‘omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of ‘omics data available from the package.

  6. Palmer Penguins

    • kaggle.com
    zip
    Updated Jul 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maladeep (2021). Palmer Penguins [Dataset]. https://www.kaggle.com/malanep/palmer-penguine
    Explore at:
    zip(8998 bytes)Available download formats
    Dataset updated
    Jul 4, 2021
    Authors
    Maladeep
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Original dataset: https://github.com/allisonhorst/palmerpenguins

    palmerpenguins

    DOI CRAN

    The goal of palmerpenguins is to provide a great dataset for data exploration & visualization, as an alternative to iris.

    Installation

    You can install the released version of palmerpenguins from CRAN with:

    install.packages("palmerpenguins")
    

    To install the development version from GitHub use:

    # install.packages("remotes")
    remotes::install_github("allisonhorst/palmerpenguins")
    

    About the data

    Data were collected and made available by "https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php">Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the "https://lternet.edu/">Long Term Ecological Research Network.

    The palmerpenguins package contains two datasets.

    library(palmerpenguins)
    data(package = 'palmerpenguins')
    

    One is called penguins, and is a simplified version of the raw data; see ?penguins for more info:

    head(penguins)
    #> # A tibble: 6 x 8
    #>  species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex 
    #>
    
  7. A unified framework for unconstrained and constrained ordination of...

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stijn Hawinkel; Frederiek-Maarten Kerckhof; Luc Bijnens; Olivier Thas (2023). A unified framework for unconstrained and constrained ordination of microbiome read count data [Dataset]. http://doi.org/10.1371/journal.pone.0205474
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Stijn Hawinkel; Frederiek-Maarten Kerckhof; Luc Bijnens; Olivier Thas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Explorative visualization techniques provide a first summary of microbiome read count datasets through dimension reduction. A plethora of dimension reduction methods exists, but many of them focus primarily on sample ordination, failing to elucidate the role of the bacterial species. Moreover, implicit but often unrealistic assumptions underlying these methods fail to account for overdispersion and differences in sequencing depth, which are two typical characteristics of sequencing data. We combine log-linear models with a dispersion estimation algorithm and flexible response function modelling into a framework for unconstrained and constrained ordination. The method is able to cope with differences in dispersion between taxa and varying sequencing depths, to yield meaningful biological patterns. Moreover, it can correct for observed technical confounders, whereas other methods are adversely affected by these artefacts. Unlike distance-based ordination methods, the assumptions underlying our method are stated explicitly and can be verified using simple diagnostics. The combination of unconstrained and constrained ordination in the same framework is unique in the field and facilitates microbiome data exploration. We illustrate the advantages of our method on simulated and real datasets, while pointing out flaws in existing methods. The algorithms for fitting and plotting are available in the R-package RCM.

  8. R

    Slakestable : R package to explore raw data from the Slakes app

    • entrepot.recherche.data.gouv.fr
    application/x-gzip
    Updated May 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Chalaux; Marine Lacoste; Marine Lacoste; Saby, Nicolas,; Saby, Nicolas,; Thomas Chalaux (2022). Slakestable : R package to explore raw data from the Slakes app [Dataset]. http://doi.org/10.15454/BGSMUE
    Explore at:
    application/x-gzip(18335)Available download formats
    Dataset updated
    May 25, 2022
    Dataset provided by
    Recherche Data Gouv
    Authors
    Thomas Chalaux; Marine Lacoste; Marine Lacoste; Saby, Nicolas,; Saby, Nicolas,; Thomas Chalaux
    License

    https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html

    Description

    Le package "slakestable" permet de formater rapidement les données brutes issues de l'application pour smartphone "Slakes" (Fajardo et al., 2016). La fonction "tablecourbe" permet de créer une unique table contenant les coefficients a, b, c issues de l'ajustement sur la Gompertz des données brutes, ainsi que le SI600 pour chaque agrégat. Il est possible de concaténer les données par site.localisation par une moyenne ou une médiane avant ou après l'ajustement de l'équation de la Gompertz, deux tables indépendantes sont créées. Il est possible de les rassembler à l'aide de la fonction "jointurefeuilles". The "slakestable" package helps for quick formatting of raw data frome the "Slakes" smartphone app. (Fajardo et al., 2016). The "tablecourbe" function allows the creation of a single table containing the coefficient a, b, c from the Gompertz fit of the data, and the SI600 for each aggregate. It is also possible to concatenate the data by site/location with a mean or median before or after the Gompertz adjustement, two tables are created. It's possible to bind them with the "jointurefeuilles" function.

  9. p

    IEAtools R package

    • pigma.org
    • sextant.ifremer.fr
    Updated Dec 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). IEAtools R package [Dataset]. https://www.pigma.org/geonetwork/srv/search?orgName=Institute%20for%20Hydrobiology%20and%20Fisheries%20Science,%20University%20of%20Hamburg
    Explore at:
    Dataset updated
    Dec 13, 2024
    Description

    An R Package that provides supporting functions for conducting Integrated Ecosystem Assessments (IEA), developed in the framework of Mission Atlantic. The package includes methods for data exploration and assessment of the current ecosystem status. Forked repository in Mission Atlantic. For latest version, check the original repository.

  10. q

    Module M.3 Visualizing data with ggplot2

    • qubeshub.org
    Updated Jun 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raisa Hernández-Pacheco; Alexandra Bland (2023). Module M.3 Visualizing data with ggplot2 [Dataset]. http://doi.org/10.25334/DH54-TQ31
    Explore at:
    Dataset updated
    Jun 26, 2023
    Dataset provided by
    QUBES
    Authors
    Raisa Hernández-Pacheco; Alexandra Bland
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Introduction to Primate Data Exploration and Linear Modeling with R was created with the goal of providing training to undergraduate biology students on data management and statistical analysis using authentic data of Cayo Santiago rhesus macaques. Module M.3 introduces basic functions from R package ggplot2 with the purpose of exploring data and generating publication-quality figures.

  11. R

    WIDEa: a Web Interface for big Data exploration, management and analysis

    • entrepot.recherche.data.gouv.fr
    Updated Sep 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philippe Santenoise; Philippe Santenoise (2021). WIDEa: a Web Interface for big Data exploration, management and analysis [Dataset]. http://doi.org/10.15454/AGU4QE
    Explore at:
    Dataset updated
    Sep 12, 2021
    Dataset provided by
    Recherche Data Gouv
    Authors
    Philippe Santenoise; Philippe Santenoise
    License

    https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15454/AGU4QEhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15454/AGU4QE

    Description

    WIDEa is R-based software aiming to provide users with a range of functionalities to explore, manage, clean and analyse "big" environmental and (in/ex situ) experimental data. These functionalities are the following, 1. Loading/reading different data types: basic (called normal), temporal, infrared spectra of mid/near region (called IR) with frequency (wavenumber) used as unit (in cm-1); 2. Interactive data visualization from a multitude of graph representations: 2D/3D scatter-plot, box-plot, hist-plot, bar-plot, correlation matrix; 3. Manipulation of variables: concatenation of qualitative variables, transformation of quantitative variables by generic functions in R; 4. Application of mathematical/statistical methods; 5. Creation/management of data (named flag data) considered as atypical; 6. Study of normal distribution model results for different strategies: calibration (checking assumptions on residuals), validation (comparison between measured and fitted values). The model form can be more or less complex: mixed effects, main/interaction effects, weighted residuals.

  12. d

    Child 1: Nutrient and streamflow model-input data

    • catalog.data.gov
    • data.usgs.gov
    Updated Oct 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Child 1: Nutrient and streamflow model-input data [Dataset]. https://catalog.data.gov/dataset/child-1-nutrient-and-streamflow-model-input-data
    Explore at:
    Dataset updated
    Oct 8, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    Trends in nutrient fluxes and streamflow for selected tributaries in the Lake Erie watershed were calculated using monitoring data at 10 locations. Trends in flow-normalized nutrient fluxes were determined by applying a weighted regression approach called WRTDS (Weighted Regression on Time, Discharge, and Season). Site information and streamflow and water-quality records are contained in 3 zipped files named as follows: INFO (site information), Daily (daily streamflow records), and Sample (water-quality records). The INFO, Daily (flow), and Sample files contain the input data, by water-quality parameter and by site as .csv files, used to run trend analyses. These files were generated by the R (version 3.1.2) software package called EGRET - Exploration and Graphics for River Trends (version 2.5.1) (Hirsch and DeCicco, 2015), and can be used directly as input to run graphical procedures and WRTDS trend analyses using EGRET R software. The .csv files are identified according to water-quality parameter (TP, SRP, TN, NO23, and TKN) and site reference number (e.g. TPfiles.1.INFO.csv, SRPfiles.1.INFO.csv, TPfiles.2.INFO.csv, etc.). Water-quality parameter abbreviations and site reference numbers are defined in the file "Site-summary_table.csv" on the landing page, where there is also a site-location map ("Site_map.pdf"). Parameter information details, including abbreviation definitions, appear in the abstract on the Landing Page. SRP data records were available at only 6 of the 10 trend sites, which are identified in the file "site-summary_table.csv" (see landing page) as monitored by the organization NCWQR (National Center for Water Quality Research). The SRP sites are: RAIS, MAUW, SAND, HONE, ROCK, and CUYA. The model-input dataset is presented in 3 parts: 1. INFO.zip (site information) 2. Daily.zip (daily streamflow records) 3. Sample.zip (water-quality records) Reference: Hirsch, R.M., and De Cicco, L.A., 2015 (revised). User Guide to Exploration and Graphics for RivEr Trends (EGRET) and dataRetrieval: R Packages for Hydrologic Data, Version 2.0, U.S. Geological Survey Techniques Methods, 4-A10. U.S. Geological Survey, Reston, VA., 93 p. (at: http://dx.doi.org/10.3133/tm4A10).

  13. Z

    Code and data: Exploring congruent diversification histories with...

    • data.niaid.nih.gov
    • data.europa.eu
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2023). Code and data: Exploring congruent diversification histories with flexibility and parsimony [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8091720
    Explore at:
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    Anonymous
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the code and data for the article "Exploring congruent diversification histories with flexibility and parsimony" (abstract bellow).

    Data :

    4705sp_mammal-time.tree: Species-level calibrated mammalian phylogeny from Alvarez-Carretero et al. (https://doi.org/10.6084/m9.figshare.14885691)

    mammals_samplingfraction.csv : Clade-specific sampling fractions from Quintero et al.(https://www.biorxiv.org/content/10.1101/2022.08.09.503355v1.full).

    Code :

    CRABS-v1.1.0.9004.zip: Archived version of the CRABS package with our extension.

    Mammalian_rates_EBD_HSMRF.rev: Rev script for the mammalian diversification analysis in RevBayes with regularized priors on diversification rates.

    Mammalian_rates_EBD_independent.rev: Rev script for the mammalian diversification analysis in RevBayes with independent diversification rates at each interval.

    Mammals_proccess_RevBayes_outputs.Rmd: R notebook for processing the outputs from the RevBayes mammalian diversification analysis, plotting the rates through time, and saving the median trajectories used for further analyses.

    Exploring_congruent_diversification_histories_with_flexibility_and_parsimony.Rmd: R notebook for comparing the initial CRABS features and our new extensions. It enables replicating the figures in the article.

    Outputs :

    output_inferredIntervals_fixedRhp_HSMRF.zip & output_inferredIntervals_fixedRhp_independent.zip: The raw traces from the RevBayes analysis, and the resulting median rate trajectories that are used to construct the congruence class illustrated in the article.

    Abstract

    Using phylogenies of present-day species to estimate diversification rate trajectories -- speciation and extinction rates over time -- is a challenging task due to non-identifiability issues. Given a phylogeny, there exists an infinite set of trajectories that result in the same likelihood; this set has been coined a congruence class. Previous work has developed approaches for sampling trajectories within a given congruence class, with the aim to assess the extent to which congruent scenarios can vary from one another. Based on this sampling approach, it has been suggested that rapid changes in speciation or extinction rates are conserved across the class. Reaching such conclusions requires to sample the broadest possible set of distinct trajectories.

    We introduce a new method for exploring congruence classes, that we implement in the R package CRABS. Whereas existing methods constrain either the speciation rate or the extinction rate trajectory, ours provides more flexibility by sampling congruent speciation and extinction rate trajectories simultaneously. This allows covering a more representative set of distinct diversification rate trajectories. We also implement a filtering step that allows selecting the most parsimonious trajectories within a class.

    We demonstrate the utility of our new sampling strategy using a simulated scenario. Next, we apply our approach to the study of mammalian diversification history. We show that rapid changes in speciation and extinction rates need not be conserved across a congruence class, but that selecting the most parsimonious trajectories shrinks the class to concordant scenarios.

    Our approach opens new avenues both to truly explore the myriad of potential diversification histories consistent with a given phylogeny, embracing the uncertainty inherent to phylogenetic diversification models, and to select among these different histories. This should help refining our inference of diversification trajectories from extant data.

  14. Additional file 4 of tRigon: an R package and Shiny App for integrative...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    html
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David L. Hölscher; Michael Goedertier; Barbara M. Klinkhammer; Patrick Droste; Ivan G. Costa; Peter Boor; Roman D. Bülow (2024). Additional file 4 of tRigon: an R package and Shiny App for integrative (path-)omics data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.26689220.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    David L. Hölscher; Michael Goedertier; Barbara M. Klinkhammer; Patrick Droste; Ivan G. Costa; Peter Boor; Roman D. Bülow
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 4. tRigon session report in html-format for processing omics datasets including a detailed description of input files, processing settings and the processed data frame.

  15. q

    Introduction to Primate Data Exploration and Linear Modeling with R

    • qubeshub.org
    Updated Jun 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raisa Hernández-Pacheco; Alexandra Bland; Alexis Diaz; Alexandra Rosati; Stephanie Gonzalez (2023). Introduction to Primate Data Exploration and Linear Modeling with R [Dataset]. http://doi.org/10.25334/T0ZY-PK40
    Explore at:
    Dataset updated
    Jun 26, 2023
    Dataset provided by
    QUBES
    Authors
    Raisa Hernández-Pacheco; Alexandra Bland; Alexis Diaz; Alexandra Rosati; Stephanie Gonzalez
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Introduction to Primate Data Exploration and Linear Modeling with R was created with the goal of providing training to undergraduate biology research students on data management and statistical analysis using authentic data of Cayo Santiago rhesus macaques.

  16. d

    Data from: Streamflow, Dissolved Organic Carbon, and Nitrate Input Datasets...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Streamflow, Dissolved Organic Carbon, and Nitrate Input Datasets and Model Results Using the Weighted Regressions on Time, Discharge, and Season (WRTDS) Model for Buck Creek Watersheds, Adirondack Park, New York, 2001 to 2021 [Dataset]. https://catalog.data.gov/dataset/streamflow-dissolved-organic-carbon-and-nitrate-input-datasets-and-model-results-using-the
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This data release supports an analysis of changes in dissolved organic carbon (DOC) and nitrate concentrations in Buck Creek watershed near Inlet, New York 2001 to 2021. The Buck Creek watershed is a 310-hectare forested watershed that is recovering from acidic deposition within the Adirondack region. The data release includes pre-processed model inputs and model outputs for the Weighted Regressions on Time, Discharge and Season (WRTDS) model (Hirsch and others, 2010) to estimate daily flow normalized concentrations of DOC and nitrate during a 20-year period of analysis. WRTDS uses daily discharge and concentration observations implemented through the Exploration and Graphics for River Trends R package (EGRET) to predict solute concentration using decimal time and discharge as explanatory variables (Hirsch and De Cicco, 2015; Hirsch and others, 2010). Discharge and concentration data are available from the U.S. Geological Survey National Water Information System (NWIS) database (U.S. Geological Survey, 2016). The time series data were analyzed for the entire period, water years 2001 (WY2001) to WY2021 where WY2001 is the period from October 1, 2000 to September 30, 2001. This data release contains 5 comma-separated values (CSV) files, one R script, and one XML metadata file. There are four input files (“Daily.csv”, “INFO.csv”, “Sample_doc.csv”, and “Sample_nitrate.csv”) that contain site information, daily mean discharge, and mean daily DOC or nitrate concentrations. The R script (“Buck Creek WRTDS R script.R”) uses the four input datasets and functions from the EGRET R package to generate estimations of flow normalized concentrations. The output file (“WRTDS_results.csv”) contains model output at daily time steps for each sub-watershed and for each solute. Files are automatically associated with the R script when opened in RStudio using the provided R project file ("Files.Rproj"). All input, output, and R files are in the "Files.zip" folder.

  17. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  18. n

    funspace: an R package to build, analyze and plot functional trait spaces

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Perez Carmona; Nicola Pavanetto; Giacomo Puglielli (2024). funspace: an R package to build, analyze and plot functional trait spaces [Dataset]. http://doi.org/10.5061/dryad.4tmpg4fg6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 28, 2024
    Dataset provided by
    Universidad de Sevilla
    University of Tartu
    Estonian University of Life Sciences
    Authors
    Carlos Perez Carmona; Nicola Pavanetto; Giacomo Puglielli
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Functional trait space analyses are pivotal to describe and compare organisms’ functional diversity across the tree of life. Yet, there is no single application that streamlines the many sometimes-troublesome steps needed to build and analyze functional trait spaces. To fill this gap, we propose funspace, an R package to easily handle bivariate and multivariate (PCA-based) functional trait space analyses. The six functions that constitute the package can be grouped in three modules: ‘Building and exploring’, ‘Mapping’, and ‘Plotting’. The building and exploring module defines the main features of a functional trait space (e.g., functional diversity metrics) by leveraging kernel density-based methods. The mapping module uses general additive models to map how a target variable distributes within a trait space. The plotting module provides many options for creating flexible and high-quality figures representing the outputs obtained from previous modules. We provide a worked example to demonstrate a complete funspace workflow. funspace will provide researchers working with functional traits across the tree of life with an indispensable asset to easily explore: (i) the main features of any functional trait space, (ii) the relationship between a functional trait space and any other biological or non-biological factor that might contribute to shaping species’ functional diversity.

  19. d

    Data from: Vertical exploration and dimensional modularity in mice

    • datadryad.org
    • datasetcatalog.nlm.nih.gov
    • +2more
    zip
    Updated Feb 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yair Wexler; Yoav Benjamini; Ilan Golani (2018). Vertical exploration and dimensional modularity in mice [Dataset]. http://doi.org/10.5061/dryad.t29p3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 16, 2018
    Dataset provided by
    Dryad
    Authors
    Yair Wexler; Yoav Benjamini; Ilan Golani
    Time period covered
    Jun 1, 2017
    Description

    Exploration is a central component of animal behaviour studied extensively in rodents. Previous tests of free exploration limited vertical movement to rearing and jumping. Here we attach a wire mesh to the arena wall, allowing vertical exploration. This provides an opportunity to study the morphogenesis of behaviour along the vertical dimension, and examine the context in which it is performed. In the current setup, the mice first use the doorway as a point reference for establishing a borderline linear path along the circumference of the arena floor, and then use this path as a linear reference for performing horizontal forays towards the center (incursions) and vertical forays on the wire mesh (ascents). Vertical movement starts with rearing on the wall, and commences with straight vertical ascents that increase in extent and complexity. The mice first reach the top of the wall, then mill about within circumscribed horizontal sections, and then progress horizontally for increasingly l...

  20. d

    Input Files and WRTDS Model Output for the two major tributaries of Lake...

    • datasets.ai
    • s.cnmilf.com
    • +1more
    55
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of the Interior (2023). Input Files and WRTDS Model Output for the two major tributaries of Lake Koocanusa: Mass Removal [Dataset]. https://datasets.ai/datasets/input-files-and-wrtds-model-output-for-the-two-major-tributaries-of-lake-koocanusa-mass-re
    Explore at:
    55Available download formats
    Dataset updated
    May 31, 2023
    Dataset authored and provided by
    Department of the Interior
    Area covered
    Lake Koocanusa
    Description

    Canadian discrete water quality data and daily streamflow records were evaluated using the Weighted Regression on Time, Discharge, and Seasons (WRTDS) model implemented with the EGRET R package (Hirsch et al. 2010, Hirsch and De Cicco 2015). Models were used to estimate loads of solutes and evaluate trends for three constituents of interest (selenium, nitrogen, and sulfate). Six models were generated; one model for each of the three constituents of interest, in each of the two major tributaries to Lake Koocanusa: the Kootenay River at Fenwick (BC08NG0009), and the Elk River above Highway 93 Near Elko (BC08NK0003). Data were obtained by downloading data from the British Columbia Water Tool (https://kwt.bcwatertool.ca/surface-water-quality, https://kwt.bcwatertool.ca/streamflow) and Environment Climate Change Canada (https://open.canada.ca/data/en/dataset/c2adcb27-6d7e-4e97-b546-b8ee3d586aa4/resource/7bb8d1ff-f446-494f-8f3d-ad252162eef5?inner_span=True). This data release consists of two input data files and one output file from the EGRET model estimation (eList) which contains the WRTDS model, for each site and constituent. The input datasets include a daily discharge data file and a measured concentration data file. The period of record for the water quality data varies among the constituents and sites. Likewise, the output file time period aligns with the input files and varies among the 6 models. Nitrate in the Elk River at Highway 93 has the longest period of record from 1979 to 2022. Water quality sampling at the Fenwick station was discontinued in 2019, so all models for the Kootenay end after 2019. This data release also contains mass removal data provided by Teck Coal Limited, which were incorporated into a sub-analysis that used the WRTDS selenium model for the Elk River. This child item contains only the mass removal files. The WRTDS model was run at a daily time step. Model performance evaluations, including a visual assessment of model fit and residuals and bias correction factors were completed. Model output for each parameter at each site (6 total) is published here in an eLists (.rds file). The format of each eLists is standardized per EGRET processing. See Hirsch and De Cicco (2015) for description of these files. WRTDS_Kalman estimates can also be evaluated by running additional functions with the eLists published. To prevent redundancy they were excluded from this output. For the Kalman models nitrate specified a rho of 0.95 while the other models used the default (0.9). Citations: Hirsch, R.M., and De Cicco, L.A., 2015, User guide to Exploration and Graphics for RivEr Trends (EGRET) and dataRetrieval—R packages for hydrologic data (version 2.0, February 2015): U.S. Geological Survey Techniques and Methods book 4, chap. A10, 93 p., http://dx.doi.org/10.3133/tm4A10. Hirsch, R.M., Moyer, D.L., and Archfield, S.A., 2010, Weighted Regressions on Time, Discharge, and Season (WRTDS), With an Application to Chesapeake Bay River Inputs: Journal of the American Water Resources Association (JAWRA), v. 46, no. 5, 857-880 p., DOI: http://dx.doi.org/10.1111/j.1752-1688.2010.00482.x.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Raisa Hernández-Pacheco; Alexandra Bland (2023). Module M.1 R basics for data exploration and management [Dataset]. http://doi.org/10.25334/M9B9-8073

Module M.1 R basics for data exploration and management

Explore at:
Dataset updated
Jun 26, 2023
Dataset provided by
QUBES
Authors
Raisa Hernández-Pacheco; Alexandra Bland
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Introduction to Primate Data Exploration and Linear Modeling with R was created with the goal of providing training to undergraduate biology students on data management and statistical analysis using authentic data of Cayo Santiago rhesus macaques. Module M.1 introduces basic functions from R, as well as from its package tidyverse, for data exploration and management.

Search
Clear search
Close search
Google apps
Main menu