72 datasets found

q
Module M.1 R basics for data exploration and management
qubeshub.org
Updated Jun 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raisa Hernández-Pacheco; Alexandra Bland (2023). Module M.1 R basics for data exploration and management [Dataset]. http://doi.org/10.25334/M9B9-8073
Explore at:
Unique identifier
https://doi.org/10.25334/M9B9-8073
Dataset updated
Jun 26, 2023
Dataset provided by
QUBES
Authors
Raisa Hernández-Pacheco; Alexandra Bland
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Introduction to Primate Data Exploration and Linear Modeling with R was created with the goal of providing training to undergraduate biology students on data management and statistical analysis using authentic data of Cayo Santiago rhesus macaques. Module M.1 introduces basic functions from R, as well as from its package tidyverse, for data exploration and management.
Additional file 1 of cytoviewer: an R/Bioconductor package for interactive...
springernature.figshare.com
datasetcatalog.nlm.nih.gov
html
Updated Aug 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lasse Meyer; Nils Eling; Bernd Bodenmiller (2024). Additional file 1 of cytoviewer: an R/Bioconductor package for interactive visualization and exploration of highly multiplexed imaging data [Dataset]. http://doi.org/10.6084/m9.figshare.26660383.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26660383.v1
Dataset updated
Aug 14, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Lasse Meyer; Nils Eling; Bernd Bodenmiller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1: Publication analysis code. Analysis code to reproduce present study.
R Package History on CRAN
kaggle.com
zip
Updated Jul 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heads or Tails (2022). R Package History on CRAN [Dataset]. https://www.kaggle.com/datasets/headsortails/r-package-history-on-cran/code
Explore at:
zip(5637913 bytes)Available download formats
Dataset updated
Jul 18, 2022
Authors
Heads or Tails
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Comprehensive R Archive Network (CRAN) is the central repository for software packages in the powerful R programming language for statistical computing. It describes itself as "a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R." If you're installing an R package in the standard way then it is provided by one of the CRAN mirrors.

The ecosystem of R packages continues to grow at an accelerated pace, covering a multitude of aspects of statistics, machine learning, data visualisation, and many other areas. This dataset provides monthly updates of all the packages available through CRAN, as well as their release histories. Explore the evolution of the R multiverse and all of its facets through this comprehensive data.

Content

I'm providing 2 csv tables that describe the current set of R packages on CRAN, as well as the version history of these packages. To derive the data, I made use of the fantastic functionality of the tools package, via the CRAN_package_db function, and the equally wonderful packageRank package and its packageHistory function. The results from those function were slightly adjusted and formatted. I might add further related tables over time.

See the associated blog post for how the data was derived, and for some ideas on how to explore this dataset.

These are the tables contained in this dataset:

cran_package_overview.csv: all R packages currently available through CRAN, with (usually) 1 row per package. (At the time of the creation of this Kaggle dataset there were a few packages with 2 entries and different dependencies. Feel free to contribute some EDA investigating those.) Packages are listed in alphabetical order according to their names.

cran_package_history.csv: version history of virtually all packages in the previous table. This table has one row for each combination of package name and version number, which in most cases leads to multiple rows per package. Packages are listed in alphabetical order according to their names.

I will update this dataset on a roughly monthly cadence by checking which packages have newer version in the overview table, and then replacing

Column Description

Table cran_package_overview.csv: I decided to simplify the large number of columns provided by CRAN and tools::CRAN_package_db into a smaller set of more focus features. All columns are formatted as strings, except for the boolean feature needs_compilation, but the date_published can be read as a ymd date:

package: package name following the official spelling and capitalisation. Table is sorted alphabetically according to this column.

version: current version.

depends: package depends on which other packages.

imports: package imports which other packages.

licence: the licence under which the package is distributed (e.g. GPL versions)

needs_compilation: boolean feature describing whether the package needs to be compiled.

author: package author.

bug_reports: where to send bugs.

url: where to read more.

date_published: when the current version of the package was published. Note: this is not the date of the initial package release. See the package history table for that.

description: relatively detailed description of what the package is doing.

title: the title and tagline of the package.

Table cran_package_history.csv: The output of packageRank::packageHistory for each package from the overview table. Almost all of them have a match in this table, and can be matched by package and version. All columns are strings, and the date can again be parsed as a ymd date:

package: package name. Joins to the feature of the same name in the overview table. Table is sorted alphabetically according to this column.

version: historical or current package version. Also joins. Secondary sorting column within each package name.

date: when this version was published. Should sort in the same way as the version does.

repository: on CRAN or in the Archive.

Acknowledgements

All data is being made publicly available by the Comprehensive R Archive Network (CRAN). I'm grateful to the authors and maintainers of the packages tools and packageRank for providing the functionality to query CRAN packages smoothly and easily.

The vignette photo is the official logo for the R language © 2016 The R Foundation. You can distribute the logo under the terms of the Creative Commons Attribution-ShareAlike 4.0 International license...
d
Data from: Chronospaces: an R package for the statistical exploration of...
datadryad.org
search.dataone.org
+2more
zip
Updated Jul 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolás Mongiardino Koch; Pablo Milla Carmona (2024). Chronospaces: an R package for the statistical exploration of divergence times promotes the assessment of methodological sensitivity [Dataset]. http://doi.org/10.5061/dryad.cfxpnvxdn
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.cfxpnvxdn
Dataset updated
Jul 24, 2024
Dataset provided by
Dryad
Authors
Nicolás Mongiardino Koch; Pablo Milla Carmona
Time period covered
Feb 19, 2024
Description
Data from: Chronospaces: an R package for the statistical exploration of divergence times reveals extreme dependence on molecular clocks and gene choice

https://doi.org/10.5061/dryad.cfxpnvxdn

The data contained in this repository supports the results presented in Mongiardino Koch & Milla Carmona (2024), introducing the R package chronospace, and exploring its use to understand sources of uncertainty in divergence time estimation.

Description of the data and file structure

The repository contains two folders, which have been zipped for convenience.

The first of these, 'Datasets', includes in turn three subfolders, containing the data obtained from three publications dealing with the diersification of three clades, and whose names denote the focal clade (i.e., 'Curculionoidea', 'Decapoda', and 'Eukaryota'). Each of these folders contain the same set of files:

'all_gene_trees.tre': A tree file containing all gene trees, ordered as in the phylogenomic dataset (see below)...
mixOmics: An R package for ‘omics feature selection and multiple data...
plos.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Rohart; Benoît Gautier; Amrit Singh; Kim-Anh Lê Cao (2023). mixOmics: An R package for ‘omics feature selection and multiple data integration [Dataset]. http://doi.org/10.1371/journal.pcbi.1005752
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1005752
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Florian Rohart; Benoît Gautier; Amrit Singh; Kim-Anh Lê Cao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The advent of high throughput technologies has led to a wealth of publicly available ‘omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a ‘molecular signature’) to explain or predict biological conditions, but mainly for a single type of ‘omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous ‘omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple ‘omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of ‘omics data available from the package.
Palmer Penguins
kaggle.com
zip
Updated Jul 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maladeep (2021). Palmer Penguins [Dataset]. https://www.kaggle.com/malanep/palmer-penguine
Explore at:
zip(8998 bytes)Available download formats
Dataset updated
Jul 4, 2021
Authors
Maladeep
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Original dataset: https://github.com/allisonhorst/palmerpenguins

palmerpenguins

The goal of palmerpenguins is to provide a great dataset for data exploration & visualization, as an alternative to iris.

Installation

You can install the released version of palmerpenguins from CRAN with:

install.packages("palmerpenguins")

To install the development version from GitHub use:

# install.packages("remotes") remotes::install_github("allisonhorst/palmerpenguins")

About the data

Data were collected and made available by "https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php">Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the "https://lternet.edu/">Long Term Ecological Research Network.

The palmerpenguins package contains two datasets.

library(palmerpenguins) data(package = 'palmerpenguins')

One is called penguins, and is a simplified version of the raw data; see ?penguins for more info:

head(penguins) #> # A tibble: 6 x 8 #> species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex #>
A unified framework for unconstrained and constrained ordination of...
plos.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stijn Hawinkel; Frederiek-Maarten Kerckhof; Luc Bijnens; Olivier Thas (2023). A unified framework for unconstrained and constrained ordination of microbiome read count data [Dataset]. http://doi.org/10.1371/journal.pone.0205474
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0205474
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Stijn Hawinkel; Frederiek-Maarten Kerckhof; Luc Bijnens; Olivier Thas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Explorative visualization techniques provide a first summary of microbiome read count datasets through dimension reduction. A plethora of dimension reduction methods exists, but many of them focus primarily on sample ordination, failing to elucidate the role of the bacterial species. Moreover, implicit but often unrealistic assumptions underlying these methods fail to account for overdispersion and differences in sequencing depth, which are two typical characteristics of sequencing data. We combine log-linear models with a dispersion estimation algorithm and flexible response function modelling into a framework for unconstrained and constrained ordination. The method is able to cope with differences in dispersion between taxa and varying sequencing depths, to yield meaningful biological patterns. Moreover, it can correct for observed technical confounders, whereas other methods are adversely affected by these artefacts. Unlike distance-based ordination methods, the assumptions underlying our method are stated explicitly and can be verified using simple diagnostics. The combination of unconstrained and constrained ordination in the same framework is unique in the field and facilitates microbiome data exploration. We illustrate the advantages of our method on simulated and real datasets, while pointing out flaws in existing methods. The algorithms for fitting and plotting are available in the R-package RCM.
R
Slakestable : R package to explore raw data from the Slakes app
entrepot.recherche.data.gouv.fr
application/x-gzip
Updated May 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Chalaux; Marine Lacoste; Marine Lacoste; Saby, Nicolas,; Saby, Nicolas,; Thomas Chalaux (2022). Slakestable : R package to explore raw data from the Slakes app [Dataset]. http://doi.org/10.15454/BGSMUE
Explore at:
application/x-gzip(18335)Available download formats
Unique identifier
https://doi.org/10.15454/BGSMUE
Dataset updated
May 25, 2022
Dataset provided by
Recherche Data Gouv
Authors
Thomas Chalaux; Marine Lacoste; Marine Lacoste; Saby, Nicolas,; Saby, Nicolas,; Thomas Chalaux
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Description
Le package "slakestable" permet de formater rapidement les données brutes issues de l'application pour smartphone "Slakes" (Fajardo et al., 2016). La fonction "tablecourbe" permet de créer une unique table contenant les coefficients a, b, c issues de l'ajustement sur la Gompertz des données brutes, ainsi que le SI600 pour chaque agrégat. Il est possible de concaténer les données par site.localisation par une moyenne ou une médiane avant ou après l'ajustement de l'équation de la Gompertz, deux tables indépendantes sont créées. Il est possible de les rassembler à l'aide de la fonction "jointurefeuilles". The "slakestable" package helps for quick formatting of raw data frome the "Slakes" smartphone app. (Fajardo et al., 2016). The "tablecourbe" function allows the creation of a single table containing the coefficient a, b, c from the Gompertz fit of the data, and the SI600 for each aggregate. It is also possible to concatenate the data by site/location with a mean or median before or after the Gompertz adjustement, two tables are created. It's possible to bind them with the "jointurefeuilles" function.
p
IEAtools R package
pigma.org
sextant.ifremer.fr
Updated Dec 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). IEAtools R package [Dataset]. https://www.pigma.org/geonetwork/srv/search?orgName=Institute%20for%20Hydrobiology%20and%20Fisheries%20Science,%20University%20of%20Hamburg
Explore at:
Dataset updated
Dec 13, 2024
Description
An R Package that provides supporting functions for conducting Integrated Ecosystem Assessments (IEA), developed in the framework of Mission Atlantic. The package includes methods for data exploration and assessment of the current ecosystem status. Forked repository in Mission Atlantic. For latest version, check the original repository.
q
Module M.3 Visualizing data with ggplot2
qubeshub.org
Updated Jun 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raisa Hernández-Pacheco; Alexandra Bland (2023). Module M.3 Visualizing data with ggplot2 [Dataset]. http://doi.org/10.25334/DH54-TQ31
Explore at:
Unique identifier
https://doi.org/10.25334/DH54-TQ31
Dataset updated
Jun 26, 2023
Dataset provided by
QUBES
Authors
Raisa Hernández-Pacheco; Alexandra Bland
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Introduction to Primate Data Exploration and Linear Modeling with R was created with the goal of providing training to undergraduate biology students on data management and statistical analysis using authentic data of Cayo Santiago rhesus macaques. Module M.3 introduces basic functions from R package ggplot2 with the purpose of exploring data and generating publication-quality figures.
R
WIDEa: a Web Interface for big Data exploration, management and analysis
entrepot.recherche.data.gouv.fr
Updated Sep 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philippe Santenoise; Philippe Santenoise (2021). WIDEa: a Web Interface for big Data exploration, management and analysis [Dataset]. http://doi.org/10.15454/AGU4QE
Explore at:
Unique identifier
https://doi.org/10.15454/AGU4QE
Dataset updated
Sep 12, 2021
Dataset provided by
Recherche Data Gouv
Authors
Philippe Santenoise; Philippe Santenoise
License
https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15454/AGU4QEhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15454/AGU4QE
Description
WIDEa is R-based software aiming to provide users with a range of functionalities to explore, manage, clean and analyse "big" environmental and (in/ex situ) experimental data. These functionalities are the following, 1. Loading/reading different data types: basic (called normal), temporal, infrared spectra of mid/near region (called IR) with frequency (wavenumber) used as unit (in cm-1); 2. Interactive data visualization from a multitude of graph representations: 2D/3D scatter-plot, box-plot, hist-plot, bar-plot, correlation matrix; 3. Manipulation of variables: concatenation of qualitative variables, transformation of quantitative variables by generic functions in R; 4. Application of mathematical/statistical methods; 5. Creation/management of data (named flag data) considered as atypical; 6. Study of normal distribution model results for different strategies: calibration (checking assumptions on residuals), validation (comparison between measured and fitted values). The model form can be more or less complex: mixed effects, main/interaction effects, weighted residuals.
d
Child 1: Nutrient and streamflow model-input data
catalog.data.gov
data.usgs.gov
Updated Oct 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Child 1: Nutrient and streamflow model-input data [Dataset]. https://catalog.data.gov/dataset/child-1-nutrient-and-streamflow-model-input-data
Explore at:
Dataset updated
Oct 8, 2025
Dataset provided by
U.S. Geological Survey
Description
Trends in nutrient fluxes and streamflow for selected tributaries in the Lake Erie watershed were calculated using monitoring data at 10 locations. Trends in flow-normalized nutrient fluxes were determined by applying a weighted regression approach called WRTDS (Weighted Regression on Time, Discharge, and Season). Site information and streamflow and water-quality records are contained in 3 zipped files named as follows: INFO (site information), Daily (daily streamflow records), and Sample (water-quality records). The INFO, Daily (flow), and Sample files contain the input data, by water-quality parameter and by site as .csv files, used to run trend analyses. These files were generated by the R (version 3.1.2) software package called EGRET - Exploration and Graphics for River Trends (version 2.5.1) (Hirsch and DeCicco, 2015), and can be used directly as input to run graphical procedures and WRTDS trend analyses using EGRET R software. The .csv files are identified according to water-quality parameter (TP, SRP, TN, NO23, and TKN) and site reference number (e.g. TPfiles.1.INFO.csv, SRPfiles.1.INFO.csv, TPfiles.2.INFO.csv, etc.). Water-quality parameter abbreviations and site reference numbers are defined in the file "Site-summary_table.csv" on the landing page, where there is also a site-location map ("Site_map.pdf"). Parameter information details, including abbreviation definitions, appear in the abstract on the Landing Page. SRP data records were available at only 6 of the 10 trend sites, which are identified in the file "site-summary_table.csv" (see landing page) as monitored by the organization NCWQR (National Center for Water Quality Research). The SRP sites are: RAIS, MAUW, SAND, HONE, ROCK, and CUYA. The model-input dataset is presented in 3 parts: 1. INFO.zip (site information) 2. Daily.zip (daily streamflow records) 3. Sample.zip (water-quality records) Reference: Hirsch, R.M., and De Cicco, L.A., 2015 (revised). User Guide to Exploration and Graphics for RivEr Trends (EGRET) and dataRetrieval: R Packages for Hydrologic Data, Version 2.0, U.S. Geological Survey Techniques Methods, 4-A10. U.S. Geological Survey, Reston, VA., 93 p. (at: http://dx.doi.org/10.3133/tm4A10).
Z
Code and data: Exploring congruent diversification histories with...
data.niaid.nih.gov
data.europa.eu
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2023). Code and data: Exploring congruent diversification histories with flexibility and parsimony [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8091720
Explore at:
Dataset updated
Sep 28, 2023
Dataset provided by
Anonymous
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the code and data for the article "Exploring congruent diversification histories with flexibility and parsimony" (abstract bellow).

Data :

4705sp_mammal-time.tree: Species-level calibrated mammalian phylogeny from Alvarez-Carretero et al. (https://doi.org/10.6084/m9.figshare.14885691)

mammals_samplingfraction.csv : Clade-specific sampling fractions from Quintero et al.(https://www.biorxiv.org/content/10.1101/2022.08.09.503355v1.full).

Code :

CRABS-v1.1.0.9004.zip: Archived version of the CRABS package with our extension.

Mammalian_rates_EBD_HSMRF.rev: Rev script for the mammalian diversification analysis in RevBayes with regularized priors on diversification rates.

Mammalian_rates_EBD_independent.rev: Rev script for the mammalian diversification analysis in RevBayes with independent diversification rates at each interval.

Mammals_proccess_RevBayes_outputs.Rmd: R notebook for processing the outputs from the RevBayes mammalian diversification analysis, plotting the rates through time, and saving the median trajectories used for further analyses.

Exploring_congruent_diversification_histories_with_flexibility_and_parsimony.Rmd: R notebook for comparing the initial CRABS features and our new extensions. It enables replicating the figures in the article.

Outputs :

output_inferredIntervals_fixedRhp_HSMRF.zip & output_inferredIntervals_fixedRhp_independent.zip: The raw traces from the RevBayes analysis, and the resulting median rate trajectories that are used to construct the congruence class illustrated in the article.

Abstract

Using phylogenies of present-day species to estimate diversification rate trajectories -- speciation and extinction rates over time -- is a challenging task due to non-identifiability issues. Given a phylogeny, there exists an infinite set of trajectories that result in the same likelihood; this set has been coined a congruence class. Previous work has developed approaches for sampling trajectories within a given congruence class, with the aim to assess the extent to which congruent scenarios can vary from one another. Based on this sampling approach, it has been suggested that rapid changes in speciation or extinction rates are conserved across the class. Reaching such conclusions requires to sample the broadest possible set of distinct trajectories.

We introduce a new method for exploring congruence classes, that we implement in the R package CRABS. Whereas existing methods constrain either the speciation rate or the extinction rate trajectory, ours provides more flexibility by sampling congruent speciation and extinction rate trajectories simultaneously. This allows covering a more representative set of distinct diversification rate trajectories. We also implement a filtering step that allows selecting the most parsimonious trajectories within a class.

We demonstrate the utility of our new sampling strategy using a simulated scenario. Next, we apply our approach to the study of mammalian diversification history. We show that rapid changes in speciation and extinction rates need not be conserved across a congruence class, but that selecting the most parsimonious trajectories shrinks the class to concordant scenarios.

Our approach opens new avenues both to truly explore the myriad of potential diversification histories consistent with a given phylogeny, embracing the uncertainty inherent to phylogenetic diversification models, and to select among these different histories. This should help refining our inference of diversification trajectories from extant data.
Additional file 4 of tRigon: an R package and Shiny App for integrative...
springernature.figshare.com
datasetcatalog.nlm.nih.gov
html
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David L. Hölscher; Michael Goedertier; Barbara M. Klinkhammer; Patrick Droste; Ivan G. Costa; Peter Boor; Roman D. Bülow (2024). Additional file 4 of tRigon: an R package and Shiny App for integrative (path-)omics data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.26689220.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26689220.v1
Dataset updated
Aug 15, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
David L. Hölscher; Michael Goedertier; Barbara M. Klinkhammer; Patrick Droste; Ivan G. Costa; Peter Boor; Roman D. Bülow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 4. tRigon session report in html-format for processing omics datasets including a detailed description of input files, processing settings and the processed data frame.
q
Introduction to Primate Data Exploration and Linear Modeling with R
qubeshub.org
Updated Jun 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raisa Hernández-Pacheco; Alexandra Bland; Alexis Diaz; Alexandra Rosati; Stephanie Gonzalez (2023). Introduction to Primate Data Exploration and Linear Modeling with R [Dataset]. http://doi.org/10.25334/T0ZY-PK40
Explore at:
Unique identifier
https://doi.org/10.25334/T0ZY-PK40
Dataset updated
Jun 26, 2023
Dataset provided by
QUBES
Authors
Raisa Hernández-Pacheco; Alexandra Bland; Alexis Diaz; Alexandra Rosati; Stephanie Gonzalez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Introduction to Primate Data Exploration and Linear Modeling with R was created with the goal of providing training to undergraduate biology research students on data management and statistical analysis using authentic data of Cayo Santiago rhesus macaques.
d
Data from: Streamflow, Dissolved Organic Carbon, and Nitrate Input Datasets...
catalog.data.gov
data.usgs.gov
Updated Nov 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Streamflow, Dissolved Organic Carbon, and Nitrate Input Datasets and Model Results Using the Weighted Regressions on Time, Discharge, and Season (WRTDS) Model for Buck Creek Watersheds, Adirondack Park, New York, 2001 to 2021 [Dataset]. https://catalog.data.gov/dataset/streamflow-dissolved-organic-carbon-and-nitrate-input-datasets-and-model-results-using-the
Explore at:
Dataset updated
Nov 26, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This data release supports an analysis of changes in dissolved organic carbon (DOC) and nitrate concentrations in Buck Creek watershed near Inlet, New York 2001 to 2021. The Buck Creek watershed is a 310-hectare forested watershed that is recovering from acidic deposition within the Adirondack region. The data release includes pre-processed model inputs and model outputs for the Weighted Regressions on Time, Discharge and Season (WRTDS) model (Hirsch and others, 2010) to estimate daily flow normalized concentrations of DOC and nitrate during a 20-year period of analysis. WRTDS uses daily discharge and concentration observations implemented through the Exploration and Graphics for River Trends R package (EGRET) to predict solute concentration using decimal time and discharge as explanatory variables (Hirsch and De Cicco, 2015; Hirsch and others, 2010). Discharge and concentration data are available from the U.S. Geological Survey National Water Information System (NWIS) database (U.S. Geological Survey, 2016). The time series data were analyzed for the entire period, water years 2001 (WY2001) to WY2021 where WY2001 is the period from October 1, 2000 to September 30, 2001. This data release contains 5 comma-separated values (CSV) files, one R script, and one XML metadata file. There are four input files (“Daily.csv”, “INFO.csv”, “Sample_doc.csv”, and “Sample_nitrate.csv”) that contain site information, daily mean discharge, and mean daily DOC or nitrate concentrations. The R script (“Buck Creek WRTDS R script.R”) uses the four input datasets and functions from the EGRET R package to generate estimations of flow normalized concentrations. The output file (“WRTDS_results.csv”) contains model output at daily time steps for each sub-watershed and for each solute. Files are automatically associated with the R script when opened in RStudio using the provided R project file ("Files.Rproj"). All input, output, and R files are in the "Files.zip" folder.
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
n
funspace: an R package to build, analyze and plot functional trait spaces
data.niaid.nih.gov
datadryad.org
zip
Updated Feb 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlos Perez Carmona; Nicola Pavanetto; Giacomo Puglielli (2024). funspace: an R package to build, analyze and plot functional trait spaces [Dataset]. http://doi.org/10.5061/dryad.4tmpg4fg6
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.4tmpg4fg6
Dataset updated
Feb 28, 2024
Dataset provided by
Universidad de Sevilla
University of Tartu
Estonian University of Life Sciences
Authors
Carlos Perez Carmona; Nicola Pavanetto; Giacomo Puglielli
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Functional trait space analyses are pivotal to describe and compare organisms’ functional diversity across the tree of life. Yet, there is no single application that streamlines the many sometimes-troublesome steps needed to build and analyze functional trait spaces. To fill this gap, we propose funspace, an R package to easily handle bivariate and multivariate (PCA-based) functional trait space analyses. The six functions that constitute the package can be grouped in three modules: ‘Building and exploring’, ‘Mapping’, and ‘Plotting’. The building and exploring module defines the main features of a functional trait space (e.g., functional diversity metrics) by leveraging kernel density-based methods. The mapping module uses general additive models to map how a target variable distributes within a trait space. The plotting module provides many options for creating flexible and high-quality figures representing the outputs obtained from previous modules. We provide a worked example to demonstrate a complete funspace workflow. funspace will provide researchers working with functional traits across the tree of life with an indispensable asset to easily explore: (i) the main features of any functional trait space, (ii) the relationship between a functional trait space and any other biological or non-biological factor that might contribute to shaping species’ functional diversity.
d
Data from: Vertical exploration and dimensional modularity in mice
datadryad.org
datasetcatalog.nlm.nih.gov
+2more
zip
Updated Feb 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yair Wexler; Yoav Benjamini; Ilan Golani (2018). Vertical exploration and dimensional modularity in mice [Dataset]. http://doi.org/10.5061/dryad.t29p3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.t29p3
Dataset updated
Feb 16, 2018
Dataset provided by
Dryad
Authors
Yair Wexler; Yoav Benjamini; Ilan Golani
Time period covered
Jun 1, 2017
Description
Exploration is a central component of animal behaviour studied extensively in rodents. Previous tests of free exploration limited vertical movement to rearing and jumping. Here we attach a wire mesh to the arena wall, allowing vertical exploration. This provides an opportunity to study the morphogenesis of behaviour along the vertical dimension, and examine the context in which it is performed. In the current setup, the mice first use the doorway as a point reference for establishing a borderline linear path along the circumference of the arena floor, and then use this path as a linear reference for performing horizontal forays towards the center (incursions) and vertical forays on the wire mesh (ascents). Vertical movement starts with rearing on the wall, and commences with straight vertical ascents that increase in extent and complexity. The mice first reach the top of the wall, then mill about within circumscribed horizontal sections, and then progress horizontally for increasingly l...
d
Input Files and WRTDS Model Output for the two major tributaries of Lake...
datasets.ai
s.cnmilf.com
+1more
55
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of the Interior (2023). Input Files and WRTDS Model Output for the two major tributaries of Lake Koocanusa: Mass Removal [Dataset]. https://datasets.ai/datasets/input-files-and-wrtds-model-output-for-the-two-major-tributaries-of-lake-koocanusa-mass-re
Explore at:
55Available download formats
Dataset updated
May 31, 2023
Dataset authored and provided by
Department of the Interior
Area covered
Lake Koocanusa
Description
Canadian discrete water quality data and daily streamflow records were evaluated using the Weighted Regression on Time, Discharge, and Seasons (WRTDS) model implemented with the EGRET R package (Hirsch et al. 2010, Hirsch and De Cicco 2015). Models were used to estimate loads of solutes and evaluate trends for three constituents of interest (selenium, nitrogen, and sulfate). Six models were generated; one model for each of the three constituents of interest, in each of the two major tributaries to Lake Koocanusa: the Kootenay River at Fenwick (BC08NG0009), and the Elk River above Highway 93 Near Elko (BC08NK0003). Data were obtained by downloading data from the British Columbia Water Tool (https://kwt.bcwatertool.ca/surface-water-quality, https://kwt.bcwatertool.ca/streamflow) and Environment Climate Change Canada (https://open.canada.ca/data/en/dataset/c2adcb27-6d7e-4e97-b546-b8ee3d586aa4/resource/7bb8d1ff-f446-494f-8f3d-ad252162eef5?inner_span=True). This data release consists of two input data files and one output file from the EGRET model estimation (eList) which contains the WRTDS model, for each site and constituent. The input datasets include a daily discharge data file and a measured concentration data file. The period of record for the water quality data varies among the constituents and sites. Likewise, the output file time period aligns with the input files and varies among the 6 models. Nitrate in the Elk River at Highway 93 has the longest period of record from 1979 to 2022. Water quality sampling at the Fenwick station was discontinued in 2019, so all models for the Kootenay end after 2019. This data release also contains mass removal data provided by Teck Coal Limited, which were incorporated into a sub-analysis that used the WRTDS selenium model for the Elk River. This child item contains only the mass removal files. The WRTDS model was run at a daily time step. Model performance evaluations, including a visual assessment of model fit and residuals and bias correction factors were completed. Model output for each parameter at each site (6 total) is published here in an eLists (.rds file). The format of each eLists is standardized per EGRET processing. See Hirsch and De Cicco (2015) for description of these files. WRTDS_Kalman estimates can also be evaluated by running additional functions with the eLists published. To prevent redundancy they were excluded from this output. For the Kalman models nitrate specified a rho of 0.95 while the other models used the default (0.9). Citations: Hirsch, R.M., and De Cicco, L.A., 2015, User guide to Exploration and Graphics for RivEr Trends (EGRET) and dataRetrieval—R packages for hydrologic data (version 2.0, February 2015): U.S. Geological Survey Techniques and Methods book 4, chap. A10, 93 p., http://dx.doi.org/10.3133/tm4A10. Hirsch, R.M., Moyer, D.L., and Archfield, S.A., 2010, Weighted Regressions on Time, Discharge, and Season (WRTDS), With an Application to Chesapeake Bay River Inputs: Journal of the American Water Resources Association (JAWRA), v. 46, no. 5, 857-880 p., DOI: http://dx.doi.org/10.1111/j.1752-1688.2010.00482.x.

Facebook

Twitter

Click to copy link

Link copied

Cite

Raisa Hernández-Pacheco; Alexandra Bland (2023). Module M.1 R basics for data exploration and management [Dataset]. http://doi.org/10.25334/M9B9-8073

Module M.1 R basics for data exploration and management

Explore at:

Unique identifier

https://doi.org/10.25334/M9B9-8073

Dataset updated

Jun 26, 2023

Dataset provided by

QUBES

Authors

Raisa Hernández-Pacheco; Alexandra Bland

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Introduction to Primate Data Exploration and Linear Modeling with R was created with the goal of providing training to undergraduate biology students on data management and statistical analysis using authentic data of Cayo Santiago rhesus macaques. Module M.1 introduces basic functions from R, as well as from its package tidyverse, for data exploration and management.

Clear search

Close search

Google apps

Main menu

Module M.1 R basics for data exploration and management

Additional file 1 of cytoviewer: an R/Bioconductor package for interactive...

R Package History on CRAN

Context

Content

Column Description

Acknowledgements

Data from: Chronospaces: an R package for the statistical exploration of...

Data from: Chronospaces: an R package for the statistical exploration of divergence times reveals extreme dependence on molecular clocks and gene choice

Description of the data and file structure

mixOmics: An R package for ‘omics feature selection and multiple data...

Palmer Penguins

Original dataset: https://github.com/allisonhorst/palmerpenguins

palmerpenguins

Installation

About the data

A unified framework for unconstrained and constrained ordination of...

Slakestable : R package to explore raw data from the Slakes app

IEAtools R package

Module M.3 Visualizing data with ggplot2

WIDEa: a Web Interface for big Data exploration, management and analysis

Child 1: Nutrient and streamflow model-input data

Code and data: Exploring congruent diversification histories with...

Additional file 4 of tRigon: an R package and Shiny App for integrative...

Introduction to Primate Data Exploration and Linear Modeling with R

Data from: Streamflow, Dissolved Organic Carbon, and Nitrate Input Datasets...

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

funspace: an R package to build, analyze and plot functional trait spaces

Data from: Vertical exploration and dimensional modularity in mice

Input Files and WRTDS Model Output for the two major tributaries of Lake...

Module M.1 R basics for data exploration and management