Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Agencies are increasingly called upon to implement their natural resource management programs within an adaptive management (AM) framework. This article provides the background and motivation for the R package, AMModels. AMModels was developed under R version 3.2.2. The overall goal of AMModels is simple: To codify knowledge in the form of models and to store it, along with models generated from numerous analyses and datasets that may come our way, so that it can be used or recalled in the future. AMModels facilitates this process by storing all models and datasets in a single object that can be saved to an .RData file and routinely augmented to track changes in knowledge through time. Through this process, AMModels allows the capture, development, sharing, and use of knowledge that may help organizations achieve their mission. While AMModels was designed to facilitate adaptive management, its utility is far more general. Many R packages exist for creating and summarizing models, but to our knowledge, AMModels is the only package dedicated not to the mechanics of analysis but to organizing analysis inputs, analysis outputs, and preserving descriptive metadata. We anticipate that this package will assist users hoping to preserve the key elements of an analysis so they may be more confidently revisited at a later date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It is always a struggle to find suitable datasets with which to teach, especially across domain expertise. There are many packages that have data, but finding them and knowing what is in them is a struggle due to inadequate documentation. Here we have compiled a search-able database of dataset metadata taken from R packages on CRAN.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
CRAN packages dataset
R and Rmd source codes for CRAN packages. The dataset has been constructed using the following steps:
Downloaded latest version from all packages on CRAN (see last updated). The source code has been downloaded from the GitHub mirror. Identified the licenses from each package from their DESCRIPTION file, and classified each of them into some license_code. See the licenses.csv file. Extract R and Rmd source files from all packages and joined with the package… See the full description on the dataset page: https://huggingface.co/datasets/dfalbel/cran-packages.
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Datasets from the WallOmics project. Contains phenomics, metabolomics, proteomics and transcriptomics data collected from two organs of five ecotypes of the model plant Arabidopsis thaliana exposed to two temperature growth conditions. Exploratory and integrative analyses of these data are presented in Durufle et al (2020) (doi:10.1093/bib/bbaa166) and Durufle et al (2020) (doi:10.3390/cells9102249).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This example dataset is used to illustrate the usage of the R package survtd in the Supplementary Materials of the paper:Moreno-Betancur M, Carlin JB, Brilleman SL, Tanamas S, Peeters A, Wolfe R (2017). Survival analysis with time-dependent covariates subject to measurement error and missing data: Two-stage joint model using multiple imputation (submitted).The data was generated using the simjm function of the package, using the following code:dat
This software code was developed to estimate the probability that individuals found at a geographic location will belong to the same genetic cluster as individuals at the nearest empirical sampling location for which ancestry is known. POPMAPS includes 5 main functions to calculate and visualize these results (see Table 1 for functions and arguments). Population assignment coefficients and a raster surface must be estimated prior to using POPMAPS functions (see Fig. 1a and b). With these data in hand, users can run a jackknife function to choose an optimal parameter combination that reconstructs empirical data best (Figs. 2 and S2). Pertinent parameters include 1) how many empirical sampling localities should be used to estimate ancestry coefficients and 2) what is the influence of empirical sites on ancestry coefficient estimation as distance increases (Fig. 2). After choosing these parameters, a user can estimate the entire ancestry probability surface (Fig. 1c and d, Fig. 3). This package can be used to estimate ancestry coefficients from empirical genetic data across a user-defined geospatial layer. Estimated ancestry coefficients are used to calculate ancestry probabilities, which together with 'hard population boundaries,' compose an ancestry probability surface. Within a hard boundary, the ancestry probability informs a user of the confidence that they can have of genetic identity matching the principal population if they were to find individuals of the focal organism at a location. Confidence can be modified across the ancestry probability surface by changing parameters influencing the contribution of empirical data to the estimation of ancestry coefficients. This information may be valuable to inform decision-making for organisms having management needs. See 'Related External Resources, Type: Source Code' below for direct access to the POPMAPS R software package.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data input and network plotting functionality from NMA R packages gemtc, pcnetmeta and netmeta.
The simulated community datasets were built using the virtualspecies V1.5.1 R package (Leroy et al., 2016), which generates spatially-explicit presence/absence matrices from habitat suitability maps. We simulated these suitability maps using Gaussian fields neutral landscapes produced using the NLMR V1.0 R package (Sciaini et al., 2018). To allow for some level of overlap between species suitability maps, we divided the γ-diversity (i.e., the total number of simulated species) by an adjustable correlation value to create several species groups that share suitability maps. Using a full factorial design, we developed 81 presence/absence maps varying across four axes (see Supplemental Table 1 and Supplemental Figure 1): 1) landscape size, representing the number of sites in the simulated landscape; 2) γ-diversity; 3) the level of correlation among species suitability maps, with greater correlations resulting in fewer shared species groups among suitability maps; and 4) the habitat suitabil...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication package for the paper titled "Towards a Taxonomy of Roxygen Documentation in R Packages"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Rvisdiff is an R/Bioconductor package that generates an interactive interface for the interpretation of differential expression results. It creates a local web page that enables the exploration of statistical analysis results through the generation of auto-analytical visualizations. Users can explore the differential expression results and the source expression data interactively in the same view. As input, the package supports the results of popular differential expression packages such as DESeq2, edgeR, and limma. As output, the package generates a local HTML page that can be easily viewed in a web browser. Rvisdiff is freely available at https://bioconductor.org/packages/Rvisdiff/.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Ecological processes and biodiversity patterns are strongly affected by how animals move through the landscape. However, it remains challenging to predict animal movement and space use. Here we present our new R package enerscape to quantify and predict animal movement in real landscapes based on energy expenditure.
Enerscape integrates a general locomotory model for terrestrial animals with GIS tools in order to map energy costs of movement in a given environment, resulting in energy landscapes that reflect how energy expenditures may shape habitat use. Enerscape only requires topographic data (elevation) and the body mass of the studied animal. To illustrate the potential of enerscape, we analyze the energy landscape for the Marsican bear (Ursus arctos marsicanus) in a protected area in central Italy in order to identify least-cost paths and high-connectivity areas with low energy costs of travel.
Enerscape allowed us to identify travel routes for the bear that minimize energy costs of movement and regions that have high landscape connectivity based on movement efficiency, highlighting potential corridors. It also identifies areas where high energy costs may prevent movement and dispersal, potentially exacerbating human-wildlife conflicts in the park. A major strength of enerscape is that it requires only widely available topographic and body size data. As such, enerscape permits a first cost-effective way to estimate landscape use and movement corridors even when telemetry data is not readily available, such as for the example with the bear.
Enerscape is built in a modular way and other movement modes and ecosystem types can be implemented when appropriate locomotory models are available. In summary, enerscape is a new general tool that quantifies, using minimal and widely available data, the energy costs of moving through a landscape. This can clarify how and why animals move in real landscapes and inform practical conservation and restoration decisions.
Methods This data repository contains only the shapefiles and javascript code that were not publicly available, but needed to reproduce the analysis of the linked article. All other publicly available data sources, which were not included in this data repository, were:
Digital elevation model (DEM) for Italy was obtained from TINITALY (http://tinitaly.pi.ingv.it/).
Sirente-Velino shapefile from Protected Planet (https://www.protectedplanet.net/en/search-areas?search_term=sirente-velino+regional+park&geo_type=site).
DEM and Tree cover density for Denmark was obtained from the Danish National database: https://download.kortforsyningen.dk/content/dhm-2007terr%C3%A6n-10-m-grid and https://download.kortforsyningen.dk/content/treecoverdensity-tcd.
NDVI was obtained from Sentinel-2 imagery accessed through Google Eearth Engine: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2.
L'Eroica shapefile was obtained from the official website of the event: https://eroica.cc/en/gaiole/permanent-route.
GPS records of horses and cattle are under embargo for one year. For more information contact emilio.berti@idiv.de.
marianna13/R-packages dataset hosted on Hugging Face and contributed by the HF Datasets community
This archive contains code and data for reproducing the analysis for “Replication Data for Revisiting ‘The Rise and Decline’ in a Population of Peer Production Projects”. Depending on what you hope to do with the data you probabbly do not want to download all of the files. Depending on your computation resources you may not be able to run all stages of the analysis. The code for all stages of the analysis, including typesetting the manuscript and running the analysis, is in code.tar. If you only want to run the final analysis or to play with datasets used in the analysis of the paper, you want intermediate_data.7z or the uncompressed tab and csv files. The data files are created in a four-stage process. The first stage uses the program “wikiq” to parse mediawiki xml dumps and create tsv files that have edit data for each wiki. The second stage generates all.edits.RDS file which combines these tsvs into a dataset of edits from all the wikis. This file is expensive to generate and at 1.5GB is pretty big. The third stage builds smaller intermediate files that contain the analytical variables from these tsv files. The fourth stage uses the intermediate files to generate smaller RDS files that contain the results. Finally, knitr and latex typeset the manuscript. A stage will only run if the outputs from the previous stages do not exist. So if the intermediate files exist they will not be regenerated. Only the final analysis will run. The exception is that stage 4, fitting models and generating plots, always runs. If you only want to replicate from the second stage onward, you want wikiq_tsvs.7z. If you want to replicate everything, you want wikia_mediawiki_xml_dumps.7z.001 wikia_mediawiki_xml_dumps.7z.002, and wikia_mediawiki_xml_dumps.7z.003. These instructions work backwards from building the manuscript using knitr, loading the datasets, running the analysis, to building the intermediate datasets. Building the manuscript using knitr This requires working latex, latexmk, and knitr installations. Depending on your operating system you might install these packages in different ways. On Debian Linux you can run apt install r-cran-knitr latexmk texlive-latex-extra. Alternatively, you can upload the necessary files to a project on Overleaf.com. Download code.tar. This has everything you need to typeset the manuscript. Unpack the tar archive. On a unix system this can be done by running tar xf code.tar. Navigate to code/paper_source. Install R dependencies. In R. run install.packages(c("data.table","scales","ggplot2","lubridate","texreg")) On a unix system you should be able to run make to build the manuscript generalizable_wiki.pdf. Otherwise you should try uploading all of the files (including the tables, figure, and knitr folders) to a new project on Overleaf.com. Loading intermediate datasets The intermediate datasets are found in the intermediate_data.7z archive. They can be extracted on a unix system using the command 7z x intermediate_data.7z. The files are 95MB uncompressed. These are RDS (R data set) files and can be loaded in R using the readRDS. For example newcomer.ds <- readRDS("newcomers.RDS"). If you wish to work with these datasets using a tool other than R, you might prefer to work with the .tab files. Running the analysis Fitting the models may not work on machines with less than 32GB of RAM. If you have trouble, you may find the functions in lib-01-sample-datasets.R useful to create stratified samples of data for fitting models. See line 89 of 02_model_newcomer_survival.R for an example. Download code.tar and intermediate_data.7z to your working folder and extract both archives. On a unix system this can be done with the command tar xf code.tar && 7z x intermediate_data.7z. Install R dependencies. install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). On a unix system you can simply run regen.all.sh to fit the models, build the plots and create the RDS files. Generating datasets Building the intermediate files The intermediate files are generated from all.edits.RDS. This process requires about 20GB of memory. Download all.edits.RDS, userroles_data.7z,selected.wikis.csv, and code.tar. Unpack code.tar and userroles_data.7z. On a unix system this can be done using tar xf code.tar && 7z x userroles_data.7z. Install R dependencies. In R run install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). Run 01_build_datasets.R. Building all.edits.RDS The intermediate RDS files used in the analysis are created from all.edits.RDS. To replicate building all.edits.RDS, you only need to run 01_build_datasets.R when the int... Visit https://dataone.org/datasets/sha256%3Acfa4980c107154267d8eb6dc0753ed0fde655a73a062c0c2f5af33f237da3437 for complete metadata about this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication Package for the paper "Technical Debt in the Peer-Review Documentation of R Packages: a rOpenSci Case Study" (MSR '21).
# Scripts: Data Collection and Processing
These are the scripts used to extract the data from _rOpenSci_. The following steps indicate how to use them.
1. Add all attached R files into an R project.
2. Install the following R packages. Moreover, the process also requires to have a working GitHub account, in order to obtain the corresponding token.
```{r}
library(dplyr)
library(stringr)
library(stringi)
library(jsonlite)
library(httpuv)
library(httr)
library(ggplot2)
library(tidyr)
```'
3. All the individual functions on the following files should be sourced into the R Environment: `getToken.R`, `comments.R`, `issues.R`, and `tagging.R`.
4. Run the script located on the file `process.R`. This will run all the previous functions in the corresponding order.
# Datasets
The following files are included:
-Dataset_1-100_Author1.xlsx contains the randomly selected 100 comments that were classified according to TD types by Author 1.
-Dataset_1-100_Author2.xlsx contains the randomly selected 100 comments that were classified according to TD types by Author 2 and the combined classification (in blue) after discussion.
-Dataset_Phrases_Both.xlsx contains the randomly selected 358 comments (resulting in 602 phrases) that were classified according to TD types by both authors 1 and 2. Their classification was incorporated into a single spreadsheet side by side for easy comparison. Disagreement was discussed and final classification is in the “Agreement” field.
-UserRoles.csv contains the user roles associated with the 600 phrases. The “comment_id” is the unique identifier for the comment from which the phrase is extracted. The phrase is represented in the “statement” field. The “agreement” field shows the final technical debt label after the analysis by two of the authors. The user roles are shown in the “user_role” column.
This is the optional data that be downloaded from within the R packages 'freesurferformats' and 'fsbrain'. See publication: https://doi.org/10.1101/2020.09.18.302935
Due to CRAN limits, this data cannot be stored in the package. The author therefore stores this on a private server, which is not optimal. This uploads serves as a backup and an alternate way to access the data, e.g., for future maintainers of the software.
Note that the files in directories 'subjects_dir/fsaverage' and 'subjects_dir/fsaverage3' are part of FreeSurfer6 and distributed under the FreeSurfer license.
yanmingyu/r-packages-tot dataset hosted on Hugging Face and contributed by the HF Datasets community
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Enriched electronic health records (EHRs) contain crucial information related to disease progression, and this information can help with decision-making in the health care field. Data analytics in health care is deemed as one of the essential processes that help accelerate the progress of clinical research. However, processing and analyzing EHR data are common bottlenecks in health care data analytics. The dxpr R package provides mechanisms for integration, wrangling, and visualization of clinical data, including diagnosis and procedure records. First, the dxpr package helps users transform International Classification of Diseases (ICD) codes to a uniform format. After code format transformation, the dxpr package supports four strategies for grouping clinical diagnostic data. For clinical procedure data, two grouping methods can be chosen. After EHRs are integrated, users can employ a set of flexible built-in querying functions for dividing data into case and control groups by using specified criteria and splitting the data into before and after an event based on the record date. Subsequently, the structure of integrated long data can be converted into wide, analysis-ready data that are suitable for statistical analysis and visualization. We conducted comorbidity data processes based on a cohort of newborns from Medical Information Mart for Intensive Care-III (n = 7,833) by using the dxpr package. We first defined patent ductus arteriosus (PDA) cases as patients who had at least one PDA diagnosis (ICD, Ninth Revision, Clinical Modification [ICD-9-CM] 7470*). Controls were defined as patients who never had PDA diagnosis. In total, 381 and 7,452 patients with and without PDA, respectively, were included in our study population. Then, we grouped the diagnoses into defined comorbidities. Finally, we observed a statistically significant difference in 8 of the 16 comorbidities among patients with and without PDA, including fluid and electrolyte disorders, valvular disease, and others.
Functions and data tables for simulation and statistical analysis of chemical toxicokinetics ("TK") as in Pearce et al. (2017) . Chemical-specific in vitro data have been obtained from relatively high throughput experiments. Both physiologically-based ("PBTK") and empirical (e.g., one compartment) "TK" models can be parameterized for several hundred chemicals and multiple species. These models are solved efficiently, often using compiled (C-based) code. This dataset is associated with the following publication: Pearce , R., C. Strope , W. Setzer , N. Sipes , and J. Wambaugh. (Journal of Statistical Software) HTTK: R Package for High-Throughput Toxicokinetics. Journal of Statistical Software. American Statistical Association, Alexandria, VA, USA, 79(4): 1-26, (2017).
I introduce an open-source R package ‘dcGOR’ to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The R codes for generating results in "lab: An R package for generating analysis-ready data from laboratory records" Sync from https://github.com/DHLab-TSENG/lab-paper/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Agencies are increasingly called upon to implement their natural resource management programs within an adaptive management (AM) framework. This article provides the background and motivation for the R package, AMModels. AMModels was developed under R version 3.2.2. The overall goal of AMModels is simple: To codify knowledge in the form of models and to store it, along with models generated from numerous analyses and datasets that may come our way, so that it can be used or recalled in the future. AMModels facilitates this process by storing all models and datasets in a single object that can be saved to an .RData file and routinely augmented to track changes in knowledge through time. Through this process, AMModels allows the capture, development, sharing, and use of knowledge that may help organizations achieve their mission. While AMModels was designed to facilitate adaptive management, its utility is far more general. Many R packages exist for creating and summarizing models, but to our knowledge, AMModels is the only package dedicated not to the mechanics of analysis but to organizing analysis inputs, analysis outputs, and preserving descriptive metadata. We anticipate that this package will assist users hoping to preserve the key elements of an analysis so they may be more confidently revisited at a later date.