Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
File List nonlinear_regression.R Description The "nonlinear_regression.R" program provides a short example (with data) of one way to perform nonlinear regression in R (version 2.8.1). This example is not meant to provide extensive information on or training in programming in R, but rather is meant to serve as a starting point for performing nonlinear regression in R. R is a free statistical computing and graphics program that may be run on of UNIX platforms, Windows and MacOS. R may be downloaded here: http://www.r-project.org/.
There are several good
resources for learning how to program and perform extensive statistical
analyses in R, including:
Benjamin M. Bolker. Ecological Models and Data in R. Princeton
University Press, 2008. ISBN 978-0-691-12522-0. [
http://www.zoology.ufl.edu/bolker/emdbook/ ]
Other references are provided at http://www.r-project.org/ under
“Documentation” and “Books”.
Facebook
TwitterFunctional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an archive of the data contained in the "Transformations" section in PubChem for integration into patRoon and other workflows.
For further details see the ECI GitLab site: README and main "tps" folder.
Credits:
Concepts: E Schymanski, E Bolton, J Zhang, T Cheng;
Code (in R): E Schymanski, R Helmus, P Thiessen
Transformations: E Schymanski, J Zhang, T Cheng and many contributors to various lists!
PubChem infrastructure: PubChem team
Reaction InChI (RInChI) calculations (v1.0): Gerd Blanke (previous versions of these files)
Acknowledgements: ECI team who contributed to related efforts, especially: J. Krier, A. Lai, M. Narayanan, T. Kondic, P. Chirsir, E. Palm. All contributors to the NORMAN-SLE transformations!
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
File List glmmeg.R: R code demonstrating how to fit a logistic regression model, with a random intercept term, to randomly generated overdispersed binomial data. boot.glmm.R: R code for estimating P-values by applying the bootstrap to a GLMM likelihood ratio statistic. Description glmm.R is some example R code which show how to fit a logistic regression model (with or without a random effects term) and use diagnostic plots to check the fit. The code is run on some randomly generated data, which are generated in such a way that overdispersion is evident. This code could be directly applied for your own analyses if you read into R a data.frame called “dataset”, which has columns labelled “success” and “failure” (for number of binomial successes and failures), “species” (a label for the different rows in the dataset), and where we want to test for the effect of some predictor variable called “location”. In other cases, just change the labels and formula as appropriate. boot.glmm.R extends glmm.R by using bootstrapping to calculate P-values in a way that provides better control of Type I error in small samples. It accepts data in the same form as that generated in glmm.R.
Facebook
TwitterThis dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.
Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.
Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.
Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.
Methods eLAB Development and Source Code (R statistical software):
eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).
eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.
Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.
The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).
Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.
Data Dictionary (DD)
EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.
Study Cohort
This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.
Statistical Analysis
OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.
Facebook
TwitterOn-road chase and PEMS measurement data while following traffic. Time-averaged to assess the emission rate of the followed vehicle, including the presence of a PEMS to measure direct tailpipe exhaust. This dataset is associated with the following publication: Snow, R., J. Faircloth, R. Baldauf, B. Yand, M. Zhang, P. Deshmukh, and X. Zhang. On-road Emissions and Chemical Transformation of Nitrogen Oxides. ATMOSPHERIC ENVIRONMENT. Elsevier Science Ltd, New York, NY, USA, 22, (2017).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
To get the consumption model from Section 3.1, one needs load execute the file consumption_data.R. Load the data for the 3 Phases ./data/CONSUMPTION/PL1.csv, PL2.csv, PL3.csv, transform the data and build the model (starting line 225). The final consumption data can be found in one file for each year in ./data/CONSUMPTION/MEGA_CONS_list.Rdata
To get the results for the optimization problem, one needs to execute the file analyze_data.R. It provides the functions to compare production and consumption data, and to optimize for the different values (PV, MBC,).
To reproduce the figures one needs to execute the file visualize_results.R. It provides the functions to reproduce the figures.
To calculate the solar radiation that is needed in the Section Production Data, follow file calculate_total_radiation.R.
To reproduce the radiation data from from ERA5, that can be found in data.zip, do the following steps: 1. ERA5 - download the reanalysis datasets as GRIB file. For FDIR select "Total sky direct solar radiation at surface", for GHI select "Surface solar radiation downwards", and for ALBEDO select "Forecast albedo". 2. convert GRIB to csv with the file era5toGRID.sh 3. convert the csv file to the data that is used in this paper with the file convert_year_to_grid.R
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterThis dataset includes a subset of previously released pesticide data (Morace and others, 2020) from the U.S. Geological Survey (USGS) National Water Quality Assessment Program (NAWQA) Regional Stream Quality Assessment (RSQA) project and the corresponding hazard index results calculated using the R package toxEval, which are relevant to Mahler and others, 2020. Pesticide and transformation products were analyzed at the USGS National Water Quality Laboratory in Denver, Colorado. Files are grouped as pesticides (parent compounds), transformation products (degradate compounds), compounds with no Acute Invertebrate (AI) benchmarks, compounds with no Acute Non-Vascular Plant (ANVP) benchmarks, and compounds not evaluated through the toxEval R program. See Morace and others, 2020 for corresponding quality assurance or quality control data.
Facebook
TwitterInstructions on how to use the data can be found within the repository.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The C2Metadata (“Continuous Capture of Metadata”) Project automates one of the most burdensome aspects of documenting the provenance of research data: describing data transformations performed by statistical software. Researchers in many fields use statistical software (SPSS, Stata, SAS, R, Python) for data transformation and data management as well as analysis. Scripts used with statistical software are translated into an independent Structured Data Transformation Language (SDTL), which serves as an intermediate language for describing data transformations. SDTL can be used to add variable-level provenance to data catalogs and codebooks and to create “variable lineages” for auditing software operations. This repository provides examples of scripts and metadata for use in testing C2Metadata tools.
Facebook
TwitterScenario
The director of marketing at Cyclists, a bike-share company in Chicago, believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclists bikes differently and design a new marketing strategy to convert casual riders into annual members.
Objective/Purpose
Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. The manager and her team are interested in analyzing the Cyclists historical bike trip data to identify trends.
Business Task Identify trends to better understand user purchase behavior and recommend marketing strategies to convert casual riders into annual members.
Data sources used We have used Cyclistic’s historical trip data to analyze and identify trends. We will use 12 months of Cyclistic trip data from January 2021 to December 2021. This is public data that we will use to explore how different customer types are using Cyclists' bikes.
Documentation of any cleaning or manipulation of data The dataset from January 2021 to December 2021 is more than 1 GB in size. So it'll be somewhat difficult to perform data manipulation and transformation in spreadsheets because of the size of the file. So we can use SQL or R as they're comparatively more capable to handle heavier files. We'll be using R to perform the above-mentioned actions. sSo I have prepare the analysis using only First Quarter i.e from January-March'21.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created during the research carrried out for the PhD of Negin Afsharzadeh and the subsequent manuscript arising from this research. The main purpose of this dataset is to create a record of the raw data that was used in the analyses in the manuscript.
This dataset includes:
In this study, we aimed to optimize approaches to improve the biotechnological production of important metabolites in G. glabra. The study is made up of four experiments that correspond to particular figures/tables in the manuscript and data, as described below.
We tested approaches for the cultivation of G. glabra, specifically the breaking of seed dormancy, to ensure timely and efficient seed germination. To do this, we tested the effect of different pretreatments, sterilization treatments and growth media on the germination success of G. glabra.
This experiment corresponds to:
We aimed to optimize the induction of hairy roots in G. glabra. Four strains of R. rhizogenes were tested to identify the most effective strain for inducing hairy root formation and we tested different tissue explants (cotyledons/hypocotyls) and methods of R. rhizogenes infection (injection or soaking for different durations) in these tissues.
This experiment corresponds to:
Eight distinct hairy root lines were established and the growth rate of these lines was measured over 40 days.
This experiment corresponds to:
We aimed to test different qualities of light on hairy root cultures in order to induce higher growth and possible enhanced metabolite production. A line with a high growth rate from experiment 3, line S, was selected for growth under different light treatments: red light, blue light, and a combination of blue and red light. To assess the overall impact of these treatments, the growth of line S, as well as the increase in antioxidant capacity and total phenolic content, were tracked over this induction period.
This experiment corresponds to:
To work with the .R file and the R datasets, it is necessary to use R: A Language and Environment for Statistical Computing and a package within R, aDHARMA. The versions used for the analyses are R version 4.4.1 and aDHARMA version 0.4.6.
The references for these are:
R Core Team, R: A Language and Environment for Statistical Computing 2024. https://www.R-project.org/
Hartig F, DHARMa: Residual Diagnostics for Hierarchical (Multi-Level/Mixed) Regression Models 2022. https://CRAN.R-project.org/package=DHARMa
Facebook
TwitterDifferential Coexpression ScriptThis script contains the use of previously normalized data to execute the DiffCoEx computational pipeline on an experiment with four treatment groups.differentialCoexpression.rNormalized Transformed Expression Count DataNormalized, transformed expression count data of Medicago truncatula and mycorrhizal fungi is given as an R data frame where the columns denote different genes and rows denote different samples. This data is used for downstream differential coexpression analyses.Expression_Data.zipNormalization and Transformation of Raw Count Data ScriptRaw count data is transformed and normalized with available R packages and RNA-Seq best practices.dataPrep.rRaw_Count_Data_Mycorrhizal_FungiRaw count data from HtSeq for mycorrhizal fungi reads are later transformed and normalized for use in differential coexpression analysis. 'R+' indicates that the sample was obtained from a plant grown in the presence of both mycorrhizal fungi and rhizobia. 'R-' indicate...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains several files related to our research paper titled "Attention Allocation to Projection Level Alleviates Overconfidence in Situation Awareness". These files are intended to provide a comprehensive overview of the data analysis process and the presentation of results. Below is a list of the files included and a brief description of each:
R Scripts: These are scripts written in the R programming language for data processing and analysis. The scripts detail the steps for data cleaning, transformation, statistical analysis, and the visualization of results. To replicate the study findings or to conduct further analyses on the dataset, users should run these scripts.
R Markdown File: Offers a dynamic document that combines R code with rich text elements such as paragraphs, headings, and lists. This file is designed to explain the logic and steps of the analysis in detail, embedding R code chunks and the outcomes of code execution. It serves as a comprehensive guide to understanding the analytical process behind the study.
HTML File: Generated from the R Markdown file, this file provides an interactive report of the results that can be viewed in any standard web browser. For those interested in browsing the study's findings without delving into the specifics of the analysis, this HTML file is the most convenient option. It presents the final analysis outcomes in an intuitive and easily understandable manner. For optimal viewing, we recommend opening the HTML file with the latest version of Google Chrome or any other modern web browser. This approach ensures that all interactive functionalities are fully operational.
Together, these files form a complete framework for the research analysis, aimed at enhancing the transparency and reproducibility of the study.
Facebook
TwitterEximpedia Export import trade data lets you search trade data and active Exporters, Importers, Buyers, Suppliers, manufacturers exporters from over 209 countries
Facebook
TwitterThis data package is associated with the publication "Organic Matter Transformations are Disconnected Between Surface Water and the Hyporheic Zone" submitted to Biogeosciences (Stegen et al., 2022). The study aims to understand how the diversity of OM transformations varies across surface and subsurface components of river corridors using inland surface water and sediments collected along river corridors across the contiguous United States. Sediment extracts and water samples were analyzed using ultrahigh resolution Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS). This dataset is comprised of one folder (WHONDR_S19S) which contains (1) a subfolder with R scripts used to process the data and to calculate biochemical transformations, (2) processed FTICR-MS data as csv files, sample collection metadata and climate data as csv files, (3) biochemical transformations profile, classifications and database as csv files, and (4) a readme file with more information regarding WHONDRS raw FTICR-MS data and processing scripts. Outside of the main folders there is a csv containing file-level metadata and a csv data dictionary defining column headers for all csv files contained in the data package. The samples were part of a WHONDRS (https://whondrs.pnnl.gov) study. The raw, unprocessed FTICR-MS data with additional data can be found at doi:10.15485/1729719 for sediments and doi:10.15485/1603775 for water. This data package contains the processed data used in the associated manuscript.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Facebook
TwitterThe data used in this analysis was obtained from published literature and available through the high-throughput toxicokinetic (HTTK) R package. The dataset consists of 1486 chemicals that span a variety of use classes including pharmaceuticals, food-use chemicals, pesticides and industrial chemicals of which 1139 chemicals had experimental human in vitro fraction unbound data and 642 chemicals that had experimental human in vitro intrinsic clearance data. Structures were curated and obtained from the DSSTox database. The distribution of experimental values for fraction unbound and intrinsic clearance is shown in Supplementary Figure S1. Since the data were non-normally distributed they were appropriately transformed before any analysis was conducted. The details of the transformation and the transformed data distribution are presented in the results section and Supplementary Figures S2 and S3. A complete list of chemicals with CAS registry numbers (CASRN), DSSTox generic substance IDs (DTXSIDs), structure and experimental data for both parameters are included as supplemental data (1.ChemicalListData.csv and 1.ChemicalList-QSARready.sdf). This dataset is associated with the following publication: Pradeep, P., G. Patlewicz, R. Pearce, J. Wambaugh, B. Wetmore, and R. Judson. Using Chemical Structure Information to Develop Predictive Models for In Vitro Toxicokinetic Parameters to Inform High-throughput Risk-assessment. Computational Toxicology. Elsevier B.V., Amsterdam, NETHERLANDS, 16: 100136, (2020).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
File List nonlinear_regression.R Description The "nonlinear_regression.R" program provides a short example (with data) of one way to perform nonlinear regression in R (version 2.8.1). This example is not meant to provide extensive information on or training in programming in R, but rather is meant to serve as a starting point for performing nonlinear regression in R. R is a free statistical computing and graphics program that may be run on of UNIX platforms, Windows and MacOS. R may be downloaded here: http://www.r-project.org/.
There are several good
resources for learning how to program and perform extensive statistical
analyses in R, including:
Benjamin M. Bolker. Ecological Models and Data in R. Princeton
University Press, 2008. ISBN 978-0-691-12522-0. [
http://www.zoology.ufl.edu/bolker/emdbook/ ]
Other references are provided at http://www.r-project.org/ under
“Documentation” and “Books”.