Included are survey data sets and .R script files necessary to replicate all tables and figures. Tables will display in the R console. Figures will save as .pdf files ot your working directory. Instructions for Replication: These materials will allow for replication in R. You can download data files in .R or .tab format. Save all files in a common folder (directory). Open the .R script file named “jop_replication_dataverse2.R” and change the working directory at the top of the script to the directory where you saved the replication materials. Execute the code in this script file to generate all tables and figures displayed in the manuscript. The script is annotated. Take care to execute the appropriate lines when loading data sets depending on whether you downloaded the data in .R or .tab format (the script is written to accommodate both formats). Note: the files "results.diff_rep.Rdata" and "results.diff2.Rdata" are R list objects and can only be opened in R. Should you encounter any problems or have any questions, please contact the author at jmummolo@stanford.edu.
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Simulated expression data with knock-outs
Description
A dataset containing simulated expression dataset. Data is simulated using a dynamical systems model from a network sampled from the S. Cerevisiae regulatory network. The dataset is a list containing the results from the simulation, and other information generated subsequently.
Format
A named list with 14 elements:
simitra numeric, indicating the iteration of the simulation (a total of 1000 were performed and 812 converged) scoresan S4 Matrix, containing vectorised inference scores of applying the methods implemented in the package. These are precomputed predictions inputmodelsa named list, storing the parameters used to sample the initial values of input genes. Proportions, means and variances of each gene is stored for each gene staticnetan igraph object, storing the initial regulatory network (150 node network) infnetan igraph object, representing the true differential network as determined using sensitivity analysis of the model netlayouta matrix (150 x 2), storing the (x, y) positions of nodes for laying out the graph infdensa numeric, network density of the true differential association network numinputa numeric, the number of input genes in the regulatory network. These are genes that have no regulators therefore need to be pre-defined numbimodala numeric, the number of input genes that are knocked-down therefore have a bimodal distribution numtfsa numeric, the number of genes in the network that regulate any other gene (are TFs) numcotargetsa numeric, the number of genes that are co-regulated, i.e. regulated by more than one TF dataan S4 Matrix, the expression data with samples along the columns and genes along the rows. Condition classification (KD vs WT) are stored as attributes of this object tripletsa data frame, consisting of gene triplets representing TF- Target associations conditioned on the gene knocked-down. Triplets are annotated for being in either the direct, influence and association networks sensmatan S4 Matrix, sensitivities of genes to TFs based on perturbation analysis of the simulation modelLoadThis dataset is in the form of an R RDS object. To load it, type the command below in an R console:simdata = readRDS("sim812.rds")
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Categorical scatterplots with R for biologists: a step-by-step guide
Benjamin Petre1, Aurore Coince2, Sophien Kamoun1
1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK
Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.
Protocol
• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.
• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.
• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.
Notes
• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.
• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.
replicates
graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()
References
Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.
Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Included are survey data sets and .R script files necessary to replicate all tables and figures. Tables will display in the R console. Figures will save as .pdf files ot your working directory. Instructions for Replication: These materials will allow for replication in R. You can download data files in .R or .tab format. Save all files in a common folder (directory). Open the .R script file named “jop_replication_dataverse2.R” and change the working directory at the top of the script to the directory where you saved the replication materials. Execute the code in this script file to generate all tables and figures displayed in the manuscript. The script is annotated. Take care to execute the appropriate lines when loading data sets depending on whether you downloaded the data in .R or .tab format (the script is written to accommodate both formats). Note: the files "results.diff_rep.Rdata" and "results.diff2.Rdata" are R list objects and can only be opened in R. Should you encounter any problems or have any questions, please contact the author at jmummolo@stanford.edu.