Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Categorical scatterplots with R for biologists: a step-by-step guide
Benjamin Petre1, Aurore Coince2, Sophien Kamoun1
1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK
Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.
Protocol
• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.
• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.
• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.
Notes
• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.
• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.
replicates
graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()
References
Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.
Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This details the software code used in RStudio for analysing study (Evaluation of changes in some physico-chemical properties of bottled water exposed to sunlight in Bauchi State, Nigeria) data. The code is provided in .R, .docx and .pdf. For .R, it is accessible directly using RStudio. For .docx and .pdf, copy and paste the commands into RStudio. It includes codes for generating plots used in paper publication Note that you require the dataset used in the study which is accessible from the following DOI:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Paper 131. Please provide all the txt files in the working directory of the R tool and copy the contents of the rcodes.txt and paste it to command line of R tool. The R toll will generate the figures and copy them in the working directory
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The raw data file is available online for public access (https://data.ontario.ca/dataset/lake-simcoe-monitoring). Download the 1980-2019 csv files and open up the file named "Simcoe_Zooplankton&Bythotrephes.csv". Copy and paste the zooplankton sheet into a new excel file called "Simcoe_Zooplankton.csv". The column ZDATE in the excel file needs to be switched from GENERAL to SHORT DATE so that the dates in the ZDATE column read "YYYY/MM/DD". Save as .csv in appropriate R folder. The data file "simcoe_manual_subset_weeks_5" is the raw data that has been subset for the main analysis of the article using the .R file "Simcoe MS - 5 Station Subset Data". The .csv file produced from this must then be manually edited to remove data points that do not have 5 stations per sampling period as well as by combining data points that should fall into a single week. The "simcoe_manual_subset_weeks_5.csv" is then used for the calculation of variability, stabilization, asynchrony, and Shannon Diversity for each year in the .R file "Simcoe MS - 5 Station Calculations". The final .R file "Simcoe MS - 5 Station Analysis contains the final statistical analyses as well as code to reproduce the original figures. Data and code for main and supplementary analyses are also available on GitHub (https://github.com/reillyoc/ZPseasonalPEs).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CFA provides data feeds for Incidents, Total Fire Bans & Fire Danger Ratings and latest news.\r \r To receive CFA RSS feeds you need a feed reader and must also subscribe to CFA feeds.\r \r To subscribe to one of the CFA RSS feeds:\r \r Click on the RSS button next to the feed you want (or ctrl, click for Mac users).\r Copy the URL of the page that is displayed.\r Paste the address in the appropriate place in your feed reader.\r Important Note: Some third party readers will not refresh as frequently as is required for live updates. It is recommended that if your reader cannot be set to update at least every 5 minutes that you check the respective web pages (Incidents and Total Fire Bans & Fire Danger Ratings) for the most up to date version.\r \r These RSS feeds conforms to RSS 2.0 specification and update every minute.
Methods.3.FormsThis file contains all the raw data for this manuscriptMethods.MeansThis file has the precalculated means for each method at each time point. All of this can be calculated from the raw data file, but this makes things more convenient.Methods Executed ScriptThis is the analysis script for this manuscript. Please note that it is saved as a .txt file, though it is designed for use in R. Anyone wishing to repeat the analysis will need to either copy and paste this into an R window, or else save the file as an appropriate R script extension.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
###############################################################################
### Source code and problem instances:
###############################################################################
- The source code of METAFOR will be made available on github once the paper has been accepted for publication. In the meantime, the code is provided in METAFOR.zip
- The list of problem instances (single objective continuous functions) are provided in the attached file "instances.zip"
###############################################################################
###############################################################################
### Experiment folders:
###############################################################################
The folders "default", "leave25OUT", "leave25OUTCEC14", "leaveLDO" and "leaveLDOCEC14" (compressed in zip format to save space) contain all the data collected during the experiment. Each of them contains the following folders/files:
- "candidates/candidates.txt" -- a text file with the algorithms specified as command lines,
- "candidates/OUTPUT" -- a folder containing a text file with the best solutions found by each algorithm for each problem instance. The names of the files in the folder are composed of the test suite "c_<0,1,2,3>", the number of the function in the test suite "f_<0,...,n>", and the number of dimensions "d_<50,100,500,750,...,d>".
- "DataAndPlots" -- a folder created automatically by the "plot_bxp_rtd_wlx.sh" processing script (see below).
Inside this folder are the following subfolders:
- "DataAndPlots/Bxp" -- stores the box plots created based on the data in "DataAndPlots/Data";
- "DataAndPlots/Cvg" -- (if any)stores the convergence plots generated based on the data in "OUTPUT_processed";
- "DataAndPlots/Data" -- stores the processed data and statistical information (median, median error, statistical test, etc.) of the raw data stored in "candidates/OUTPUT";
- "DataAndPlots/Time" -- stores the average time taken by the algorithms on each problem instance.
- "DataAndPlots/Table" -- (if any) stores, in plain text and pseudo-LaTeX format, the tables of results reported in the paper, i.e., median, median error, median absolute deviation, rankings, statistical tests, and number of wins.
- "OUTPUT" -- stores the convergence data of the algorithms (i.e. function evaluations vs. solution quality).
In the latest version of METAFOR, each convergence file consists of 100 points; however, in the version we used for the experiments reported in the paper, each file consists of thousands of points per algorithm, making this folder particularly heavy.
The data in folder "candidates/OUTPUT" and "OUTPUT" is gathered via the script "runMe.sh" indicating an experiment folder and instances file, namely:
- in the case of "default", we solved instances "test_MIXTURE_max200.txt" and "test_MIXTURE_onlyLargeScale.txt";
- in the case of "leave25OUT", we solved instances "test_MIXTURE_max200.txt", "test_MIXTURE_onlyLargeScale.txt" and "test_MIXTURE_onlyLargeScaleDND.txt";
- in the case of "leave25OUTCEC14", we solved instances "test_CEC14.txt";
- in the case of "leaveLDO", we solved instances "test_MIXTURE_max200.txt", "test_MIXTURE_onlyLargeScale.txt" and "test_MIXTURE_onlyLargeScaleDND.txt";
- in the case of "leaveLDOCEC14", we solved instances "test_CEC14.txt".
###############################################################################
###############################################################################
### Processing scripts:
###############################################################################
The scripts folder (scripts.zip) contains the main processing script "plot_bxp_rtd_wlx.sh" and several auxiliary R and shell scripts: "boxplot.R", "cvg_log.R", "wilcoxon.R", "ranksPerClass.R", "filter_repeating.sh", "full_outer_join.sh", and "replace_na.sh". All the auxiliary scripts are automatically called by the main script, depending on the options specified by the user, and are intended to be used standalone. The R axiliary scripts are used to generate the box plots ("boxplot.R") and convergence plots ("cvg_log.R"), to perform the statistical test ("wilcoxon.R"), and to compute the rankings ("ranksPerClass.R"). The auxiliary shell scripts are used to clean the raw data stored in "OUTPUT" and create a file called data-*-mean.txt for each data file, which can be entered into "cvg_log.R" to generate the convergence plots.
###############################################################################
###############################################################################
### Folders with experiments:
###############################################################################
Since in the paper we report different sets of algorithms solving different sets of problems. We created a folder (compressed in zip format for space reasons) for each of them and put in it only the specific data that we want to analyze and plot. The data inside these folders is simply copy-paste from the main experiment folders, and it is as follows:
- "METAFOR/exp1_dftVStuned" contains the results discussed in section 5.3.1 of the paper.
- "METAFOR/exp2_mtfVSHyb" contains the results discussed in section 5.3.2 of the paper.
- "METAFOR/exp3_CEC14" and "METAFOR/exp4_LS" contain the results discussed in section 5.3.3 of the paper.
###############################################################################
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Categorical scatterplots with R for biologists: a step-by-step guide
Benjamin Petre1, Aurore Coince2, Sophien Kamoun1
1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK
Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.
Protocol
• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.
• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.
• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.
Notes
• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.
• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.
replicates
graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()
References
Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.
Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128