Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Categorical scatterplots with R for biologists: a step-by-step guide
Benjamin Petre1, Aurore Coince2, Sophien Kamoun1
1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK
Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.
Protocol
• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.
• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.
• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.
Notes
• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.
• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.
replicates
graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()
References
Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.
Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Civil and geological engineers have used field variable-head permeability tests (VH tests or slug tests) for over one century to assess the local hydraulic conductivity of tested soils and rocks. The water level in the pipe or riser casing reaches, after some rest time, a static position or elevation, z2. Then, the water level position is changed rapidly, by adding or removing some water volume, or by inserting or removing a solid slug. Afterward, the water level position or elevation z1(t) is recorded vs. time t, yielding a difference in hydraulic head or water column defined as Z(t) = z1(t) - z2. The water level at rest is assumed to be the piezometric level or PL for the tested zone, before drilling a hole and installing test equipment. All equations use Z(t) or Z*(t) = Z(t) / Z(t=0). The water-level response vs. time may be a slow return to equilibrium (overdamped test), or an oscillation back to equilibrium (underdamped test). This document deals exclusively with overdamped tests. Their data may be analyzed using several methods, known to yield different results for the hydraulic conductivity. The methods fit in three groups: group 1 neglects the influence of the solid matrix strain, group 2 is for tests in aquitards with delayed strain caused by consolidation, and group 3 takes into account some elastic and instant solid matrix strain. This document briefly explains what is wrong with certain theories and why. It shows three ways to plot the data, which are the three diagnostic graphs. According to experience with thousands of tests, most test data are biased by an incorrect estimate z2 of the piezometric level at rest. The derivative or velocity plot does not depend upon this assumed piezometric level, but can verify its correctness. The document presents experimental results and explains the three-diagnostic graphs approach, which unifies the theories and, most important, yields a user-independent result. Two free spreadsheet files are provided. The spreadsheet "Lefranc-Test-English-Model" follows the Canadian standards and is used to explain how to treat correctly the test data to reach a user-independent result. The user does not modify this model spreadsheet but can make as many copies as needed, with different names. The user can treat any other data set in a copy, and can also modify any copy if needed. The second Excel spreadsheet contains several sets of data that can be used to practice with the copies of the model spreadsheet. En génie civil et géologique, on a utilisé depuis plus d'un siècle les essais in situ de perméabilité à niveau variable (essais VH ou slug tests), afin d'évaluer la conductivité hydraulique locale des sols et rocs testés. Le niveau d'eau dans le tuyau ou le tubage prend, après une période de repos, une position ou élévation statique, z2. Ensuite, on modifie rapidement la position du niveau d'eau, en ajoutant ou en enlevant rapi-dement un volume d'eau, ou en insérant ou retirant un objet solide. La position ou l'élévation du niveau d'eau, z1(t), est alors notée en fonction du temps, t, ce qui donne une différence de charge hydraulique définie par Z(t) = z1(t) - z2. Le niveau d'eau au repos est supposé être le niveau piézométrique pour la zone testée, avant de forer un trou et d'installer l'équipement pour un essai. Toutes les équations utilisent Z(t) ou Z*(t) = Z(t) / Z(t=0). La réponse du niveau d'eau avec le temps peut être soit un lent retour à l'équilibre (cas suramorti) soit une oscillation amortie retournant à l'équilibre (cas sous-amorti). Ce document ne traite que des cas suramortis. Leurs données peuvent être analysées à l'aide de plusieurs méthodes, connues pour donner des résultats différents pour la conductivité hydraulique. Les méthodes appartiennent à trois groupes : le groupe 1 néglige l'influence de la déformation de la matrice solide, le groupe 2 est pour les essais dans des aquitards avec une déformation différée causée par la consolidation, et le groupe 3 prend en compte une certaine déformation élastique et instantanée de la matrice solide. Ce document explique brièvement ce qui est incorrect dans les théories et pourquoi. Il montre trois façons de tracer les données, qui sont les trois graphiques de diagnostic. Selon l'expérience de milliers d'essais, la plupart des données sont biaisées par un estimé incorrect de z2, le niveau piézométrique supposé. Le graphe de la dérivée ou graphe des vitesses ne dépend pas de la valeur supposée pour le niveau piézomé-trique, mais peut vérifier son exactitude. Le document présente des résultats expérimentaux et explique le diagnostic à trois graphiques, qui unifie les théories et donne un résultat indépendant de l'utilisateur, ce qui est important. Deux fichiers Excel gratuits sont fournis. Le fichier"Lefranc-Test-English-Model" suit les normes canadiennes : il sert à expliquer comment traiter correctement les données d'essai pour avoir un résultat indépendant de l'utilisateur. Celui-ci ne modifie pas ce...
Facebook
Twittertaken from https://yougov.co.uk/topics/entertainment/survey-results/daily/2023/03/21/6595b/3
Survey Question: When you’ve finished reading a physical copy of a book that you own, what are you most likely to do with it?
2979 adults from Great Britain Surveyed.
Data was intended to make a simple pie chart in Excel
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Categorical scatterplots with R for biologists: a step-by-step guide
Benjamin Petre1, Aurore Coince2, Sophien Kamoun1
1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK
Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.
Protocol
• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.
• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.
• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.
Notes
• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.
• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.
replicates
graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()
References
Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.
Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128