7 datasets found

Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pbio.1002128
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
f
Data from: Compound Classification and Consideration of Correlation with...
acs.figshare.com
zip
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuto Matsumoto; Hiroaki Gotoh (2023). Compound Classification and Consideration of Correlation with Chemical Descriptors from Articles on Antioxidant Capacity Using Natural Language Processing [Dataset]. http://doi.org/10.1021/acs.jcim.3c01826.s002
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.3c01826.s002
Dataset updated
Dec 20, 2023
Dataset provided by
ACS Publications
Authors
Yuto Matsumoto; Hiroaki Gotoh
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
In recent times, there has been a substantial increase in the number of articles focusing on antioxidants. However, the development of a comprehensive estimator for antioxidant capacity remains elusive due to the challenge of integrating information from these articles. Furthermore, the complexity of the antioxidant mechanism, which involves a multitude of factors, makes it difficult to establish a simple equation or correlation. Hence, there is a pressing need for a model that can effectively interpret the collective knowledge from these articles, especially from a chemistry perspective. In this research, we employed natural language processing techniques, specifically Word2Vec, to analyze articles related to antioxidant capacity. We extracted representation vectors of compound names from these documents and organized them into 10 distinct clusters. In our investigation of two of these clusters, we unveiled that the majority of the compounds in question were flavonoids and flavonoid glycosides. To establish a link between the descriptors and clusters, we utilized kernel density estimation and generated scatter plots to visualize their similarity. These visualizations clearly indicated a strong relationship between the descriptors and clusters, affirming that a tangible connection exists between word vectors and compound descriptors through a document analysis conducted with natural language processing techniques. This study represents a pioneering approach that utilizes document analysis to shed light on the field of antioxidant capacity research, marking a significant advancement in this domain.
Data Analysis.
plos.figshare.com
xls
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert M. X. Wu; Huan Zhang; Jie Liang; Niusha Shafiabady; Hai Yan (Helen) Lu; Ergun Gide; D. W. M. N. C. Dasanayake; Meena Jha; Shaoyang Duan (2025). Data Analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0321077.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0321077.t001
Dataset updated
Jul 1, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Robert M. X. Wu; Huan Zhang; Jie Liang; Niusha Shafiabady; Hai Yan (Helen) Lu; Ergun Gide; D. W. M. N. C. Dasanayake; Meena Jha; Shaoyang Duan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper proposes a dynamic analytical processing (DAP) visualization tool based on the Bubble-Wall Plot. It can be handily used to develop visual warning systems for visualizing the dynamic analytical processes of hazard data. Comparative analysis and case study methods are used in this research. Based on a literature review of Q1 publications since 2017, 23 types of data visualization approaches/tools are identified, including seven anomaly data visualization tools. This study presents three significant findings by comparing existing data visualization approaches. The primary finding is that no single visualization tool can fully satisfy industry requirements. This finding motivates academics to develop new DAP visualization tools. The second finding is that there are different views of Line Charts and various perspectives on Scatter Plots. The other one is that different researchers may perceive an existing data visualization tool differently, such as arguments between Scatter Plots and Line Charts and diverse opinions about Parallel Coordinate Plots and Scatter Plots. Users’ awareness rises when they choose data visualization tools that satisfy their requirements. By conducting a comparative analysis based on five categories (Style, Value, Change, Correlation, and Others) with 26 subcategories of metric features, results show that this new tool can effectively solve the limitations of existing visualization tools as it appears to have three remarkable characteristics: the simplest cartographic tool, the most straightforward visual result, and the most intuitive tool. Furthermore, this paper illustrates how the Bubble-Wall Plot can be effectively applied to develop a warning system for presenting dynamic analytical processes of hazard data in the coal mine. Lastly, this paper provides two recommendations, one implication, six research limitations, and eleven further study topics.
Appendix C. Scatter plots of daily GPP vs. GCC for all deciduous broadleaf...
wiley.figshare.com
datasetcatalog.nlm.nih.gov
html
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Toomey; Mark A. Friedl; Steve Frolking; Koen Hufkens; Stephen Klosterman; Oliver Sonnentag; Dennis D. Baldocchi; Carl J. Bernacchi; Sebastien C. Biraud; Gil Bohrer; Edward Brzostek; Sean P. Burns; Carole Coursolle; David Y. Hollinger; Hank A. Margolis; Harry McCaughey; Russell K. Monson; J. William Munger; Stephen Pallardy; Richard P. Phillips; Margaret S. Torn; Sonia Wharton; Marcelo Zeri; Andrew D. Richardson (2023). Appendix C. Scatter plots of daily GPP vs. GCC for all deciduous broadleaf forest (DBF) evergreen needleleaf forest (ENF) and grassland (GRS) sites, listed by plant functional type. [Dataset]. http://doi.org/10.6084/m9.figshare.3520367.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3520367.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Wileyhttps://www.wiley.com/
Authors
Michael Toomey; Mark A. Friedl; Steve Frolking; Koen Hufkens; Stephen Klosterman; Oliver Sonnentag; Dennis D. Baldocchi; Carl J. Bernacchi; Sebastien C. Biraud; Gil Bohrer; Edward Brzostek; Sean P. Burns; Carole Coursolle; David Y. Hollinger; Hank A. Margolis; Harry McCaughey; Russell K. Monson; J. William Munger; Stephen Pallardy; Richard P. Phillips; Margaret S. Torn; Sonia Wharton; Marcelo Zeri; Andrew D. Richardson
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Scatter plots of daily GPP vs. GCC for all deciduous broadleaf forest (DBF) evergreen needleleaf forest (ENF) and grassland (GRS) sites, listed by plant functional type.
f
Appendix B. Scatterplots of glucosinolate concentration per area (mmol/m2)...
figshare.com
wiley.figshare.com
html
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. Brian Traw; Paul Feeny (2023). Appendix B. Scatterplots of glucosinolate concentration per area (mmol/m2) against tissue value and nitrogen per area. [Dataset]. http://doi.org/10.6084/m9.figshare.3529043.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3529043.v1
Dataset updated
May 30, 2023
Dataset provided by
Wiley
Authors
M. Brian Traw; Paul Feeny
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Scatterplots of glucosinolate concentration per area (mmol/m2) against tissue value and nitrogen per area.
Appendix C. Untransformed and transformed scatterplot matrices summarizing...
wiley.figshare.com
figshare.com
html
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Van Pelt; Stephen C. Sillett (2023). Appendix C. Untransformed and transformed scatterplot matrices summarizing bivariate relationships among tree-level structural variables used in principal components analysis. [Dataset]. http://doi.org/10.6084/m9.figshare.3566364.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3566364.v1
Dataset updated
May 30, 2023
Dataset provided by
Wileyhttps://www.wiley.com/
Authors
Robert Van Pelt; Stephen C. Sillett
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Untransformed and transformed scatterplot matrices summarizing bivariate relationships among tree-level structural variables used in principal components analysis.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1

Petre_Slide_CategoricalScatterplotFigShare.pptx

Explore at:

pptxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3840102.v1

Dataset updated

Sep 19, 2016

Dataset provided by

Figsharehttp://figshare.com/

Authors

Benj Petre; Aurore Coince; Sophien Kamoun

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/

Clear search

Close search

Google apps

Main menu

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

Data from: Compound Classification and Consideration of Correlation with...

Data Analysis.

Appendix C. Scatter plots of daily GPP vs. GCC for all deciduous broadleaf...

Appendix B. Scatterplots of glucosinolate concentration per area (mmol/m2)...

Appendix C. Untransformed and transformed scatterplot matrices summarizing...

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate