30 datasets found

f
Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
figshare
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
f
Data from: PiTMaP: A New Analytical Platform for High-Throughput Direct...
figshare.com
acs.figshare.com
zip
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kei Zaitsu; Seiichiro Eguchi; Tomomi Ohara; Kenta Kondo; Akira Ishii; Hitoshi Tsuchihashi; Takakazu Kawamata; Akira Iguchi (2023). PiTMaP: A New Analytical Platform for High-Throughput Direct Metabolome Analysis by Probe Electrospray Ionization/Tandem Mass Spectrometry Using an R Software-Based Data Pipeline [Dataset]. http://doi.org/10.1021/acs.analchem.0c01271.s005
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.analchem.0c01271.s005
Dataset updated
Jun 3, 2023
Dataset provided by
ACS Publications
Authors
Kei Zaitsu; Seiichiro Eguchi; Tomomi Ohara; Kenta Kondo; Akira Ishii; Hitoshi Tsuchihashi; Takakazu Kawamata; Akira Iguchi
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
A new analytical platform called PiTMaP was developed for high-throughput direct metabolome analysis by probe electrospray ionization/tandem mass spectrometry (PESI/MS/MS) using an R software-based data pipeline. PESI/MS/MS was used as the data acquisition technique, applying a scheduled-selected reaction monitoring method to expand the targeted metabolites. Seventy-two metabolites mainly related to the central energy metabolism were selected; data acquisition time was optimized using mouse liver and brain samples, indicating that the 2.4 min data acquisition method had a higher repeatability than the 1.2 and 4.8 min methods. A data pipeline was constructed using the R software, and it was proven that it can (i) automatically generate box-and-whisker plots for all metabolites, (ii) perform multivariate analyses such as principal component analysis (PCA) and projection to latent structures-discriminant analysis (PLS-DA), (iii) generate score and loading plots of PCA and PLS-DA, (iv) calculate variable importance of projection (VIP) values, (v) determine a statistical family by VIP value criterion, (vi) perform tests of significance with the false discovery rate (FDR) correction method, and (vii) draw box-and-whisker plots only for significantly changed metabolites. These tasks could be completed within ca. 1 min. Finally, PiTMaP was applied to two cases: (1) an acetaminophen-induced acute liver injury model and control mice and (2) human meningioma samples with different grades (G1–G3), demonstrating the feasibility of PiTMaP. PiTMaP was found to perform data acquisition without tedious sample preparation and a posthoc data analysis within ca. 1 min. Thus, it would be a universal platform to perform rapid metabolic profiling of biological samples.
Data used in Figures 1-3 and Table 2
catalog.data.gov
Updated Jun 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2025). Data used in Figures 1-3 and Table 2 [Dataset]. https://catalog.data.gov/dataset/data-used-in-figures-1-3-and-table-2
Explore at:
Dataset updated
Jun 15, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Data used to generate Figures 1-3 and Table 2 in the journal article entitled "Evaluating the Laboratory Performance of Pellet-Fueled Semigasifier Cookstoves" https://doi.org/10.1021/acs.est.4c10008 Figure 1. Box and whisker plots (with jittered points by ISO test phase) of emission factors based on energy delivered (EFd) for particle pollutants (a) fine particulate matter (PM2.5), (b) ultrafine particles (UFP), (c) organic carbon (OC), and (d) elemental carbon (EC). Figure 2. Box and whisker plots (with jittered points by the ISO test phase) of emission factors based on energy delivered (EFd) for gaseous pollutants: (a) carbon monoxide (CO), (b) total hydrocarbons (THC), (c) methane (CH4), and (d) nitrogen oxides (NOx). Figure 3. Emission factors (based on energy delivered) of fine particulate matter (PM2.5) and carbon monoxide (CO) plotted against ISO Tiers for distinct test phases (i.e., power levels) and overall (i.e., mean of phase-averaged) results for all three stoves (a−c). Error bars represent the 90% confidence intervals in the mean. Mean values are also plotted from the literature (d) for lab (L), test kitchen (K), and field (F) studies, as summarized in Tables S16 and S17. Table 1. Summary of Power Level Mean (i.e., Mean of All Stove/Fuel Combinations at a Given Power Level) and Standard Deviations (SD, Here Defined as Standard Deviations of All Stove/Fuel Combination Means at a Given Power Level) of Emission Factors Based on Energy Delivered (EFd) for All Pollutants Excluding Carbon Dioxide (CO2). This dataset is associated with the following publication: Champion, W., G. Shen, C. Williams, L. Virtaranta, M. Barnes, C. Christianson, M. Hays, and J. Jetter. Evaluating the Laboratory Performance of Pellet-fueled Semi-gasifier Cookstoves. ACS ES&T Air. American Chemical Society, Washington, DC, USA, 59(4): 0, (2025).
n
Data from: Research funding for male reproductive health and infertility in...
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated Mar 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eva Gumerova; Christopher De Jonge; Christopher Barratt (2022). Research funding for male reproductive health and infertility in the UK and USA [2016 – 2019] [Dataset]. http://doi.org/10.5061/dryad.v9s4mw6wc
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.v9s4mw6wc
Dataset updated
Mar 1, 2022
Dataset provided by
University of Dundee
University of Minnesota
Authors
Eva Gumerova; Christopher De Jonge; Christopher Barratt
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
United Kingdom, United States
Description
There is a paucity of data on research funding levels for male reproductive health (MRH). We investigated the research funding for MRH and infertility by examining publicly accessible webdatabases from the UK and USA government funding agencies. Information on the funding was collected from the UKRI-GTR, the NIHR’s Open Data Summary, and the USA’s NIH RePORT webdatabases. Funded projects between January 2016 and December 2019 were recorded and funding support was divided into three research categories: (i) male-based; (ii) female-based; and (iii) not-specified. Between January 2016 and December 2019, UK agencies awarded a total of £11,767,190 to 18 projects for male-based research and £29,850,945 to 40 projects for female-based research. There was no statistically significant difference in the median funding grant awarded within the male-based and female-based categories (p=0.56, W=392). The USA NIH funded 76 projects totalling $59,257,746 for male-based research and 99 projects totalling $83,272,898 for female-based research Again, there was no statistically significant difference in the median funding grant awarded between the two research categories (p=0.83, W=3834). This is the first study examining funding granted by main government research agencies from the UK and USA for MRH. These results should stimulate further discussion of the challenges of tackling male infertility and reproductive health disorders and formulating appropriate investment strategies. Methods Experimental Design: Publicly accessible UK Research and Innovation (UKRI), National Institute for Health Research (NIHR), and National Institutes of Health (NIH) funding agency databases covering awards from January 2016 to December 2019 were examined (see Supplementary Table 1). Following the inclusion and exclusion criteria outlined within Supplementary Tables 2,3, funding data were collected on research proposals investigating infertility and reproductive health. For simplicity, these are referred to collectively as ‘infertility research’. As the primary focus of this research is on infertility, the data were divided into three main categories: (i) male-based, (ii) female-based, and (iii) not-specified (Supplementary Table 2). The first two groups covered projects whose primary aim, based on the information presented in the research abstracts, timeline summaries and/or impact statements, was male- or female-focussed. “Not-specified” includes research projects that have either not specified a primary focus towards either male or female or have explicitly stated a focus on both. The process was conducted and reviewed by E.G. with C.L.R.B. Total funding for all three groups, funding over time, and comparison with overall funding for a particular agency was examined. Briefly, E.G. retrieved the primary data and produced the first set of data for discussion with C.L.R.B. Both went through the complete list and discussed each study/project and decided whether: (a) it should be included or not, and (b) what category does it fell under (male-, female-, or not-specified). The abstracts, which were almost always available and provided by each research study, were all examined and scrutinised by both E.G. and C.L.R.B together. If there was clear disagreement between E.G. and C.L.R.B, which were very rare, the project would not be included. UK Data Collection: From April 2018 the UK research councils, Innovate UK, and Research England are reported under one organization, the UKRI (2019). The councils independently fund research projects according to their respective visions and missions; however, until 2018/19, their annual funding expenditures were reported under the UKRI’s annual reports and budgets. The UKRI’s Gateway to Research (UKRI-GTR) web database allows users to analyse the information provided on taxpayer-funded research. Relevant search terms such as “male infertility” or “female reproductive health” (see Supplementary Table 2) were applied with appropriate database filters (Supplementary Table 1). The project award relevance was determined by assessing the objectives in project abstracts, timeline summaries, and planned impacts. Supplementary Tables 1, 2 and 3 provide the search filters and the reference criteria for inclusion/exclusion utilized for analysis. The UKRI-GTR provides the total funding granted to the projects within a designated period. Data obtained from the NIHR had minor differences. The NIHR has 6 datasets. The Open Data Summary View dataset was used as it provided details on funded projects, grants, summary abstracts, and project dates. Like the UKRI data, the NIHR excel datasheet had specific search terms and filters applied to sift out irrelevant projects (Supplementary Tables 1-3). The UKRI councils and NIHR report their annual expenditure and budgets for 1st April to 31st March. Thus, the projects will fall under the funding period of when their research activities begin (e.g. if a project’s research activities undergo between May 20th, 2017, to March 20th, 2019, this project will be categorized under the funding period 2017/18). The projects collected would begin their investigations between January 2016 to December 2019, therefore 5 consecutive funding periods were examined (2015/16, 2016/17, 2017/18, 2018/19, and 2019/20). The UK data collection period ran between October 2020 to December 2020. USA Data Collection: The NIH has a research portfolio online operating tools sites (RePORT) providing access to their research activities, such as previously funded research, active research projects, and information on NIH’s annual expenditures. The RePORT-Query database has similar features as the UKRI-GTR and NIHR such as providing information on project abstracts, research impact, start- and end-dates, funding grants, and type of research. Like the UK data collection, appropriate search terms were inputted with the database filters applied and followed the same inclusion-exclusion criteria (Supplementary Tables 1, 2, and 3). The UK and US agencies present data on funded research under different calendar and funding periods because the US’ federal tax policy requires federal bodies to report all funding expenses under a fiscal year (FY). The NIH’s FY follows a calendar period from October 1st to September 30th (e.g., FY2016 comprises funding activity from October 1st, 2015, to September 30th, 2016). Projects running over one calendar period are reported several times under consecutive fiscal years and the funds are divided according to the annual period of the project’s activity. During data collection, 74 projects were found as active with incomplete funding sums as the NIH divides the grants according to the budgeting period of every FY. The NIH are in the process of granting funds for the FY2021, so projects ending in 2020 or 2021 provide a complete funding sum. For the active projects ending after 2021, incomplete funding data is provided. It is assumed the funding will increase in value by the time the research ends in the future, but the final awarded sum is unknown. To remain consistent with the UK data, projects granted funding are totalled as one figure and recorded under the FY the project first began research, whether they are active or completed. Thus US funding is referred to as “Current Total Funding”. When going through the REPORTER database, the NIH present the same research project multiple times for every funded fiscal year with consecutive project reference IDs. Therefore, for simplicity, we only included the first project reference ID. For more information on deciphering NIH's project's IDs, see https://era.nih.gov/files/Deciphering_NIH_Application.pdf. For the USA, the initial data collection period ran between October 2020 to December 2020 but then restarted for a brief period in January 2021 to add up the remaining funding values for some of the active research projects. Data Analysis: The data was divided into three main groups and organized into the funding period or FY the project was first awarded. R-Studio (Version 1.3.1093) was utilized for the data analysis. Box-and-whisker plots are presented with rounded P-values. Kruskal-Wallis and Wilcoxon Rank Sum tests were generated to assess any statistical significance. The data was independently collected and does not assume a normal distribution, so the rank-based, non-parametric tests such as the Kruskal-Wallis and Wilcoxon Rank Sum were used. Research Project Details Included in the Collection Datasets: For both, the UK and USA data, we included the following details:

The project (or study) titles The Project IDs (also referred to as Project Reference or Project Number) The project Start and End Dates The project's Status (identified by the end dates or if explicitly stated in the database) The Funding Organisation (for the UK) and Admin Institute (for the USA) that are funding the research The project Category (i.e. Research Grants or Fellowships) The Amount Granted (for the USA, the funding values were summed up to the most recent awarding date).

Rearranging/Processing Data for Analysis: After the data collection has been completed, the data was processed into a simpler format in Notepad in order to perform the statistical analyses using RStudio. For that, only the essential details were included and organised that the RStudio system would recognise and analyse the information effectively and efficiently. The project Type (male, female or not-specifieded), funding sum for the respective research project Type, and the funding period (UK) / FY (USA) were included. These details were then arranged appropriately to produce box-and-whisker plots with P-values, perform the chosen statistical analysis tests, and produce the data statistics in RStudio. As mentioned earlier, the funding period/fiscal years were added following the timeframes set out by the respective countries.
g
Dataset for Targeted GC-MS Analysis of Firefighters' Exhaled Breath
gimi9.com
datasets.ai
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataset for Targeted GC-MS Analysis of Firefighters' Exhaled Breath [Dataset]. https://gimi9.com/dataset/data-gov_dataset-for-targeted-gc-ms-analysis-of-firefighters-exhaled-breath
Explore at:
Description
This dataset includes a table of the VOC concentrations detected in firefighter breath samples. QQ-plots for benzene, toluene, and ethylbenzene levels in breath samples as well as box-and-whisker plots of pre-, post-, and 1 h post-exposure breath levels of VOCs for firefighters participating in attack, search, and outside ventilation positions are provided. Graphs detailing the responses of individuals to pre-, post-, and 1 h post-exposure concentrations of benzene, toluene, and ethylbenzene are shown. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The original dataset contains identification information for the firefighters who participated in the controlled structure burns. The analyzed tables and graphs can be made publicly available. Format: The original dataset contains identification information for the firefighters who participated in the controlled structure burns. The analyzed tables and graphs can be made publicly available. This dataset is associated with the following publication: Wallace, A., J. Pleil, K. Oliver, D. Whitaker, S. Mentese, K. Fent, and G. Horn. Targeted GC-MS analysis of firefighters’ exhaled breath: Exploring biomarker response at the individual level. JOURNAL OF OCCUPATIONAL AND ENVIRONMENTAL HYGIENE. Taylor & Francis, Inc., Philadelphia, PA, USA, 16(5): 355-366, (2019).
Supplementary material 3 from: Eddy B (2024) A GIS methodology for mapping...
zenodo.org
data.niaid.nih.gov
pdf
Updated Jun 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Eddy; Brian Eddy (2024). Supplementary material 3 from: Eddy B (2024) A GIS methodology for mapping regional and community vitality for Canada using the CanEcumene 3.0 Geodatabase with census data. One Ecosystem 9: e122079. https://doi.org/10.3897/oneeco.9.e122079 [Dataset]. http://doi.org/10.3897/oneeco.9.e122079.suppl3
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3897/oneeco.9.e122079.suppl3
Dataset updated
Jun 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Brian Eddy; Brian Eddy
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Canada
Description
Box and Whisker Plots of CVI Values for Selected Community Dimensions for 2001-2021
Box plot
figshare.com
xlsx
Updated Dec 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shinichi Sato (2022). Box plot [Dataset]. http://doi.org/10.6084/m9.figshare.19290185.v5
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19290185.v5
Dataset updated
Dec 8, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Shinichi Sato
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RSV box-and-whisker diagram data for the search terms "malnutrition," "frailty," "sarcopenia," and "cachexia" from January 1, 2018 to January 1, 2022. The data is divided before and after the declaration of the COVID-19 pandemic.
m
Ecological snapshot of a Panopea population within their traces (Pliocene,...
data.mendeley.com
narcis.nl
Updated Jan 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weronika Łaska (2019). Ecological snapshot of a Panopea population within their traces (Pliocene, Agua Amarga subbasin, SE Spain) [Dataset]. http://doi.org/10.17632/663rnpkw6x.1
Explore at:
Unique identifier
https://doi.org/10.17632/663rnpkw6x.1
Dataset updated
Jan 27, 2019
Authors
Weronika Łaska
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Spain, Agua Amarga
Description
It is provided a R code to compute a histogram of size frequencies, with bivalve length as the variable combined with a box & whisker plot. It was used to characterize the ontogenetic stages of the Panopea individuals analyzed at the Agua Amarga outcrop (SE Spain). Necessary data including Panopea length are provided in "Panopea and S. phiale measurements.xlsx" file.
m
RAAS markers and COVID-19
data.mendeley.com
Updated Sep 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nisha Parikh (2022). RAAS markers and COVID-19 [Dataset]. http://doi.org/10.17632/6dzn4yxc3s.2
Explore at:
Unique identifier
https://doi.org/10.17632/6dzn4yxc3s.2
Dataset updated
Sep 5, 2022
Authors
Nisha Parikh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary Figure 1A: Box and Whisker Plots of log Aldosterone to Renin Ratio, additionally adjusted for body mass index Supplementary Figure 1B. Box and Whisker Plots of log Renin, additionally adjusted for body mass index Supplementary Figure 1C. Box and Whisker Plots of log Aldosterone, additionally adjusted for body mass index Supplementary Figure 2. Box and Whisker Plots of log ACE activity, additionally adjusted for body mass index
n
Chapter 10 of the Working Group I Contribution to the IPCC Sixth Assessment...
data-search.nerc.ac.uk
Updated Nov 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Chapter 10 of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Figure 10.20 (v20220113) [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=Mediterranean
Explore at:
Dataset updated
Nov 16, 2021
Description
Data for Figure 10.20 from Chapter 10 of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6). Figure 10.20 shows aspects of Mediterranean summer warming. --------------------------------------------------- How to cite this dataset --------------------------------------------------- When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates: Doblas-Reyes, F.J., A.A. Sörensson, M. Almazroui, A. Dosio, W.J. Gutowski, R. Haarsma, R. Hamdi, B. Hewitson, W.-T. Kwon, B.L. Lamptey, D. Maraun, T.S. Stephenson, I. Takayabu, L. Terray, A. Turner, and Z. Zuo, 2021: Linking Global to Regional Climate Change. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change[Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 1363–1512, doi:10.1017/9781009157896.012. --------------------------------------------------- Figure subpanels --------------------------------------------------- The figure has 7 subpanels. Data for subpanels d, e, f and g is provided. --------------------------------------------------- List of data provided --------------------------------------------------- The data is annual summer (JJA) means for: - Observed trends over 1960-2014 - Anomalies 1960-2014 with respect to 1995-2014 average for the Mediterranean mean (lon: 10°W-40°E, lat: 25°N-50°N) - Trends 1960-2014 for the Mediterranean mean (lon: 10°W-40°E, lat: 25°N-50°N) - Modelled trend differences to the observed over 1960-2014 --------------------------------------------------- Data provided in relation to figure --------------------------------------------------- Panel (d): - Data file: Fig_10_20_panel-d_mapplot_tas_obs_trend_single_single_trend.nc; JJA Berkeley Earth surface air temperature OLS linear trends over 1960-2014 over the Mediterranean (lon: 10°W-40°E, lat: 25°N-50°N) Panel (e): - Data file: Fig_10_20_panel-e_timeseries.csv; Observed and modelled JJA surface air temperature anomalies 1960-2014 (baseline 1995-2014) for the Mediterranean mean (lon: 10°W-40°E, lat: 25°N-50°N): CMIP5 (blue), CMIP6 (red), HighResMIP (orange), CORDEX EUR-44 (light blue), CORDEX EUR-11 (green), Berkeley Earth (dark blue), CRU TS (brown), HadCRUT5 (cyan) Panel (f): - Data file: Fig_10_20_panel-f_trends.csv; JJA OLS linear trends in surface air temperature 1960-2014 for the Mediterranean mean (lon: 10°W-40°E, lat: 25°N-50°N) of observations (Berkeley Earth, CRU TS, HadCRUT5: black crosses) and models (CMIP5 (blue circles), CMIP6 (red circles), HighResMIP (orange circles), CORDEX EUR-44 (light blue circles), CORDEX EUR-11 (green circles)) and box-and-whisker plots for the SMILEs: MIROC6, CSIRO-Mk3-6-0, MPI-ESM, d4PDF (grey shading) Panel (g): - Data files: Fig_10_20_panel-g_mapplot_tas_cmip5_mean_trend_bias_tas_cmip5_maps_trend_MultiModelMean_trend-bias.nc, Fig_10_20_panel-g_mapplot_tas_cmip6_mean_trend_bias_tas_cmip6_maps_trend_MultiModelMean_trend-bias.nc, Fig_10_20_panel-g_mapplot_tas_cordex_11_mean_trend_bias_tas_cordex_11_maps_trend_MultiModelMean_trend-bias.nc, Fig_10_20_panel-g_mapplot_tas_cordex_44_mean_trend_bias_tas_cordex_44_maps_trend_MultiModelMean_trend-bias.nc, Fig_10_20_panel-g_mapplot_tas_hrmip_mean_trend_bias_tas_hrmip_maps_trend_MultiModelMean_trend-bias.nc; Modelled OLS linear surface air temperature trend differences to the observed trend (Berkeley Earth) over 1960-2014 of CMIP5, CMIP6, HighResMIP, CORDEX EUR-44, and CORDEX EUR-11 ensemble means Acronyms: CMIP - Coupled Model Intercomparison Project, Cordex – Coordinated Regional Climate Downscaling Experiment, CRU TS- Climatic Research Unit Time Series, CSIRO - Commonwealth Scientific and Industrial Research Organisation, MIROC - Model for Interdisciplinary Research on Climate, SMILEs - single model initial-condition large ensembles, d4PDF - Database for Policy Decision-Making for Future Climate Change, OLS - ordinary least squares regression. --------------------------------------------------- Notes on reproducing the figure from the provided data --------------------------------------------------- The code for ESMValTool is provided. --------------------------------------------------- Sources of additional information --------------------------------------------------- The following weblinks are provided in the Related Documents section of this catalogue record: - Link to the figure on the IPCC AR6 website - Link to the report component containing the figure (Chapter 10) - Link to the Supplementary Material for Chapter 10, which contains details on the input data used in Table 10.SM.11 - Link to the code for the figure, archived on Zenodo.
Groundwater temperature profiles of Superficial, Dalradian and Carboniferous...
gsni-data.bgs.ac.uk
Updated Jan 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geological Survey of Northern Ireland (2022). Groundwater temperature profiles of Superficial, Dalradian and Carboniferous Limestones in the Derg and Blackwater Catchments [Dataset]. https://gsni-data.bgs.ac.uk/geonetwork/srv/api/records/2aae1abc-0126-46e7-ac8e-0e35d828d455
Explore at:
www:download-1.0-http--downloadAvailable download formats
Dataset updated
Jan 5, 2022
Dataset provided by
British Geological Surveyhttps://www.bgs.ac.uk/
License
https://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitationshttps://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations
Time period covered
Nov 10, 2020 - Aug 17, 2022
Area covered

Description
These data present monthly temperature profiles for nine boreholes located within the Derg (six) and Blackwater (three) catchments that are installed in three aquifers (Superficial (five), Dalradian (three) and Carboniferous Limestone (one)). These data was collected between 10th November 2020 and 17th August 2022. Data was collected hourly by Hobo level logger.

The spreadsheet presents data on the monitoring boreholes including Easting, Northing, Elevation, Groundwater Data Repository (GDR) number and depth of the top and bottom of the screened section. Time series temperature data is available for each borehole at one hour interval.

The data has been summarized and box and whisker plots of monthly temperatures for each individual borehole have been produced. Line plots have also been produced showing the average monthly temperature at each borehole grouped by aquifer (Superficial, Dalradian and Carboniferous Limestone).

These boreholes were installed as part of the CatchmentCARE project.
Integrating Satellite and Sensor Measurements to Understand Urban Air...
catalog.data.gov
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2025). Integrating Satellite and Sensor Measurements to Understand Urban Air Quality: A Case Study of PM2.5 in Asunción, Paraguay [Dataset]. https://catalog.data.gov/dataset/integrating-satellite-and-sensor-measurements-to-understand-urban-air-quality-a-case-study
Explore at:
Dataset updated
Jul 18, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
Paraguay, Asunción
Description
Data describes Satellite-derived monthly averaged PM2.5 concentrations for South America and Asunción during February 2020, Box-and-whisker plots showing the distributions of monthly mean PM2.5 values and 24-hr PM2.5 mean values for each ground-level sensor, Line graph showing hourly PM2.5 averages for each grou, nd-level sensor, Sensor-measured daily PM2.5 means for ItaEnramada and Villamora during August 2020 (fire season) and March 2021 (rainy season). This dataset is not publicly accessible because: Owned by Chester F. Carlson Center for Imaging Science at the Rochester Institute of Technology. It can be accessed through the following means: Contact John Kerekes at kerekes@cis.rit.edu. Format: Medium sized files; not too big. No special equipment needed. This dataset is associated with the following publication: Baldauf, R., L. Prox, J. Kerekes, Y. Zhou, G. Pallarolas , and M. Lang. Integrating satellite and sensor measurements to understand urban air quality: case study of PM2.5 in Asunción, Paraguay. EM: AIR AND WASTE MANAGEMENT ASSOCIATION'S MAGAZINE FOR ENVIRONMENTAL MANAGERS. Air & Waste Management Association, Pittsburgh, PA, USA, NA, (2023).
r
Data from: SUPPLEMENTARY MATERIALS for Mitochondrial DNA genome variation in...
researchdata.se
Updated Jun 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kimberly Sturk-Andreaggi (2024). SUPPLEMENTARY MATERIALS for Mitochondrial DNA genome variation in the Swedish population [Dataset]. http://doi.org/10.57804/mfyp-ea25
Explore at:
(356562), (51317), (296464), (412425)Available download formats
Unique identifier
https://doi.org/10.57804/mfyp-ea25
Dataset updated
Jun 24, 2024
Dataset provided by
Uppsala University
Authors
Kimberly Sturk-Andreaggi
Description
The data consists of:

Table S1. The haplogroup breakdown for the 934 SweGen haplotypes included in the final mitochondrial genome dataset.

Figure S1. Graphical description of the average read depths observed in the SweGen dataset.

Figure S2. The distribution of read depth for the 16,569 positions of the mitochondrial genome based on the average observed in a subset of 100 representative SweGen haplotypes.

Figure S3. The box-and-whisker plot presents the distribution of average variant frequency for each coverage classification group.

The dataset was originally published in DiVA and moved to SND in 2024.
d
Data from: Geochemical data analysis system (GDA): reference manual
datadiscoverystudio.org
pdf v.unknown
Updated Jan 1, 1992
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sheraton, J.W. (1992). Geochemical data analysis system (GDA): reference manual [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/42101226e2a8408fa57251f278303b83/html
Explore at:
pdf v.unknownAvailable download formats
Dataset updated
Jan 1, 1992
Authors
Sheraton, J.W.
Description
GDA (Geochemic al Data Analysis) is a comprehensive IBM PC-based geochemical data processing system. It is designed to use whole-rock geochemical data retrieved from the ORACLE database, but can be adapted for other databases, or data can be entered into files from the keyboard. The programs are written in FORTRAN 77 (microsoft compiler) and use the MicroGlyph Systems SciPlot graphics package for plotting. The system includes facilities for generating plots (histograms, XY plots, triangular plots, spidergrams, box-whisker plots, etc.), calculating statistical functions (e.g., mean, standard deviation, regression lines, correlation coefficients and cluster analysis) and CIPW norms, printing tables, and carrying out petrogenetic modelling calculations. Plots can be displayed on a PC screen for inspection and editing before being output to a plotter or other device. Other programs allow samples to be assigned to groups for plotting purposes, and allow editing and merging of datafiles.
experimental data (raw data, processed data)
zenodo.org
zip
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Leonardo Camacho Villalón; Christian Leonardo Camacho Villalón (2025). experimental data (raw data, processed data) [Dataset]. http://doi.org/10.5281/zenodo.14810882
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14810882
Dataset updated
Feb 10, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christian Leonardo Camacho Villalón; Christian Leonardo Camacho Villalón
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
###############################################################################
### Source code and problem instances:
###############################################################################
- The source code of METAFOR will be made available on github once the paper has been accepted for publication. In the meantime, the code is provided in METAFOR.zip
- The list of problem instances (single objective continuous functions) are provided in the attached file "instances.zip"
###############################################################################

###############################################################################
### Experiment folders:
###############################################################################
The folders "default", "leave25OUT", "leave25OUTCEC14", "leaveLDO" and "leaveLDOCEC14" (compressed in zip format to save space) contain all the data collected during the experiment. Each of them contains the following folders/files:
- "candidates/candidates.txt" -- a text file with the algorithms specified as command lines,
- "candidates/OUTPUT" -- a folder containing a text file with the best solutions found by each algorithm for each problem instance. The names of the files in the folder are composed of the test suite "c_<0,1,2,3>", the number of the function in the test suite "f_<0,...,n>", and the number of dimensions "d_<50,100,500,750,...,d>".
- "DataAndPlots" -- a folder created automatically by the "plot_bxp_rtd_wlx.sh" processing script (see below).
Inside this folder are the following subfolders:
- "DataAndPlots/Bxp" -- stores the box plots created based on the data in "DataAndPlots/Data";
- "DataAndPlots/Cvg" -- (if any)stores the convergence plots generated based on the data in "OUTPUT_processed";
- "DataAndPlots/Data" -- stores the processed data and statistical information (median, median error, statistical test, etc.) of the raw data stored in "candidates/OUTPUT";
- "DataAndPlots/Time" -- stores the average time taken by the algorithms on each problem instance.
- "DataAndPlots/Table" -- (if any) stores, in plain text and pseudo-LaTeX format, the tables of results reported in the paper, i.e., median, median error, median absolute deviation, rankings, statistical tests, and number of wins.
- "OUTPUT" -- stores the convergence data of the algorithms (i.e. function evaluations vs. solution quality).
In the latest version of METAFOR, each convergence file consists of 100 points; however, in the version we used for the experiments reported in the paper, each file consists of thousands of points per algorithm, making this folder particularly heavy.

The data in folder "candidates/OUTPUT" and "OUTPUT" is gathered via the script "runMe.sh" indicating an experiment folder and instances file, namely:
- in the case of "default", we solved instances "test_MIXTURE_max200.txt" and "test_MIXTURE_onlyLargeScale.txt";
- in the case of "leave25OUT", we solved instances "test_MIXTURE_max200.txt", "test_MIXTURE_onlyLargeScale.txt" and "test_MIXTURE_onlyLargeScaleDND.txt";
- in the case of "leave25OUTCEC14", we solved instances "test_CEC14.txt";
- in the case of "leaveLDO", we solved instances "test_MIXTURE_max200.txt", "test_MIXTURE_onlyLargeScale.txt" and "test_MIXTURE_onlyLargeScaleDND.txt";
- in the case of "leaveLDOCEC14", we solved instances "test_CEC14.txt".
###############################################################################

###############################################################################
### Processing scripts:
###############################################################################
The scripts folder (scripts.zip) contains the main processing script "plot_bxp_rtd_wlx.sh" and several auxiliary R and shell scripts: "boxplot.R", "cvg_log.R", "wilcoxon.R", "ranksPerClass.R", "filter_repeating.sh", "full_outer_join.sh", and "replace_na.sh". All the auxiliary scripts are automatically called by the main script, depending on the options specified by the user, and are intended to be used standalone. The R axiliary scripts are used to generate the box plots ("boxplot.R") and convergence plots ("cvg_log.R"), to perform the statistical test ("wilcoxon.R"), and to compute the rankings ("ranksPerClass.R"). The auxiliary shell scripts are used to clean the raw data stored in "OUTPUT" and create a file called data-*-mean.txt for each data file, which can be entered into "cvg_log.R" to generate the convergence plots.
###############################################################################

###############################################################################
### Folders with experiments:
###############################################################################
Since in the paper we report different sets of algorithms solving different sets of problems. We created a folder (compressed in zip format for space reasons) for each of them and put in it only the specific data that we want to analyze and plot. The data inside these folders is simply copy-paste from the main experiment folders, and it is as follows:
- "METAFOR/exp1_dftVStuned" contains the results discussed in section 5.3.1 of the paper.
- "METAFOR/exp2_mtfVSHyb" contains the results discussed in section 5.3.2 of the paper.
- "METAFOR/exp3_CEC14" and "METAFOR/exp4_LS" contain the results discussed in section 5.3.3 of the paper.
###############################################################################

Electric Vehicle Usage and Charging Analysis Dataset Across Seven Major...

zenodo.org

bin, csv

Updated Nov 6, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Weipeng Zhan; Yuan Liao; Yuan Liao; Junjun Deng; Zhenpo Wang; Sonia Yeh; Sonia Yeh; Weipeng Zhan; Junjun Deng; Zhenpo Wang (2024). Electric Vehicle Usage and Charging Analysis Dataset Across Seven Major Cities in China [Dataset]. http://doi.org/10.5281/zenodo.13852045

Explore at:

bin, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13852045

Dataset updated

Nov 6, 2024

Dataset provided by

Zenodo

Authors

Weipeng Zhan; Yuan Liao; Yuan Liao; Junjun Deng; Zhenpo Wang; Sonia Yeh; Sonia Yeh; Weipeng Zhan; Junjun Deng; Zhenpo Wang

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

China

Description

Background

This dataset provides supporting data for the figures presented in our study on electric vehicle (EV) usage and charging behavior across major Chinese cities. The detailed analysis and raw data are thoroughly described in Zhan et al (2025). The study examines 1.69 million EVs, representing 42% of China's total EV fleet, from November 2020 to October 2021. The study provides insights into operational demands, infrastructure requirements, and energy consumption patterns by analyzing diverse vehicle types—including private cars, taxis, buses, and special purpose vehicles (SPVs).

The purpose of this dataset is to enable researchers who do not have access to the same raw data to replicate, calibrate, or extend our findings using the processed data that underpins each figure. This resource is valuable for further research on EV infrastructure planning, energy consumption, and vehicle performance. This dataset is made available to help the research community leverage our findings and facilitate advancements in electric vehicle research and infrastructure planning. Please refer to Zhan et al (2025) for full details on the methodology and analysis.

Data description

This dataset includes the processed data underlying each figure in Zhan et al (2025), covering various aspects of EV usage, battery capacity, and charging behavior across seven major Chinese cities: Beijing, Shanghai, Guangzhou, Shenzhen, Nanjing, Chengdu, and Chongqing. The dataset is organized to correspond directly with the figures in the paper, facilitating its use for further analysis and model calibration. Each dataset is aligned with specific figures, providing essential data to help researchers without access to the original raw data.

1. EV Type and Battery Energy Distribution Across Cities

Fig1a.Distribution of EV types across selected Chinese cities

File: Fig1a.Distribution of EV types across selected Chinese cities.csv

Description: Distribution of EV types across seven cities, detailing the share of different vehicle types.

Column	Description	Data type	Unit
Beijing	Distribution of EV types in Beijing	Float	%
Shenzhen	Distribution of EV types in Shenzhen	Float	%
Shanghai	Distribution of EV types in Shanghai	Float	%
Guangzhou	Distribution of EV types in Guangzhou	Float	%
Chengdu	Distribution of EV types in Chengdu	Float	%
Chongqing	Distribution of EV types in Chongqing	Float	%
Nanjing	Distribution of EV types in Nanjing	Float	%

Fig1b.Distribution of battery energy by vehicle types

File: Fig1b.Distribution of battery energy by vehicle types.csv

Description: Distribution of battery energy across different vehicle types, represented as box plot statistics.

Column	Description	Data type	Unit
type_2	vehicle types	String	-
Lower Whisker	The battery energy corresponding to the Lower Whisker of the box plot.	Float	kWh
Q1 (25%)	The 25th percentile value of battery energy.	Float	kWh
Median (50%)	The median value of battery energy.	Float	kWh
Q3 (75%)	The 75th percentile value of battery energy.	Float	kWh
Upper Whisker	The battery energy corresponding to the Upper Whisker of the box plot.	Float	kWh

2. Variations in Battery Energy

Fig1c.Variations of battery energy of buses

File: Fig1c.Variations of battery energy of buses across studied cities.csv

Description: Battery energy variations for buses across the studied cities.

Column	Description	Data type	Unit
city_En	English name of 7 Chinese city	String	-
Lower Whisker	The battery energy of buses corresponding to the Lower Whisker of the box plot.	Float	kWh
Q1 (25%)	The 25th percentile value of battery energy of buses.	Float	kWh
Median (50%)	The median value of battery energy of buses.	Float	kWh
Q3 (75%)	The 75th percentile value of battery energy of buses.	Float	kWh
Upper Whisker	The battery energy of buses corresponding to the Upper Whisker of the box plot.	Float	kWh

Fig1d.Variations of battery energy of SPVs

File: Fig1c.Variations of battery energy of SPVs across studied cities.csv

Description: Battery energy variations for special purpose vehicles (SPVs) across cities.

Column	Description	Data type	Unit
city_En	English name of 7 Chinese city	String	-
Lower Whisker	The battery energy of SPVs corresponding to the Lower Whisker of the box plot.	Float	kWh
Q1 (25%)	The 25th

The results of the six methods on HS dataset.
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Der-Chiang Li; Susan C. Hu; Liang-Sian Lin; Chun-Wu Yeh (2023). The results of the six methods on HS dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0181853.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0181853.t008
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Der-Chiang Li; Susan C. Hu; Liang-Sian Lin; Chun-Wu Yeh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The results of the six methods on HS dataset.
e
SUPPLERENDE MATERIALER for variation i mitokondrielt DNA-genom i den svenske...
data.europa.eu
unknown
Updated Jan 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uppsala universitet (2025). SUPPLERENDE MATERIALER for variation i mitokondrielt DNA-genom i den svenske befolkning [Dataset]. https://data.europa.eu/data/datasets/https-doi-org-10-57804-mfyp-ea25?locale=da
Explore at:
unknownAvailable download formats
Dataset updated
Jan 30, 2025
Dataset authored and provided by
Uppsala universitet
Description
Dataene består af:

Tabel S1. Haplogruppeopdelingen for de 934 SweGen-haplotyper, der indgår i det endelige mitokondriegenomdatasæt.

Fig. S1. Grafisk beskrivelse af de gennemsnitlige læsedybder, der er observeret i SweGen-datasættet.

Figur S2. Fordelingen af læsedybden for de 16.569 positioner i mitokondriegenomet baseret på det gennemsnit, der er observeret i en undergruppe på 100 repræsentative SweGen-haplotyper.

Figur S3. Den box-and-whisker plot præsenterer fordelingen af gennemsnitlige variant frekvens for hver

dækningsklassifikationsgruppe.

Datasættet blev oprindeligt offentliggjort i DiVA og flyttet til SND 2024.
f
The results of the six methods on VC dataset.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Der-Chiang Li; Susan C. Hu; Liang-Sian Lin; Chun-Wu Yeh (2023). The results of the six methods on VC dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0181853.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0181853.t007
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Der-Chiang Li; Susan C. Hu; Liang-Sian Lin; Chun-Wu Yeh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The results of the six methods on VC dataset.
f
The results of four classifiers for the WDBC, PD, VC, and HS data set.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Der-Chiang Li; Susan C. Hu; Liang-Sian Lin; Chun-Wu Yeh (2023). The results of four classifiers for the WDBC, PD, VC, and HS data set. [Dataset]. http://doi.org/10.1371/journal.pone.0181853.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0181853.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Der-Chiang Li; Susan C. Hu; Liang-Sian Lin; Chun-Wu Yeh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The results of four classifiers for the WDBC, PD, VC, and HS data set.

Facebook

Twitter

Click to copy link

Link copied

Cite

Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1

Petre_Slide_CategoricalScatterplotFigShare.pptx

Explore at:

pptxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3840102.v1

Dataset updated

Sep 19, 2016

Dataset provided by

figshare

Authors

Benj Petre; Aurore Coince; Sophien Kamoun

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/

Clear search

Close search

Google apps

Main menu

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

Data from: PiTMaP: A New Analytical Platform for High-Throughput Direct...

Data used in Figures 1-3 and Table 2

Data from: Research funding for male reproductive health and infertility in...

Dataset for Targeted GC-MS Analysis of Firefighters' Exhaled Breath

Supplementary material 3 from: Eddy B (2024) A GIS methodology for mapping...

Box plot

Ecological snapshot of a Panopea population within their traces (Pliocene,...

RAAS markers and COVID-19

Chapter 10 of the Working Group I Contribution to the IPCC Sixth Assessment...

Groundwater temperature profiles of Superficial, Dalradian and Carboniferous...

Integrating Satellite and Sensor Measurements to Understand Urban Air...

Data from: SUPPLEMENTARY MATERIALS for Mitochondrial DNA genome variation in...

Data from: Geochemical data analysis system (GDA): reference manual

experimental data (raw data, processed data)

Electric Vehicle Usage and Charging Analysis Dataset Across Seven Major...

Background

Data description

1. EV Type and Battery Energy Distribution Across Cities

2. Variations in Battery Energy

The results of the six methods on HS dataset.

SUPPLERENDE MATERIALER for variation i mitokondrielt DNA-genom i den svenske...

The results of the six methods on VC dataset.

The results of four classifiers for the WDBC, PD, VC, and HS data set.

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate