32 datasets found
  1. f

    Petre_Slide_CategoricalScatterplotFigShare.pptx

    • figshare.com
    pptx
    Updated Sep 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
    Explore at:
    pptxAvailable download formats
    Dataset updated
    Sep 19, 2016
    Dataset provided by
    figshare
    Authors
    Benj Petre; Aurore Coince; Sophien Kamoun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Categorical scatterplots with R for biologists: a step-by-step guide

    Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

    1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

    Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

    Protocol

    • Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

    • Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

    • Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

    Notes

    • Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

    • Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

    7 Display the graph in a separate window. Dot colors indicate

    replicates

    graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

    References

    Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

    Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

    Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

    https://cran.r-project.org/

    http://ggplot2.org/

  2. f

    Comparison experiments by using IF.

    • figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gen Li; Jason J. Jung (2023). Comparison experiments by using IF. [Dataset]. http://doi.org/10.1371/journal.pone.0247119.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Gen Li; Jason J. Jung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison experiments by using IF.

  3. f

    Performance of DynGPE.

    • plos.figshare.com
    xls
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gen Li; Jason J. Jung (2023). Performance of DynGPE. [Dataset]. http://doi.org/10.1371/journal.pone.0247119.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Gen Li; Jason J. Jung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of DynGPE.

  4. GOPI Resource - Stacked Column Chart - Change in Jobs in Maryland by Month...

    • data.wu.ac.at
    csv, json, xml
    Updated Apr 27, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Bureau of Labor Statistics (2017). GOPI Resource - Stacked Column Chart - Change in Jobs in Maryland by Month (with Feb and March 2010 outliers filtered out) [Dataset]. https://data.wu.ac.at/schema/data_maryland_gov/NWk4aS1ieDU2
    Explore at:
    xml, csv, jsonAvailable download formats
    Dataset updated
    Apr 27, 2017
    Dataset provided by
    Bureau of Labor Statisticshttp://www.bls.gov/
    Area covered
    Maryland
    Description

    This dataset represents the CHANGE in the number of jobs per industry category and sub-category from the previous month, not the raw counts of actual jobs. The data behind these monthly change values is from the Bureau of Labor Statistics (BLS) Current Employment Statistics (CES) program. CES data represents businesses and government agencies, providing detailed industry data on employment on nonfarm payrolls.

  5. Data Visualization Cheat sheets and Resources

    • kaggle.com
    zip
    Updated Feb 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kash (2021). Data Visualization Cheat sheets and Resources [Dataset]. https://www.kaggle.com/kaushiksuresh147/data-visualization-cheat-cheats-and-resources
    Explore at:
    zip(133638507 bytes)Available download formats
    Dataset updated
    Feb 20, 2021
    Authors
    Kash
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Data Visualization Corpus

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1430847%2F29f7950c3b7daf11175aab404725542c%2FGettyImages-1187621904-600x360.jpg?generation=1601115151722854&alt=media" alt="">

    Data Visualization

    Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

    In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions

    The Data Visualizaion Copus

    The Data Visualization corpus consists:

    • 32 cheat sheets: This includes A-Z about the techniques and tricks that can be used for visualization, Python and R visualization cheat sheets, Types of charts, and their significance, Storytelling with data, etc..

    • 32 Charts: The corpus also consists of a significant amount of data visualization charts information along with their python code, d3.js codes, and presentations relation to the respective charts explaining in a clear manner!

    • Some recommended books for data visualization every data scientist's should read:

      1. Beautiful Visualization by Julie Steele and Noah Iliinsky
      2. Information Dashboard Design by Stephen Few
      3. Knowledge is beautiful by David McCandless (Short abstract)
      4. The Functional Art: An Introduction to Information Graphics and Visualization by Alberto Cairo
      5. The Visual Display of Quantitative Information by Edward R. Tufte
      6. storytelling with data: a data visualization guide for business professionals by cole Nussbaumer knaflic
      7. Research paper - Cheat Sheets for Data Visualization Techniques by Zezhong Wang, Lovisa Sundin, Dave Murray-Rust, Benjamin Bach

    Suggestions:

    In case, if you find any books, cheat sheets, or charts missing and if you would like to suggest some new documents please let me know in the discussion sections!

    Resources:

    Request to kaggle users:

    • A kind request to kaggle users to create notebooks on different visualization charts as per their interest by choosing a dataset of their own as many beginners and other experts could find it useful!

    • To create interactive EDA using animation with a combination of data visualization charts to give an idea about how to tackle data and extract the insights from the data

    Suggestion and queries:

    Feel free to use the discussion platform of this data set to ask questions or any queries related to the data visualization corpus and data visualization techniques

    Kindly upvote the dataset if you find it useful or if you wish to appreciate the effort taken to gather this corpus! Thank you and have a great day!

  6. f

    Data from: Robust Multivariate Functional Control Chart

    • tandf.figshare.com
    pdf
    Updated Oct 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Capezza; Fabio Centofanti; Antonio Lepore; Biagio Palumbo (2024). Robust Multivariate Functional Control Chart [Dataset]. http://doi.org/10.6084/m9.figshare.25365672.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Oct 9, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Christian Capezza; Fabio Centofanti; Antonio Lepore; Biagio Palumbo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In modern Industry 4.0 applications, a huge amount of data is acquired during manufacturing processes and is often contaminated with outliers, which can seriously reduce the performance of control charting procedures, especially in complex and high-dimensional settings. In the context of profile monitoring, we propose a new framework that is referred to as robust multivariate functional control chart (RoMFCC) to monitor a multivariate functional quality characteristic while being robust to both functional casewise and componentwise outliers. In the former case, observations of the quality characteristic are contaminated in all functional variables or components, while, in the latter, the contamination affects one or more components independently. The RoMFCC relies on (I) a functional filter to identify componentwise outliers to be replaced by missing components; (II) a robust multivariate functional data imputation method; (III) a casewise robust dimensionality reduction; (IV) a monitoring strategy for the quality characteristic. Through a Monte Carlo simulation study, the RoMFCC is compared with competing schemes that have already appeared in the literature. A case study is finally presented where the proposed framework is used to monitor a resistance spot welding process in the automotive industry. RoMFCC is implemented in the R package funcharts, available online on CRAN.

  7. Superstore Sales Analysis

    • kaggle.com
    Updated Oct 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Reda Elblgihy (2023). Superstore Sales Analysis [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/superstore-sales-analysis/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ali Reda Elblgihy
    Description

    Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:

    1- Data Import and Transformation:

    • Gather and import relevant sales data from various sources into Excel.
    • Utilize Power Query to clean, transform, and structure the data for analysis.
    • Merge and link different data sheets to create a cohesive dataset, ensuring that all data fields are connected logically.

    2- Data Quality Assessment:

    • Perform data quality checks to identify and address issues like missing values, duplicates, outliers, and data inconsistencies.
    • Standardize data formats and ensure that all data is in a consistent, usable state.

    3- Calculating COGS:

    • Determine the Cost of Goods Sold (COGS) for each product sold by considering factors like purchase price, shipping costs, and any additional expenses.
    • Apply appropriate formulas and calculations to determine COGS accurately.

    4- Discount Analysis:

    • Analyze the discount values offered on products to understand their impact on sales and profitability.
    • Calculate the average discount percentage, identify trends, and visualize the data using charts or graphs.

    5- Sales Metrics:

    • Calculate and analyze various sales metrics, such as total revenue, profit margins, and sales growth.
    • Utilize Excel functions to compute these metrics and create visuals for better insights.

    6- Visualization:

    • Create visualizations, such as charts, graphs, and pivot tables, to present the data in an understandable and actionable format.
    • Visual representations can help identify trends, outliers, and patterns in the data.

    7- Report Generation:

    • Compile the findings and insights into a well-structured report or dashboard, making it easy for stakeholders to understand and make informed decisions.

    Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.

  8. Data from: Spatio-Temporal Graph Neural Network for Urban Spaces:...

    • zenodo.org
    bin
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Silke Kirstin Kaiser; Silke Kirstin Kaiser (2025). Data from: Spatio-Temporal Graph Neural Network for Urban Spaces: Interpolating Citywide Traffic Volume [Dataset]. http://doi.org/10.5281/zenodo.15332147
    Explore at:
    binAvailable download formats
    Dataset updated
    May 7, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Silke Kirstin Kaiser; Silke Kirstin Kaiser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Urban Traffic Volume Dataset – Berlin (Strava) & New York City (Taxi)

    Associated paper: Spatio-Temporal Graph Neural Network for Urban Spaces: Interpolating Citywide Traffic Volume
    Authors: Silke K. Kaiser, Filipe Rodrigues, Carlos Lima Azevedo, Lynn H. Kaack

    Citation Request

    If you use this dataset, please cite our paper:

    Kaiser, S. K., Rodrigues, F., Azevedo Lima, C., & Kaack, L.H. (2025). Spatio-Temporal Graph Neural Network for Urban Spaces: Interpolating Citywide Traffic Volume. [published on arXiv].

    Dataset Overview

    This dataset includes street-level traffic volume data for two major urban areas:

    • Berlin (Strava Cycling Data): Daily bicycle traffic volumes from 2019–2023, aggregated from publicly shared Strava user data.

    • New York City (Taxi Data): Hourly motorized traffic volumes from Manhattan for January–February 2016, derived from GPS trajectories of yellow taxis.

    Both datasets are provided at the street-segment level and come with rich auxiliary features capturing spatial, temporal, infrastructure, and contextual information.

    Each city includes:

    • :
      Full feature table for each street segment, including traffic volume and auxiliary features.

    • :
      Geometry for each street segment.

    • :
      Binary adjacency matrix.

    • :
      Adjacency matrix weighted by node feature similarity.

    • :
      Adjacency matrix based on Euclidean (bird’s-eye) distance.

    • :
      Adjacency matrix based on real-world road network distance.

    • :
      Adjacency matrix weighted by estimated travel time over the road network.

    Key Features and Methodology

    • Volume Estimation: Strava volumes are rounded aggregates of bike trips; NYC volumes are computed from reconstructed taxi trajectories.

    • Filtering: Extreme outliers (e.g., from special events) are filtered per segment to focus on typical traffic conditions.

    • Auxiliary Features:

      • Built environment (e.g., speed limits, road types, lane counts)

      • Points of Interest (e.g., shops, schools, transit stops)

      • Network connectivity metrics (degree, betweenness, etc.)

      • Temporal indicators (weekday, holidays, hour, month)

      • Weather data (sunshine, precipitation, temperature)

      • Socioeconomic indicators (Berlin only)

      • Proxy motorized traffic metrics (Berlin only)

    See the paper for a complete list of features and detailed methodology.

    -------------------

    We are grateful the European Union’s Horizon Europe research and innovation program funded this project under Grant Agreement No 101057131, Climate Action To Advance HeaLthY Societies in Europe (CATALYSE).

  9. f

    Summary of the BayeScan results for FST outliers.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Om P. Rajora; Andrew J. Eckert; John W. R. Zinck (2023). Summary of the BayeScan results for FST outliers. [Dataset]. http://doi.org/10.1371/journal.pone.0158691.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Om P. Rajora; Andrew J. Eckert; John W. R. Zinck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Values in parentheses are 95% credible intervals. Results are listed for a range of prior weights on the null model.

  10. f

    S1 Data -

    • plos.figshare.com
    zip
    Updated Mar 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Liu; Qian Ning; Guangwei Liu; Haonan Wang; Yixin Zhu; Miao Zhong (2025). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0318431.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 3, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Wei Liu; Qian Ning; Guangwei Liu; Haonan Wang; Yixin Zhu; Miao Zhong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Traditional subspace feature selection methods typically rely on a fixed distance to compute residuals between the original and feature reconstruction spaces. However, this approach struggles to adapt to diverse datasets and often fails to handle noise and outliers effectively. In this paper, we propose an unsupervised feature selection method named unsupervised feature selection algorithm based on -norm feature reconstruction (NFRFS). Employing a flexible norm to represent both the original space and the spatial distance of feature reconstruction, enhances adaptability and broadens its applicability by adjusting p. Additionally, adaptive graph learning is integrated into the feature selection process to preserve the local geometric structure of the data. Features exhibiting sparsity and low redundancy are selected through the regularization constraint of the inner product in the feature selection matrix. To demonstrate the effectiveness of the method, numerical studies were conducted on 14 benchmark datasets. Our results indicate that the method outperforms 10 unsupervised feature selection algorithms in terms of clustering performance.

  11. f

    Single-Locus versus Multilocus Patterns of Local Adaptation to Climate in...

    • plos.figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Om P. Rajora; Andrew J. Eckert; John W. R. Zinck (2023). Single-Locus versus Multilocus Patterns of Local Adaptation to Climate in Eastern White Pine (Pinus strobus, Pinaceae) [Dataset]. http://doi.org/10.1371/journal.pone.0158691
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Om P. Rajora; Andrew J. Eckert; John W. R. Zinck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Natural plant populations are often adapted to their local climate and environmental conditions, and populations of forest trees offer some of the best examples of this pattern. However, little empirical work has focused on the relative contribution of single-locus versus multilocus effects to the genetic architecture of local adaptation in plants/forest trees. Here, we employ eastern white pine (Pinus strobus) to test the hypothesis that it is the inter-genic effects that primarily drive climate-induced local adaptation. The genetic structure of 29 range-wide natural populations of eastern white pine was determined in relation to local climatic factors using both a reference set of SSR markers, and SNPs located in candidate genes putatively involved in adaptive response to climate. Comparisons were made between marker sets using standard single-locus outlier analysis, single-locus and multilocus environment association analyses and a novel implementation of Population Graphs. Magnitudes of population structure were similar between the two marker sets. Outlier loci consistent with diversifying selection were rare for both SNPs and SSRs. However, genetic distances based on the multilocus among population covariances (cGD) were significantly more correlated to climate, even after correcting for spatial effects, for SNPs as compared to SSRs. Coalescent simulations confirmed that the differences in mutation rates between SSRs and SNPs did not affect the topologies of the Population Graphs, and hence values of cGD and their correlations with associated climate variables. We conclude that the multilocus covariances among populations primarily reflect adaptation to local climate and environment in eastern white pine. This result highlights the complexity of the genetic architecture of adaptive traits, as well as the need to consider multilocus effects in studies of local adaptation.

  12. f

    The crcc T2 Revised statistics.

    • plos.figshare.com
    • figshare.com
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hong Choon Ong; Ekele Alih (2023). The crcc T2 Revised statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t015
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hong Choon Ong; Ekele Alih
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The crcc T2 Revised statistics.

  13. f

    Addition-point OLS matrix, B.

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hong Choon Ong; Ekele Alih (2023). Addition-point OLS matrix, B. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hong Choon Ong; Ekele Alih
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Addition-point OLS matrix, B.

  14. f

    Augmented Projector.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hong Choon Ong; Ekele Alih (2023). Augmented Projector. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hong Choon Ong; Ekele Alih
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Augmented Projector.

  15. RDA and pRDA results by molecular marker type reveal differential effects of...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Om P. Rajora; Andrew J. Eckert; John W. R. Zinck (2023). RDA and pRDA results by molecular marker type reveal differential effects of climate and geography across marker types. [Dataset]. http://doi.org/10.1371/journal.pone.0158691.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Om P. Rajora; Andrew J. Eckert; John W. R. Zinck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bolded values are those with P-values < 0.05.

  16. f

    MRM analyses using cGD as the response matrix reveal that the SNP data are...

    • figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Om P. Rajora; Andrew J. Eckert; John W. R. Zinck (2023). MRM analyses using cGD as the response matrix reveal that the SNP data are overly correlated to climate and climate conditional on geography. [Dataset]. http://doi.org/10.1371/journal.pone.0158691.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Om P. Rajora; Andrew J. Eckert; John W. R. Zinck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Differences in R2 values between marker types are significant using a permutation approach.

  17. f

    The Pulp-fibre Dataset.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hong Choon Ong; Ekele Alih (2023). The Pulp-fibre Dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t012
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hong Choon Ong; Ekele Alih
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Pulp-fibre Dataset.

  18. f

    Similarity Matrix sij=(bi−bj)′(Σ^(B))−1(bi−bj).

    • figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hong Choon Ong; Ekele Alih (2023). Similarity Matrix sij=(bi−bj)′(Σ^(B))−1(bi−bj). [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hong Choon Ong; Ekele Alih
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Similarity Matrix sij=(bi−bj)′(Σ^(B))−1(bi−bj).

  19. f

    A summary of SNPs, candidate genes and their biological functions from...

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Om P. Rajora; Andrew J. Eckert; John W. R. Zinck (2023). A summary of SNPs, candidate genes and their biological functions from functional analysis of homologues in model plant Arabidopsis or other plants. [Dataset]. http://doi.org/10.1371/journal.pone.0158691.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Om P. Rajora; Andrew J. Eckert; John W. R. Zinck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The details on these candidate genes, including EST loci, GenBank and TreeGenes database ID, and references for the identification of biological functions are provided in S2 Table.

  20. f

    The crcc, RMVE, RMCD, and classical Hotelling’s T2 statistics.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hong Choon Ong; Ekele Alih (2023). The crcc, RMVE, RMCD, and classical Hotelling’s T2 statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t011
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hong Choon Ong; Ekele Alih
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The crcc, RMVE, RMCD, and classical Hotelling’s T2 statistics.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1

Petre_Slide_CategoricalScatterplotFigShare.pptx

Explore at:
pptxAvailable download formats
Dataset updated
Sep 19, 2016
Dataset provided by
figshare
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/

Search
Clear search
Close search
Google apps
Main menu