32 datasets found

f
Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
figshare
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
f
Comparison experiments by using IF.
figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gen Li; Jason J. Jung (2023). Comparison experiments by using IF. [Dataset]. http://doi.org/10.1371/journal.pone.0247119.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0247119.t001
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Gen Li; Jason J. Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison experiments by using IF.
f
Performance of DynGPE.
plos.figshare.com
xls
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gen Li; Jason J. Jung (2023). Performance of DynGPE. [Dataset]. http://doi.org/10.1371/journal.pone.0247119.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0247119.t002
Dataset updated
Jun 11, 2023
Dataset provided by
PLOS ONE
Authors
Gen Li; Jason J. Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance of DynGPE.
GOPI Resource - Stacked Column Chart - Change in Jobs in Maryland by Month...
data.wu.ac.at
csv, json, xml
Updated Apr 27, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Bureau of Labor Statistics (2017). GOPI Resource - Stacked Column Chart - Change in Jobs in Maryland by Month (with Feb and March 2010 outliers filtered out) [Dataset]. https://data.wu.ac.at/schema/data_maryland_gov/NWk4aS1ieDU2
Explore at:
xml, csv, jsonAvailable download formats
Dataset updated
Apr 27, 2017
Dataset provided by
Bureau of Labor Statisticshttp://www.bls.gov/
Area covered
Maryland
Description
This dataset represents the CHANGE in the number of jobs per industry category and sub-category from the previous month, not the raw counts of actual jobs. The data behind these monthly change values is from the Bureau of Labor Statistics (BLS) Current Employment Statistics (CES) program. CES data represents businesses and government agencies, providing detailed industry data on employment on nonfarm payrolls.
Data Visualization Cheat sheets and Resources
kaggle.com
zip
Updated Feb 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kash (2021). Data Visualization Cheat sheets and Resources [Dataset]. https://www.kaggle.com/kaushiksuresh147/data-visualization-cheat-cheats-and-resources
Explore at:
zip(133638507 bytes)Available download formats
Dataset updated
Feb 20, 2021
Authors
Kash
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Data Visualization Corpus

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1430847%2F29f7950c3b7daf11175aab404725542c%2FGettyImages-1187621904-600x360.jpg?generation=1601115151722854&alt=media" alt="">

Data Visualization

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions

The Data Visualizaion Copus

The Data Visualization corpus consists:

32 cheat sheets: This includes A-Z about the techniques and tricks that can be used for visualization, Python and R visualization cheat sheets, Types of charts, and their significance, Storytelling with data, etc..

32 Charts: The corpus also consists of a significant amount of data visualization charts information along with their python code, d3.js codes, and presentations relation to the respective charts explaining in a clear manner!

Some recommended books for data visualization every data scientist's should read:

Beautiful Visualization by Julie Steele and Noah Iliinsky

Information Dashboard Design by Stephen Few

Knowledge is beautiful by David McCandless (Short abstract)

The Functional Art: An Introduction to Information Graphics and Visualization by Alberto Cairo

The Visual Display of Quantitative Information by Edward R. Tufte

storytelling with data: a data visualization guide for business professionals by cole Nussbaumer knaflic

Research paper - Cheat Sheets for Data Visualization Techniques by Zezhong Wang, Lovisa Sundin, Dave Murray-Rust, Benjamin Bach

Suggestions:

In case, if you find any books, cheat sheets, or charts missing and if you would like to suggest some new documents please let me know in the discussion sections!

Resources:

Charts: I personally recommend data viz catalogue, it's easy to understand with their explanation!

Python codes: Plotly for python and Python graph gallery

R codes for charts:Plotly for R

d3 codes: Visualization codes using d3

Request to kaggle users:

A kind request to kaggle users to create notebooks on different visualization charts as per their interest by choosing a dataset of their own as many beginners and other experts could find it useful!

To create interactive EDA using animation with a combination of data visualization charts to give an idea about how to tackle data and extract the insights from the data

Suggestion and queries:

Feel free to use the discussion platform of this data set to ask questions or any queries related to the data visualization corpus and data visualization techniques

Kindly upvote the dataset if you find it useful or if you wish to appreciate the effort taken to gather this corpus! Thank you and have a great day!
f
Data from: Robust Multivariate Functional Control Chart
tandf.figshare.com
pdf
Updated Oct 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Capezza; Fabio Centofanti; Antonio Lepore; Biagio Palumbo (2024). Robust Multivariate Functional Control Chart [Dataset]. http://doi.org/10.6084/m9.figshare.25365672.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25365672.v1
Dataset updated
Oct 9, 2024
Dataset provided by
Taylor & Francis
Authors
Christian Capezza; Fabio Centofanti; Antonio Lepore; Biagio Palumbo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In modern Industry 4.0 applications, a huge amount of data is acquired during manufacturing processes and is often contaminated with outliers, which can seriously reduce the performance of control charting procedures, especially in complex and high-dimensional settings. In the context of profile monitoring, we propose a new framework that is referred to as robust multivariate functional control chart (RoMFCC) to monitor a multivariate functional quality characteristic while being robust to both functional casewise and componentwise outliers. In the former case, observations of the quality characteristic are contaminated in all functional variables or components, while, in the latter, the contamination affects one or more components independently. The RoMFCC relies on (I) a functional filter to identify componentwise outliers to be replaced by missing components; (II) a robust multivariate functional data imputation method; (III) a casewise robust dimensionality reduction; (IV) a monitoring strategy for the quality characteristic. Through a Monte Carlo simulation study, the RoMFCC is compared with competing schemes that have already appeared in the literature. A case study is finally presented where the proposed framework is used to monitor a resistance spot welding process in the automotive industry. RoMFCC is implemented in the R package funcharts, available online on CRAN.
Superstore Sales Analysis
kaggle.com
Updated Oct 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Reda Elblgihy (2023). Superstore Sales Analysis [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/superstore-sales-analysis/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ali Reda Elblgihy
Description
Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:

1- Data Import and Transformation:

Gather and import relevant sales data from various sources into Excel.

Utilize Power Query to clean, transform, and structure the data for analysis.

Merge and link different data sheets to create a cohesive dataset, ensuring that all data fields are connected logically.

2- Data Quality Assessment:

Perform data quality checks to identify and address issues like missing values, duplicates, outliers, and data inconsistencies.

Standardize data formats and ensure that all data is in a consistent, usable state.

3- Calculating COGS:

Determine the Cost of Goods Sold (COGS) for each product sold by considering factors like purchase price, shipping costs, and any additional expenses.

Apply appropriate formulas and calculations to determine COGS accurately.

4- Discount Analysis:

Analyze the discount values offered on products to understand their impact on sales and profitability.

Calculate the average discount percentage, identify trends, and visualize the data using charts or graphs.

5- Sales Metrics:

Calculate and analyze various sales metrics, such as total revenue, profit margins, and sales growth.

Utilize Excel functions to compute these metrics and create visuals for better insights.

6- Visualization:

Create visualizations, such as charts, graphs, and pivot tables, to present the data in an understandable and actionable format.

Visual representations can help identify trends, outliers, and patterns in the data.

7- Report Generation:

Compile the findings and insights into a well-structured report or dashboard, making it easy for stakeholders to understand and make informed decisions.

Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
Data from: Spatio-Temporal Graph Neural Network for Urban Spaces:...
zenodo.org
bin
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Silke Kirstin Kaiser; Silke Kirstin Kaiser (2025). Data from: Spatio-Temporal Graph Neural Network for Urban Spaces: Interpolating Citywide Traffic Volume [Dataset]. http://doi.org/10.5281/zenodo.15332147
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15332147
Dataset updated
May 7, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Silke Kirstin Kaiser; Silke Kirstin Kaiser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Urban Traffic Volume Dataset – Berlin (Strava) & New York City (Taxi)

Associated paper: Spatio-Temporal Graph Neural Network for Urban Spaces: Interpolating Citywide Traffic Volume
Authors: Silke K. Kaiser, Filipe Rodrigues, Carlos Lima Azevedo, Lynn H. Kaack

Citation Request

If you use this dataset, please cite our paper:

Kaiser, S. K., Rodrigues, F., Azevedo Lima, C., & Kaack, L.H. (2025). Spatio-Temporal Graph Neural Network for Urban Spaces: Interpolating Citywide Traffic Volume. [published on arXiv].

Dataset Overview

This dataset includes street-level traffic volume data for two major urban areas:

Berlin (Strava Cycling Data): Daily bicycle traffic volumes from 2019–2023, aggregated from publicly shared Strava user data.

New York City (Taxi Data): Hourly motorized traffic volumes from Manhattan for January–February 2016, derived from GPS trajectories of yellow taxis.

Both datasets are provided at the street-segment level and come with rich auxiliary features capturing spatial, temporal, infrastructure, and contextual information.

Each city includes:

:
Full feature table for each street segment, including traffic volume and auxiliary features.

:
Geometry for each street segment.

:
Binary adjacency matrix.

:
Adjacency matrix weighted by node feature similarity.

:
Adjacency matrix based on Euclidean (bird’s-eye) distance.

:
Adjacency matrix based on real-world road network distance.

:
Adjacency matrix weighted by estimated travel time over the road network.

Key Features and Methodology

Volume Estimation: Strava volumes are rounded aggregates of bike trips; NYC volumes are computed from reconstructed taxi trajectories.

Filtering: Extreme outliers (e.g., from special events) are filtered per segment to focus on typical traffic conditions.

Auxiliary Features:

Built environment (e.g., speed limits, road types, lane counts)

Points of Interest (e.g., shops, schools, transit stops)

Network connectivity metrics (degree, betweenness, etc.)

Temporal indicators (weekday, holidays, hour, month)

Weather data (sunshine, precipitation, temperature)

Socioeconomic indicators (Berlin only)

Proxy motorized traffic metrics (Berlin only)

See the paper for a complete list of features and detailed methodology.

-------------------

We are grateful the European Union’s Horizon Europe research and innovation program funded this project under Grant Agreement No 101057131, Climate Action To Advance HeaLthY Societies in Europe (CATALYSE).
f
Summary of the BayeScan results for FST outliers.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Om P. Rajora; Andrew J. Eckert; John W. R. Zinck (2023). Summary of the BayeScan results for FST outliers. [Dataset]. http://doi.org/10.1371/journal.pone.0158691.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0158691.t004
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Om P. Rajora; Andrew J. Eckert; John W. R. Zinck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Values in parentheses are 95% credible intervals. Results are listed for a range of prior weights on the null model.
f
S1 Data -
plos.figshare.com
zip
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wei Liu; Qian Ning; Guangwei Liu; Haonan Wang; Yixin Zhu; Miao Zhong (2025). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0318431.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0318431.s001
Dataset updated
Mar 3, 2025
Dataset provided by
PLOS ONE
Authors
Wei Liu; Qian Ning; Guangwei Liu; Haonan Wang; Yixin Zhu; Miao Zhong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Traditional subspace feature selection methods typically rely on a fixed distance to compute residuals between the original and feature reconstruction spaces. However, this approach struggles to adapt to diverse datasets and often fails to handle noise and outliers effectively. In this paper, we propose an unsupervised feature selection method named unsupervised feature selection algorithm based on -norm feature reconstruction (NFRFS). Employing a flexible norm to represent both the original space and the spatial distance of feature reconstruction, enhances adaptability and broadens its applicability by adjusting p. Additionally, adaptive graph learning is integrated into the feature selection process to preserve the local geometric structure of the data. Features exhibiting sparsity and low redundancy are selected through the regularization constraint of the inner product in the feature selection matrix. To demonstrate the effectiveness of the method, numerical studies were conducted on 14 benchmark datasets. Our results indicate that the method outperforms 10 unsupervised feature selection algorithms in terms of clustering performance.
f
Single-Locus versus Multilocus Patterns of Local Adaptation to Climate in...
plos.figshare.com
pdf
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Om P. Rajora; Andrew J. Eckert; John W. R. Zinck (2023). Single-Locus versus Multilocus Patterns of Local Adaptation to Climate in Eastern White Pine (Pinus strobus, Pinaceae) [Dataset]. http://doi.org/10.1371/journal.pone.0158691
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0158691
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Om P. Rajora; Andrew J. Eckert; John W. R. Zinck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Natural plant populations are often adapted to their local climate and environmental conditions, and populations of forest trees offer some of the best examples of this pattern. However, little empirical work has focused on the relative contribution of single-locus versus multilocus effects to the genetic architecture of local adaptation in plants/forest trees. Here, we employ eastern white pine (Pinus strobus) to test the hypothesis that it is the inter-genic effects that primarily drive climate-induced local adaptation. The genetic structure of 29 range-wide natural populations of eastern white pine was determined in relation to local climatic factors using both a reference set of SSR markers, and SNPs located in candidate genes putatively involved in adaptive response to climate. Comparisons were made between marker sets using standard single-locus outlier analysis, single-locus and multilocus environment association analyses and a novel implementation of Population Graphs. Magnitudes of population structure were similar between the two marker sets. Outlier loci consistent with diversifying selection were rare for both SNPs and SSRs. However, genetic distances based on the multilocus among population covariances (cGD) were significantly more correlated to climate, even after correcting for spatial effects, for SNPs as compared to SSRs. Coalescent simulations confirmed that the differences in mutation rates between SSRs and SNPs did not affect the topologies of the Population Graphs, and hence values of cGD and their correlations with associated climate variables. We conclude that the multilocus covariances among populations primarily reflect adaptation to local climate and environment in eastern white pine. This result highlights the complexity of the genetic architecture of adaptive traits, as well as the need to consider multilocus effects in studies of local adaptation.
f
The crcc T2 Revised statistics.
plos.figshare.com
figshare.com
xls
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hong Choon Ong; Ekele Alih (2023). The crcc T2 Revised statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t015
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125835.t015
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Hong Choon Ong; Ekele Alih
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The crcc T2 Revised statistics.
f
Addition-point OLS matrix, B.
plos.figshare.com
xls
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hong Choon Ong; Ekele Alih (2023). Addition-point OLS matrix, B. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125835.t009
Dataset updated
Jun 5, 2023
Dataset provided by
PLOS ONE
Authors
Hong Choon Ong; Ekele Alih
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Addition-point OLS matrix, B.
f
Augmented Projector.
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hong Choon Ong; Ekele Alih (2023). Augmented Projector. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125835.t008
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Hong Choon Ong; Ekele Alih
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Augmented Projector.
RDA and pRDA results by molecular marker type reveal differential effects of...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Om P. Rajora; Andrew J. Eckert; John W. R. Zinck (2023). RDA and pRDA results by molecular marker type reveal differential effects of climate and geography across marker types. [Dataset]. http://doi.org/10.1371/journal.pone.0158691.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0158691.t005
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Om P. Rajora; Andrew J. Eckert; John W. R. Zinck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Bolded values are those with P-values < 0.05.
f
MRM analyses using cGD as the response matrix reveal that the SNP data are...
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Om P. Rajora; Andrew J. Eckert; John W. R. Zinck (2023). MRM analyses using cGD as the response matrix reveal that the SNP data are overly correlated to climate and climate conditional on geography. [Dataset]. http://doi.org/10.1371/journal.pone.0158691.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0158691.t006
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Om P. Rajora; Andrew J. Eckert; John W. R. Zinck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Differences in R2 values between marker types are significant using a permutation approach.
f
The Pulp-fibre Dataset.
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hong Choon Ong; Ekele Alih (2023). The Pulp-fibre Dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t012
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125835.t012
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Hong Choon Ong; Ekele Alih
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Pulp-fibre Dataset.
f
Similarity Matrix sij=(bi−bj)′(Σ^(B))−1(bi−bj).
figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hong Choon Ong; Ekele Alih (2023). Similarity Matrix sij=(bi−bj)′(Σ^(B))−1(bi−bj). [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125835.t010
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Hong Choon Ong; Ekele Alih
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Similarity Matrix sij=(bi−bj)′(Σ^(B))−1(bi−bj).
f
A summary of SNPs, candidate genes and their biological functions from...
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Om P. Rajora; Andrew J. Eckert; John W. R. Zinck (2023). A summary of SNPs, candidate genes and their biological functions from functional analysis of homologues in model plant Arabidopsis or other plants. [Dataset]. http://doi.org/10.1371/journal.pone.0158691.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0158691.t002
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Om P. Rajora; Andrew J. Eckert; John W. R. Zinck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The details on these candidate genes, including EST loci, GenBank and TreeGenes database ID, and references for the identification of biological functions are provided in S2 Table.
f
The crcc, RMVE, RMCD, and classical Hotelling’s T2 statistics.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hong Choon Ong; Ekele Alih (2023). The crcc, RMVE, RMCD, and classical Hotelling’s T2 statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t011
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125835.t011
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Hong Choon Ong; Ekele Alih
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The crcc, RMVE, RMCD, and classical Hotelling’s T2 statistics.

Facebook

Twitter

Click to copy link

Link copied

Cite

Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1

Petre_Slide_CategoricalScatterplotFigShare.pptx

Explore at:

pptxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3840102.v1

Dataset updated

Sep 19, 2016

Dataset provided by

figshare

Authors

Benj Petre; Aurore Coince; Sophien Kamoun

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/

Clear search

Close search

Google apps

Main menu

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

Comparison experiments by using IF.

Performance of DynGPE.

GOPI Resource - Stacked Column Chart - Change in Jobs in Maryland by Month...

Data Visualization Cheat sheets and Resources

The Data Visualization Corpus

Data Visualization

The Data Visualizaion Copus

The Data Visualization corpus consists:

Suggestions:

Resources:

Request to kaggle users:

Suggestion and queries:

Kindly upvote the dataset if you find it useful or if you wish to appreciate the effort taken to gather this corpus! Thank you and have a great day!

Data from: Robust Multivariate Functional Control Chart

Superstore Sales Analysis

Data from: Spatio-Temporal Graph Neural Network for Urban Spaces:...

Urban Traffic Volume Dataset – Berlin (Strava) & New York City (Taxi)

Citation Request

Dataset Overview

Key Features and Methodology

Summary of the BayeScan results for FST outliers.

S1 Data -

Single-Locus versus Multilocus Patterns of Local Adaptation to Climate in...

The crcc T2 Revised statistics.

Addition-point OLS matrix, B.

Augmented Projector.

RDA and pRDA results by molecular marker type reveal differential effects of...

MRM analyses using cGD as the response matrix reveal that the SNP data are...

The Pulp-fibre Dataset.

Similarity Matrix sij=(bi−bj)′(Σ^(B))−1(bi−bj).

A summary of SNPs, candidate genes and their biological functions from...

The crcc, RMVE, RMCD, and classical Hotelling’s T2 statistics.

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate