30 datasets found

f
Comparison experiments by using IF.
figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gen Li; Jason J. Jung (2023). Comparison experiments by using IF. [Dataset]. http://doi.org/10.1371/journal.pone.0247119.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0247119.t001
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Gen Li; Jason J. Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison experiments by using IF.
f
Data from: Multivariate Functional Data Visualization and Outlier Detection
tandf.figshare.com
application/x-rar
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenlin Dai; Marc G. Genton (2023). Multivariate Functional Data Visualization and Outlier Detection [Dataset]. http://doi.org/10.6084/m9.figshare.6308771.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6308771.v1
Dataset updated
Jun 2, 2023
Dataset provided by
Taylor & Francis
Authors
Wenlin Dai; Marc G. Genton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This article proposes a new graphical tool, the magnitude-shape (MS) plot, for visualizing both the magnitude and shape outlyingness of multivariate functional data. The proposed tool builds on the recent notion of functional directional outlyingness, which measures the centrality of functional data by simultaneously considering the level and the direction of their deviation from the central region. The MS-plot intuitively presents not only levels but also directions of magnitude outlyingness on the horizontal axis or plane, and demonstrates shape outlyingness on the vertical axis. A dividing curve or surface is provided to separate nonoutlying data from the outliers. Both the simulated data and the practical examples confirm that the MS-plot is superior to existing tools for visualizing centrality and detecting outliers for functional data. Supplementary material for this article is available online.
f
Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
figshare
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
f
Data from: Multivariate Outliers and the O3 Plot
tandf.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antony Unwin (2023). Multivariate Outliers and the O3 Plot [Dataset]. http://doi.org/10.6084/m9.figshare.7792115.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7792115.v1
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francis
Authors
Antony Unwin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identifying and dealing with outliers is an important part of data analysis. A new visualization, the O3 plot, is introduced to aid in the display and understanding of patterns of multivariate outliers. It uses the results of identifying outliers for every possible combination of dataset variables to provide insight into why particular cases are outliers. The O3 plot can be used to compare the results from up to six different outlier identification methods. There is anRpackage OutliersO3 implementing the plot. The article is illustrated with outlier analyses of German demographic and economic data. Supplementary materials for this article are available online.
API security: Access behavior anomaly dataset
kaggle.com
Updated Nov 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravi Guntur (2021). API security: Access behavior anomaly dataset [Dataset]. https://www.kaggle.com/datasets/tangodelta/api-access-behaviour-anomaly-dataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 22, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ravi Guntur
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Context

Distributed micro-services based applications are typically accessed via APIs. These APIs are used either by apps or they can be accessed directly by programmatic means. Many a time API access is abused by attackers trying to exploit the business logic exposed by these APIs. The way normal users access these APIs is different from how the attackers access these APIs. Many applications have 100s of APIs that are called in specific order and depending on various factors such as browser refreshes, session refreshes, network errors, or programmatic access these behaviors are not static and can vary for the same user. API calls in long running sessions form access graphs that need to be analysed in order to discover attack patterns and anomalies. Graphs dont lend themselves to numerical computation. We address this issue and provide a dataset where user access behavior is qualified as numerical features. In addition we provide a dataset where raw API call graphs are provided. Supporting the use of these datasets two notebooks on classification, node embeddings and clustering are also provided.

About the dataset

There are 4 files provided. Two files are in CSV format and two files are in JSON format. The files in CSV format are user behavior graphs represented as behavior metrics. The JSON files are the actual API call graphs. The two datasets can be joined on a key so that those who want to combine graphs with metrics could do so in novel ways.

What is new in this dataset

This data set captures API access patterns in terms of behavior metrics. Behaviors are captured by tracking users' API call graphs which are then summarized in terms of metrics. In some sense a categorical sequence of entities has been reduced to numerical metrics.

CSV dataset

There are two files provided. One called supervised_dataset.csv has behaviors labeled as normal or outlier. The second file called remaining_behavior_ext.csv has a larger number of samples that are not labeled but has additional insights as well as a classification created by another algorithm.

What is each row

Each row is one instance of an observed behavior that has been manually classified as normal or outlier

JSON dataset

There are two files provided to correspond to the two CSV files

What is each item

Each item has an _id field that can be used to join against the CSV data sets. Then we have the API behavior graph represented as a list of edges.

Inspiration

To model the classification label with a skewed distribution of normal and abnormal cases and with very few labeled samples available. Use supervised_dataset.csv

To verify where the predicted class differs from the class determined by a second algorithm. Use remaining_behavior_ext.csv
Y
Citation Network Graph
shibatadb.com
Updated May 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubetsu (2019). Citation Network Graph [Dataset]. https://www.shibatadb.com/article/yUqSNnAw
Explore at:
Dataset updated
May 7, 2019
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Description
Network of 42 papers and 130 citation links related to "Non-convex low-rank matrix recovery with arbitrary outliers via median-truncated gradient descent".
Data Visualization Cheat sheets and Resources
kaggle.com
zip
Updated Feb 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kash (2021). Data Visualization Cheat sheets and Resources [Dataset]. https://www.kaggle.com/kaushiksuresh147/data-visualization-cheat-cheats-and-resources
Explore at:
zip(133638507 bytes)Available download formats
Dataset updated
Feb 20, 2021
Authors
Kash
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Data Visualization Corpus

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1430847%2F29f7950c3b7daf11175aab404725542c%2FGettyImages-1187621904-600x360.jpg?generation=1601115151722854&alt=media" alt="">

Data Visualization

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions

The Data Visualizaion Copus

The Data Visualization corpus consists:

32 cheat sheets: This includes A-Z about the techniques and tricks that can be used for visualization, Python and R visualization cheat sheets, Types of charts, and their significance, Storytelling with data, etc..

32 Charts: The corpus also consists of a significant amount of data visualization charts information along with their python code, d3.js codes, and presentations relation to the respective charts explaining in a clear manner!

Some recommended books for data visualization every data scientist's should read:

Beautiful Visualization by Julie Steele and Noah Iliinsky

Information Dashboard Design by Stephen Few

Knowledge is beautiful by David McCandless (Short abstract)

The Functional Art: An Introduction to Information Graphics and Visualization by Alberto Cairo

The Visual Display of Quantitative Information by Edward R. Tufte

storytelling with data: a data visualization guide for business professionals by cole Nussbaumer knaflic

Research paper - Cheat Sheets for Data Visualization Techniques by Zezhong Wang, Lovisa Sundin, Dave Murray-Rust, Benjamin Bach

Suggestions:

In case, if you find any books, cheat sheets, or charts missing and if you would like to suggest some new documents please let me know in the discussion sections!

Resources:

Charts: I personally recommend data viz catalogue, it's easy to understand with their explanation!

Python codes: Plotly for python and Python graph gallery

R codes for charts:Plotly for R

d3 codes: Visualization codes using d3

Request to kaggle users:

A kind request to kaggle users to create notebooks on different visualization charts as per their interest by choosing a dataset of their own as many beginners and other experts could find it useful!

To create interactive EDA using animation with a combination of data visualization charts to give an idea about how to tackle data and extract the insights from the data

Suggestion and queries:

Feel free to use the discussion platform of this data set to ask questions or any queries related to the data visualization corpus and data visualization techniques

Kindly upvote the dataset if you find it useful or if you wish to appreciate the effort taken to gather this corpus! Thank you and have a great day!
f
Performance of DynGPE.
plos.figshare.com
xls
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gen Li; Jason J. Jung (2023). Performance of DynGPE. [Dataset]. http://doi.org/10.1371/journal.pone.0247119.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0247119.t002
Dataset updated
Jun 11, 2023
Dataset provided by
PLOS ONE
Authors
Gen Li; Jason J. Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance of DynGPE.
Data from: Spatio-Temporal Graph Neural Network for Urban Spaces:...
zenodo.org
bin
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Silke Kirstin Kaiser; Silke Kirstin Kaiser (2025). Data from: Spatio-Temporal Graph Neural Network for Urban Spaces: Interpolating Citywide Traffic Volume [Dataset]. http://doi.org/10.5281/zenodo.15332147
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15332147
Dataset updated
May 7, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Silke Kirstin Kaiser; Silke Kirstin Kaiser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Urban Traffic Volume Dataset – Berlin (Strava) & New York City (Taxi)

Associated paper: Spatio-Temporal Graph Neural Network for Urban Spaces: Interpolating Citywide Traffic Volume
Authors: Silke K. Kaiser, Filipe Rodrigues, Carlos Lima Azevedo, Lynn H. Kaack

Citation Request

If you use this dataset, please cite our paper:

Kaiser, S. K., Rodrigues, F., Azevedo Lima, C., & Kaack, L.H. (2025). Spatio-Temporal Graph Neural Network for Urban Spaces: Interpolating Citywide Traffic Volume. [published on arXiv].

Dataset Overview

This dataset includes street-level traffic volume data for two major urban areas:

Berlin (Strava Cycling Data): Daily bicycle traffic volumes from 2019–2023, aggregated from publicly shared Strava user data.

New York City (Taxi Data): Hourly motorized traffic volumes from Manhattan for January–February 2016, derived from GPS trajectories of yellow taxis.

Both datasets are provided at the street-segment level and come with rich auxiliary features capturing spatial, temporal, infrastructure, and contextual information.

Each city includes:

:
Full feature table for each street segment, including traffic volume and auxiliary features.

:
Geometry for each street segment.

:
Binary adjacency matrix.

:
Adjacency matrix weighted by node feature similarity.

:
Adjacency matrix based on Euclidean (bird’s-eye) distance.

:
Adjacency matrix based on real-world road network distance.

:
Adjacency matrix weighted by estimated travel time over the road network.

Key Features and Methodology

Volume Estimation: Strava volumes are rounded aggregates of bike trips; NYC volumes are computed from reconstructed taxi trajectories.

Filtering: Extreme outliers (e.g., from special events) are filtered per segment to focus on typical traffic conditions.

Auxiliary Features:

Built environment (e.g., speed limits, road types, lane counts)

Points of Interest (e.g., shops, schools, transit stops)

Network connectivity metrics (degree, betweenness, etc.)

Temporal indicators (weekday, holidays, hour, month)

Weather data (sunshine, precipitation, temperature)

Socioeconomic indicators (Berlin only)

Proxy motorized traffic metrics (Berlin only)

See the paper for a complete list of features and detailed methodology.

-------------------

We are grateful the European Union’s Horizon Europe research and innovation program funded this project under Grant Agreement No 101057131, Climate Action To Advance HeaLthY Societies in Europe (CATALYSE).
Additional file 2 of Outlier identification and monitoring of institutional...
springernature.figshare.com
txt
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Menelaos Pavlou; Gareth Ambler; Rumana Z. Omar; Andrew T. Goodwin; Uday Trivedi; Peter Ludman; Mark de Belder (2023). Additional file 2 of Outlier identification and monitoring of institutional or clinician performance: an overview of statistical methods and application to national audit data [Dataset]. http://doi.org/10.6084/m9.figshare.22612465.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22612465.v1
Dataset updated
Jun 21, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Menelaos Pavlou; Gareth Ambler; Rumana Z. Omar; Andrew T. Goodwin; Uday Trivedi; Peter Ludman; Mark de Belder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 2.
C
Number of cyclists per route segment
ckan.mobidatalab.eu
Updated Aug 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Metropolregion Rhein-Neckar (2022). Number of cyclists per route segment [Dataset]. https://ckan.mobidatalab.eu/dataset/number-of-cyclists-per-route-segment
Explore at:
http://publications.europa.eu/resource/authority/file-type/csv, http://publications.europa.eu/resource/authority/file-type/gpkg, http://publications.europa.eu/resource/authority/file-type/geojsonAvailable download formats
Dataset updated
Aug 9, 2022
Dataset provided by
Metropolregion Rhein-Neckar
License
http://dcat-ap.de/def/licenses/cc-by-nchttp://dcat-ap.de/def/licenses/cc-by-nc
Description
IMPORTANT: Both directions of travel were aggregated in this data set. The data set contains the nationwide bicycle traffic volumes that were recorded by users via app during 2020 as part of the Stadtradeln campaign of the Climate Alliance e.V. and as part of the MOVEBIS research project. The data was then processed by the professorships for computer networks and traffic ecology at the TU Dresden, cleared of outliers and other means of transport and then projected onto the OSM network graph. An adapted version of the Open Source Routing Machine (OSRM) was used for this. The speeds are stored in the direction of the network edges of the map basis OpenStreetMap.
f
DataSheet1_AEROS: AdaptivE RObust Least-Squares for Graph-Based SLAM.pdf
frontiersin.figshare.com
pdf
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Milad Ramezani; Matias Mattamala; Maurice Fallon (2023). DataSheet1_AEROS: AdaptivE RObust Least-Squares for Graph-Based SLAM.pdf [Dataset]. http://doi.org/10.3389/frobt.2022.789444.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/frobt.2022.789444.s001
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Milad Ramezani; Matias Mattamala; Maurice Fallon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In robot localisation and mapping, outliers are unavoidable when loop-closure measurements are taken into account. A single false-positive loop-closure can have a very negative impact on SLAM problems causing an inferior trajectory to be produced or even for the optimisation to fail entirely. To address this issue, popular existing approaches define a hard switch for each loop-closure constraint. This paper presents AEROS, a novel approach to adaptively solve a robust least-squares minimisation problem by adding just a single extra latent parameter. It can be used in the back-end component of the SLAM system to enable generalised robust cost minimisation by simultaneously estimating the continuous latent parameter along with the set of sensor poses in a single joint optimisation. This leads to a very closely curve fitting on the distribution of the residuals, thereby reducing the effect of outliers. Additionally, we formulate the robust optimisation problem using standard Gaussian factors so that it can be solved by direct application of popular incremental estimation approaches such as iSAM. Experimental results on publicly available synthetic datasets and real LiDAR-SLAM datasets collected from the 2D and 3D LiDAR systems show the competitiveness of our approach with the state-of-the-art techniques and its superiority on real world scenarios.
f
DataSheet1_Use ggbreak to Effectively Utilize Plotting Space to Deal With...
frontiersin.figshare.com
pdf
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuangbin Xu; Meijun Chen; Tingze Feng; Li Zhan; Lang Zhou; Guangchuang Yu (2023). DataSheet1_Use ggbreak to Effectively Utilize Plotting Space to Deal With Large Datasets and Outliers.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.774846.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2021.774846.s001
Dataset updated
Jun 6, 2023
Dataset provided by
Frontiers
Authors
Shuangbin Xu; Meijun Chen; Tingze Feng; Li Zhan; Lang Zhou; Guangchuang Yu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the rapid increase of large-scale datasets, biomedical data visualization is facing challenges. The data may be large, have different orders of magnitude, contain extreme values, and the data distribution is not clear. Here we present an R package ggbreak that allows users to create broken axes using ggplot2 syntax. It can effectively use the plotting area to deal with large datasets (especially for long sequential data), data with different magnitudes, and contain outliers. The ggbreak package increases the available visual space for a better presentation of the data and detailed annotation, thus improves our ability to interpret the data. The ggbreak package is fully compatible with ggplot2 and it is easy to superpose additional layers and applies scale and theme to adjust the plot using the ggplot2 syntax. The ggbreak package is open-source software released under the Artistic-2.0 license, and it is freely available on CRAN (https://CRAN.R-project.org/package=ggbreak) and Github (https://github.com/YuLab-SMU/ggbreak).
f
Goodness-of-fit filtering in classical metric multidimensional scaling with...
tandf.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Graffelman (2023). Goodness-of-fit filtering in classical metric multidimensional scaling with large datasets [Dataset]. http://doi.org/10.6084/m9.figshare.11389830.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11389830.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
Jan Graffelman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Metric multidimensional scaling (MDS) is a widely used multivariate method with applications in almost all scientific disciplines. Eigenvalues obtained in the analysis are usually reported in order to calculate the overall goodness-of-fit of the distance matrix. In this paper, we refine MDS goodness-of-fit calculations, proposing additional point and pairwise goodness-of-fit statistics that can be used to filter poorly represented observations in MDS maps. The proposed statistics are especially relevant for large data sets that contain outliers, with typically many poorly fitted observations, and are helpful for improving MDS output and emphasizing the most important features of the dataset. Several goodness-of-fit statistics are considered, and both Euclidean and non-Euclidean distance matrices are considered. Some examples with data from demographic, genetic and geographic studies are shown.
f
Summary of the BayeScan results for FST outliers.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Om P. Rajora; Andrew J. Eckert; John W. R. Zinck (2023). Summary of the BayeScan results for FST outliers. [Dataset]. http://doi.org/10.1371/journal.pone.0158691.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0158691.t004
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Om P. Rajora; Andrew J. Eckert; John W. R. Zinck
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Values in parentheses are 95% credible intervals. Results are listed for a range of prior weights on the null model.
f
S1 Data -
plos.figshare.com
zip
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wei Liu; Qian Ning; Guangwei Liu; Haonan Wang; Yixin Zhu; Miao Zhong (2025). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0318431.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0318431.s001
Dataset updated
Mar 3, 2025
Dataset provided by
PLOS ONE
Authors
Wei Liu; Qian Ning; Guangwei Liu; Haonan Wang; Yixin Zhu; Miao Zhong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Traditional subspace feature selection methods typically rely on a fixed distance to compute residuals between the original and feature reconstruction spaces. However, this approach struggles to adapt to diverse datasets and often fails to handle noise and outliers effectively. In this paper, we propose an unsupervised feature selection method named unsupervised feature selection algorithm based on -norm feature reconstruction (NFRFS). Employing a flexible norm to represent both the original space and the spatial distance of feature reconstruction, enhances adaptability and broadens its applicability by adjusting p. Additionally, adaptive graph learning is integrated into the feature selection process to preserve the local geometric structure of the data. Features exhibiting sparsity and low redundancy are selected through the regularization constraint of the inner product in the feature selection matrix. To demonstrate the effectiveness of the method, numerical studies were conducted on 14 benchmark datasets. Our results indicate that the method outperforms 10 unsupervised feature selection algorithms in terms of clustering performance.
f
Comparison of Reliability Visualization Methods.
plos.figshare.com
xls
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Hadler; Clemens Ammann; Hadil Saad; Leonhard Grassow; Philine Reisdorf; Steffen Lange; Sascha Däuber; Jeanette Schulz-Menger (2025). Comparison of Reliability Visualization Methods. [Dataset]. http://doi.org/10.1371/journal.pone.0323371.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0323371.t002
Dataset updated
May 29, 2025
Dataset provided by
PLOS ONE
Authors
Thomas Hadler; Clemens Ammann; Hadil Saad; Leonhard Grassow; Philine Reisdorf; Steffen Lange; Sascha Däuber; Jeanette Schulz-Menger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundArtificial intelligence (AI) methods have established themselves in cardiovascular magnetic resonance (CMR) as automated quantification tools for ventricular volumes, function, and myocardial tissue characterization. Quality assurance approaches focus on measuring and controlling AI-expert differences but there is a need for tools that better communicate reliability and agreement. This study introduces the Verity plot, a novel statistical visualization that communicates the reliability of quantitative parameters (QP) with clear agreement criteria and descriptive statistics.MethodsTolerance ranges for the acceptability of the bias and variance of AI-expert differences were derived from intra- and interreader evaluations. AI-expert agreement was defined by bias confidence and variance tolerance intervals being within bias and variance tolerance ranges. A reliability plot was designed to communicate this statistical test for agreement. Verity plots merge reliability plots with density and a scatter plot to illustrate AI-expert differences. Their utility was compared against Correlation, Box and Bland-Altman plots.ResultsBias and variance tolerance ranges were established for volume, function, and myocardial tissue characterization QPs. Verity plots provided insights into statstistcal properties, outlier detection, and parametric test assumptions, outperforming Correlation, Box and Bland-Altman plots. Additionally, they offered a framework for determining the acceptability of AI-expert bias and variance.ConclusionVerity plots offer markers for bias, variance, trends and outliers, in addition to deciding AI quantification acceptability. The plots were successfully applied to various AI methods in CMR and decisively communicated AI-expert agreement.
f
Case study on FewRel 1.0 validation set.
figshare.com
xls
Updated Jun 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tao Liu; Zunwang Ke; Yanbing Li; Wushour Silamu (2023). Case study on FewRel 1.0 validation set. [Dataset]. http://doi.org/10.1371/journal.pone.0286915.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286915.t007
Dataset updated
Jun 8, 2023
Dataset provided by
PLOS ONE
Authors
Tao Liu; Zunwang Ke; Yanbing Li; Wushour Silamu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Few-shot Relation Classification identifies the relation between target entity pairs in unstructured natural language texts by training on a small number of labeled samples. Recent prototype network-based studies have focused on enhancing the prototype representation capability of models by incorporating external knowledge. However, the majority of these works constrain the representation of class prototypes implicitly through complex network structures, such as multi-attention mechanisms, graph neural networks, and contrastive learning, which constrict the model’s ability to generalize. In addition, most models with triplet loss disregard intra-class compactness during model training, thereby limiting the model’s ability to handle outlier samples with low semantic similarity. Therefore, this paper proposes a non-weighted prototype enhancement module that uses the feature-level similarity between prototypes and relation information as a gate to filter and complete features. Meanwhile, we design a class cluster loss that samples difficult positive and negative samples and explicitly constrains both intra-class compactness and inter-class separability to learn a metric space with high discriminability. Extensive experiments were done on the publicly available dataset FewRel 1.0 and 2.0, and the results show the effectiveness of the proposed model.
Graph Samping Results
figshare.com
pdf
Updated Jan 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laurens Rietveld; Christophe Guéret; Stefan Schlobach; Antonis Loizou; Rinke Hoekstra (2016). Graph Samping Results [Dataset]. http://doi.org/10.6084/m9.figshare.724046.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.724046.v2
Dataset updated
Jan 18, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Laurens Rietveld; Christophe Guéret; Stefan Schlobach; Antonis Loizou; Rinke Hoekstra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These boxplots show the recall distribution of our queries for each combination of rewrite method and network analysis algorithm for the 4 datasets we considered. The bounds of the box represent the lower and upper quartiles of the recall scores. The average recall is denoted by the triangle and the horizontal line provides the median recall score. Whiskers extend to datapoints that are up to 1.5 times larger and smaller than the interquartile range. Any points outside this range are considered outliers, and are represented as dots.

Whenever we can claim a statistically significant better recall than one of our baselines, we show this using + or * signs. Our significance calculations are public as well on github: https://github.com/Data2Semantics/GraphSampling/blob/master/bin/significance
f
Hyperparameter of the modes built in our experiments.
figshare.com
xls
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tao Liu; Zunwang Ke; Yanbing Li; Wushour Silamu (2023). Hyperparameter of the modes built in our experiments. [Dataset]. http://doi.org/10.1371/journal.pone.0286915.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0286915.t002
Dataset updated
Jun 8, 2023
Dataset provided by
PLOS ONE
Authors
Tao Liu; Zunwang Ke; Yanbing Li; Wushour Silamu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Hyperparameter of the modes built in our experiments.

Facebook

Twitter

Click to copy link

Link copied

Cite

Gen Li; Jason J. Jung (2023). Comparison experiments by using IF. [Dataset]. http://doi.org/10.1371/journal.pone.0247119.t001

Comparison experiments by using IF.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0247119.t001

Dataset updated

Jun 2, 2023

Dataset provided by

PLOS ONE

Authors

Gen Li; Jason J. Jung

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Comparison experiments by using IF.

Clear search

Close search

Google apps

Main menu

Comparison experiments by using IF.

Data from: Multivariate Functional Data Visualization and Outlier Detection

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

Data from: Multivariate Outliers and the O3 Plot

API security: Access behavior anomaly dataset

Context

About the dataset

What is new in this dataset

CSV dataset

What is each row

JSON dataset

What is each item

Inspiration

Citation Network Graph

Data Visualization Cheat sheets and Resources

The Data Visualization Corpus

Data Visualization

The Data Visualizaion Copus

The Data Visualization corpus consists:

Suggestions:

Resources:

Request to kaggle users:

Suggestion and queries:

Kindly upvote the dataset if you find it useful or if you wish to appreciate the effort taken to gather this corpus! Thank you and have a great day!

Performance of DynGPE.

Data from: Spatio-Temporal Graph Neural Network for Urban Spaces:...

Urban Traffic Volume Dataset – Berlin (Strava) & New York City (Taxi)

Citation Request

Dataset Overview

Key Features and Methodology

Additional file 2 of Outlier identification and monitoring of institutional...

Number of cyclists per route segment

DataSheet1_AEROS: AdaptivE RObust Least-Squares for Graph-Based SLAM.pdf

DataSheet1_Use ggbreak to Effectively Utilize Plotting Space to Deal With...

Goodness-of-fit filtering in classical metric multidimensional scaling with...

Summary of the BayeScan results for FST outliers.

S1 Data -

Comparison of Reliability Visualization Methods.

Case study on FewRel 1.0 validation set.

Graph Samping Results

Hyperparameter of the modes built in our experiments.

Comparison experiments by using IF.