100+ datasets found

Additional file 1: of Proposal of supervised data analysis strategy of...
springernature.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elena Landoni; Rosalba Miceli; Maurizio Callari; Paola Tiberio; Valentina Appierto; Valentina Angeloni; Luigi Mariani; Maria Daidone (2023). Additional file 1: of Proposal of supervised data analysis strategy of plasma miRNAs from hybridisation array data with an application to assess hemolysis-related deregulation [Dataset]. http://doi.org/10.6084/m9.figshare.c.3595874_D5.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3595874_D5.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Elena Landoni; Rosalba Miceli; Maurizio Callari; Paola Tiberio; Valentina Appierto; Valentina Angeloni; Luigi Mariani; Maria Daidone
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R codes for implementing the described analyses (sample processing, data pre-processing, class comparison and class prediction). Caliper matching was implemented using the nonrandom package; the t- and the AD tests were implemented using the stats package and the adk package, respectively. Notice that the updated package for implementing the AD test is kSamples. As regards the bootstrap selection and the egg-shaped plot, we respectively modified the doBS and the importance igraph functions, both included in the bootfs package. For the SVM model we used the e1071 package. (R 12Â kb)
s
Data from: Mapping beta diversity from space: Sparse Generalized...
eprints.soton.ac.uk
search.dataone.org
+3more
Updated May 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leitão, Pedro J.; Suess, Stefan; Schwieder, Marcel; Catry, Inês; Milton, Edward; Moreira, Francisco; Osborne, Patrick E.; Pinto, Manuel J.; Van Der Linden, Sebastian; Hostert, Patrick; Milton, Edward (2023). Data from: Mapping beta diversity from space: Sparse Generalized Dissimilarity Modelling (SGDM) for analysing high-dimensional data [Dataset]. http://doi.org/10.5061/dryad.ns7pv
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.ns7pv
Dataset updated
May 6, 2023
Dataset provided by
DRYAD
Authors
Leitão, Pedro J.; Suess, Stefan; Schwieder, Marcel; Catry, Inês; Milton, Edward; Moreira, Francisco; Osborne, Patrick E.; Pinto, Manuel J.; Van Der Linden, Sebastian; Hostert, Patrick; Milton, Edward
Description
Species and environmental dataThis compiled (zip) file consists of 7 matrices of data: one species data matrix, with abundance observations per visited plot; and 6 environmental data matrices, consisting of land cover classification (Class), simulated EnMAP and Landsat data (April and August), and a 6 time-step Landsat time series (January, March, May, June, July and September). All data is compiled to the 125m radius plots, as described in the paper.Leitaoetal_Mapping beta diversity from space_Data.zip,1. Spatial patterns of community composition turnover (beta diversity) may be mapped through Generalised Dissimilarity Modelling (GDM). While remote sensing data are adequate to describe these patterns, the often high-dimensional nature of these data poses some analytical challenges, potentially resulting in loss of generality. This may hinder the use of such data for mapping and monitoring beta-diversity patterns. 2. This study presents Sparse Generalised Dissimilarity Modelling (SGDM), a methodological framework designed to improve the use of high-dimensional data to predict community turnover with GDM. SGDM consists of a two-stage approach, by first transforming the environmental data with a sparse canonical correlation analysis (SCCA), aimed at dealing with high-dimensional datasets, and secondly fitting the transformed data with GDM. The SCCA penalisation parameters are chosen according to a grid search procedure in order to optimise the predictive performance of a GDM fit on the resulting components. The proposed method was illustrated on a case study with a clear environmental gradient of shrub encroachment following cropland abandonment, and subsequent turnover in the bird communities. Bird community data, collected on 115 plots located along the described gradient, were used to fit composition dissimilarity as a function of several remote sensing datasets, including a time series of Landsat data as well as simulated EnMAP hyperspectral data. 3. The proposed approach always outperformed GDM models when fit on high-dimensional datasets. Its usage on low-dimensional data was not consistently advantageous. Models using high-dimensional data, on the other hand, always outperformed those using low-dimensional data, such as single date multispectral imagery. 4. This approach improved the direct use of high-dimensional remote sensing data, such as time series or hyperspectral imagery, for community dissimilarity modelling, resulting in better performing models. The good performance of models using high-dimensional datasets further highlights the relevance of dense time series and data coming from new and forthcoming satellite sensors for ecological applications such as mapping species beta diversity.
s
Citation Trends for "Ensemble feature selection for high-dimensional data: a...
shibatadb.com
Updated Feb 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubetsu (2019). Citation Trends for "Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains" [Dataset]. https://www.shibatadb.com/article/sTWmXKQu
Explore at:
Dataset updated
Feb 25, 2019
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Time period covered
2019 - 2025
Variables measured
New Citations per Year
Description
Yearly citation counts for the publication titled "Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains".
f
Data from: High dimensional surrogacy: computational aspects of an upscaled...
tandf.figshare.com
text/x-tex
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rudradev Sengupta; Nolen Joy Perualila; Ziv Shkedy; Przemyslaw Biecek; Geert Molenberghs; Luc Bijnens (2023). High dimensional surrogacy: computational aspects of an upscaled analysis [Dataset]. http://doi.org/10.6084/m9.figshare.9746051.v1
Explore at:
text/x-texAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9746051.v1
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francis
Authors
Rudradev Sengupta; Nolen Joy Perualila; Ziv Shkedy; Przemyslaw Biecek; Geert Molenberghs; Luc Bijnens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identification of genomic biomarkers is an important area of research in the context of drug discovery experiments. These experiments typically consist of several high dimensional datasets that contain information about a set of drugs (compounds) under development. This type of data structure introduces the challenge of multi-source data integration. High-Performance Computing (HPC) has become an important tool for everyday research tasks. In the context of drug discovery, high dimensional multi-source data needs to be analyzed to identify the biological pathways related to the new set of drugs under development. In order to process all information contained in the datasets, HPC techniques are required. Even though R packages for parallel computing are available, they are not optimized for a specific setting and data structure. In this article, we propose a new framework, for data analysis, to use R in a computer cluster. The proposed data analysis workflow is applied to a multi-source high dimensional drug discovery dataset and compared with a few existing R packages for parallel computing.
MOESM1 of A non-parametric maximum for number of selected features:...
springernature.figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amir Ghaseminejad Tafreshi (2023). MOESM1 of A non-parametric maximum for number of selected features: objective optima for FDR and significance threshold with application to ordinal survey analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6401663.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6401663.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Amir Ghaseminejad Tafreshi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1. The raw data file used in this study.
Data from: A method for analysis of phenotypic change for phenotypes...
zenodo.org
data.niaid.nih.gov
+2more
csv
Updated May 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael L. Collyer; David J. Sekora; Dean C. Adams; Michael L. Collyer; David J. Sekora; Dean C. Adams (2022). Data from: A method for analysis of phenotypic change for phenotypes described by high-dimensional data [Dataset]. http://doi.org/10.5061/dryad.1p80f
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.1p80f
Dataset updated
May 29, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael L. Collyer; David J. Sekora; Dean C. Adams; Michael L. Collyer; David J. Sekora; Dean C. Adams
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The analysis of phenotypic change is important for several evolutionary biology disciplines, including phenotypic plasticity, evolutionary developmental biology, morphological evolution, physiological evolution, evolutionary ecology and behavioral evolution. It is common for researchers in these disciplines to work with multivariate phenotypic data. When phenotypic variables exceed the number of research subjects—data called 'high-dimensional data'—researchers are confronted with analytical challenges. Parametric tests that require high observation to variable ratios present a paradox for researchers, as eliminating variables potentially reduces effect sizes for comparative analyses, yet test statistics require more observations than variables. This problem is exacerbated with data that describe 'multidimensional' phenotypes, whereby a description of phenotype requires high-dimensional data. For example, landmark-based geometric morphometric data use the Cartesian coordinates of (potentially) many anatomical landmarks to describe organismal shape. Collectively such shape variables describe organism shape, although the analysis of each variable, independently, offers little benefit for addressing biological questions. Here we present a nonparametric method of evaluating effect size that is not constrained by the number of phenotypic variables, and motivate its use with example analyses of phenotypic change using geometric morphometric data. Our examples contrast different characterizations of body shape for a desert fish species, associated with measuring and comparing sexual dimorphism between two populations. We demonstrate that using more phenotypic variables can increase effect sizes, and allow for stronger inferences.
Research data supporting: "Relevant, hidden, and frustrated information in...
zenodo.org
zip
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chiara Lionello; Chiara Lionello; Matteo Becchi; Matteo Becchi; Simone Martino; Simone Martino; Giovanni M. Pavan; Giovanni M. Pavan (2025). Research data supporting: "Relevant, hidden, and frustrated information in high-dimensional analyses of complex dynamical systems with internal noise" [Dataset]. http://doi.org/10.5281/zenodo.14529457
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14529457
Dataset updated
May 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Chiara Lionello; Chiara Lionello; Matteo Becchi; Matteo Becchi; Simone Martino; Simone Martino; Giovanni M. Pavan; Giovanni M. Pavan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the set of data shown in the paper "Relevant, hidden, and frustrated information in high-dimensional analyses of complex dynamical systems with internal noise", published on arXiv (DOI: 10.48550/arXiv.2412.09412).

The scripts contained herein are:

PCA-Analysis.py: python script to calculate the SOAP descriptor, denoising it, and compute the Principal Component Analysis

SOAP-Component-Analysis.py: python script to calculate the variance of the single SOAP components

Hierarchical-Clustering.py: python script to compute the hierarchical clustering and plot the dataset

OnionClustering-1d.py: script to compute the Onion clustering on a single SOAP component or principal component

OnionClustering-2d.py: script to compute bi-dimensional Onion clustering

OnionClustering-plot.py: script to plot the Onion plot, removing clusters with population <1%

UMAP.py: script to compute the UMAP dimensionality reduction technique

To reproduce the data of this work you should start form SOAP-Component-Analysis.py to calculate the SOAP descriptor and select the components that are interesting for you, then you can calculate the PCA with PCA-Analysis.py, and applying the clustering based on your necessities (OnionClustering-1d.py, OnionClustering-2d.py, Hierarchical-Clustering.py). Further modifications of the Onion plot can be done with the script: OnionClustering-plot.py. Umap can be calculated with UMAP.py.

Additional data contained herein are:

starting-configuration.gro: gromacs file with the initial configuration of the ice-water system

traj-ice-water-50ns-sampl4ps.xtc: trajectory of the ice-water system sampled every 4 ps

traj-ice-water-50ns-sampl40ps.xtc: trajectory of the ice-water system sampled every 40 ps

some files containing the SOAP descriptor of the ice-water system: ice-water-50ns-sampl40ps.hdf5, ice-water-50ns-sampl40ps_soap.hdf5, ice-water-50ns-sampl40ps_soap.npy, ice-water-50ns-sampl40ps_soap-spavg.npy

PCA-results: folder that contains some example results of the PCA

UMAP-results: folder that contains some example results of UMAP

The data related to the Quincke rollers can be found here: https://zenodo.org/records/10638736
f
Data from: Factor Modelling for High-dimensional Functional Time Series
tandf.figshare.com
bin
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaojun Guo; Xinghao Qiao; Qingsong Wang; Zihan Wang (2025). Factor Modelling for High-dimensional Functional Time Series [Dataset]. http://doi.org/10.6084/m9.figshare.29098926.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29098926.v1
Dataset updated
May 19, 2025
Dataset provided by
Taylor & Francis
Authors
Shaojun Guo; Xinghao Qiao; Qingsong Wang; Zihan Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Many economic and scientific problems involve the analysis of high-dimensional functional time series, where the number of functional variables p diverges as the number of serially dependent observations n increases. In this paper, we present a novel functional factor model for high-dimensional functional time series that maintains and makes use of the functional and dynamic structure to achieve great dimension reduction and find the latent factor structure. To estimate the number of functional factors and the factor loadings, we propose a fully functional estimation procedure based on an eigenanalysis for a nonnegative definite and symmetric matrix. Our proposal involves a weight matrix to improve the estimation efficiency and tackle the issue of heterogeneity, the rationale of which is illustrated by formulating the estimation from a novel regression perspective. Asymptotic properties of the proposed method are studied when p diverges at some polynomial rate as n increases. To provide a parsimonious model and enhance interpretability for near-zero factor loadings, we impose sparsity assumptions on the factor loading space and then develop a regularized estimation procedure with theoretical guarantees when p grows exponentially fast relative to n. Finally, we demonstrate the superiority of our proposed estimators over the alternatives/competitors through simulations and applications to a U.K. temperature data set and a Japanese mortality data set.
f
Data from: A Matrix-Free Likelihood Method for Exploratory Factor Analysis...
tandf.figshare.com
datasetcatalog.nlm.nih.gov
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fan Dai; Somak Dutta; Ranjan Maitra (2023). A Matrix-Free Likelihood Method for Exploratory Factor Analysis of High-Dimensional Gaussian Data [Dataset]. http://doi.org/10.6084/m9.figshare.11402247.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11402247.v3
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francis
Authors
Fan Dai; Somak Dutta; Ranjan Maitra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This technical note proposes a novel profile likelihood method for estimating the covariance parameters in exploratory factor analysis of high-dimensional Gaussian datasets with fewer observations than number of variables. An implicitly restarted Lanczos algorithm and a limited-memory quasi-Newton method are implemented to develop a matrix-free framework for likelihood maximization. Simulation results show that our method is substantially faster than the expectation-maximization solution without sacrificing accuracy. Our method is applied to fit factor models on data from suicide attempters, suicide ideators, and a control group. Supplementary materials for this article are available online.

Data from: Multi-Channel Image Data Analysis using Sonification

pub.uni-bielefeld.de

Updated Dec 19, 2018

Facebook

Twitter

Click to copy link

Link copied

Cite

Thomas Hermann; Tim Wilhelm Nattkemper (2018). Multi-Channel Image Data Analysis using Sonification [Dataset]. https://pub.uni-bielefeld.de/record/2763993

Explore at:

Dataset updated

Dec 19, 2018

Authors

Thomas Hermann; Tim Wilhelm Nattkemper

License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

In biomedicine as well as in many other areas experimental data consists of topographically ordered multidimensional data arrays or images.

In our collaboration, multi parameter flourescence microscopy data of immunoflourescently labeled lymphocytes has to be analysed. One experimental data set consists of n intensity images of the sample. As a result of a specific immunolabeling technique in each image different subsets of the lymphocytes appear with high intensity values, expressing the existence of a specific cell surface protein. Because the positions of the cells are not affected by the labeling process, the n flourescence signals of a cell can be traced through the image stack at constant coordinates.

The analysis of such stacks of images by an expert user is limited to two strategies in most laboratories: the images are analyzed one after the other or up to three images are written into the RGB channels of a color map. Obviously, these techniques are not suitable for the analysis of higher dimensional data.

Here, Sonification of the stack of images allows to perceive the complete pattern of all markers. The biomedical expert may probe specific cells on an auditory map and listen to their flourescence patterns. The sonification was designed to satisfy specific requirements:

Identification - Cells with identical patterns should very easily be perceived as identical sounds
Similarity - Similar cell flourescence patterns should lead to sonifications that sound similar
Extensibility - the sonification should be extensible, so that the future addition of markers does not change the sound characteristic, driven by the other markers
Short Duration - the whole sonification should last only a short time of about 1 sec, to allow a fast browsing of the image.

Such sonifications can be derived using several strategies. One is to play a tone for each marker if the corresponding flourescence intensity is more than a threshold. Thus a rythmic pattern emerges for each cell. Another strategy is to use frequency to distinct markers. Thus each cell is a superposition of tones with different pitch and a chord or tone-cluster is the result. This leads to a harmonic presentation of each cell. However, using both time and pitch, the result is a rythmical sequence of tones and thus a specific melody for a cell. As our abbilities to memorize and recognice melodies or musical structures is better than recognizing visual presented histograms, this yields a promising approach for the inspection of such data by an expert. Now, an example sonification is presented using only five dimensional data images. However, the results are even good with much higher dimensionality - we tested the method with a stack of 12 images. The following demonstration uses only 5 markers. A map is rendered to show all cells for browsing (shown right)

https://pub.uni-bielefeld.de/download/2763993/2763999" height=100 width=120 align=TEXTTOP> cd-02	https://pub.uni-bielefeld.de/download/2763993/2764001" height=100 width=120> cd-08		Identical patterns: Cell 1 Cell 2
https://pub.uni-bielefeld.de/download/2763993/2763998" height=100 width=120> cd-03	------	https://pub.uni-bielefeld.de/download/2763993/2764003" height=100 width=120 align=TEXTTOP> superposition	Similar pattern: Cell 3
https://pub.uni-bielefeld.de/download/2763993/2764000" height=100 width=120 align=TEXTTOP> cd-04	https://pub.uni-bielefeld.de/download/2763993/2764002" height=100 width=120 align=TEXTTOP> hla-dr		Very different pattern: Cell 4

A specific advantage of this method is, that it allows to examine the high-dimensional data vectors without the need to change the viewing direction. However, there are many other methods to present such data acoustically, e.g. by using different timbre classes for the markers, like percussive instruments, fluid sounds, musical instruments or the human voice. These alternatives and their applicability are currently investigated.

Y
Citation Network Graph
shibatadb.com
Updated Feb 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubetsu (2019). Citation Network Graph [Dataset]. https://www.shibatadb.com/article/sTWmXKQu
Explore at:
Dataset updated
Feb 25, 2019
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Description
Network of 44 papers and 69 citation links related to "Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains".
Z
Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ilan Davis (2024). Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP gene traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6374011
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
Jeffrey Y. Lee
Ilan Davis
Maria Kiourlappou
Martin Sergeant
Darragh Ennis
Joshua S. Titlow
Stephen Taylor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV: https://mdv.molbiol.ox.ac.uk), a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP gene traps reveals common discordance between mRNA and protein across the nervous system (https://doi.org/10.1083/jcb.202205129). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.
s
Citation Trends for "High-dimensional MRI data analysis using a large-scale...
shibatadb.com
Updated Apr 19, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubetsu (2013). Citation Trends for "High-dimensional MRI data analysis using a large-scale manifold learning approach" [Dataset]. https://www.shibatadb.com/article/7frzPeiQ
Explore at:
Dataset updated
Apr 19, 2013
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Time period covered
2014 - 2023
Variables measured
New Citations per Year
Description
Yearly citation counts for the publication titled "High-dimensional MRI data analysis using a large-scale manifold learning approach".
f
Data from: Mapper–Type Algorithms for Complex Data and Relations
tandf.figshare.com
zip
Updated Jun 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paweł Dłotko; Davide Gurnari; Radmila Sazdanovic (2024). Mapper–Type Algorithms for Complex Data and Relations [Dataset]. http://doi.org/10.6084/m9.figshare.25668931.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25668931.v2
Dataset updated
Jun 7, 2024
Dataset provided by
Taylor & Francis
Authors
Paweł Dłotko; Davide Gurnari; Radmila Sazdanovic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mapper and Ball Mapper are Topological Data Analysis tools used for exploring high dimensional point clouds and visualizing scalar–valued functions on those point clouds. Inspired by open questions in knot theory, new features are added to Ball Mapper that enable encoding of the structure, internal relations and symmetries of the point cloud. Moreover, the strengths of Mapper and Ball Mapper constructions are combined to create a tool for comparing high dimensional data descriptors of a single dataset. This new hybrid algorithm, Mapper on Ball Mapper, is applicable to high dimensional lens functions. As a proof of concept we include applications to knot and game theory.
f
Comparison of robust discriminant methods.
plos.figshare.com
xls
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaojuan Ma; Yubing Duan (2025). Comparison of robust discriminant methods. [Dataset]. http://doi.org/10.1371/journal.pone.0322741.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0322741.t002
Dataset updated
Jun 12, 2025
Dataset provided by
PLOS ONE
Authors
Shaojuan Ma; Yubing Duan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper presents an improved robust Fisher discriminant method designed to handle high-dimensional data, particularly in the presence of outliers. Traditional Fisher discriminant methods are sensitive to outliers, which can significantly degrade their performance. To address this issue, we integrate the Minimum Regularized Covariance Determinant (MRCD) algorithm into the Fisher discriminant framework, resulting in the MRCD-Fisher discriminant model. The MRCD algorithm enhances robustness by regularizing the covariance matrix, making it suitable for high-dimensional data where the number of variables exceeds the number of observations. We conduct comparative experiments with other robust discriminant methods, the results demonstrate that the MRCD-Fisher discriminant outperforms these methods in terms of robustness and accuracy, especially when dealing with data contaminated by outliers. The MRCD-Fisher discriminant maintains high data cleanliness and computational stability, making it a reliable choice for high-dimensional data analysis. This study provides a valuable contribution to the field of robust statistical analysis, offering a practical solution for handling complex, outlier-prone datasets.
s
Citation Trends for "Feature Extraction and Uncorrelated Discriminant...
shibatadb.com
Updated May 15, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubetsu (2008). Citation Trends for "Feature Extraction and Uncorrelated Discriminant Analysis for High-Dimensional Data" [Dataset]. https://www.shibatadb.com/article/GSy4gtV5
Explore at:
Dataset updated
May 15, 2008
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Time period covered
2008 - 2025
Variables measured
New Citations per Year
Description
Yearly citation counts for the publication titled "Feature Extraction and Uncorrelated Discriminant Analysis for High-Dimensional Data".
m
Educational Attainment in North Carolina Public Schools: Use of statistical...
data.mendeley.com
Updated Nov 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
Explore at:
Unique identifier
https://doi.org/10.17632/6cm9wyd5g5.1
Dataset updated
Nov 14, 2018
Authors
Scott Herford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.
N
Data from: High Dimensional Analysis Delineates Myeloid and Lymphoid...
data.niaid.nih.gov
Updated Nov 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gubin MM; Esaulova E; Ward JP; Malkova ON; Runci D; Wong P; Noguchi T; Arthur CD; Meng W; Alspach E; Medrano RV; Fronick C; Fehlings M; Newell EW; Fulton RS; Sheehan KF; Oh ST; Schreiber RD; Artyomov MN (2019). High Dimensional Analysis Delineates Myeloid and Lymphoid Compartment Remodeling during Successful Immune Checkpoint Cancer Therapy [Dataset]. https://data.niaid.nih.gov/resources?id=gse119352
Explore at:
Dataset updated
Nov 22, 2019
Dataset provided by
Washington University in St.Louis
Authors
Gubin MM; Esaulova E; Ward JP; Malkova ON; Runci D; Wong P; Noguchi T; Arthur CD; Meng W; Alspach E; Medrano RV; Fronick C; Fehlings M; Newell EW; Fulton RS; Sheehan KF; Oh ST; Schreiber RD; Artyomov MN
Description
Using complementary forms of high dimensional profiling we define differences in CD45+ cells from syngeneic mouse tumors that either grow progressively or eventually reject following immune checkpoint therapy (ICT). Unbiased assessment of gene expression of tumor infiltrating cells by single cell RNA sequencing (scRNAseq) and longitudinal assessment of cellular protein expression by mass cytometry (CyTOF) revealed significant remodeling of both the lymphoid and myeloid intratumoral compartments. Surprisingly, we observed multiple subpopulations of monocytes/macrophages distinguishable by the combinatorial presence or absence of CD206, CX3CR1, CD1d and iNOS, markers of different macrophage activation states that change over time during ICT in a manner partially dependent on IFNγ. Both the CyTOF data and additional analysis of scRNAseq data support the hypothesis that macrophage polarization/activation results from effects on circulatory monocytes/early macrophages entering tumors rather than on pre-polarized mature intratumoral macrophages. Thus, ICT induces transcriptional and functional remodeling of both myeloid and lymphoid compartments. Droplet-based 3′ end massively parallel single-cell RNA sequencing was performed by encapsulating sorted live CD45+ tumor infiltrating cells into droplets and libraries were prepared using Chromium Single Cell 3′ Reagent Kits v1 according to manufacturer’s protocol (10x Genomics). The generated scRNAseq libraries were sequenced using an Illumina HiSeq2500.
r
Data from: Sparse Principal Component Analysis with Preserved Sparsity...
researchdata.edu.au
Updated 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inge Koch; Navid Shokouhi; Abd-Krim Seghouane; Mathematics and Statistics (2019). Sparse Principal Component Analysis with Preserved Sparsity Pattern [Dataset]. http://doi.org/10.24433/CO.4593141.V1
Explore at:
Unique identifier
https://doi.org/10.24433/CO.4593141.V1
Dataset updated
2019
Dataset provided by
Code Ocean
The University of Western Australia
Authors
Inge Koch; Navid Shokouhi; Abd-Krim Seghouane; Mathematics and Statistics
Description
MATLAB code + demo to reproduce results for "Sparse Principal Component Analysis with Preserved Sparsity". This code calculates the principal loading vectors for any given high-dimensional data matrix. The advantage of this method over existing sparse-PCA methods is that it can produce principal loading vectors with the same sparsity pattern for any number of principal components. Please see Readme.md for more information.
f
DataSheet1_qCLUE: a quantum clustering algorithm for multi-dimensional...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pantaleo, Felice; Mosca, Michele; Dellantonio, Luca; Gopalakrishnan, Dhruv; Redjeb, Wahid; Di Pilato, Antonio (2024). DataSheet1_qCLUE: a quantum clustering algorithm for multi-dimensional datasets.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001305432
Explore at:
Dataset updated
Oct 11, 2024
Authors
Pantaleo, Felice; Mosca, Michele; Dellantonio, Luca; Gopalakrishnan, Dhruv; Redjeb, Wahid; Di Pilato, Antonio
Description
Clustering algorithms are at the basis of several technological applications, and are fueling the development of rapidly evolving fields such as machine learning. In the recent past, however, it has become apparent that they face challenges stemming from datasets that span more spatial dimensions. In fact, the best-performing clustering algorithms scale linearly in the number of points, but quadratically with respect to the local density of points. In this work, we introduce qCLUE, a quantum clustering algorithm that scales linearly in both the number of points and their density. qCLUE is inspired by CLUE, an algorithm developed to address the challenging time and memory budgets of Event Reconstruction (ER) in future High-Energy Physics experiments. As such, qCLUE marries decades of development with the quadratic speedup provided by quantum computers. We numerically test qCLUE in several scenarios, demonstrating its effectiveness and proving it to be a promising route to handle complex data analysis tasks – especially in high-dimensional datasets with high densities of points.

Facebook

Twitter

Click to copy link

Link copied

Cite

Elena Landoni; Rosalba Miceli; Maurizio Callari; Paola Tiberio; Valentina Appierto; Valentina Angeloni; Luigi Mariani; Maria Daidone (2023). Additional file 1: of Proposal of supervised data analysis strategy of plasma miRNAs from hybridisation array data with an application to assess hemolysis-related deregulation [Dataset]. http://doi.org/10.6084/m9.figshare.c.3595874_D5.v1

Additional file 1: of Proposal of supervised data analysis strategy of plasma miRNAs from hybridisation array data with an application to assess hemolysis-related deregulation

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.c.3595874_D5.v1

Dataset updated

May 30, 2023

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Elena Landoni; Rosalba Miceli; Maurizio Callari; Paola Tiberio; Valentina Appierto; Valentina Angeloni; Luigi Mariani; Maria Daidone

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

R codes for implementing the described analyses (sample processing, data pre-processing, class comparison and class prediction). Caliper matching was implemented using the nonrandom package; the t- and the AD tests were implemented using the stats package and the adk package, respectively. Notice that the updated package for implementing the AD test is kSamples. As regards the bootstrap selection and the egg-shaped plot, we respectively modified the doBS and the importance igraph functions, both included in the bootfs package. For the SVM model we used the e1071 package. (R 12Â kb)

Clear search

Close search

Google apps

Main menu

Additional file 1: of Proposal of supervised data analysis strategy of...

Data from: Mapping beta diversity from space: Sparse Generalized...

Citation Trends for "Ensemble feature selection for high-dimensional data: a...

Data from: High dimensional surrogacy: computational aspects of an upscaled...

MOESM1 of A non-parametric maximum for number of selected features:...

Data from: A method for analysis of phenotypic change for phenotypes...

Research data supporting: "Relevant, hidden, and frustrated information in...

Data from: Factor Modelling for High-dimensional Functional Time Series

Data from: A Matrix-Free Likelihood Method for Exploratory Factor Analysis...

Data from: Multi-Channel Image Data Analysis using Sonification

Citation Network Graph

Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...

Citation Trends for "High-dimensional MRI data analysis using a large-scale...

Data from: Mapper–Type Algorithms for Complex Data and Relations

Comparison of robust discriminant methods.

Citation Trends for "Feature Extraction and Uncorrelated Discriminant...

Educational Attainment in North Carolina Public Schools: Use of statistical...

Data from: High Dimensional Analysis Delineates Myeloid and Lymphoid...

Data from: Sparse Principal Component Analysis with Preserved Sparsity...

DataSheet1_qCLUE: a quantum clustering algorithm for multi-dimensional...

Additional file 1: of Proposal of supervised data analysis strategy of plasma miRNAs from hybridisation array data with an application to assess hemolysis-related deregulation