100+ datasets found
  1. Additional file 1: of Proposal of supervised data analysis strategy of...

    • springernature.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Landoni; Rosalba Miceli; Maurizio Callari; Paola Tiberio; Valentina Appierto; Valentina Angeloni; Luigi Mariani; Maria Daidone (2023). Additional file 1: of Proposal of supervised data analysis strategy of plasma miRNAs from hybridisation array data with an application to assess hemolysis-related deregulation [Dataset]. http://doi.org/10.6084/m9.figshare.c.3595874_D5.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Elena Landoni; Rosalba Miceli; Maurizio Callari; Paola Tiberio; Valentina Appierto; Valentina Angeloni; Luigi Mariani; Maria Daidone
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R codes for implementing the described analyses (sample processing, data pre-processing, class comparison and class prediction). Caliper matching was implemented using the nonrandom package; the t- and the AD tests were implemented using the stats package and the adk package, respectively. Notice that the updated package for implementing the AD test is kSamples. As regards the bootstrap selection and the egg-shaped plot, we respectively modified the doBS and the importance igraph functions, both included in the bootfs package. For the SVM model we used the e1071 package. (R 12Â kb)

  2. s

    Data from: Mapping beta diversity from space: Sparse Generalized...

    • eprints.soton.ac.uk
    • search.dataone.org
    • +3more
    Updated May 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leitão, Pedro J.; Suess, Stefan; Schwieder, Marcel; Catry, Inês; Milton, Edward; Moreira, Francisco; Osborne, Patrick E.; Pinto, Manuel J.; Van Der Linden, Sebastian; Hostert, Patrick; Milton, Edward (2023). Data from: Mapping beta diversity from space: Sparse Generalized Dissimilarity Modelling (SGDM) for analysing high-dimensional data [Dataset]. http://doi.org/10.5061/dryad.ns7pv
    Explore at:
    Dataset updated
    May 6, 2023
    Dataset provided by
    DRYAD
    Authors
    Leitão, Pedro J.; Suess, Stefan; Schwieder, Marcel; Catry, Inês; Milton, Edward; Moreira, Francisco; Osborne, Patrick E.; Pinto, Manuel J.; Van Der Linden, Sebastian; Hostert, Patrick; Milton, Edward
    Description

    Species and environmental dataThis compiled (zip) file consists of 7 matrices of data: one species data matrix, with abundance observations per visited plot; and 6 environmental data matrices, consisting of land cover classification (Class), simulated EnMAP and Landsat data (April and August), and a 6 time-step Landsat time series (January, March, May, June, July and September). All data is compiled to the 125m radius plots, as described in the paper.Leitaoetal_Mapping beta diversity from space_Data.zip,1. Spatial patterns of community composition turnover (beta diversity) may be mapped through Generalised Dissimilarity Modelling (GDM). While remote sensing data are adequate to describe these patterns, the often high-dimensional nature of these data poses some analytical challenges, potentially resulting in loss of generality. This may hinder the use of such data for mapping and monitoring beta-diversity patterns. 2. This study presents Sparse Generalised Dissimilarity Modelling (SGDM), a methodological framework designed to improve the use of high-dimensional data to predict community turnover with GDM. SGDM consists of a two-stage approach, by first transforming the environmental data with a sparse canonical correlation analysis (SCCA), aimed at dealing with high-dimensional datasets, and secondly fitting the transformed data with GDM. The SCCA penalisation parameters are chosen according to a grid search procedure in order to optimise the predictive performance of a GDM fit on the resulting components. The proposed method was illustrated on a case study with a clear environmental gradient of shrub encroachment following cropland abandonment, and subsequent turnover in the bird communities. Bird community data, collected on 115 plots located along the described gradient, were used to fit composition dissimilarity as a function of several remote sensing datasets, including a time series of Landsat data as well as simulated EnMAP hyperspectral data. 3. The proposed approach always outperformed GDM models when fit on high-dimensional datasets. Its usage on low-dimensional data was not consistently advantageous. Models using high-dimensional data, on the other hand, always outperformed those using low-dimensional data, such as single date multispectral imagery. 4. This approach improved the direct use of high-dimensional remote sensing data, such as time series or hyperspectral imagery, for community dissimilarity modelling, resulting in better performing models. The good performance of models using high-dimensional datasets further highlights the relevance of dense time series and data coming from new and forthcoming satellite sensors for ecological applications such as mapping species beta diversity.

  3. s

    Citation Trends for "Ensemble feature selection for high-dimensional data: a...

    • shibatadb.com
    Updated Feb 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2019). Citation Trends for "Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains" [Dataset]. https://www.shibatadb.com/article/sTWmXKQu
    Explore at:
    Dataset updated
    Feb 25, 2019
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2019 - 2025
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains".

  4. f

    Data from: High dimensional surrogacy: computational aspects of an upscaled...

    • tandf.figshare.com
    text/x-tex
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rudradev Sengupta; Nolen Joy Perualila; Ziv Shkedy; Przemyslaw Biecek; Geert Molenberghs; Luc Bijnens (2023). High dimensional surrogacy: computational aspects of an upscaled analysis [Dataset]. http://doi.org/10.6084/m9.figshare.9746051.v1
    Explore at:
    text/x-texAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Rudradev Sengupta; Nolen Joy Perualila; Ziv Shkedy; Przemyslaw Biecek; Geert Molenberghs; Luc Bijnens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Identification of genomic biomarkers is an important area of research in the context of drug discovery experiments. These experiments typically consist of several high dimensional datasets that contain information about a set of drugs (compounds) under development. This type of data structure introduces the challenge of multi-source data integration. High-Performance Computing (HPC) has become an important tool for everyday research tasks. In the context of drug discovery, high dimensional multi-source data needs to be analyzed to identify the biological pathways related to the new set of drugs under development. In order to process all information contained in the datasets, HPC techniques are required. Even though R packages for parallel computing are available, they are not optimized for a specific setting and data structure. In this article, we propose a new framework, for data analysis, to use R in a computer cluster. The proposed data analysis workflow is applied to a multi-source high dimensional drug discovery dataset and compared with a few existing R packages for parallel computing.

  5. MOESM1 of A non-parametric maximum for number of selected features:...

    • springernature.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Ghaseminejad Tafreshi (2023). MOESM1 of A non-parametric maximum for number of selected features: objective optima for FDR and significance threshold with application to ordinal survey analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6401663.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Amir Ghaseminejad Tafreshi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1. The raw data file used in this study.

  6. Data from: A method for analysis of phenotypic change for phenotypes...

    • zenodo.org
    • data.niaid.nih.gov
    • +2more
    csv
    Updated May 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael L. Collyer; David J. Sekora; Dean C. Adams; Michael L. Collyer; David J. Sekora; Dean C. Adams (2022). Data from: A method for analysis of phenotypic change for phenotypes described by high-dimensional data [Dataset]. http://doi.org/10.5061/dryad.1p80f
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 29, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael L. Collyer; David J. Sekora; Dean C. Adams; Michael L. Collyer; David J. Sekora; Dean C. Adams
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The analysis of phenotypic change is important for several evolutionary biology disciplines, including phenotypic plasticity, evolutionary developmental biology, morphological evolution, physiological evolution, evolutionary ecology and behavioral evolution. It is common for researchers in these disciplines to work with multivariate phenotypic data. When phenotypic variables exceed the number of research subjects—data called 'high-dimensional data'—researchers are confronted with analytical challenges. Parametric tests that require high observation to variable ratios present a paradox for researchers, as eliminating variables potentially reduces effect sizes for comparative analyses, yet test statistics require more observations than variables. This problem is exacerbated with data that describe 'multidimensional' phenotypes, whereby a description of phenotype requires high-dimensional data. For example, landmark-based geometric morphometric data use the Cartesian coordinates of (potentially) many anatomical landmarks to describe organismal shape. Collectively such shape variables describe organism shape, although the analysis of each variable, independently, offers little benefit for addressing biological questions. Here we present a nonparametric method of evaluating effect size that is not constrained by the number of phenotypic variables, and motivate its use with example analyses of phenotypic change using geometric morphometric data. Our examples contrast different characterizations of body shape for a desert fish species, associated with measuring and comparing sexual dimorphism between two populations. We demonstrate that using more phenotypic variables can increase effect sizes, and allow for stronger inferences.

  7. Research data supporting: "Relevant, hidden, and frustrated information in...

    • zenodo.org
    zip
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chiara Lionello; Chiara Lionello; Matteo Becchi; Matteo Becchi; Simone Martino; Simone Martino; Giovanni M. Pavan; Giovanni M. Pavan (2025). Research data supporting: "Relevant, hidden, and frustrated information in high-dimensional analyses of complex dynamical systems with internal noise" [Dataset]. http://doi.org/10.5281/zenodo.14529457
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chiara Lionello; Chiara Lionello; Matteo Becchi; Matteo Becchi; Simone Martino; Simone Martino; Giovanni M. Pavan; Giovanni M. Pavan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the set of data shown in the paper "Relevant, hidden, and frustrated information in high-dimensional analyses of complex dynamical systems with internal noise", published on arXiv (DOI: 10.48550/arXiv.2412.09412).

    The scripts contained herein are:

    1. PCA-Analysis.py: python script to calculate the SOAP descriptor, denoising it, and compute the Principal Component Analysis
    2. SOAP-Component-Analysis.py: python script to calculate the variance of the single SOAP components
    3. Hierarchical-Clustering.py: python script to compute the hierarchical clustering and plot the dataset
    4. OnionClustering-1d.py: script to compute the Onion clustering on a single SOAP component or principal component
    5. OnionClustering-2d.py: script to compute bi-dimensional Onion clustering
    6. OnionClustering-plot.py: script to plot the Onion plot, removing clusters with population <1%
    7. UMAP.py: script to compute the UMAP dimensionality reduction technique

    To reproduce the data of this work you should start form SOAP-Component-Analysis.py to calculate the SOAP descriptor and select the components that are interesting for you, then you can calculate the PCA with PCA-Analysis.py, and applying the clustering based on your necessities (OnionClustering-1d.py, OnionClustering-2d.py, Hierarchical-Clustering.py). Further modifications of the Onion plot can be done with the script: OnionClustering-plot.py. Umap can be calculated with UMAP.py.

    Additional data contained herein are:

    1. starting-configuration.gro: gromacs file with the initial configuration of the ice-water system
    2. traj-ice-water-50ns-sampl4ps.xtc: trajectory of the ice-water system sampled every 4 ps
    3. traj-ice-water-50ns-sampl40ps.xtc: trajectory of the ice-water system sampled every 40 ps
    4. some files containing the SOAP descriptor of the ice-water system: ice-water-50ns-sampl40ps.hdf5, ice-water-50ns-sampl40ps_soap.hdf5, ice-water-50ns-sampl40ps_soap.npy, ice-water-50ns-sampl40ps_soap-spavg.npy
    5. PCA-results: folder that contains some example results of the PCA
    6. UMAP-results: folder that contains some example results of UMAP

    The data related to the Quincke rollers can be found here: https://zenodo.org/records/10638736

  8. f

    Data from: Factor Modelling for High-dimensional Functional Time Series

    • tandf.figshare.com
    bin
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaojun Guo; Xinghao Qiao; Qingsong Wang; Zihan Wang (2025). Factor Modelling for High-dimensional Functional Time Series [Dataset]. http://doi.org/10.6084/m9.figshare.29098926.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Shaojun Guo; Xinghao Qiao; Qingsong Wang; Zihan Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Many economic and scientific problems involve the analysis of high-dimensional functional time series, where the number of functional variables p diverges as the number of serially dependent observations n increases. In this paper, we present a novel functional factor model for high-dimensional functional time series that maintains and makes use of the functional and dynamic structure to achieve great dimension reduction and find the latent factor structure. To estimate the number of functional factors and the factor loadings, we propose a fully functional estimation procedure based on an eigenanalysis for a nonnegative definite and symmetric matrix. Our proposal involves a weight matrix to improve the estimation efficiency and tackle the issue of heterogeneity, the rationale of which is illustrated by formulating the estimation from a novel regression perspective. Asymptotic properties of the proposed method are studied when p diverges at some polynomial rate as n increases. To provide a parsimonious model and enhance interpretability for near-zero factor loadings, we impose sparsity assumptions on the factor loading space and then develop a regularized estimation procedure with theoretical guarantees when p grows exponentially fast relative to n. Finally, we demonstrate the superiority of our proposed estimators over the alternatives/competitors through simulations and applications to a U.K. temperature data set and a Japanese mortality data set.

  9. f

    Data from: A Matrix-Free Likelihood Method for Exploratory Factor Analysis...

    • tandf.figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fan Dai; Somak Dutta; Ranjan Maitra (2023). A Matrix-Free Likelihood Method for Exploratory Factor Analysis of High-Dimensional Gaussian Data [Dataset]. http://doi.org/10.6084/m9.figshare.11402247.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Fan Dai; Somak Dutta; Ranjan Maitra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This technical note proposes a novel profile likelihood method for estimating the covariance parameters in exploratory factor analysis of high-dimensional Gaussian datasets with fewer observations than number of variables. An implicitly restarted Lanczos algorithm and a limited-memory quasi-Newton method are implemented to develop a matrix-free framework for likelihood maximization. Simulation results show that our method is substantially faster than the expectation-maximization solution without sacrificing accuracy. Our method is applied to fit factor models on data from suicide attempters, suicide ideators, and a control group. Supplementary materials for this article are available online.

  10. u

    Data from: Multi-Channel Image Data Analysis using Sonification

    • pub.uni-bielefeld.de
    Updated Dec 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Hermann; Tim Wilhelm Nattkemper (2018). Multi-Channel Image Data Analysis using Sonification [Dataset]. https://pub.uni-bielefeld.de/record/2763993
    Explore at:
    Dataset updated
    Dec 19, 2018
    Authors
    Thomas Hermann; Tim Wilhelm Nattkemper
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    In biomedicine as well as in many other areas experimental data consists of topographically ordered multidimensional data arrays or images.

    In our collaboration, multi parameter flourescence microscopy data of immunoflourescently labeled lymphocytes has to be analysed. One experimental data set consists of n intensity images of the sample. As a result of a specific immunolabeling technique in each image different subsets of the lymphocytes appear with high intensity values, expressing the existence of a specific cell surface protein. Because the positions of the cells are not affected by the labeling process, the n flourescence signals of a cell can be traced through the image stack at constant coordinates.

    The analysis of such stacks of images by an expert user is limited to two strategies in most laboratories: the images are analyzed one after the other or up to three images are written into the RGB channels of a color map. Obviously, these techniques are not suitable for the analysis of higher dimensional data.

    Here, Sonification of the stack of images allows to perceive the complete pattern of all markers. The biomedical expert may probe specific cells on an auditory map and listen to their flourescence patterns. The sonification was designed to satisfy specific requirements:

    • Identification - Cells with identical patterns should very easily be perceived as identical sounds
    • Similarity - Similar cell flourescence patterns should lead to sonifications that sound similar
    • Extensibility - the sonification should be extensible, so that the future addition of markers does not change the sound characteristic, driven by the other markers
    • Short Duration - the whole sonification should last only a short time of about 1 sec, to allow a fast browsing of the image.

    Such sonifications can be derived using several strategies. One is to play a tone for each marker if the corresponding flourescence intensity is more than a threshold. Thus a rythmic pattern emerges for each cell. Another strategy is to use frequency to distinct markers. Thus each cell is a superposition of tones with different pitch and a chord or tone-cluster is the result. This leads to a harmonic presentation of each cell. However, using both time and pitch, the result is a rythmical sequence of tones and thus a specific melody for a cell. As our abbilities to memorize and recognice melodies or musical structures is better than recognizing visual presented histograms, this yields a promising approach for the inspection of such data by an expert. Now, an example sonification is presented using only five dimensional data images. However, the results are even good with much higher dimensionality - we tested the method with a stack of 12 images. The following demonstration uses only 5 markers. A map is rendered to show all cells for browsing (shown right)

    https://pub.uni-bielefeld.de/download/2763993/2763999" height=100 width=120 align=TEXTTOP>
    cd-02
    https://pub.uni-bielefeld.de/download/2763993/2764001" height=100 width=120>
    cd-08
    Identical patterns:
    Cell 1
    Cell 2
    https://pub.uni-bielefeld.de/download/2763993/2763998" height=100 width=120>
    cd-03




    ------

    https://pub.uni-bielefeld.de/download/2763993/2764003" height=100 width=120 align=TEXTTOP>
    superposition
    Similar pattern:
    Cell 3
    https://pub.uni-bielefeld.de/download/2763993/2764000" height=100 width=120 align=TEXTTOP>
    cd-04
    https://pub.uni-bielefeld.de/download/2763993/2764002" height=100 width=120 align=TEXTTOP>
    hla-dr
    Very different pattern:
    Cell 4

    A specific advantage of this method is, that it allows to examine the high-dimensional data vectors without the need to change the viewing direction. However, there are many other methods to present such data acoustically, e.g. by using different timbre classes for the markers, like percussive instruments, fluid sounds, musical instruments or the human voice. These alternatives and their applicability are currently investigated.

  11. Y

    Citation Network Graph

    • shibatadb.com
    Updated Feb 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2019). Citation Network Graph [Dataset]. https://www.shibatadb.com/article/sTWmXKQu
    Explore at:
    Dataset updated
    Feb 25, 2019
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Description

    Network of 44 papers and 69 citation links related to "Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains".

  12. Z

    Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilan Davis (2024). Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP gene traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6374011
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Jeffrey Y. Lee
    Ilan Davis
    Maria Kiourlappou
    Martin Sergeant
    Darragh Ennis
    Joshua S. Titlow
    Stephen Taylor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV: https://mdv.molbiol.ox.ac.uk), a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP gene traps reveals common discordance between mRNA and protein across the nervous system (https://doi.org/10.1083/jcb.202205129). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.

  13. s

    Citation Trends for "High-dimensional MRI data analysis using a large-scale...

    • shibatadb.com
    Updated Apr 19, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2013). Citation Trends for "High-dimensional MRI data analysis using a large-scale manifold learning approach" [Dataset]. https://www.shibatadb.com/article/7frzPeiQ
    Explore at:
    Dataset updated
    Apr 19, 2013
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2014 - 2023
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "High-dimensional MRI data analysis using a large-scale manifold learning approach".

  14. f

    Data from: Mapper–Type Algorithms for Complex Data and Relations

    • tandf.figshare.com
    zip
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paweł Dłotko; Davide Gurnari; Radmila Sazdanovic (2024). Mapper–Type Algorithms for Complex Data and Relations [Dataset]. http://doi.org/10.6084/m9.figshare.25668931.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Paweł Dłotko; Davide Gurnari; Radmila Sazdanovic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mapper and Ball Mapper are Topological Data Analysis tools used for exploring high dimensional point clouds and visualizing scalar–valued functions on those point clouds. Inspired by open questions in knot theory, new features are added to Ball Mapper that enable encoding of the structure, internal relations and symmetries of the point cloud. Moreover, the strengths of Mapper and Ball Mapper constructions are combined to create a tool for comparing high dimensional data descriptors of a single dataset. This new hybrid algorithm, Mapper on Ball Mapper, is applicable to high dimensional lens functions. As a proof of concept we include applications to knot and game theory.

  15. f

    Comparison of robust discriminant methods.

    • plos.figshare.com
    xls
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaojuan Ma; Yubing Duan (2025). Comparison of robust discriminant methods. [Dataset]. http://doi.org/10.1371/journal.pone.0322741.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Shaojuan Ma; Yubing Duan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper presents an improved robust Fisher discriminant method designed to handle high-dimensional data, particularly in the presence of outliers. Traditional Fisher discriminant methods are sensitive to outliers, which can significantly degrade their performance. To address this issue, we integrate the Minimum Regularized Covariance Determinant (MRCD) algorithm into the Fisher discriminant framework, resulting in the MRCD-Fisher discriminant model. The MRCD algorithm enhances robustness by regularizing the covariance matrix, making it suitable for high-dimensional data where the number of variables exceeds the number of observations. We conduct comparative experiments with other robust discriminant methods, the results demonstrate that the MRCD-Fisher discriminant outperforms these methods in terms of robustness and accuracy, especially when dealing with data contaminated by outliers. The MRCD-Fisher discriminant maintains high data cleanliness and computational stability, making it a reliable choice for high-dimensional data analysis. This study provides a valuable contribution to the field of robust statistical analysis, offering a practical solution for handling complex, outlier-prone datasets.

  16. s

    Citation Trends for "Feature Extraction and Uncorrelated Discriminant...

    • shibatadb.com
    Updated May 15, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2008). Citation Trends for "Feature Extraction and Uncorrelated Discriminant Analysis for High-Dimensional Data" [Dataset]. https://www.shibatadb.com/article/GSy4gtV5
    Explore at:
    Dataset updated
    May 15, 2008
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2008 - 2025
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "Feature Extraction and Uncorrelated Discriminant Analysis for High-Dimensional Data".

  17. m

    Educational Attainment in North Carolina Public Schools: Use of statistical...

    • data.mendeley.com
    Updated Nov 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
    Explore at:
    Dataset updated
    Nov 14, 2018
    Authors
    Scott Herford
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.

  18. N

    Data from: High Dimensional Analysis Delineates Myeloid and Lymphoid...

    • data.niaid.nih.gov
    Updated Nov 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gubin MM; Esaulova E; Ward JP; Malkova ON; Runci D; Wong P; Noguchi T; Arthur CD; Meng W; Alspach E; Medrano RV; Fronick C; Fehlings M; Newell EW; Fulton RS; Sheehan KF; Oh ST; Schreiber RD; Artyomov MN (2019). High Dimensional Analysis Delineates Myeloid and Lymphoid Compartment Remodeling during Successful Immune Checkpoint Cancer Therapy [Dataset]. https://data.niaid.nih.gov/resources?id=gse119352
    Explore at:
    Dataset updated
    Nov 22, 2019
    Dataset provided by
    Washington University in St.Louis
    Authors
    Gubin MM; Esaulova E; Ward JP; Malkova ON; Runci D; Wong P; Noguchi T; Arthur CD; Meng W; Alspach E; Medrano RV; Fronick C; Fehlings M; Newell EW; Fulton RS; Sheehan KF; Oh ST; Schreiber RD; Artyomov MN
    Description

    Using complementary forms of high dimensional profiling we define differences in CD45+ cells from syngeneic mouse tumors that either grow progressively or eventually reject following immune checkpoint therapy (ICT). Unbiased assessment of gene expression of tumor infiltrating cells by single cell RNA sequencing (scRNAseq) and longitudinal assessment of cellular protein expression by mass cytometry (CyTOF) revealed significant remodeling of both the lymphoid and myeloid intratumoral compartments. Surprisingly, we observed multiple subpopulations of monocytes/macrophages distinguishable by the combinatorial presence or absence of CD206, CX3CR1, CD1d and iNOS, markers of different macrophage activation states that change over time during ICT in a manner partially dependent on IFNγ. Both the CyTOF data and additional analysis of scRNAseq data support the hypothesis that macrophage polarization/activation results from effects on circulatory monocytes/early macrophages entering tumors rather than on pre-polarized mature intratumoral macrophages. Thus, ICT induces transcriptional and functional remodeling of both myeloid and lymphoid compartments. Droplet-based 3′ end massively parallel single-cell RNA sequencing was performed by encapsulating sorted live CD45+ tumor infiltrating cells into droplets and libraries were prepared using Chromium Single Cell 3′ Reagent Kits v1 according to manufacturer’s protocol (10x Genomics). The generated scRNAseq libraries were sequenced using an Illumina HiSeq2500.

  19. r

    Data from: Sparse Principal Component Analysis with Preserved Sparsity...

    • researchdata.edu.au
    Updated 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inge Koch; Navid Shokouhi; Abd-Krim Seghouane; Mathematics and Statistics (2019). Sparse Principal Component Analysis with Preserved Sparsity Pattern [Dataset]. http://doi.org/10.24433/CO.4593141.V1
    Explore at:
    Dataset updated
    2019
    Dataset provided by
    Code Ocean
    The University of Western Australia
    Authors
    Inge Koch; Navid Shokouhi; Abd-Krim Seghouane; Mathematics and Statistics
    Description

    MATLAB code + demo to reproduce results for "Sparse Principal Component Analysis with Preserved Sparsity". This code calculates the principal loading vectors for any given high-dimensional data matrix. The advantage of this method over existing sparse-PCA methods is that it can produce principal loading vectors with the same sparsity pattern for any number of principal components. Please see Readme.md for more information.

  20. f

    DataSheet1_qCLUE: a quantum clustering algorithm for multi-dimensional...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pantaleo, Felice; Mosca, Michele; Dellantonio, Luca; Gopalakrishnan, Dhruv; Redjeb, Wahid; Di Pilato, Antonio (2024). DataSheet1_qCLUE: a quantum clustering algorithm for multi-dimensional datasets.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001305432
    Explore at:
    Dataset updated
    Oct 11, 2024
    Authors
    Pantaleo, Felice; Mosca, Michele; Dellantonio, Luca; Gopalakrishnan, Dhruv; Redjeb, Wahid; Di Pilato, Antonio
    Description

    Clustering algorithms are at the basis of several technological applications, and are fueling the development of rapidly evolving fields such as machine learning. In the recent past, however, it has become apparent that they face challenges stemming from datasets that span more spatial dimensions. In fact, the best-performing clustering algorithms scale linearly in the number of points, but quadratically with respect to the local density of points. In this work, we introduce qCLUE, a quantum clustering algorithm that scales linearly in both the number of points and their density. qCLUE is inspired by CLUE, an algorithm developed to address the challenging time and memory budgets of Event Reconstruction (ER) in future High-Energy Physics experiments. As such, qCLUE marries decades of development with the quadratic speedup provided by quantum computers. We numerically test qCLUE in several scenarios, demonstrating its effectiveness and proving it to be a promising route to handle complex data analysis tasks – especially in high-dimensional datasets with high densities of points.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Elena Landoni; Rosalba Miceli; Maurizio Callari; Paola Tiberio; Valentina Appierto; Valentina Angeloni; Luigi Mariani; Maria Daidone (2023). Additional file 1: of Proposal of supervised data analysis strategy of plasma miRNAs from hybridisation array data with an application to assess hemolysis-related deregulation [Dataset]. http://doi.org/10.6084/m9.figshare.c.3595874_D5.v1
Organization logoOrganization logo

Additional file 1: of Proposal of supervised data analysis strategy of plasma miRNAs from hybridisation array data with an application to assess hemolysis-related deregulation

Related Article
Explore at:
txtAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Elena Landoni; Rosalba Miceli; Maurizio Callari; Paola Tiberio; Valentina Appierto; Valentina Angeloni; Luigi Mariani; Maria Daidone
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

R codes for implementing the described analyses (sample processing, data pre-processing, class comparison and class prediction). Caliper matching was implemented using the nonrandom package; the t- and the AD tests were implemented using the stats package and the adk package, respectively. Notice that the updated package for implementing the AD test is kSamples. As regards the bootstrap selection and the egg-shaped plot, we respectively modified the doBS and the importance igraph functions, both included in the bootfs package. For the SVM model we used the e1071 package. (R 12Â kb)

Search
Clear search
Close search
Google apps
Main menu