Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
row sparse (Sparse Model B)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that were developed for traditional bulk RNA-sequencing data, thus not accounting for the peculiarities of single-cell data, such as sparseness and zero-inflated counts. Here, we present a ready-to-use pipeline named gf-icf (gene frequency–inverse cell frequency) for normalization of raw counts, feature selection, and dimensionality reduction of scRNA-seq data for their visualization and subsequent analyses. Our work is based on a data transformation model named term frequency–inverse document frequency (TF-IDF), which has been extensively used in the field of text mining where extremely sparse and zero-inflated data are common. Using benchmark scRNA-seq datasets, we show that the gf-icf pipeline outperforms existing state-of-the-art methods in terms of improved visualization and ability to separate and distinguish different cell types.
Sparse Basic Linear Algebra Subprograms (BLAS), comprise of computational kernels for the calculation sparse vectors and matrices operations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The presented data set, inspired by the SophiaBeads Dataset Project for X-ray Computed Tomography, is collected for studies involving sparsity-regularised reconstruction. The aim is to provide tomographic data for various samples where the sparsity in the image varies.
This dataset is made available as part of the publication
"SparseBeads Data: Benchmarking Sparsity-Regularized Computed Tomography", Jakob S Jørgensen et al, 2017. Meas. Sci. Technol. 28 124005.
Direct link: https://doi.org/10.1088/1361-6501/aa8c29.
This manuscript is published as part of Special Feature on Advanced X-ray Tomography (open access). We refer the users to this publication for an extensive detail in the experimental planning and data acquisition.
Each zipped data folder includes
The meta data for data acquisition and geometry parameters of the scan (.xtekct and .ctprofile.xml).
A sinogram of the central slice (CentreSlice > Sinograms > .tif) along with meta data for the 2D slice (.xtek2dct and .ct2dprofile.xml),
List of projection angles (.ang)
and a 2D FDK reconstruction using the CTPro reconstruction suite (RECON2D > .vol) with volume visualisation parameters (.vgi), added as a reference.
We also include an extra script for those that wish to use the SophiaBeads Dataset Project Codes, which essentially replaces the main script provided, sophiaBeads.m (visit https://zenodo.org/record/16539). Please note that sparseBeads.m script will have to be placed in the same folder as the project codes. The latest version of this script can be found here: https://github.com/jakobsj/SparseBeads_code
For more information, please contact
jakj [at] dtu.dk
jakob.jorgensen [at] manchester.ac.uk
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"The first capture of the area surrounding the Floreat Surf Life Saving Club, these sand dunes were captured by UAV imagery on 17th Aug 2021 for the Cambridge Coastcare beach dune modelling and monitoring project. It was created as part of an initiative to innovatively monitor coastal dune erosion and visualize these changes over time for future management and mitigation. This data includes Orthomosaic, DSM, DTM, Elevation Contours, 3D Mesh, 3D Point Cloud and LiDAR constructed from over 500 images captured from UAV (drone) and processed in Pix4D. All datasets can be freely accessed through DataWA. Link to Animated video fly-through of this 3D data model Link to the Sketchfab visualisation of the 3D textured mesh The dataset is a Sparse 3D Point Cloud (i.e. a 3D set of points): the X,Y,Z position and colour information is stored for each point of the point cloud. This dataset is of the South Floreat SLSC (2021 Flight-1 project area).
In this paper we propose an innovative learning algorithm - a variation of One-class ? Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class ? SVMs while reducing both training time and test time by several factors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Change-point (CP) VAR models face a dimensionality curse due to the proliferation of parameters that arises when new breaks are detected. We introduce the Sparse CP-VAR model which determines which parameters truly vary when a break is detected. By doing so, the number of new parameters to be estimated at each regime is drastically reduced and the break dynamics becomes easier to be interpreted. The Sparse CP-VAR model disentangles the dynamics of the mean parameters and the covariance matrix. The former uses CP dynamics with shrinkage prior distributions, while the latter is driven by an infinite hidden Markov framework. An extensive simulation study is carried out to compare our approach with existing ones. We provide applications to financial and macroeconomic systems. It turns out that many off-diagonal VAR parameters are zero for the entire sample period and that most break activity is in the covariance matrix. We show that this has important consequences for portfolio optimization, in particular when future instabilities are included in the predictive densities. Forecasting-wise, the Sparse CP-VAR model compares favorably to several time-varying parameter models in terms of density and point forecast metrics.
Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational cost. This paper posits that these methods can be extremely useful for understanding large collections of text documents, without requiring user expertise in machine learning. Our approach relies on three main ingredients: (a) multi-document text summarization and (b) comparative summarization of two corpora, both using parse regression or classifi?cation; (c) sparse principal components and sparse graphical models for unsupervised analysis and visualization of large text corpora. We validate our approach using a corpus of Aviation Safety Reporting System (ASRS) reports and demonstrate that the methods can reveal causal and contributing factors in runway incursions. Furthermore, we show that the methods automatically discover four main tasks that pilots perform during flight, which can aid in further understanding the causal and contributing factors to runway incursions and other drivers for aviation safety incidents. Citation: L. El Ghaoui, G. C. Li, V. Duong, V. Pham, A. N. Srivastava, and K. Bhaduri, “Sparse Machine Learning Methods for Understanding Large Text Corpora,” Proceedings of the Conference on Intelligent Data Understanding, 2011.
MATLAB code + demo to reproduce results for "Sparse Principal Component Analysis with Preserved Sparsity". This code calculates the principal loading vectors for any given high-dimensional data matrix. The advantage of this method over existing sparse-PCA methods is that it can produce principal loading vectors with the same sparsity pattern for any number of principal components. Please see Readme.md for more information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of the proposed model with baseline algorithms for RMSE on the basis of SR.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper considers factor estimation from heterogeneous data, where some of the variables-the relevant ones-are informative for estimating the factors, and others-the irrelevant ones-are not. We estimate the factor model within a Bayesian framework, specifying a sparse prior distribution for the factor loadings. Based on identified posterior factor loading estimates, we provide alternative methods to identify relevant and irrelevant variables. Simulations show that both types of variables are identified quite accurately. Empirical estimates for a large multi-country GDP dataset and a disaggregated inflation dataset for the USA show that a considerable share of variables is irrelevant for factor estimation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Factor models have been applied extensively for forecasting when high-dimensional datasets are available. In this case, the number of variables can be very large. For instance, usual dynamic factor models in central banks handle over 100 variables. However, there is a growing body of literature indicating that more variables do not necessarily lead to estimated factors with lower uncertainty or better forecasting results. This paper investigates the usefulness of partial least squares techniques that take into account the variable to be forecast when reducing the dimension of the problem from a large number of variables to a smaller number of factors. We propose different approaches of dynamic sparse partial least squares as a means of improving forecast efficiency by simultaneously taking into account the variable forecast while forming an informative subset of predictors, instead of using all the available ones to extract the factors. We use the well-known Stock and Watson database to check the forecasting performance of our approach. The proposed dynamic sparse models show good performance in improving efficiency compared to widely used factor methods in macroeconomic forecasting.
Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. Gaussian Process regression is a popular technique for modeling the input-output relations of a set of variables under the assumption that the weight vector has a Gaussian prior. However, it is challenging to apply Gaussian Process regression to large data sets since prediction based on the learned model requires inversion of an order n kernel matrix. Approximate solutions for sparse Gaussian Processes have been proposed for sparse problems. However, in almost all cases, these solution techniques are agnostic to the input domain and do not preserve the similarity structure in the data. As a result, although these solutions sometimes provide excellent accuracy, the models do not have interpretability. Such interpretable sparsity patterns are very important for many applications. We propose a new technique for sparse Gaussian Process regression that allows us to compute a parsimonious model while preserving the interpretability of the sparsity structure in the data. We discuss how the inverse kernel matrix used in Gaussian Process prediction gives valuable domain information and then adapt the inverse covariance estimation from Gaussian graphical models to estimate the Gaussian kernel. We solve the optimization problem using the alternating direction method of multipliers that is amenable to parallel computation. We demonstrate the performance of our method in terms of accuracy, scalability and interpretability on a climate data set.
Calculated beam pattern in Fourier space of a unitary input given two sparsely sampled synthetic aperture arrays: 1. a regularly spaced array sampled at 2*lambda, where lambda is the wavelength of the 40 GHz signal, and 2. the regularly spaced array with random perturbations (of order ~<lambda) to the (x,y) spatial location of each sample point. This dataset is published in "An Overview of Advances in Signal Processing Techniques for Classical and Quantum Wideband Synthetic Apertures" by Vouras, et al. in IEEE Selected Topics in Signal Processing Recent Advances in Wideband Signal Processing for Classical and Quantum Synthetic Apertures.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset in this repository complements the publication: Adrian Grille Guerra, Andrea Sciacchitano, Fulvio Scarano; Iterative modal reconstruction for sparse particle tracking data. Physics of Fluids 1 July 2024; 36 (7): 075107. https://doi.org/10.1063/5.0209527. The dataset contains the electronic supplementary material also available in the online version of the journal (three videos), a digital version of the figures of the publication in Matlab figure format, the full dataset discussed in the publication and also a sample code of the proposed methodology.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Landsat satellite imagery is used to derive woody vegetation extent products that discriminate between forest, sparse woody and non-woody land cover across a time series from 1988 to 2022. A forest is defined as woody vegetation with a minimum 20 per cent canopy cover, at least 2 metres high and a minimum area of 0.2 hectares. Note that this product is not filtered to the 0.2ha criteria for forest to allow for flexibility in different use cases. Filtering to remove areas less than 0.2ha is undertaken in downstream processing for the purposes of Australia's National Inventory Reports. Sparse woody is defined as woody vegetation with a canopy cover between 5-19 per cent.
The three-class classification (forest, sparse woody and non-woody) supersedes the two-class classification (forest and non-forest) from 2016. The new classification is produced using the same approach in terms of time series processing (conditional probability networks) as the two-class method, to detect woody vegetation cover. The three-class algorithm better encompasses the different types of woody vegetation across the Australian landscape.
Unlike previous versions of the National Forest and Sparse Woody Vegetation data releases where 35 tiles have been released concurrently as part of the product, only the 25 southern tiles were supplied in the initial v7.0 release in June 2023. The 10 northern tiles have been released in July 2024 as v7.1 as a supplement to the initial product release to complete the standard 35 tiles. Please see the National Forest and Sparse Woody Vegetation data metadata pdf (Version 7.1 - 2022 release) for more information.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Biological systems exhibit two structural features on many levels of organization: sparseness, in which only a small fraction of possible interactions between components actually occur; and modularity – the near decomposability of the system into modules with distinct functionality. Recent work suggests that modularity can evolve in a variety of circumstances, including goals that vary in time such that they share the same subgoals (modularly varying goals), or when connections are costly. Here, we studied the origin of modularity and sparseness focusing on the nature of the mutation process, rather than on connection cost or variations in the goal. We use simulations of evolution with different mutation rules. We found that commonly used sum-rule mutations, in which interactions are mutated by adding random numbers, do not lead to modularity or sparseness except for in special situations. In contrast, product-rule mutations in which interactions are mutated by multiplying by random numbers – a better model for the effects of biological mutations – led to sparseness naturally. When the goals of evolution are modular, in the sense that specific groups of inputs affect specific groups of outputs, product-rule mutations also lead to modular structure; sum-rule mutations do not. Product-rule mutations generate sparseness and modularity because they tend to reduce interactions, and to keep small interaction terms small.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Training and testing set for E-Commerce product images dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison for user cold-start problem on BE-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used for the paper SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data
It contains over 1GB of high-quality motion capture data recorded with an Xsens Awinda system while using a variety of VR applications in Meta Quest devices.
Visit the paper website!
If you find our data useful, please cite our paper:
@article{10.1145/3625264, author = {Ponton, Jose Luis and Yun, Haoran and Aristidou, Andreas and Andujar, Carlos and Pelechano, Nuria}, title = {SparsePoser: Real-Time Full-Body Motion Reconstruction from Sparse Data}, year = {2023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, issn = {0730-0301}, url = {https://doi.org/10.1145/3625264}, doi = {10.1145/3625264}, journal = {ACM Trans. Graph.}, month = {oct}}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
row sparse (Sparse Model B)