Sparse Basic Linear Algebra Subprograms (BLAS), comprise of computational kernels for the calculation sparse vectors and matrices operations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects, has 1 rows. and is filtered where the books is Sparse matrix technology. It features 10 columns including book subject, number of authors, number of books, earliest publication date, and latest publication date. The preview is ordered by number of books (descending).
Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational cost. This paper posits that these methods can be extremely useful for understanding large collections of text documents, without requiring user expertise in machine learning. Our approach relies on three main ingredients: (a) multi-document text summarization and (b) comparative summarization of two corpora, both using parse regression or classifi?cation; (c) sparse principal components and sparse graphical models for unsupervised analysis and visualization of large text corpora. We validate our approach using a corpus of Aviation Safety Reporting System (ASRS) reports and demonstrate that the methods can reveal causal and contributing factors in runway incursions. Furthermore, we show that the methods automatically discover four main tasks that pilots perform during flight, which can aid in further understanding the causal and contributing factors to runway incursions and other drivers for aviation safety incidents. Citation: L. El Ghaoui, G. C. Li, V. Duong, V. Pham, A. N. Srivastava, and K. Bhaduri, “Sparse Machine Learning Methods for Understanding Large Text Corpora,” Proceedings of the Conference on Intelligent Data Understanding, 2011.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The presented data set, inspired by the SophiaBeads Dataset Project for X-ray Computed Tomography, is collected for studies involving sparsity-regularised reconstruction. The aim is to provide tomographic data for various samples where the sparsity in the image varies.
This dataset is made available as part of the publication
"SparseBeads Data: Benchmarking Sparsity-Regularized Computed Tomography", Jakob S Jørgensen et al, 2017. Meas. Sci. Technol. 28 124005.
Direct link: https://doi.org/10.1088/1361-6501/aa8c29.
This manuscript is published as part of Special Feature on Advanced X-ray Tomography (open access). We refer the users to this publication for an extensive detail in the experimental planning and data acquisition.
Each zipped data folder includes
The meta data for data acquisition and geometry parameters of the scan (.xtekct and .ctprofile.xml).
A sinogram of the central slice (CentreSlice > Sinograms > .tif) along with meta data for the 2D slice (.xtek2dct and .ct2dprofile.xml),
List of projection angles (.ang)
and a 2D FDK reconstruction using the CTPro reconstruction suite (RECON2D > .vol) with volume visualisation parameters (.vgi), added as a reference.
We also include an extra script for those that wish to use the SophiaBeads Dataset Project Codes, which essentially replaces the main script provided, sophiaBeads.m (visit https://zenodo.org/record/16539). Please note that sparseBeads.m script will have to be placed in the same folder as the project codes. The latest version of this script can be found here: https://github.com/jakobsj/SparseBeads_code
For more information, please contact
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
sparse-generative-ai/requests dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Landsat satellite imagery is used to derive woody vegetation extent products that discriminate between forest, sparse woody and non-woody land cover across a time series from 1988 to 2018. A forest is defined as woody vegetation with a minimum 20 per cent canopy cover, potentially reaching 2 metres high and a minimum area of 0.2 hectares. Sparse woody is defined as woody vegetation with a canopy cover between 5-19 per cent.
The three-class classification (forest, sparse woody and non-woody) supersedes the two class classification (forest and non-forest) from 2016. The new classification is produced using the same approach in terms of time series processing (conditional probability networks) as the two-class method, to detect woody vegetation cover. The three-class algorithm better encompasses the different types of woody vegetation across the Australian landscape.
Link Function: information
In this paper we propose an innovative learning algorithm - a variation of One-class ? Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class ? SVMs while reducing both training time and test time by several factors.
This dataset was created by Yash Gupta
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The artifact consists of the necessary data to reproduce the results reported in the SAT-20 Paper titled "On the Sparsity of XORs in Approximate Model Counting".
In particular, the artifact consists of the binaries, the log files generated by our computing cluster, and scripts to generate tables and the plots used in the paper.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
diksha-shrivastava13/sparse dataset hosted on Hugging Face and contributed by the HF Datasets community
The MNIST and CIFAR datasets are used to test the robustness of neural networks against sparse attacks.
NEST: NEw Sparse maTrix dataset
NEST is a new sparse matrix dataset. Its purpose is to define a modern set of sparse matrices arising in relevant and actual scientific application in order to improve further sparse numerical methods. Nest can be seen as a continuity of the Sparse Matrix Market datasets and contain some curated sparse matrices from it as legacy references. The matrices are stored as COO sparse matrices in scipy.sparse.npz archive format. Conversion utils to/from the… See the full description on the dataset page: https://huggingface.co/datasets/vincent-maillou/NEST.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used for the paper SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data
It contains over 1GB of high-quality motion capture data recorded with an Xsens Awinda system while using a variety of VR applications in Meta Quest devices.
Visit the paper website!
If you find our data useful, please cite our paper:
@article{10.1145/3625264, author = {Ponton, Jose Luis and Yun, Haoran and Aristidou, Andreas and Andujar, Carlos and Pelechano, Nuria}, title = {SparsePoser: Real-Time Full-Body Motion Reconstruction from Sparse Data}, year = {2023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, issn = {0730-0301}, url = {https://doi.org/10.1145/3625264}, doi = {10.1145/3625264}, journal = {ACM Trans. Graph.}, month = {oct}}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset in this repository complements the publication: Adrian Grille Guerra, Andrea Sciacchitano, Fulvio Scarano; Iterative modal reconstruction for sparse particle tracking data. Physics of Fluids 1 July 2024; 36 (7): 075107. https://doi.org/10.1063/5.0209527. The dataset contains the electronic supplementary material also available in the online version of the journal (three videos), a digital version of the figures of the publication in Matlab figure format, the full dataset discussed in the publication and also a sample code of the proposed methodology.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
In this paper we propose an innovative learning algorithm - a variation of One-class Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class SVMs while reducing both training time and test time by several factors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The feature dimension of the sparse feature subsets and the full features.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract: These are the experimental data for the paper Bach, Jakob. "Using Constraints to Discover Sparse and Alternative Subgroup Descriptions" published on arXiv in 2024. You can find the paper here and the code here. See the README for details. The datasets used in our study (which we also provide here) originate from PMLB. The corresponding GitHub repository is MIT-licensed ((c) 2016 Epistasis Lab at UPenn). Please see the file LICENSE in the folder datasets/ for the license text. TechnicalRemarks: # Experimental Data for the Paper "Using Constraints to Discover Sparse and Alternative Subgroup Descriptions" These are the experimental data for the paper Bach, Jakob. "Using Constraints to Discover Sparse and Alternative Subgroup Descriptions"
Landsat satellite imagery is used to derive woody vegetation extent products that discriminate between forest, sparse woody and non-woody land cover across a time series from 1988 to 2020. A forest is defined as woody vegetation with a minimum 20 per cent canopy cover, at least 2 metres high and a minimum area of 0.2 hectares. Sparse woody is defined as woody vegetation with a canopy cover between 5-19 per cent.
The three-class classification (forest, sparse woody and non-woody) supersedes the two-class classification (forest and non-forest) from 2016. The new classification is produced using the same approach in terms of time series processing (conditional probability networks) as the two-class method, to detect woody vegetation cover. The three-class algorithm better encompasses the different types of woody vegetation across the Australian landscape.
Earlier versions of this dataset were published in the Department of Environment and Energy.
Sparse Basic Linear Algebra Subprograms (BLAS), comprise of computational kernels for the calculation sparse vectors and matrices operations.