100+ datasets found
  1. Data from: Sparse Machine Learning Methods for Understanding Large Text...

    • data.nasa.gov
    • gimi9.com
    • +4more
    application/rdfxml +5
    Updated Jun 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Sparse Machine Learning Methods for Understanding Large Text Corpora [Dataset]. https://data.nasa.gov/w/3xx3-746c/default?cur=hFRJxcUSf5W
    Explore at:
    xml, csv, application/rdfxml, tsv, json, application/rssxmlAvailable download formats
    Dataset updated
    Jun 26, 2018
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational cost. This paper posits that these methods can be extremely useful for understanding large collections of text documents, without requiring user expertise in machine learning. Our approach relies on three main ingredients: (a) multi-document text summarization and (b) comparative summarization of two corpora, both using parse regression or classifi cation; (c) sparse principal components and sparse graphical models for unsupervised analysis and visualization of large text corpora. We validate our approach using a corpus of Aviation Safety Reporting System (ASRS) reports and demonstrate that the methods can reveal causal and contributing factors in runway incursions. Furthermore, we show that the methods automatically discover four main tasks that pilots perform during flight, which can aid in further understanding the causal and contributing factors to runway incursions and other drivers for aviation safety incidents.

    Citation: L. El Ghaoui, G. C. Li, V. Duong, V. Pham, A. N. Srivastava, and K. Bhaduri, “Sparse Machine Learning Methods for Understanding Large Text Corpora,” Proceedings of the Conference on Intelligent Data Understanding, 2011.

  2. SparseBeads Dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. S. Jørgensen; S. B. Coban; W. R. B. Lionheart; S. A. McDonald; P. J. Withers; J. S. Jørgensen; S. B. Coban; W. R. B. Lionheart; S. A. McDonald; P. J. Withers (2020). SparseBeads Dataset [Dataset]. http://doi.org/10.5281/zenodo.290117
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    J. S. Jørgensen; S. B. Coban; W. R. B. Lionheart; S. A. McDonald; P. J. Withers; J. S. Jørgensen; S. B. Coban; W. R. B. Lionheart; S. A. McDonald; P. J. Withers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The presented data set, inspired by the SophiaBeads Dataset Project for X-ray Computed Tomography, is collected for studies involving sparsity-regularised reconstruction. The aim is to provide tomographic data for various samples where the sparsity in the image varies.

    This dataset is made available as part of the publication

    "SparseBeads Data: Benchmarking Sparsity-Regularized Computed Tomography", Jakob S Jørgensen et al, 2017. Meas. Sci. Technol. 28 124005.

    Direct link: https://doi.org/10.1088/1361-6501/aa8c29.

    This manuscript is published as part of Special Feature on Advanced X-ray Tomography (open access). We refer the users to this publication for an extensive detail in the experimental planning and data acquisition.

    Each zipped data folder includes

    • The meta data for data acquisition and geometry parameters of the scan (.xtekct and .ctprofile.xml).

    • A sinogram of the central slice (CentreSlice > Sinograms > .tif) along with meta data for the 2D slice (.xtek2dct and .ct2dprofile.xml),

    • List of projection angles (.ang)

    • and a 2D FDK reconstruction using the CTPro reconstruction suite (RECON2D > .vol) with volume visualisation parameters (.vgi), added as a reference.

    We also include an extra script for those that wish to use the SophiaBeads Dataset Project Codes, which essentially replaces the main script provided, sophiaBeads.m (visit https://zenodo.org/record/16539). Please note that sparseBeads.m script will have to be placed in the same folder as the project codes. The latest version of this script can be found here: https://github.com/jakobsj/SparseBeads_code

    For more information, please contact

    • jakj [at] dtu.dk
    • jakob.jorgensen [at] manchester.ac.uk
  3. Z

    Data from: SparsePoser: Real-time Full-body Motion Reconstruction from...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aristidou, Andreas (2023). SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8427980
    Explore at:
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    Yun, Haoran
    Pelechano, Nuria
    Ponton, Jose Luis
    Andujar, Carlos
    Aristidou, Andreas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data used for the paper SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data

    It contains over 1GB of high-quality motion capture data recorded with an Xsens Awinda system while using a variety of VR applications in Meta Quest devices.

    Visit the paper website!

    If you find our data useful, please cite our paper:

    @article{10.1145/3625264, author = {Ponton, Jose Luis and Yun, Haoran and Aristidou, Andreas and Andujar, Carlos and Pelechano, Nuria}, title = {SparsePoser: Real-Time Full-Body Motion Reconstruction from Sparse Data}, year = {2023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, issn = {0730-0301}, url = {https://doi.org/10.1145/3625264}, doi = {10.1145/3625264}, journal = {ACM Trans. Graph.}, month = {oct}}

  4. d

    Floreat-f2 - Sparse Point Cloud LAS - Aug 2021 - Datasets - data.wa.gov.au

    • catalogue.data.wa.gov.au
    Updated Aug 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Floreat-f2 - Sparse Point Cloud LAS - Aug 2021 - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/floreat-f2-sparse-point-cloud-las-aug-2021
    Explore at:
    Dataset updated
    Aug 17, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Western Australia, Floreat
    Description

    The first capture of the area North of the Floreat Surf Life Saving Club, these sand dunes were captured by UAV imagery on 17th Aug 2021 for the Cambridge Coastcare beach dune modelling and monitoring project. It was created as part of an initiative to innovatively monitor coastal dune erosion and visualize these changes over time for future management and mitigation. This data includes Orthomosaic, DSM, DTM, Elevation Contours, 3D Mesh, 3D Point Cloud and LiDAR constructed from over 500 images captured from UAV (drone) and processed in Pix4D. All datasets can be freely accessed through DataWA. Link to Animated video fly-through of this 3D data model Link to the Sketchfab visualisation of the 3D textured mesh The dataset is a Sparse 3D Point Cloud (i.e. a 3D set of points): the X,Y,Z position and colour information is stored for each point of the point cloud. This dataset is of the area North of Floreat SLSC (2021 Flight-2 project area).

  5. f

    Assessment and Improvement of Statistical Tools for Comparative Proteomics...

    • figshare.com
    • acs.figshare.com
    txt
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veit Schwämmle; Ileana Rodríguez León; Ole Nørregaard Jensen (2023). Assessment and Improvement of Statistical Tools for Comparative Proteomics Analysis of Sparse Data Sets with Few Experimental Replicates [Dataset]. http://doi.org/10.1021/pr400045u.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    ACS Publications
    Authors
    Veit Schwämmle; Ileana Rodríguez León; Ole Nørregaard Jensen
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.

  6. 4

    Code: Quality Diversity Algorithms for Calibrating a Supply Chain Simulation...

    • data.4tu.nl
    zip
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Isabelle van Schilt (2024). Code: Quality Diversity Algorithms for Calibrating a Supply Chain Simulation Model with Sparse Data [Dataset]. http://doi.org/10.4121/766f4e89-fa03-47c6-a9f2-fa41f241984b.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Isabelle van Schilt
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This repository is part of the Ph.D. thesis of Isabelle M. van Schilt, Delft University of Technology.

    This repository is used to calibrate the underlying structure and parameters of a stylized supply chain simulation model of counterfeit Personal Protective Equipment (PPE) using the quality diversity algorithm. For this, we use the pyribs library for the quality diversity algorithm, and pydsol-core and pydsol-model for the discrete event simulation model. The calibration is done with sparse data, which is generated by degrading the ground truth data on noise, bias, and missing values. We define the structure of a supply chain simulation model as a key value of a dictionary (sorted on graph density), which is a set of possible supply chain models. The integer is, thus, a decision variable of the calibration, next to other parameters in the simulation model.

    To use this repository, we need a simulation model developed in pydsol-core and pydsol-model . Additionally, we need a dictionary with various different simulation structures as input, as well as the ground truth data. For this project, we use the repository complex_stylized_supply_chain_model_generator as simulation model.

  7. c

    Data from: Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach

    • s.cnmilf.com
    • catalog.data.gov
    Updated Dec 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2023). Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/sparse-solutions-for-single-class-svms-a-bi-criterion-approach
    Explore at:
    Dataset updated
    Dec 6, 2023
    Dataset provided by
    Dashlink
    Description

    In this paper we propose an innovative learning algorithm - a variation of One-class ? Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class ? SVMs while reducing both training time and test time by several factors.

  8. Data from: Lagrangian analysis of submesoscale flows from sparse data using...

    • zenodo.org
    zip
    Updated Mar 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. M. Aravind; H. M. Aravind; Tamay Ozgokmen; Michael Allshouse; Michael Allshouse; Tamay Ozgokmen (2024). Lagrangian analysis of submesoscale flows from sparse data using Gaussian Process Regression for field reconstruction [Dataset]. http://doi.org/10.5281/zenodo.10795574
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 8, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    H. M. Aravind; H. M. Aravind; Tamay Ozgokmen; Michael Allshouse; Michael Allshouse; Tamay Ozgokmen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data used for the preparation of the manuscript "Lagrangian analysis of submesoscale flows from sparse data using Gaussian Process Regression for field reconstruction".

  9. d

    Data from: spectre: An R package to estimate spatially-explicit community...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Oct 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Craig Eric Simpkins; Sebastian Hanß; Matthias Spangenberg; Jan Salecker; Maximilian Hesselbarth; Kerstin Wiegand (2022). spectre: An R package to estimate spatially-explicit community composition using sparse data [Dataset]. http://doi.org/10.5061/dryad.fbg79cnz7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 6, 2022
    Dataset provided by
    Dryad
    Authors
    Craig Eric Simpkins; Sebastian Hanß; Matthias Spangenberg; Jan Salecker; Maximilian Hesselbarth; Kerstin Wiegand
    Time period covered
    2022
    Description

    The simulated community datasets were built using the virtualspecies V1.5.1 R package (Leroy et al., 2016), which generates spatially-explicit presence/absence matrices from habitat suitability maps. We simulated these suitability maps using Gaussian fields neutral landscapes produced using the NLMR V1.0 R package (Sciaini et al., 2018). To allow for some level of overlap between species suitability maps, we divided the γ-diversity (i.e., the total number of simulated species) by an adjustable correlation value to create several species groups that share suitability maps. Using a full factorial design, we developed 81 presence/absence maps varying across four axes (see Supplemental Table 1 and Supplemental Figure 1): 1) landscape size, representing the number of sites in the simulated landscape; 2) γ-diversity; 3) the level of correlation among species suitability maps, with greater correlations resulting in fewer shared species groups among suitability maps; and 4) the habitat suitabil...

  10. Z

    Bayesian estimation of information-theoretic metrics for sparsely sampled...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piga, Angelo (2024). Bayesian estimation of information-theoretic metrics for sparsely sampled distributions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10592746
    Explore at:
    Dataset updated
    Jan 30, 2024
    Dataset provided by
    Font-Pomarol, Lluc
    Guimerà, Roger
    Piga, Angelo
    Sales-Pardo, Marta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Codes, syntetic and empirical data for "Bayesian estimation of information-theoretic metrics for sparsely sampled distributions"

    Abstract:

    Estimating the Shannon entropy of a discrete distribution from which we have only observed a small sample is challenging. Estimating other information-theoretic metrics, such as the Kullback-Leibler divergence between two sparsely sampled discrete distributions, is even harder. Here, we propose a fast, semi-analytical estimator for sparsely sampled distributions. Its derivation is grounded in probabilistic considerations and uses a hierarchical Bayesian approach to extract as much information as possible from the few observations available. Our approach provides estimates of the Shannon entropy with precision at least comparable to the benchmarks we consider, and most often higher; it does so across diverse distributions with very different properties. Our method can also be used to obtain accurate estimates of other information-theoretic metrics, including the notoriously challenging Kullback-Leibler divergence. Here, again, our approach has less bias, overall, than the benchmark estimators we consider.

  11. w

    Subjects of Sparse matrix technology

    • workwithdata.com
    Updated Jul 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Subjects of Sparse matrix technology [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=book&fop0=%3D&fval0=Sparse+matrix+technology
    Explore at:
    Dataset updated
    Jul 13, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects and is filtered where the books is Sparse matrix technology, featuring 10 columns including authors, average publication date, book publishers, book subject, and books. The preview is ordered by number of books (descending).

  12. Grating Lobes and Spatial Aliasing in Sparse Array Beampatterns

    • catalog.data.gov
    • gimi9.com
    • +2more
    Updated Jul 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). Grating Lobes and Spatial Aliasing in Sparse Array Beampatterns [Dataset]. https://catalog.data.gov/dataset/grating-lobes-and-spatial-aliasing-in-sparse-array-beampatterns
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Calculated beam pattern in Fourier space of a unitary input given two sparsely sampled synthetic aperture arrays: 1. a regularly spaced array sampled at 2*lambda, where lambda is the wavelength of the 40 GHz signal, and 2. the regularly spaced array with random perturbations (of order ~<lambda) to the (x,y) spatial location of each sample point. This dataset is published in "An Overview of Advances in Signal Processing Techniques for Classical and Quantum Wideband Synthetic Apertures" by Vouras, et al. in IEEE Selected Topics in Signal Processing Recent Advances in Wideband Signal Processing for Classical and Quantum Synthetic Apertures.

  13. d

    Data from: Direction matching for sparse movement data sets: determining...

    • search.dataone.org
    Updated Dec 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bonnell, Tyler R.; Henzi, S. Peter; Barrett, Louise (2024). Data from: Direction matching for sparse movement data sets: determining interaction rules in social groups [Dataset]. https://search.dataone.org/view/sha256%3A55f5360b0607b9a7b17dc62d5e77d5c3dddacf213d2c7f13b5abdeeba295ab41
    Explore at:
    Dataset updated
    Dec 11, 2024
    Dataset provided by
    Borealis
    Authors
    Bonnell, Tyler R.; Henzi, S. Peter; Barrett, Louise
    Description

    AbstractIt is generally assumed that high-resolution movement data are needed to extract meaningful decision-making patterns of animals on the move. Here we propose a modified version of force matching (referred to here as direction matching), whereby sparse movement data (i.e., collected over minutes instead of seconds) can be used to test hypothesized forces acting on a focal animal based on their ability to explain observed movement. We first test the direction matching approach using simulated data from an agent-based model, and then go on to apply it to a sparse movement data set collected on a troop of baboons in the DeHoop Nature Reserve, South Africa. We use the baboon data set to test the hypothesis that an individual’s motion is influenced by the group as a whole or, alternatively, whether it is influenced by the location of specific individuals within the group. Our data provide support for both hypotheses, with stronger support for the latter. The focal animal showed consistent patterns of movement toward particular individuals when distance from these individuals increased beyond 5.6 m. Although the focal animal was also sensitive to the group movement on those occasions when the group as a whole was highly clustered, these conditions of isolation occurred infrequently. We suggest that specific social interactions may thus drive overall group cohesion. The results of the direction matching approach suggest that relatively sparse data, with low technical and economic costs, can be used to test between hypotheses on the factors driving movement decisions. Usage notesXY coordinates of individual baboonsThis dataset consists of 74 days of full-day follows of a baboon troop (Papio hamadryas ursinus) at the DeHoop Nature Reserve in South Africa. Individual GPS points of all adult group members (N=14) were collected continuously throughout the day by an observer walking repeatedly from one end of the group to the other. A GPS point was taken on all adults present in the group by holding the GPS receiver above the animal (or as close as possible) to record its position and individual identitybaboon_group_dehoop.csv

  14. National Forest and Sparse Woody Vegetation Data (Version 3, 2018 Release)

    • demo.dev.magda.io
    geotiff, pdf, wms +1
    Updated Jul 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Australian Government Department of Climate Change, Energy, the Environment and Water (2022). National Forest and Sparse Woody Vegetation Data (Version 3, 2018 Release) [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-d734c65e-0e7b-4190-9aa5-ddbb5844e86d
    Explore at:
    zip, wms, geotiff, pdfAvailable download formats
    Dataset updated
    Jul 4, 2022
    Dataset provided by
    Australian Governmenthttp://www.australia.gov.au/
    Description

    Landsat satellite imagery is used to derive woody vegetation extent products that discriminate between forest, sparse woody and non-woody land cover across a time series from 1988 to 2018. A forest …Show full descriptionLandsat satellite imagery is used to derive woody vegetation extent products that discriminate between forest, sparse woody and non-woody land cover across a time series from 1988 to 2018. A forest is defined as woody vegetation with a minimum 20 per cent canopy cover, potentially reaching 2 metres high and a minimum area of 0.2 hectares. Sparse woody is defined as woody vegetation with a canopy cover between 5-19 per cent. The three-class classification (forest, sparse woody and non-woody) supersedes the two class classification (forest and non-forest) from 2016. The new classification is produced using the same approach in terms of time series processing (conditional probability networks) as the two-class method, to detect woody vegetation cover. The three-class algorithm better encompasses the different types of woody vegetation across the Australian landscape.

  15. H

    Replication Data for: Sparse Estimation and Uncertainty with Application to...

    • dataverse.harvard.edu
    Updated Oct 31, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marc Ratkovic; Dustin Tingley (2016). Replication Data for: Sparse Estimation and Uncertainty with Application to Subgroup Analysis [Dataset]. http://doi.org/10.7910/DVN/RNMB1Q
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 31, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Marc Ratkovic; Dustin Tingley
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Replication matrerials for Ratkovic and Tingley (2016) “ Sparse Estimation and Uncertainty with Application to Subgroup Analysis.” All files, data, and scripts needed to generate the figures and results in the paper are in this archive. The zip file contains two sets of files, for the Bechtel and Scheve (2013) replication and files for replicating the simulation study.

  16. SPM script for sparse fMRI data processing single subject analysis

    • osf.io
    Updated Jan 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pradeep D (2023). SPM script for sparse fMRI data processing single subject analysis [Dataset]. http://doi.org/10.17605/OSF.IO/P825K
    Explore at:
    Dataset updated
    Jan 31, 2023
    Dataset provided by
    Center for Open Sciencehttps://cos.io/
    Authors
    Pradeep D
    License

    http://www.gnu.org/licenses/gpl-3.0.txthttp://www.gnu.org/licenses/gpl-3.0.txt

    Description

    MATLAB script using SPM12 batch for performing single subject analysis on sparse fMRI data

  17. Performance measurements for "Bringing Order to Sparsity: A Sparse Matrix...

    • zenodo.org
    zip
    Updated Apr 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James D. Trotter; James D. Trotter (2023). Performance measurements for "Bringing Order to Sparsity: A Sparse Matrix Reordering Study on Multicore CPUs" [Dataset]. http://doi.org/10.5281/zenodo.7821491
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    James D. Trotter; James D. Trotter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The paper "Bringing Order to Sparsity: A Sparse Matrix Reordering Study on Multicore CPUs" compares various strategies for reordering sparse matrices. The purpose of reordering is to improve performance of sparse matrix operations, for example, by reducing fill-in resulting from sparse Cholesky factorisation or improving data locality in sparse matrix-vector multiplication (SpMV). Many reordering strategies have been proposed in the literature and the current paper provides a thorough comparison of several of the most popular methods.

    This comparison is based on performance measurements that were collected on the eX3 cluster, a Norwegian, experimental research infrastructure for exploration of exascale computing. These performance measurements are gathered in the data set provided here, particularly related to the performance of two SpMV kernels with respect to 490 sparse matrices, 6 matrix orderings and 8 multicore CPUs.

    Experimental results are provided in a human-readable, tabular format using plain-text ASCII. This format may be readily consumed by gnuplot to create plots or imported into commonly used spreadsheet tools for further analysis.

    Performance measurements are provided based on an SpMV kernel using the compressed sparse row (CSR) storage format with 7 matrix orderings. One file is provided for each of 8 multicore CPU systems considered in the paper:

    1. Skylake: csr_all_xeongold16q_032_threads_ss490.txt
    2. Ice Lake: csr_all_habanaq_072_threads_ss490.txt
    3. Naples: csr_all_defq_064_threads_ss490.txt
    4. Rome: csr_all_rome16q_016_threads_ss490.txt
    5. Milan A: csr_all_fpgaq_048_threads_ss490.txt
    6. Milan B: csr_all_milanq_128_threads_ss490.txt
    7. TX2: csr_all_armq_064_threads_ss490.txt
    8. Hi1620: csr_all_huaq_128_threads_ss490.txt

    A corresponding set of files and performance measurements are provided for a second SpMV kernel that is also studied in the paper.

    Each file consists of 490 rows and 54 columns. Each row corresponds to a different matrix from the SuiteSparse Matrix Collection (https://sparse.tamu.edu/). The first 5 columns specify some general information about the matrix, such as its group and name, as well as the number of rows, columns and nonzeros. Column 6 specifies the number of threads used for the experiment (which depends on the CPU). The remaining columns are grouped according to the 7 different matrix orderings that were studied, in the following order: original, Reverse Cuthill-McKee (RCM), Nested Dissection (ND), Approximate Minimum Degree (AMD), Graph Partitioning (GP), Hypergraph Partitioning (HP), and Gray ordering. For each ordering, the following 7 columns are given:


    1. Minimum number of nonzeros processed by any thread by the SpMV kernel
    2. Maximum number of nonzeros processed by any thread by the SpMV kernel
    3. Mean number of nonzeros processed per thread by the SpMV kernel
    4. Imbalance factor, which is the ratio of the maximum to the mean number of nonzeros processed per thread by the SpMV kernel
    5. Time (in seconds) to perform a single SpMV iteration; this was measured by taking the minimum out of 100 SpMV iterations performed
    6. Maximum performance (in Gflop/s) for a single SpMV iteration; this was measured by taking twice the number of matrix nonzeros and dividing by the minimum time out of 100 SpMV iterations performed.
    7. Mean performance (in Gflop/s) for a single SpMV iteration; this was measured by taking twice the number of matrix nonzeros and dividing by the mean time of the 97 last SpMV iterations performed (i.e., the first 3 SpMV iterations are ignored).

    The results in Fig. 1 of the paper show speedup (or slowdown) resulting from reordering with respect to 3 reorderings and 3 selected matrices. These results can be reproduced by inspecting the performance results that were collected on the Milan B and Ice Lake systems for the three matrices Freescale/Freescale2, SNAP/com-Amazon and GenBank/kmer_V1r. Specifically, the numbers displayed in the figure are obtained by dividing the maximum performance measured for the respective orderings (i.e., RCM, ND and GP) by the maximum performance measured for the original ordering.

    The results presented in Figs. 2 and 3 of the paper show the speedup of SpMV as a result of reordering for the two SpMV kernels considered in the paper. In this case, gnuplot scripts are provided to reproduce the figures from the data files described above.

  18. d

    Fast sparse matrix multiplication

    • elsevier.digitalcommonsdata.com
    Updated Jan 1, 1992
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S.C. Park (1992). Fast sparse matrix multiplication [Dataset]. http://doi.org/10.17632/ydtrxpr4vw.1
    Explore at:
    Dataset updated
    Jan 1, 1992
    Authors
    S.C. Park
    License

    https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/

    Description

    Abstract A new space-efficient representation for sparse matrices is introduced and a fast sparse matrix multiplication algorithm based on the new representation is presented. The scheme is very efficient when the nonzero elements of a sparse matrix are partially or fully adjacent to one another as in band or triangular matrices. The space complexity of the new representation is better than that of existing algorithms when the number of sets of adjacent nonzero elements, called segments, is less than ...

    Title of program: SMM Catalogue Id: ACHR_v1_0

    Nature of problem Sparse matrix multiplication often arises in scientific computations. Since a sparse matrix includes many zero elements, the multiplication should not be handled in the same way as for dense matrices. The standard matrix multiplication algorithm for n x n factor matrices, represented in the usual two-dimensional array form, takes O(n^3) time. This means that when the factor matrices are very large, e.g., 1000's x 1000's, not only will the computation time be excessively long but the demands on ...

    Versions of this program held in the CPC repository in Mendeley Data ACHR_v1_0; SMM; 10.1016/0010-4655(92)90116-G

    This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)

  19. r

    National Forest and Sparse Woody Vegetation Data (Version 8.0 - 2023...

    • researchdata.edu.au
    Updated Aug 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Australian Government Department of Climate Change, Energy, the Environment and Water (2024). National Forest and Sparse Woody Vegetation Data (Version 8.0 - 2023 Release) [Dataset]. https://researchdata.edu.au/national-forest-sparse-2023-release/3381312
    Explore at:
    Dataset updated
    Aug 9, 2024
    Dataset provided by
    data.gov.au
    Authors
    Australian Government Department of Climate Change, Energy, the Environment and Water
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Landsat satellite imagery is used to derive woody vegetation extent products that discriminate between forest, sparse woody and non-woody land cover across a time series from 1988 to 2023. A forest is defined as woody vegetation with a minimum 20 per cent canopy cover, at least 2 metres high and a minimum area of 0.2 hectares. Note that this product is not filtered to the 0.2ha criteria for forest to allow for flexibility in different use cases. Filtering to remove areas less than 0.2ha is undertaken in downstream processing for the purposes of Australia's National Inventory Reports. Sparse woody is defined as woody vegetation with a canopy cover between 5-19 per cent.\r \r The three-class classification (forest, sparse woody and non-woody) supersedes the two-class classification (forest and non-forest) from 2016. The new classification is produced using the same approach in terms of time series processing (conditional probability networks) as the two-class method, to detect woody vegetation cover. The three-class algorithm better encompasses the different types of woody vegetation across the Australian landscape.\r \r Unlike previous versions of the National Forest and Sparse Woody Vegetation data releases where 35 tiles have been released as part of the product, only the 25 southern tiles have been supplied in this release. The 10 northern tiles will be released as a separate product release, expected later in the financial year, as these are subject to a methodological change associated with the adoption of the Sentinel sensor and will be supplied at a different resolution. Please see the National Forest and Sparse Woody Vegetation data metadata pdf (Version 8.0 - 2023 release) for more information.

  20. Data from: Generating fast sparse matrix vector multiplication from a high...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jun 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federico Pizzuti; Federico Pizzuti; Michel Steuwer; Christophe Dubach; Michel Steuwer; Christophe Dubach (2022). Generating fast sparse matrix vector multiplication from a high level generic functional IR [Dataset]. http://doi.org/10.5061/dryad.wstqjq2gs
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Federico Pizzuti; Federico Pizzuti; Michel Steuwer; Christophe Dubach; Michel Steuwer; Christophe Dubach
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Usage of high-level intermediate representations promises the generation of fast code from a high-level description, improving the productivity of developers while achieving the performance traditionally only reached with low-level programming approaches.

    High-level IRs come in two flavors:
    1) domain-specific IRs designed to express only for a specific application area; or
    2) generic high-level IRs that can be used to generate high-performance code across many domains.
    Developing generic IRs is more challenging but offers the advantage of reusing a common compiler infrastructure various applications.

    In this paper, we extend a generic high-level IR to enable efficient computation with sparse data structures.
    Crucially, we encode sparse representation using reusable dense building blocks already present in the high-level IR.
    We use a form of dependent types to model sparse matrices in CSR format by expressing the relationship between multiple dense arrays explicitly separately storing the length of rows, the column indices, and the non-zero values of the matrix.

    We demonstrate that we achieve high-performance compared to spare low-level library code using our extended generic high-level code generator.
    On an Nvidia GPU, we outperform the highly tuned Nvidia cuSparse implementation of SpMV multiplication across 28 sparse matrices of varying sparsity on average by $1.7\times$.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2018). Sparse Machine Learning Methods for Understanding Large Text Corpora [Dataset]. https://data.nasa.gov/w/3xx3-746c/default?cur=hFRJxcUSf5W
Organization logo

Data from: Sparse Machine Learning Methods for Understanding Large Text Corpora

Related Article
Explore at:
xml, csv, application/rdfxml, tsv, json, application/rssxmlAvailable download formats
Dataset updated
Jun 26, 2018
License

U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically

Description

Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational cost. This paper posits that these methods can be extremely useful for understanding large collections of text documents, without requiring user expertise in machine learning. Our approach relies on three main ingredients: (a) multi-document text summarization and (b) comparative summarization of two corpora, both using parse regression or classifi cation; (c) sparse principal components and sparse graphical models for unsupervised analysis and visualization of large text corpora. We validate our approach using a corpus of Aviation Safety Reporting System (ASRS) reports and demonstrate that the methods can reveal causal and contributing factors in runway incursions. Furthermore, we show that the methods automatically discover four main tasks that pilots perform during flight, which can aid in further understanding the causal and contributing factors to runway incursions and other drivers for aviation safety incidents.

Citation: L. El Ghaoui, G. C. Li, V. Duong, V. Pham, A. N. Srivastava, and K. Bhaduri, “Sparse Machine Learning Methods for Understanding Large Text Corpora,” Proceedings of the Conference on Intelligent Data Understanding, 2011.

Search
Clear search
Close search
Google apps
Main menu