100+ datasets found
  1. f

    Performance comparison for user cold-start problem on BE-dataset.

    • plos.figshare.com
    xls
    Updated Jun 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan (2023). Performance comparison for user cold-start problem on BE-dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0273486.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance comparison for user cold-start problem on BE-dataset.

  2. f

    Statistics of sample data from the Last.fm and Douban datasets.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waleed Reafee; Naomie Salim; Atif Khan (2023). Statistics of sample data from the Last.fm and Douban datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0154848.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Waleed Reafee; Naomie Salim; Atif Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistics of sample data from the Last.fm and Douban datasets.

  3. d

    Data from: Sparse Inverse Gaussian Process Regression with Application to...

    • catalog.data.gov
    • data.nasa.gov
    • +2more
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Sparse Inverse Gaussian Process Regression with Application to Climate Network Discovery [Dataset]. https://catalog.data.gov/dataset/sparse-inverse-gaussian-process-regression-with-application-to-climate-network-discovery
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. Gaussian Process regression is a popular technique for modeling the input-output relations of a set of variables under the assumption that the weight vector has a Gaussian prior. However, it is challenging to apply Gaussian Process regression to large data sets since prediction based on the learned model requires inversion of an order n kernel matrix. Approximate solutions for sparse Gaussian Processes have been proposed for sparse problems. However, in almost all cases, these solution techniques are agnostic to the input domain and do not preserve the similarity structure in the data. As a result, although these solutions sometimes provide excellent accuracy, the models do not have interpretability. Such interpretable sparsity patterns are very important for many applications. We propose a new technique for sparse Gaussian Process regression that allows us to compute a parsimonious model while preserving the interpretability of the sparsity structure in the data. We discuss how the inverse kernel matrix used in Gaussian Process prediction gives valuable domain information and then adapt the inverse covariance estimation from Gaussian graphical models to estimate the Gaussian kernel. We solve the optimization problem using the alternating direction method of multipliers that is amenable to parallel computation. We demonstrate the performance of our method in terms of accuracy, scalability and interpretability on a climate data set.

  4. d

    Data from: Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach

    • catalog.data.gov
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach [Dataset]. https://catalog.data.gov/dataset/sparse-solutions-for-single-class-svms-a-bi-criterion-approach
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    In this paper we propose an innovative learning algorithm - a variation of One-class ? Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class ? SVMs while reducing both training time and test time by several factors.

  5. Z

    Data from: MetaFlux: Meta-learning global carbon fluxes from sparse...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liu, Jiangong (2024). MetaFlux: Meta-learning global carbon fluxes from sparse spatiotemporal observations [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7761880
    Explore at:
    Dataset updated
    Apr 14, 2024
    Dataset provided by
    Nathaniel, Juan
    Gentine, Pierre
    Liu, Jiangong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MetaFlux is a global, long-term carbon flux dataset of gross primary production and ecosystem respiration that is generated using meta-learning. The principle of meta-learning stems from the need to solve the problem of learning in the face of sparse data availability. Data sparsity is a prevalent challenge in climate and ecology science. For instance, in-situ observations tend to be spatially and temporally sparse. This issue can arise from sensor malfunctions, limited sensor locations, or non-ideal climate conditions such as persistent cloud cover. The lack of high-quality continuous data can make it difficult to understand many climate processes that are otherwise critical. The machine-learning community has attempted to tackle this problem by developing several learning approaches, including meta-learning that learns how to learn broad features across tasks to better infer other poorly sampled ones. In this work, we applied meta-learning to solve the problem of upscaling continuous carbon fluxes from sparse observations. Data scarcity in carbon flux applications is particularly problematic in the tropics and semi-arid regions, where only around 8–11% of long-term eddy covariance stations are currently operational. Unfortunately, these regions are important in modulating the global carbon cycle and its interannual variability. In general, we find that meta-trained machine models, including multi-layer perceptrons (MLP), long-short-term memory (LSTM), and bi-directional LSTM (BiLSTM), have lower validation errors on flux estimates by 9–16% when compared to their non-meta-trained counterparts. In addition, meta-trained models are more robust to extreme conditions, with 4–24% lower overall errors. Finally, we use an ensemble of meta-trained deep networks to generate a global product of ecosystem-scale photosynthesis and respiration fluxes from in-situ observations to daily and monthly global products at a 0.25-degree spatial resolution from 2001 to 2023, called "MetaFlux". We also checked for the seasonality, interannual variability, and correlation to solar-induced fluorescence of the upscaled product and found that MetaFlux outperformed state-of-the-art machine learning upscaling models, especially in critical semi-arid and tropical regions.

  6. f

    The performance comparison based on Last.fm dataset.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waleed Reafee; Naomie Salim; Atif Khan (2023). The performance comparison based on Last.fm dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0154848.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Waleed Reafee; Naomie Salim; Atif Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The performance comparison based on Last.fm dataset.

  7. Parameters used for the AD methods.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vyron Christodoulou; Yaxin Bi; George Wilkie (2023). Parameters used for the AD methods. [Dataset]. http://doi.org/10.1371/journal.pone.0212098.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Vyron Christodoulou; Yaxin Bi; George Wilkie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Parameters used for the AD methods.

  8. E

    Replication Data for: Sparse multi-trait genomic prediction under incomplete...

    • data.moa.gov.et
    html
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CIMMYT Ethiopia (2025). Replication Data for: Sparse multi-trait genomic prediction under incomplete block designs [Dataset]. https://data.moa.gov.et/dataset/hdl-11529-10548787
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    CIMMYT Ethiopia
    Description

    The efficiency of genomic selection methodologies can be increased by sparse testing where a subset of materials are evaluated in different environments. Seven different multi-environment plant breeding datasets were used to evaluate four different methods for allocating lines to environments in a multi-trait genomic prediction problem. The results of the analysis are presented in the accompanying article.

  9. f

    Training and testing set for E-Commerce product images dataset.

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan (2023). Training and testing set for E-Commerce product images dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0273486.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Training and testing set for E-Commerce product images dataset.

  10. J

    Sparse Partial Least Squares in Time Series for Macroeconomic Forecasting...

    • jda-test.zbw.eu
    txt
    Updated Nov 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julieta Fuentes; Pilar Poncela; Julio Rodríguez; Julieta Fuentes; Pilar Poncela; Julio Rodríguez (2022). Sparse Partial Least Squares in Time Series for Macroeconomic Forecasting (replication data) [Dataset]. https://jda-test.zbw.eu/dataset/sparse-partial-least-squares-in-time-series-for-macroeconomic-forecasting
    Explore at:
    txt(417806), txt(413218), txt(1525)Available download formats
    Dataset updated
    Nov 8, 2022
    Dataset provided by
    ZBW - Leibniz Informationszentrum Wirtschaft
    Authors
    Julieta Fuentes; Pilar Poncela; Julio Rodríguez; Julieta Fuentes; Pilar Poncela; Julio Rodríguez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Factor models have been applied extensively for forecasting when high-dimensional datasets are available. In this case, the number of variables can be very large. For instance, usual dynamic factor models in central banks handle over 100 variables. However, there is a growing body of literature indicating that more variables do not necessarily lead to estimated factors with lower uncertainty or better forecasting results. This paper investigates the usefulness of partial least squares techniques that take into account the variable to be forecast when reducing the dimension of the problem from a large number of variables to a smaller number of factors. We propose different approaches of dynamic sparse partial least squares as a means of improving forecast efficiency by simultaneously taking into account the variable forecast while forming an informative subset of predictors, instead of using all the available ones to extract the factors. We use the well-known Stock and Watson database to check the forecasting performance of our approach. The proposed dynamic sparse models show good performance in improving efficiency compared to widely used factor methods in macroeconomic forecasting.

  11. Min-Cut/Max-Flow Problem Instances for Benchmarking

    • zenodo.org
    • data.dtu.dk
    txt, zip
    Updated Dec 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick M. Jensen; Niels Jeppesen; Anders B. Dahl; Vedrana A. Dahl; Patrick M. Jensen; Niels Jeppesen; Anders B. Dahl; Vedrana A. Dahl (2022). Min-Cut/Max-Flow Problem Instances for Benchmarking [Dataset]. http://doi.org/10.5281/zenodo.4905882
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Dec 5, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Patrick M. Jensen; Niels Jeppesen; Anders B. Dahl; Vedrana A. Dahl; Patrick M. Jensen; Niels Jeppesen; Anders B. Dahl; Vedrana A. Dahl
    Description

    NOTE: This dataset is now outdated. Please see https://doi.org/10.11583/DTU.17091101 for the updated version with many more problems.

    This is a collection of min-cut/max-flow problem instances that can be used for benchmarking min-cut/max-flow algorithms. The collection is released in companionship with the paper:

    • Jensen et al., "Review of Serial and Parallel Min-Cut/Max-Flow Algorithms for Computer Vision", T-PAMI, 2022.

    The problem instances are collected from a wide selection of sources to be as representative as possible. Specifically, this collection contains:

    • Most of the problem instances (some are unavailable due to dead links) published by the University of Waterloo: https://vision.cs.uwaterloo.ca/data/maxflow
    • Super-resolution, texture restoration, deconvolution, decision tree field (DTF) and automatic labeling environment from Verma & Batra, "MaxFlow Revisited: An Empirical Comparison of Maxflow Algorithms for Dense Vision Problems", 2012, BMVC
    • Sparse Layered Graph instances from Jeppesen et al., "Sparse Layered Graphs for Multi-Object Segmentation", 2020, CVPR
    • Multi-object surface fitting from Jensen et al., "Multi-Object Graph-Based Segmentation With Non-Overlapping Surfaces", 2020, CVPRW

    The reason for releasing this collection is to provide a single place to download all datasets used in our paper (and various previous paper) instead of having to scavenge from multiple sources. Furthermore, several of the problem instances typically used for benchmarking min-cut/max-flow algorithms are no longer available at their original locations and may be difficult to find. By storing the data on Zenodo with a dedicated DOI we hope to avoid this. For license information, see below.

    Files and formats

    We provide all problem instances in two file formats: DIMACS and a custom binary format. Both are described below. Each file has been zipped, and similar files have then been grouped into their own zip file (i.e. it is a zip of zips). DIMACS files have been prefixed with `dimacs_` and binary files have been prefixed with `bin_`.

    DIMACS

    All problem instances are available in DIMACS format (explained here: http://lpsolve.sourceforge.net/5.5/DIMACS_maxf.htm).

    For the larger problem instances, we have also published a partition of the graph nodes into blocks for block-based parallel min-cut/max-flow. The partition matches the one used in the companion review paper (Jensen et al., 2021). For a problem instance with filename `

    4
    0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3

    Binary files

    While DIMACS has the advantage of being human-readable, storing everything as text requires a lot of space. This makes the files unnecessarily large and slow to parse. To overcome this, we also release all problem instances in a simple binary storage format. We have two formats: one for graphs and one for quadratic pseudo-boolean optimization (QPBO) problems. Code to convert to/from DIMACS is also available at: https://www.doi.org/10.5281/zenodo.4903946 or https://github.com/patmjen/maxflow_algorithms.

    Binary BK (`.bbk`) files are for storing normal graphs for min-cut/max-flow. They closely follow the internal storage format used in the original implementation of the Boykov-Kolmogorov algorithm, meaning that terminal arcs are stored in a separate list from normal neighbor arcs. The format is:

    Uncompressed:

    Header: (3 x uint8) 'BBQ'
    Types codes: (2 x uint8) captype, tcaptype
    Sizes: (3 x uint64) num_nodes, num_terminal_arcs, num_neighbor_arcs
    Terminal arcs: (num_terminal_arcs x BkTermArc)
    Neighbor arcs: (num_neighbor_arcs x BkNborArc)

    Compressed (using Google's snappy: https://github.com/google/snappy):

    Header: (3 x uint8) 'bbq'
    Types codes: (2 x uint8) captype, tcaptype
    Sizes: (3 x uint64) num_nodes, num_terminal_arcs, num_neighbor_arcs
    Terminal arcs: (1 x uint64) compressed_bytes_1
            (compressed_bytes_1 x uint8) compressed num_terminal_arcs x BkTermArc
    Neighbor arcs: (1 x uint64) compressed_bytes_2
            (compressed_bytes_2 x uint8) compressed num_neighbor_arcs x BkNborArc
    

    Where:

    /** Enum for switching over POD types. */
    enum TypeCode : uint8_t {
      TYPE_UINT8,
      TYPE_INT8,
      TYPE_UINT16,
      TYPE_INT16,
      TYPE_UINT32,
      TYPE_INT32,
      TYPE_UINT64,
      TYPE_INT64,
      TYPE_FLOAT,
      TYPE_DOUBLE,
      TYPE_INVALID = 0xFF
    };
    
    /** Terminal arc with source and sink capacity for given node. */
    template 

    Binary QPBO (`.bq`) files are for storing QPBO problems. Unary and binary terms are stored in separate lists. The format is:

    Uncompressed:

    Header: (5 x uint8) 'BQPBO'
    Types codes: (1 x uint8) captype
    Sizes: (3 x uint64) num_nodes, num_unary_terms, num_binary_terms
    Unary arcs: (num_unary_terms x BkUnaryTerm)
    Binary arcs: (num_binary_terms x BkBinaryTerm)

    Compressed (using Google's snappy: https://github.com/google/snappy):

    Header: (5 x uint8) 'bqpbo'
    Types codes: (1 x uint8) captype
    Sizes: (3 x uint64) num_nodes, num_unary_terms, num_binary_terms
    Unary terms: (1 x uint64) compressed_bytes_1
           (compressed_bytes_1 x uint8) compressed num_unary_terms x BkUnaryTerm
    Binary terms: (1 x uint64) compressed_bytes_2
           (compressed_bytes_2 x uint8) compressed num_binary_terms x BkBinaryTerm
    

    Where:

    /** Enum for switching over POD types. */
    enum TypeCode : uint8_t {
      TYPE_UINT8,
      TYPE_INT8,
      TYPE_UINT16,
      TYPE_INT16,
      TYPE_UINT32,
      TYPE_INT32,
      TYPE_UINT64,
      TYPE_INT64,
      TYPE_FLOAT,
      TYPE_DOUBLE,
      TYPE_INVALID = 0xFF
    };
    
    /** Unary term */
    template 

    Block (`.blk`) files are for storing a partition of the graph nodes into disjoint blocks. The format is:

    Nodes: uint64_t num_nodes
    Blocks: uint16_t num_blocks
    Data: (num_nodes x uint16_t) node_blocks

    We do not claim ownership over the problem instances from the University of Waterloo and those from (Verma and Batra, 2012). Please contact the original sources for additional information. We publish the datasets from (Jeppesen et al., 2020) and (Jensen et al., 2020) under the Creative Commons Attribution 4.0 International (CC BY 4.0), see https://creativecommons.org/licenses/by/4.0/. The specific files are (not counting their `dimacs_`/`bin_` prefix):

    University of Waterloo (consult original sources)

    • adhead.zip
    • babyface.zip
    • bone.zip
    • bone_sub.zip
    • liver.zip
    • BL06.zip
    • BVZ.zip
    • LB07.zip
    • KZ2.zip

    Verma & Batra, 2012 (consult original sources):

    • ale.zip
    • dtf.zip
    • deconv.zip
    • super_res.zip
    • texture.zip

    Jeppesen et al., 2020 (CC BY 4.0):

    • NT32_tomo3.zip
    • NT32_tomo3_216.zip

    Jensen et al., 2020 (CC BY 4.0):

    • cells.zip
    • foam.zip
    • simcells.zip
  12. n

    Sparsity-constrained wavefront optimization by leveraging complex media

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li-Yu Yu; Sixian You (2024). Sparsity-constrained wavefront optimization by leveraging complex media [Dataset]. http://doi.org/10.5061/dryad.wdbrv15wk
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2024
    Dataset provided by
    Massachusetts Institute of Technology
    Authors
    Li-Yu Yu; Sixian You
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Wavefront shaping gains increasing importance in complex photonics, which can manipulate light spatially and temporally to counter the scattering effect. Important applications include deep-tissue imaging, microendoscopy, optical communications, nanofabrication, and remote sensing. However, high-speed and high-fidelity wavefront shaping is fundamentally hindered by the dimensionality limitation of hardware devices, evinced by the competition between the frame rate, pixel count, and modulation depth. To overcome the speed-fidelity tradeoff, we leverage complex media (e.g., diffusers or multimode fibers) as analogue random multiplexers for pattern compression to address the demand for high-dimensional spatiotemporal control. Sparsity-constrained wavefront optimization is designed to solve the problem by seeking a low-dimensional, robust representation of wavefronts with a carefully designed sparsity constraint. This optimization framework can achieve high-fidelity wavefront shaping through complex media using high-speed, yet relatively low-precision spatial light modulation devices (e.g., digital micromirror devices) without compromising the frame rate.

    Methods The dataset contains an experimentally calibrated complex-field transmission matrix of a graded-index multimode fiber (GIF50C, Thorlabs) and 1000 preprocessed test images extracted from the Fashion-MNIST dataset. The script takes the preprocessed test images as the ground truth. It carries out the sparsity-constrained wavefront optimization to solve for the wavefront to generate those test images through the multimode fiber given its transmission matrix.
    In brief, the transmission matrix was measured by raster scanning the proximal end of the multimode fiber using a DMD and recording the corresponding speckles at the distal end using off-axis holography. The test images were first downsampled and interpolated to match the coordinate of the distal end of the multimode fiber. Then, they were vectorized to a one-dimensional vector to comply with the format of the transmission matrix. The initial guesses were obtained by performing the Gerchberg-Saxton algorithm with 10 iterations. All the implementation details can be found in Section 4.3 of our paper https://arxiv.org/abs/2302.10254.

  13. Data Mining for IVHM using Sparse Binary Ensembles, Phase I

    • data.nasa.gov
    application/rdfxml +5
    Updated Jun 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Data Mining for IVHM using Sparse Binary Ensembles, Phase I [Dataset]. https://data.nasa.gov/dataset/Data-Mining-for-IVHM-using-Sparse-Binary-Ensembles/qfus-evzq
    Explore at:
    xml, tsv, csv, application/rssxml, application/rdfxml, jsonAvailable download formats
    Dataset updated
    Jun 26, 2018
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    In response to NASA SBIR topic A1.05, "Data Mining for Integrated Vehicle Health Management", Michigan Aerospace Corporation (MAC) asserts that our unique SPADE (Sparse Processing Applied to Data Exploitation) technology meets a significant fraction of the stated criteria and has functionality that enables it to handle many applications within the aircraft lifecycle. SPADE distills input data into highly quantized features and uses MAC's novel techniques for constructing Ensembles of Decision Trees to develop extremely accurate diagnostic/prognostic models for classification, regression, clustering, anomaly detection and semi-supervised learning tasks. These techniques are currently being employed to do Threat Assessment for satellites in conjunction with researchers at the Air Force Research Lab. Significant advantages to this approach include: 1) completely data driven; 2) training and evaluation are faster than conventional methods; 3) operates effectively on huge datasets (> billion samples X > million features), 4) proven to be as accurate as state-of-the-art techniques in many significant real-world applications. The specific goals for Phase 1 will be to work with domain experts at NASA and with our partners Boeing, SpaceX and GMV Space Systems to delineate a subset of problems that are particularly well-suited to this approach and to determine requirements for deploying algorithms on platforms of opportunity.

  14. f

    The basic statistics of using a PFNF algorithm on Douban datasets.

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waleed Reafee; Naomie Salim; Atif Khan (2023). The basic statistics of using a PFNF algorithm on Douban datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0154848.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Waleed Reafee; Naomie Salim; Atif Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The basic statistics of using a PFNF algorithm on Douban datasets.

  15. p

    PubMed MultiLabel Text Classification Dataset MeSH - Dataset - CKAN

    • data.poltekkes-smg.ac.id
    Updated Oct 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). PubMed MultiLabel Text Classification Dataset MeSH - Dataset - CKAN [Dataset]. https://data.poltekkes-smg.ac.id/dataset/pubmed-multilabel-text-classification-dataset-mesh
    Explore at:
    Dataset updated
    Oct 9, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of an approx 50k collection of research articles from PubMed repository. Originally these documents are manually annotated by Biomedical Experts with their MeSH labels and each article are described in terms of 10-15 MeSH labels. In this Dataset we have huge numbers of labels present as a MeSH major, raising the issue of extremely large output space and severe label sparsity issues. To solve this issue, the Dataset has been Processed and mapped to its root as described below.

  16. m

    Data from: Local kernel regression and neural network approaches to the...

    • archive.materialscloud.org
    application/gzip, bin +1
    Updated Aug 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raimon Fabregat; Alberto Fabrizio; Edgar Engel; Benjamin Meyer; Veronika Juraskova; Michele Ceriotti; Clemence Corminboeuf; Raimon Fabregat; Alberto Fabrizio; Edgar Engel; Benjamin Meyer; Veronika Juraskova; Michele Ceriotti; Clemence Corminboeuf (2021). Local kernel regression and neural network approaches to the conformational landscapes of oligopeptides [Dataset]. http://doi.org/10.24435/materialscloud:kp-82
    Explore at:
    text/markdown, bin, application/gzipAvailable download formats
    Dataset updated
    Aug 10, 2021
    Dataset provided by
    Materials Cloud
    Authors
    Raimon Fabregat; Alberto Fabrizio; Edgar Engel; Benjamin Meyer; Veronika Juraskova; Michele Ceriotti; Clemence Corminboeuf; Raimon Fabregat; Alberto Fabrizio; Edgar Engel; Benjamin Meyer; Veronika Juraskova; Michele Ceriotti; Clemence Corminboeuf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The application of machine learning to theoretical chemistry has made it possible to combine the accuracy of quantum chemical energetics with the thorough sampling of finite-temperature fluctuations. To reach this goal, a diverse set of methods has been proposed, ranging from simple linear models to kernel regression and highly nonlinear neural networks. Here we apply two widely different approaches to the same, challenging problem - the sampling of the conformational landscape of polypeptides at finite temperature. We develop a Local Kernel Regression (LKR) coupled with a supervised sparsity method and compare it with a more established approach based on Behler-Parrinello type Neural Networks. In the context of the LKR, we discuss how the supervised selection of the reference pool of environments is crucial to achieve accurate potential energy surfaces at a competitive computational cost and leverage the locality of the model to infer which chemical environments are poorly described by the DFTB baseline. We then discuss the relative merits of the two frameworks and perform Hamiltonian-reservoir replica-exchange Monte Carlo sampling and metadynamics simulations, respectively, to demonstrate that both frameworks can achieve converged and transferable sampling of the conformational landscape of complex and flexible biomolecules with comparable accuracy and computational cost.

  17. u

    Data from: Video-rate raman-based metabolic imaging by airy light-sheet...

    • data.nkn.uidaho.edu
    • verso.uidaho.edu
    • +1more
    Updated Feb 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas E. Vasdekis (2023). Data from: Video-rate raman-based metabolic imaging by airy light-sheet illumination and photon-sparse detection [Dataset]. http://doi.org/10.11578/1908656
    Explore at:
    Dataset updated
    Feb 12, 2023
    Dataset provided by
    University of Idaho
    Authors
    Andreas E. Vasdekis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data supporting manuscript submitted to PNAS: Video-Rate Raman-based Metabolic Imaging by Airy Light-Sheet Illumination and Photon-Sparse Detection. The data set includes: [1] raw data and [2] related images used in the analyses described within the manuscript. Despite its massive potential, Raman imaging represents just a modest fraction of all research and clinical microscopy to date. This is due to the ultralow Raman scattering cross-sections of most biomolecules that impose low-light or photon-sparse conditions. Bioimaging under such conditions is suboptimal, as it either results in ultralow frame rates or requires increased levels of irradiance. Here, we overcome this tradeoff by introducing Raman imaging that operates at both video rates and 1,000-fold lower irradiance than state-of-the-art methods. To accomplish this, we deployed a judicially designed Airy light-sheet microscope to efficiently image large specimen regions. Further, we implemented subphoton per pixel image acquisition and reconstruction to confront issues arising from photon sparsity at just millisecond integrations. We demonstrate the versatility of our approach by imaging a variety of samples, including the three-dimensional (3D) metabolic activity of single microbial cells and the underlying cell-to-cell variability. To image such small-scale targets, we again harnessed photon sparsity to increase magnification without a field-of-view penalty, thus, overcoming another key limitation in modern light-sheet microscopy.

  18. f

    Comparison of the proposed model with baseline algorithms for RMSE on the...

    • plos.figshare.com
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan (2023). Comparison of the proposed model with baseline algorithms for RMSE on the basis of SR. [Dataset]. http://doi.org/10.1371/journal.pone.0273486.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of the proposed model with baseline algorithms for RMSE on the basis of SR.

  19. d

    Data from: A Davidson program for finding a few selected extreme eigenpairs...

    • elsevier.digitalcommonsdata.com
    Updated Jan 1, 1994
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Stathopoulos (1994). A Davidson program for finding a few selected extreme eigenpairs of a large, sparse, real, symmetric matrix [Dataset]. http://doi.org/10.17632/tss3rwyynt.1
    Explore at:
    Dataset updated
    Jan 1, 1994
    Authors
    Andreas Stathopoulos
    License

    https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/

    Description

    Abstract A program is presented for determining a few selected eigenvalues and their eigenvectors on either end of the spectrum of a large, real, symmetric matrix. Based on the Davidson method, which is extensively used in quantum chemistry/physics, the current implementation improves the power of the original algorithm by adopting several extensions. The matrix-vector multiplication routine that it requires is to be provided by the user. Different matrix formats and optimizations are thus feasible. E...

    Title of program: DVDSON Catalogue Id: ACPZ_v1_0

    Nature of problem Finding a few extreme eigenpairs of a real, symmetric matrix is of great importance in scientific computations. Examples abound in structural engineering, quantum chemistry and electronic structure physics [1,2]. The matrices involved are usually too large to be efficiently solved using standard methods. Moreover, their large size often prohibits full storage forcing various sparse representations. Even sparse representations cannot always be stored in main memory [3]. Thus, an iterative method ...

    Versions of this program held in the CPC repository in Mendeley Data ACPZ_v1_0; DVDSON; 10.1016/0010-4655(94)90073-6

    This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)

  20. d

    Data from: Pseudo-Label Generation for Multi-Label Text Classification

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Pseudo-Label Generation for Multi-Label Text Classification [Dataset]. https://catalog.data.gov/dataset/pseudo-label-generation-for-multi-label-text-classification
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan (2023). Performance comparison for user cold-start problem on BE-dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0273486.t006

Performance comparison for user cold-start problem on BE-dataset.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 6, 2023
Dataset provided by
PLOS ONE
Authors
Syed Irteza Hussain Jafri; Rozaida Ghazali; Irfan Javid; Zahid Mahmood; Abdullahi Abdi Abubakar Hassan
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Performance comparison for user cold-start problem on BE-dataset.

Search
Clear search
Close search
Google apps
Main menu