100+ datasets found
  1. d

    Data from: Highly Scalable Matching Pursuit Signal Decomposition Algorithm

    • catalog.data.gov
    • datasets.ai
    • +3more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Highly Scalable Matching Pursuit Signal Decomposition Algorithm [Dataset]. https://catalog.data.gov/dataset/highly-scalable-matching-pursuit-signal-decomposition-algorithm
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    In this research, we propose a variant of the classical Matching Pursuit Decomposition (MPD) algorithm with significantly improved scalability and computational performance. MPD is a powerful iterative algorithm that decomposes a signal into linear combinations of its dictionary elements or “atoms”. A best fit atom from an arbitrarily defined dictionary is determined through cross-correlation. The selected atom is subtracted from the signal and this procedure is repeated on the residual in the subsequent iterations until a stopping criteria is met. A sufficiently large dictionary is required for an accurate reconstruction; this in return increases the computational burden of the algorithm, thus limiting its applicability and level of adoption. Our main contribution lies in improving the computational efficiency of the algorithm to allow faster decomposition while maintaining a similar level of accuracy. The Correlation Thresholding and Multiple Atom Extractions techniques were proposed to decrease the computational burden of the algorithm. Correlation thresholds prune insignificant atoms from the dictionary. The ability to extract multiple atoms within a single iteration enhanced the effectiveness and efficiency of each iteration. The proposed algorithm, entitled MPD++, was demonstrated using real world data set.

  2. Matching results between landmark in different sources and landmark in a...

    • zenodo.org
    csv
    Updated Oct 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie-Dominique Van Damme; Ana-Maria Olteanu-Raimond; Marie-Dominique Van Damme; Ana-Maria Olteanu-Raimond (2022). Matching results between landmark in different sources and landmark in a referenced dataset (BDTOPO) [Dataset]. http://doi.org/10.5281/zenodo.6483785
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 1, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marie-Dominique Van Damme; Ana-Maria Olteanu-Raimond; Marie-Dominique Van Damme; Ana-Maria Olteanu-Raimond
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The four datasets represent the results of a two sequentials processus. The first processus consists on a automatic matching between landmark in different sources and landmark in a referenced dataset (french national topographic data: BDTOPO). Then the links 1:1 are manually validated by experts in the second processus.

    The four different datasets and the BDTOPO dataset are archived here.

    The data matching algorithm is described in this paper.

    Each file represents the result matching for features belonging to a data source with:

    - the name of file depends on the data source

    - id_source corresponds to the identify of the landmark in data source

    - types_of_matching_results describes the type of result matching :

    • « 1:0 » means that a landmark from a data source (e.g. Camptocamp) has no homologue landmark in BDTOPO
    • « 1:1 validated » means that a homologous feature exist in BDTOPO and the link was validated
    • « 1:1 non validated » means that the matching link was not validated
    • « without candidates » represents the non-matched landmarks because there are no candidates in BDTOPO or because the landmark in data source is far away from its homologous in BDTOPO
    • « uncertain » : uncertainty cases are complex cases where any decision is taken by the data matching algorithm

    - id_bdtopo corresponds to the identify of the landmark in BDTOPO if and only if there is a validated matching link


  3. Data from: Stochastic Matching DataSet

    • kaggle.com
    Updated Jul 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    knightwayne (2023). Stochastic Matching DataSet [Dataset]. https://www.kaggle.com/datasets/knightwayne/stochastic-matching-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    knightwayne
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by knightwayne

    Released under CC BY-SA 3.0

    Contents

  4. o

    Valentine Datasets

    • explore.openaire.eu
    • data.niaid.nih.gov
    Updated Jul 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christos Koutras; Georgios Siachamis; Andra Ionescu; Kyriakos Psarakis; Jerry Brons; Marios Fragkoulis; Christoph Lofi; Angela Bonifati; Asterios Katsifodimos (2021). Valentine Datasets [Dataset]. http://doi.org/10.5281/zenodo.5084604
    Explore at:
    Dataset updated
    Jul 9, 2021
    Authors
    Christos Koutras; Georgios Siachamis; Andra Ionescu; Kyriakos Psarakis; Jerry Brons; Marios Fragkoulis; Christoph Lofi; Angela Bonifati; Asterios Katsifodimos
    Description

    Datasets used for evaluating state-of-the-art schema matching methods in the paper "Valentine: Evaluating Matching Techniques for Dataset Discovery" , which was accepted for presentation in IEEE ICDE 2021. They come in the form of fabricated pairs respecting a relatedness scenario as discussed in the paper.

  5. ONC Patient Matching Algorithm Challenge Data

    • linkagelibrary.icpsr.umich.edu
    Updated Sep 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of the National Coordinator for Health (2019). ONC Patient Matching Algorithm Challenge Data [Dataset]. http://doi.org/10.3886/E111962V1
    Explore at:
    Dataset updated
    Sep 20, 2019
    Authors
    Office of the National Coordinator for Health
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The goal of the Patient Matching Algorithm Challenge is to bring about greater transparency and data on the performance of existing patient matching algorithms, spur the adoption of performance metrics for patient data matching algorithm vendors, and positively impact other aspects of patient matching such as deduplication and linking to clinical data. Participants will be provided a data set and will have their answers evaluated and scored against a master key. Up to 6 cash prizes will be awarded with a total purse of up to $75,000.00.https://www.patientmatchingchallenge.com/The test dataset used in the ONC Patient Matching Algorithm Challenge is available for download by students, researchers, or anyone else interested in additional analysis and patient matching algorithm development. More information about the Patient Matching Algorithm Challenge can be found: https://www.patientmatchingchallenge.com/.The dataset containing 1 million patients was split into eight files of alphabetical groupings by the the patient's last name, plus an additional file containing test patients with no last name recorded (Null). All files should be downloaded and merged for analysis.https://github.com/onc-healthit/patient-matching

  6. i

    Map Matching Dataset

    • ieee-dataport.org
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youliang chen (2023). Map Matching Dataset [Dataset]. https://ieee-dataport.org/documents/map-matching-dataset
    Explore at:
    Dataset updated
    Oct 10, 2023
    Authors
    Youliang chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    China

  7. H

    Replication Data for: Why Propensity Scores Should Not Be Used for Matching

    • dataverse.harvard.edu
    Updated Jan 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Nielsen; Gary King (2019). Replication Data for: Why Propensity Scores Should Not Be Used for Matching [Dataset]. http://doi.org/10.7910/DVN/A9LZNV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Richard Nielsen; Gary King
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Abstract: We show that propensity score matching (PSM), an enormously popular method of preprocessing data for causal inference, often accomplishes the opposite of its intended goal — thus increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment. PSM is thus uniquely blind to the often large portion of imbalance that can be eliminated by approximating full blocking with other matching methods. Moreover, in data balanced enough to approximate complete randomization, either to begin with or after pruning some observations, PSM approximates random matching which, we show, increases imbalance even relative to the original data. Although these results suggest researchers replace PSM with one of the other available matching methods, propensity scores have other productive uses.

  8. Data from: Automated Linking of Historical Data

    • linkagelibrary.icpsr.umich.edu
    Updated Aug 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ran Abramitzky; Leah Boustan; Katherine Eriksson; James Feigenbaum; Santiago Perez (2020). Automated Linking of Historical Data [Dataset]. http://doi.org/10.3886/E120703V1
    Explore at:
    Dataset updated
    Aug 20, 2020
    Authors
    Ran Abramitzky; Leah Boustan; Katherine Eriksson; James Feigenbaum; Santiago Perez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1850 - 1940
    Area covered
    United States
    Description

    Currently, the repository provides codes for two such methods:The ABE fully automated approach: This approach is a fully automated method for linking historical datasets (e.g. complete-count Censuses) by first name, last name and age. The approach was first developed by Ferrie (1996) and adapted and scaled for the computer by Abramitzky, Boustan and Eriksson (2012, 2014, 2017). Because names are often misspelled or mistranscribed, our approach suggests testing robustness to alternative name matching (using raw names, NYSIIS standardization, and Jaro-Winkler distance). To reduce the chances of false positives, our approach suggests testing robustness by requiring names to be unique within a five year window and/or requiring the match on age to be exact.A fully automated probabilistic approach (EM): This approach (Abramitzky, Mill, and Perez 2019) suggests a fully automated probabilistic method for linking historical datasets. We combine distances in reported names and ages between each two potential records into a single score, roughly corresponding to the probability that both records belong to the same individual. We estimate these probabilities using the Expectation-Maximization (EM) algorithm, a standard technique in the statistical literature. We suggest a number of decision rules that use these estimated probabilities to determine which records to use in the analysis.

  9. J

    Random Recursive Partitioning: a matching method for the estimation of the...

    • journaldata.zbw.eu
    .rda, csv, txt, zip
    Updated Dec 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Porro; Stefano Maria Iacus; Giuseppe Porro; Stefano Maria Iacus (2022). Random Recursive Partitioning: a matching method for the estimation of the average treatment effect (replication data) [Dataset]. http://doi.org/10.15456/jae.2022319.1304251755
    Explore at:
    csv(13692), .rda(118659), zip(18375), txt(3478), csv(40569), csv(166644), csv(169710), csv(21498), csv(177445)Available download formats
    Dataset updated
    Dec 8, 2022
    Dataset provided by
    ZBW - Leibniz Informationszentrum Wirtschaft
    Authors
    Giuseppe Porro; Stefano Maria Iacus; Giuseppe Porro; Stefano Maria Iacus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this paper we introduce the Random Recursive Partitioning (RRP) matching method. RRP generates a proximity matrix which might be useful in econometric applications like average treatment effect estimation. RRP is a Monte Carlo method that randomly generates non-empty recursive partitions of the data and evaluates the proximity between two observations as the empirical frequency they fall in a same cell of these random partitions over all Monte Carlo replications. From the proximity matrix it is possible to derive both graphical and analytical tools to evaluate the extent of the common support between data sets. The RRP method is honest in that it does not match observations at any cost: if data sets are separated, the method clearly states it. The match obtained with RRP is invariant under monotonic transformation of the data. Average treatment effect estimators derived from the proximity matrix seem to be competitive compared to more commonly used estimators. RRP method does not require a particular structure of the data and for this reason it can be applied when distances like Mahalanobis or Euclidean are not suitable, in the presence of missing data or when the estimated propensity score is too sensitive to model specifications.

  10. Web Data Commons Phones Dataset, Augmented Version, Fixed Splits

    • linkagelibrary.icpsr.umich.edu
    delimited
    Updated Nov 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Primpeli; Christian Bizer (2020). Web Data Commons Phones Dataset, Augmented Version, Fixed Splits [Dataset]. http://doi.org/10.3886/E127243V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Nov 23, 2020
    Dataset provided by
    University of Mannheim (Germany)
    Authors
    Anna Primpeli; Christian Bizer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Motivation: Entity Matching is the task of determining which records from different data sources describe the same real-world entity. It is an important task for data integration and has been the focus of many research works. A large number of entity matching/record linkage tasks has been made available for evaluating entity matching methods. However, the lack of fixed development and test splits as well as correspondence sets including both matching and non-matching record pairs hinders the reproducibility and comparability of benchmark experiments. In an effort to enhance the reproducibility and comparability of the experiments, we complement existing entity matching benchmark tasks with fixed sets of non-matching pairs as well as fixed development and test splits. Dataset Description: An augmented version of the wdc phones dataset for benchmarking entity matching/record linkage methods found at:http://webdatacommons.org/productcorpus/index.html#toc4 The augmented version adds fixed splits for training, validation and testing as well as their corresponding feature vectors. The feature vectors are built using data type specific similarity metrics.The dataset contains 447 records describing products deriving from 17 e-shops which are matched against a product catalog of 50 products. The gold standards have manual annotations for 258 matching and 22,092 non-matching pairs. The total number of attributes used to decribe the product records are 26 while the attribute density is 0.25. The augmented dataset enhances the reproducibility of matching methods and the comparability of matching results. The dataset is part of the CompERBench repository which provides 21 complete benchmark tasks for entity matching for public download: http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html

  11. m

    Ground-roll separation using intelligence based-matching method

    • data.mendeley.com
    • narcis.nl
    Updated Feb 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jinghe Li (2020). Ground-roll separation using intelligence based-matching method [Dataset]. http://doi.org/10.17632/xg237bzyxb.1
    Explore at:
    Dataset updated
    Feb 27, 2020
    Authors
    Jinghe Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Separation is achieved by intelligence based-matching of the curvelet coefficients.

  12. P

    WDC LSPM Dataset

    • paperswithcode.com
    • library.toponeai.link
    Updated May 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). WDC LSPM Dataset [Dataset]. https://paperswithcode.com/dataset/wdc-products
    Explore at:
    Dataset updated
    May 31, 2022
    Description

    Many e-shops have started to mark-up product data within their HTML pages using the schema.org vocabulary. The Web Data Commons project regularly extracts such data from the Common Crawl, a large public web crawl. The Web Data Commons Training and Test Sets for Large-Scale Product Matching contain product offers from different e-shops in the form of binary product pairs (with corresponding label "match" or "no match") for four product categories, computers, cameras, watches and shoes.

    In order to support the evaluation of machine learning-based matching methods, the data is split into training, validation and test sets. For each product category, we provide training sets in four different sizes (2.000-70.000 pairs). Furthermore there are sets of ids for each training set for a possible validation split (stratified random draw) available. The test set for each product category consists of 1.100 product pairs. The labels of the test sets were manually checked while those of the training sets were derived using shared product identifiers from the Web via weak supervision.

    The data stems from the WDC Product Data Corpus for Large-Scale Product Matching - Version 2.0 which consists of 26 million product offers originating from 79 thousand websites.

  13. Serie A Matches Dataset (2020-2025)

    • kaggle.com
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcel Biezunski (2025). Serie A Matches Dataset (2020-2025) [Dataset]. https://www.kaggle.com/datasets/marcelbiezunski/serie-a-matches-dataset-2020-2025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Marcel Biezunski
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Don't forget to upvote if you enjoy my work :)

    Serie A Match Results Dataset (2020–2025) was created in response to community requests following the release of my LaLiga Match Results Dataset.

    This dataset contains match-level results and performance stats from the Italian Serie A football league, covering seasons 2020 to 2025.

    Source: Data was collected using a custom Python web scraper from FBref.com (https://fbref.com/en/comps/11/Serie-A-Stats).

    Uses: - Match prediction models - Sports analytics - Feature engineering experiments - Educational ML datasets

    Licensing Intended for educational and research use only. All rights remain with original data providers.

  14. Data from: Comparative study on matching methods for the distinction of...

    • figshare.com
    zip
    Updated Jan 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Schorcht; Robert Hecht; Gotthard Meinel (2022). Comparative study on matching methods for the distinction of building modifications and replacements based on mul-ti-temporal building footprint data [Dataset]. http://doi.org/10.6084/m9.figshare.18027683.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 26, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Martin Schorcht; Robert Hecht; Gotthard Meinel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the input data and results used in the paper"Comparative study on matching methods for the distinction of building modifications and replacements based on mul-ti-temporal building footprint data".License information:The LoD1 data used as input in this study are openly available at Transparenzportal Hamburg (https://transparenz.hamburg.de/),from Freie und Hansestadt Hamburg, Landesbetrieb Geoinformation und Vermessung (LGV), in compliance with the licence dl-de/by-2-0 (https://www.govdata.de/dl-de/by-2-0):)Content:1. Input Footprints of non-identical pairs:input_reference_objects.zip2. Results without additional position deviation:results_without_deviation.zip3. Results with generated position deviation including geometries:results_with_deviation.zip

  15. H

    Replication Data for: The Balance-Sample Size Frontier in Matching Methods...

    • dataverse.harvard.edu
    pdf, tsv, txt +1
    Updated Jul 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2017). Replication Data for: The Balance-Sample Size Frontier in Matching Methods for Causal Inference [Dataset]. http://doi.org/10.7910/DVN/SURSEO
    Explore at:
    tsv(184878), tsv(925446), type/x-r-syntax(42824), pdf(66052), txt(1742)Available download formats
    Dataset updated
    Jul 1, 2017
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We propose a simplified approach to matching for causal inference that simultaneously optimizes both balance (similarity between the treated and control groups) and matched sample size. Existing approaches either fix the matched sample size and maximize balance or fix balance and maximize sample size, leaving analysts to settle for suboptimal solutions or attempt manual optimization by iteratively tweaking their matching method and rechecking balance. To jointly maximize balance and sample size, we introduce the matching frontier, the set of matching solutions with maximum balance for each possible sample size. Rather than iterating, researchers can choose matching solutions from the frontier for analysis in one step. We derive fast algorithms that calculate the matching frontier for several commonly used balance metrics. We demonstrate with analyses of the effect of sex on judging and job training programs that show how the methods we introduce can extract new knowledge from existing data sets.

  16. n

    Data from: From classification to matching: A CNN-based approach for...

    • narcis.nl
    • data.mendeley.com
    Updated Jan 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhao, X (via Mendeley Data) (2023). From classification to matching: A CNN-based approach for retrieving painted pottery images [Dataset]. http://doi.org/10.17632/xnk7s6xgxz.1
    Explore at:
    Dataset updated
    Jan 9, 2023
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Zhao, X (via Mendeley Data)
    Description

    Recently artificial intelligence has begun to assist archaeologists in processing images of archaeological artifacts. We report a convolutional neural network approach to obtain feature vectors of painted pottery images by a preliminary classification machine learning of the cultural types. The model, trained on a photographic image dataset of Chinese Neolithic color-painted pottery, achieved 92.58% precision in assigning vessel images to corresponding archaeological types. The feature vectors contain information of vessel shape, color, and ornamentation design, based on which similarity coefficients for the images in the dataset were calculated. The quantitative measurement of similarity allows searching for the closest match to artefacts in the dataset, as well as a network of vessels in terms of similarity. This work highlights the potential of CNN approaches in curating of archaeological artifacts, providing a new tool assisting to study chronology, typology, decoration design, etc.

  17. S

    The dataset after matching segmentation of Sohu events

    • scidb.cn
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zhang xiu (2025). The dataset after matching segmentation of Sohu events [Dataset]. http://doi.org/10.57760/sciencedb.j00133.00302
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Science Data Bank
    Authors
    zhang xiu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The dataset is sourced from the event matching part of the 2021 Sohu Campus Text Matching Algorithm Competition(https://www.biendata.xyz/competition/sohu_2021/). The event matching datasets released in the preliminary and final rounds were merged, and the event matching parts of short text and short text, short text and long text, and long text and long text were selected. 20% were used as the test set, the remaining 20% were used as the validation set, and the rest were used as the training set.

  18. i

    Data of Variable-Gain Servo Matching Lyu

    • ieee-dataport.org
    Updated Jul 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dun Lyu (2021). Data of Variable-Gain Servo Matching Lyu [Dataset]. https://ieee-dataport.org/documents/data-variable-gain-servo-matching-lyu
    Explore at:
    Dataset updated
    Jul 19, 2021
    Authors
    Dun Lyu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    5

  19. C

    AgLiMatch dataset

    • dataverse.csuc.cat
    • portalrecerca.udl.cat
    • +1more
    pcap, png, tsv, txt
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier Guevara; Javier Guevara; Jordi Gené Mola; Jordi Gené Mola; Eduard Gregorio López; Eduard Gregorio López; Miguel Torres-Torriti; Miguel Torres-Torriti; Giulio Reina; Giulio Reina; Fernando Auat Cheein; Fernando Auat Cheein (2025). AgLiMatch dataset [Dataset]. http://doi.org/10.34810/data2320
    Explore at:
    pcap(588461494), txt(89361), pcap(355137568), pcap(731088388), pcap(699472654), tsv(1135), txt(258148), txt(2774), pcap(552702982), txt(164443), txt(104279), txt(69031), pcap(514651510), txt(175790), png(1269595), pcap(368572624), pcap(64437102), txt(98805), txt(69652), pcap(265225278), txt(265982), txt(182), pcap(347767372), txt(211224), pcap(161080724), txt(205850)Available download formats
    Dataset updated
    Jun 6, 2025
    Dataset provided by
    CORA.Repositori de Dades de Recerca
    Authors
    Javier Guevara; Javier Guevara; Jordi Gené Mola; Jordi Gené Mola; Eduard Gregorio López; Eduard Gregorio López; Miguel Torres-Torriti; Miguel Torres-Torriti; Giulio Reina; Giulio Reina; Fernando Auat Cheein; Fernando Auat Cheein
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Dataset funded by
    Agencia Estatal de Investigación
    Generalitat de Catalunya
    Description

    The agricultural LiDAR data to evaluate scan matching techniques (AgLiMatch dataset) is comprised of a set of Velodyne VLP-16 LiDAR captures and the corresponding GNSS-RTK tracks acquired in a Fuji apple orchard using an autonomous platform. This dataset was used in [1] to evaluate scan matching techniques by comparing the platform path calculated using LiDAR scan matching techniques and the actual platform path ground truth measured with a GNSS-RTK system. The correspondence between each LiDAR file (inside /velodyne_data folder) and GNSS track file (inside /GNSS_data folder) is detailed in “Velodyne-GNSS_correspondence-data.xlsx” file. The relative position between the LiDAR sensor and the GNSS rover is shown in “experimental_setup.png”. Distance units are in mm.

  20. Pedestrian network attributes-- Datasets

    • figshare.com
    bin
    Updated Mar 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xue Yang; Kathleen Stewart; Mengyuan Fang; Luliang Tang (2021). Pedestrian network attributes-- Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.12660467.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 5, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Xue Yang; Kathleen Stewart; Mengyuan Fang; Luliang Tang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Title: Attributing pedestrian networks with semantic information based on multi-source spatial dataAbstract: The lack of associating pedestrian networks, i.e., the paths and roads used for non-vehicular travel, with information about semantic attribution is a major weakness for many applications, especially those supporting accurate pedestrian routing. Researchers have developed various algorithms to generate pedestrian walkways based on datasets, including high-resolution images, existing map databases, and GPS data; however, the semantic attribution of pedestrian walkways is often ignored. The objective of our study is to automatically extract semantic information including incline values and the different categories of pedestrian paths from multi-source spatial data, such as crowdsourced GPS tracking data, land use data, and motor vehicle road (MVR) networks. Incline values for each pedestrian path were derived from tracking data through elevation filtering using wavelet theory and a similarity-based map-matching method. To automatically categorize pedestrian paths into five classes including sidewalk, crosswalk, entrance walkway, indoor path, and greenway, we developed a hierarchical strategy of spatial analysis using land use data and MVR networks. The effectiveness of our proposed method is demonstrated using real datasets including GPS tracking data collected by volunteers, land use data acquired from OpenStreetMap, and MVR network data downloaded from Gaode Map.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dashlink (2025). Highly Scalable Matching Pursuit Signal Decomposition Algorithm [Dataset]. https://catalog.data.gov/dataset/highly-scalable-matching-pursuit-signal-decomposition-algorithm

Data from: Highly Scalable Matching Pursuit Signal Decomposition Algorithm

Related Article
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description

In this research, we propose a variant of the classical Matching Pursuit Decomposition (MPD) algorithm with significantly improved scalability and computational performance. MPD is a powerful iterative algorithm that decomposes a signal into linear combinations of its dictionary elements or “atoms”. A best fit atom from an arbitrarily defined dictionary is determined through cross-correlation. The selected atom is subtracted from the signal and this procedure is repeated on the residual in the subsequent iterations until a stopping criteria is met. A sufficiently large dictionary is required for an accurate reconstruction; this in return increases the computational burden of the algorithm, thus limiting its applicability and level of adoption. Our main contribution lies in improving the computational efficiency of the algorithm to allow faster decomposition while maintaining a similar level of accuracy. The Correlation Thresholding and Multiple Atom Extractions techniques were proposed to decrease the computational burden of the algorithm. Correlation thresholds prune insignificant atoms from the dictionary. The ability to extract multiple atoms within a single iteration enhanced the effectiveness and efficiency of each iteration. The proposed algorithm, entitled MPD++, was demonstrated using real world data set.

Search
Clear search
Close search
Google apps
Main menu