Facebook
TwitterThe DMIS dataset is a flat file record of the matching of several data set collections. Primarily it consists of VTRs, dealer records, Observer data in conjunction with vessel permit information for the purpose of supporting North East Regional quota monitoring projects.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Photographic capture–recapture is a valuable tool for obtaining demographic information on wildlife populations due to its noninvasive nature and cost-effectiveness. Recently, several computer-aided photo-matching algorithms have been developed to more efficiently match images of unique individuals in databases with thousands of images. However, the identification accuracy of these algorithms can severely bias estimates of vital rates and population size. Therefore, it is important to understand the performance and limitations of state-of-the-art photo-matching algorithms prior to implementation in capture–recapture studies involving possibly thousands of images. Here, we compared the performance of four photo-matching algorithms; Wild-ID, I3S Pattern+, APHIS, and AmphIdent using multiple amphibian databases of varying image quality. We measured the performance of each algorithm and evaluated the performance in relation to database size and the number of matching images in the database. We found that algorithm performance differed greatly by algorithm and image database, with recognition rates ranging from 100% to 22.6% when limiting the review to the 10 highest ranking images. We found that recognition rate degraded marginally with increased database size and could be improved considerably with a higher number of matching images in the database. In our study, the pixel-based algorithm of AmphIdent exhibited superior recognition rates compared to the other approaches. We recommend carefully evaluating algorithm performance prior to using it to match a complete database. By choosing a suitable matching algorithm, databases of sizes that are unfeasible to match “by eye” can be easily translated to accurate individual capture histories necessary for robust demographic estimates.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and Code to accompany the paper "Correlation Neglect in Student-to-School Matching."Abstract: We present results from three experiments containing incentivized school-choice scenarios. In these scenarios, we vary whether schools' assessments of students are based on a common priority (inducing correlation in admissions decisions) or are based on independent assessments (eliminating correlation in admissions decisions). The quality of students' application strategies declines in the presence of correlated admissions: application strategies become substantially more aggressive and fail to include attractive ``safety'' options. We provide a battery of tests suggesting that this phenomenon is at least partially driven by correlation neglect, and we discuss implications for the design and deployment of student-to-school matching mechanisms.
Facebook
TwitterThis dataset contains detailed match and player data from League of Legends, one of the most popular multiplayer online battle arena (MOBA) games in the world. It includes 35,000 matches and contains 78,000 summoner statistics, capturing a wide range of in-game statistics, such as champion selection, player performance metrics, match outcomes, and more.
The dataset is structured to support a variety of analyses, including:
Whether you are interested in competitive gaming, data science, or predictive modeling, this dataset provides a rich source of structured data to explore the dynamics of League of Legends at scale.
Data was collected from Riot Games API using Python script(link) from Patch 25.19
The datase consists of 7 csv files:
-MySQL Database using Linux -Database Schema Script can be found here. (Works with the gtihub project to collect your own data)
The Riot API only provides the "BOTTOM" lane for bot-lane players. During Data collection, roles were inferred by combining chapions that often played support with CS metrics to distinguish ADC vs Support — especially for ambiguous picks like Senna or off-meta choices.
Data is collected using the official Riot Games API. We thank Riot Games for providing the data and tools that make this project possible. This dataset is not endorsed or certified by Riot Games. No personal or identifiable player data (e.g., Summoner Names, Summoner IDs, or PUUIDs) are included. The SummonerTbl has been intentionally excluded from this public release.
The Python scripts used for data collection, as well as various scripts I developed for API calls, database management, and initial data analytics, can be found on GitHub
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The goal of the Patient Matching Algorithm Challenge is to bring about greater transparency and data on the performance of existing patient matching algorithms, spur the adoption of performance metrics for patient data matching algorithm vendors, and positively impact other aspects of patient matching such as deduplication and linking to clinical data. Participants will be provided a data set and will have their answers evaluated and scored against a master key. Up to 6 cash prizes will be awarded with a total purse of up to $75,000.00.https://www.patientmatchingchallenge.com/The test dataset used in the ONC Patient Matching Algorithm Challenge is available for download by students, researchers, or anyone else interested in additional analysis and patient matching algorithm development. More information about the Patient Matching Algorithm Challenge can be found: https://www.patientmatchingchallenge.com/.The dataset containing 1 million patients was split into eight files of alphabetical groupings by the the patient's last name, plus an additional file containing test patients with no last name recorded (Null). All files should be downloaded and merged for analysis.https://github.com/onc-healthit/patient-matching
Facebook
Twitter“Number of triplets” is the number of triplets in specific relation involved in top 100 paths. “Predictions” is the number of relations neither in data sets nor in chosen databases. “Proven predictions” is the number of relations not in data sets but matched with chosen databases.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset covers competitive gaming matches from Nov '23 to 12/01/2024. Player statuses are collected 90 days prior to each match, specifically from the map where the match occurred. It includes team rankings and team map winrates.
Snapshot of gaming from Nov '23 to 12/01/2024, with player statuses reflecting the 90 days before each match.
Player stats linked to the map of each match.
Team Map Winrate: Insight into team proficiency on specific maps.
Track how player statuses were over the 90 days before the match.
Evaluate team dominance, dynamics, and trends.
Provide valuable insights for betting enthusiasts by leveraging historical performance data for informed decisions.
Disclaimer: Efforts have been made for accuracy, but gaming variability may impact findings. Cross-reference with additional data for thorough analysis.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The ARG Database is a huge collection of labeled and unlabeled graphs realized by the MIVIA Group. The aim of this collection is to provide the graph research community with a standard test ground for the benchmarking of graph matching algorithms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Many e-shops have started to mark-up product data within their HTML pages using the schema.org vocabulary. The Web Data Commons project regularly extracts such data from the Common Crawl, a large public web crawl. The Web Data Commons Training and Test Sets for Large-Scale Product Matching contain product offers from different e-shops in the form of binary product pairs (with corresponding label “match” or “no match”) for four product categories, computers, cameras, watches and shoes. In order to support the evaluation of machine learning-based matching methods, the data is split into training, validation and test sets. For each product category, we provide training sets in four different sizes (2.000-70.000 pairs). Furthermore there are sets of ids for each training set for a possible validation split (stratified random draw) available. The test set for each product category consists of 1.100 product pairs. The labels of the test sets were manually checked while those of the training sets were derived using shared product identifiers from the Web weak supervision. The data stems from the WDC Product Data Corpus for Large-Scale Product Matching - Version 2.0 which consists of 26 million product offers originating from 79 thousand websites. For more information and download links for the corpus itself, please follow the links below.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
To construct the datasaet, firstly, the Social Diagnosis (SD) data was used – a dataset based on results of a representative, longitudinal study conducted by an interdisciplinary board of academics in 2000-2015 [[i]]. Furthermore, the SD dataset has been combined with two datasets on sports infrastructure: The Statistics Poland (SP), conducted every four years, the latest, from 2014, was used for our analysis [[ii]] and the “Orliki” database, containing information on facilities constructed in 2008–2012 within a publicly-funded programme [[iii]]. The data has been combined by calculation indicators of availability: number of sports facilities of particular types per capita, at the NUTS (Classification of Territorial Units for Statistics) 3 level.
Czapiński J, Panek T. Social diagnosis. 2015. Available from: http://www.diagnoza.com/.
Statistics Poland. Surveys on sports facilities - results of KFT-OB/a (municipalities), KFT-OB/b (external administrators), KFT-1 (sport clubs) surveys. Warsaw: Statistics Poland. 2015.
Biernat E, Piątkowska M, Zembura P, Gołdys A. Problem zarządzania orlikami z perspektywy animatorów z gmin wiejskich i miejskich [Challenges of management of Orlik pitches in the perspective of the animators from municipal and rural communities]. Przeds Zarz. 2017;18(8):429–44. Polish.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Researchers are often interested in linking individuals between two datasets that lack a common unique identifier. Matching procedures often struggle to match records with common names, birthplaces, or other field values. Computational feasibility is also a challenge, particularly when linking large datasets. We develop a Bayesian method for automated probabilistic record linkage and show it recovers more than 50% more true matches, holding accuracy constant, than comparable methods in a matching of military recruitment data to the 1900 U.S. Census for which expert-labeled matches are available. Our approach, which builds on a recent state-of-the-art Bayesian method, refines the modeling of comparison data, allowing disagreement probability parameters conditional on nonmatch status to be record-specific in the smaller of the two datasets. This flexibility significantly improves matching when many records share common field values. We show that our method is computationally feasible in practice, despite the added complexity, with an R/C++ implementation that achieves a significant improvement in speed over comparable recent methods. We also suggest a lightweight method for treatment of very common names and show how to estimate true positive rate and positive predictive value when true match status is unavailable.
Facebook
TwitterWe are analysing family circumstances and education by matching parent and pupil data. The data compares household income and educational outcomes of pupils in England.
Read more information about how we share student and workforce data.
To ensure this privacy notice is up to date, we will review this information annually.
Facebook
TwitterA multistage tandem mass spectral database using a variety of structurally defined glycans. It provides tools for glycomics research that enable users to identify glycans by spectral matching. The database stores MS2, MS3, and MS4 spectra of N-and O-linked glycans, and glycolipid glycans as well as the partial structures of these glycans.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains the materials needed to replicate the results presented in Mozer et al. (2019), "Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality", forthcoming in Political Analysis.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A primary challenge for researchers that make use of observational data is selection bias (i.e., the units of analysis exhibit systematic differences and dis-homogeneities due to non-random selection into treatment). This article encourages researchers in acknowledging this problem and discusses how and - more importantly - under which assumptions they may resort to statistical matching techniques to reduce the imbalance in the empirical distribution of pre-treatment observable variables between the treatment and control groups. With the aim of providing a practical guidance, the article engages with the evaluation of the effectiveness of peacekeeping missions in the case of the Bosnian civil war, a research topic in which selection bias is a structural feature of the observational data researchers have to use, and shows how to apply the Coarsened Exact Matching (CEM), the most widely used matching algorithm in the fields of Political Science and International Relations.
Facebook
TwitterRecreational and aesthetic enjoyment of public lands is increasing across a wide range of activities, highlighting the need to assess and adapt management to accommodate these uses. Despite a growing number of studies on mapping cultural ecosystem services, most are local- scale assessments that rely on costly and time-consuming primary data collection. As a result, the availability of spatial information on non-market values associated with cultural ecosystem services (social values) remains limited. Spatial function transfer, if it could be justified for social-value models, would expedite the development of social-value information and promote its more regular inclusion in ecosystem service assessments. We used survey data from six national forests in Colorado and Wyoming to explore the potential for transferring cultural ecosystem service models between forests and specifically to test the hypothesis that transfer performance increases with social-context similarity between transferring and receiving areas. Results confirm this relationship but fall just short of being able to predict with certainty when transferred models will meet the minimum performance criterion needed for defensible use by managers. Social values are highly variable and can be difficult to predict, but our results suggest that with the right combination of indicators that spatial function transfer can become a defensible means of generating social-value information when primary data collection is not feasible.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
mzhappyface/celeb-face-matching-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Noor-ai/task-matching-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterA simple, replicable methodology can help researchers link corporate loan datasets.
Facebook
TwitterThis study proposes a framework for developing a novel deep learning-based map-matching model in the limited ground-truth data environment.
Facebook
TwitterThe DMIS dataset is a flat file record of the matching of several data set collections. Primarily it consists of VTRs, dealer records, Observer data in conjunction with vessel permit information for the purpose of supporting North East Regional quota monitoring projects.