Facebook
TwitterThe DMIS dataset is a flat file record of the matching of several data set collections. Primarily it consists of VTRs, dealer records, Observer data in conjunction with vessel permit information for the purpose of supporting North East Regional quota monitoring projects.
Facebook
TwitterData standardization is an important part of effective management. However, sometimes people have data that doesn't match. This dataset includes different ways that counties might get written by different people. It can be used as a lookup table when you need County to be your unique identifier. For example, it allows you to match St. Mary's, St Marys, and Saint Mary's so that you can use it with disparate data from other data sets.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset covers competitive gaming matches from Nov '23 to 12/01/2024. Player statuses are collected 90 days prior to each match, specifically from the map where the match occurred. It includes team rankings and team map winrates.
Snapshot of gaming from Nov '23 to 12/01/2024, with player statuses reflecting the 90 days before each match.
Player stats linked to the map of each match.
Team Map Winrate: Insight into team proficiency on specific maps.
Track how player statuses were over the 90 days before the match.
Evaluate team dominance, dynamics, and trends.
Provide valuable insights for betting enthusiasts by leveraging historical performance data for informed decisions.
Disclaimer: Efforts have been made for accuracy, but gaming variability may impact findings. Cross-reference with additional data for thorough analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and Code to accompany the paper "Correlation Neglect in Student-to-School Matching."Abstract: We present results from three experiments containing incentivized school-choice scenarios. In these scenarios, we vary whether schools' assessments of students are based on a common priority (inducing correlation in admissions decisions) or are based on independent assessments (eliminating correlation in admissions decisions). The quality of students' application strategies declines in the presence of correlated admissions: application strategies become substantially more aggressive and fail to include attractive ``safety'' options. We provide a battery of tests suggesting that this phenomenon is at least partially driven by correlation neglect, and we discuss implications for the design and deployment of student-to-school matching mechanisms.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Advanced Resume Parser & Job Matcher Resumes
This dataset contains a merged collection of real and synthetic resume data in JSON format. The resumes have been normalized to a common schema to facilitate the development of NLP models for candidate-job matching in the technical recruitment domain.
Dataset Details
Dataset Description
This dataset is a combined collection of real resumes and synthetically generated CVs.
Curated by: datasetmaster… See the full description on the dataset page: https://huggingface.co/datasets/datasetmaster/resumes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Researchers are often interested in linking individuals between two datasets that lack a common unique identifier. Matching procedures often struggle to match records with common names, birthplaces, or other field values. Computational feasibility is also a challenge, particularly when linking large datasets. We develop a Bayesian method for automated probabilistic record linkage and show it recovers more than 50% more true matches, holding accuracy constant, than comparable methods in a matching of military recruitment data to the 1900 U.S. Census for which expert-labeled matches are available. Our approach, which builds on a recent state-of-the-art Bayesian method, refines the modeling of comparison data, allowing disagreement probability parameters conditional on nonmatch status to be record-specific in the smaller of the two datasets. This flexibility significantly improves matching when many records share common field values. We show that our method is computationally feasible in practice, despite the added complexity, with an R/C++ implementation that achieves a significant improvement in speed over comparable recent methods. We also suggest a lightweight method for treatment of very common names and show how to estimate true positive rate and positive predictive value when true match status is unavailable.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The goal of the Patient Matching Algorithm Challenge is to bring about greater transparency and data on the performance of existing patient matching algorithms, spur the adoption of performance metrics for patient data matching algorithm vendors, and positively impact other aspects of patient matching such as deduplication and linking to clinical data. Participants will be provided a data set and will have their answers evaluated and scored against a master key. Up to 6 cash prizes will be awarded with a total purse of up to $75,000.00.https://www.patientmatchingchallenge.com/The test dataset used in the ONC Patient Matching Algorithm Challenge is available for download by students, researchers, or anyone else interested in additional analysis and patient matching algorithm development. More information about the Patient Matching Algorithm Challenge can be found: https://www.patientmatchingchallenge.com/.The dataset containing 1 million patients was split into eight files of alphabetical groupings by the the patient's last name, plus an additional file containing test patients with no last name recorded (Null). All files should be downloaded and merged for analysis.https://github.com/onc-healthit/patient-matching
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Context
In today’s competitive job market, companies receive numerous applications for each job posting, making it challenging to efficiently screen and shortlist candidates. This dataset is designed to facilitate research and development in resume screening, job matching, and recruitment analytics. It can be used to build machine learning models for applicant-job matching, automate resume parsing, and analyze hiring trends.
Dataset Overview
This dataset contains applicant details, resumes, job descriptions, and matching labels to assess how well a candidate fits a specific job role. It can be used to explore factors affecting job selection, identify biases in hiring, and improve applicant tracking systems.
Data Sources & Collection
The dataset was compiled from synthetic and publicly available job application data. It is structured to resemble real-world hiring scenarios, making it useful for data science and HR analytics projects. The resumes and job descriptions are either anonymized, synthesized, or derived from publicly accessible recruitment data.
Columns Description
Job Applicant Name – Full name of the applicant. Age – Applicant’s age. Gender – Applicant’s gender identity. Race – Racial background of the applicant. Ethnicity – Ethnic identity of the applicant. Resume – Text content of the applicant’s resume, including skills, experience, and education. Job Roles – The job positions for which the applicant applied. Job Description – A detailed description of the job role, including required skills, responsibilities, and qualifications. Best Match – A label or score indicating how well the applicant matches the job role based on qualifications and experience.
Inspiration & Use Cases
This dataset is useful for: âś… Building AI-powered resume-screening models to automate candidate selection. âś… Developing job recommendation systems that suggest the best roles for applicants. âś… Analyzing hiring trends & biases in recruitment based on age, gender, or ethnicity. âś… Training NLP models for resume parsing and job description understanding.
Potential Applications
AI-based Applicant Tracking Systems (ATS) HR Analytics & Hiring Bias Studies Resume-Job Matching Algorithms Data-Driven Career Counseling
🚀 We encourage data scientists, recruiters, and HR tech enthusiasts to explore this dataset and build innovative solutions!
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Eslam_T_F
Released under MIT
Facebook
TwitterAs required by federal law, state SNAP agencies verify financial and non-financial information by matching SNAP applicant and participant information to various national and state data sources to ensure they meet the program’s eligibility criteria. Data matching is an important tool for ensuring program integrity and benefit accuracy. However, information on states’ data matching practices and protocols is limited. This study was undertaken to address this knowledge gap.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Many e-shops have started to mark-up product data within their HTML pages using the schema.org vocabulary. The Web Data Commons project regularly extracts such data from the Common Crawl, a large public web crawl. The Web Data Commons Training and Test Sets for Large-Scale Product Matching contain product offers from different e-shops in the form of binary product pairs (with corresponding label “match” or “no match”) for four product categories, computers, cameras, watches and shoes. In order to support the evaluation of machine learning-based matching methods, the data is split into training, validation and test sets. For each product category, we provide training sets in four different sizes (2.000-70.000 pairs). Furthermore there are sets of ids for each training set for a possible validation split (stratified random draw) available. The test set for each product category consists of 1.100 product pairs. The labels of the test sets were manually checked while those of the training sets were derived using shared product identifiers from the Web weak supervision. The data stems from the WDC Product Data Corpus for Large-Scale Product Matching - Version 2.0 which consists of 26 million product offers originating from 79 thousand websites. For more information and download links for the corpus itself, please follow the links below.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains the materials needed to replicate the results presented in Mozer et al. (2019), "Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality", forthcoming in Political Analysis.
Facebook
Twitterhttps://www.apache.org/licenses/LICENSE-2.0.htmlhttps://www.apache.org/licenses/LICENSE-2.0.html
This is a test dataset for geospatial data matching, including simulated urban and rural roads.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Fuzzy string matching remains a key issue when political scientists combine data from different sources. Existing matching methods invariably rely on string distances, such as Levenshtein distance and cosine similarity. As such, they are inherently incapable of matching strings that refer to the same entity with different names such as ''JP Morgan'' and ''Chase Bank'', ''DPRK'' and ''North Korea'', ''Chuck Fleischmann (R)'' and ''Charles Fleischmann (R)''. In this letter, we propose to use large language models to entirely sidestep this problem in an easy and intuitive manner. Extensive experiments show that our proposed methods can improve the state of the art by as much as 39% in terms of average precision while being substantially easier and more intuitive to use by political scientists. Moreover, our results are robust against various temperatures. We further note that enhanced prompting can lead to additional performance improvements.
Facebook
TwitterThis dataset contains the predicted prices of the asset RESUME-MATCHER github.com/srbhr/RESUME-MATCHER over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The FRC Match Dataset is based on the FRST Robotics Competition (FRC) competition records from 2018 to 2025, and is a robot competition match data that includes various information such as EPA (Expected Score Contribution), match win rate, team composition, and match results for each match.
2) Data Utilization (1) FRC Match Data has characteristics that: • Each row contains numerical and categorical variables such as year, event, playoff status, match stage, winning team, EPA-based probability of victory, team name and composition, and match results, which together provide team/match performance and forecasting indicators. (2) FRC Match Data can be used to: • Prediction and Assessment of Match Results: Using EPA and past match data, machine learning models can predict match wins and losses, and prediction models can be evaluated for reliability with indicators such as Brier score. • Team Strategy and Performance Analysis: By analyzing EPA, win rate, and matchup data for each team, you can use it to understand the strategic contribution, cooperation effects, seasonal trends, and strong and weak team characteristics.
Facebook
TwitterGrad-match: Gradient matching based data subset selection for efficient deep model training.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset of 61867 matches from OverTrack.gg was loaded into memory. Most matches represent a team of 5 vs. 5, with no possibility for ties. Since tracking matches is volutary, a not insignificant percentage of matches (7.82%) are played against a team of unrated/unknown players. The average number of matchs per unique player is about 2.338, however a decent number of players were tracked for over a hundred matches.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Match reported 2.5K in Employees for its fiscal year ending in December of 2024. Data for Match | MTCH - Employees Total Number including historical, tables and charts were last updated by Trading Economics this last December in 2025.
Facebook
TwitterThese data were gathered in order to evaluate the implications of rational choice theory for offender rehabilitation. The hypothesis of the research was that income-enhancing prison rehabilitation programs are most effective for the economically motivated offender. The offender was characterized by demographic and socio-economic characteristics, criminal history and behavior, and work activities during incarceration. Information was also collected on type of release and post-release recidivistic and labor market measures. Recividism was measured by arrests, convictions, and reincarcerations, length of time until first arrest after release, and seriousness of offense leading to reincarceration.
Facebook
TwitterThe DMIS dataset is a flat file record of the matching of several data set collections. Primarily it consists of VTRs, dealer records, Observer data in conjunction with vessel permit information for the purpose of supporting North East Regional quota monitoring projects.