Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Photographic capture–recapture is a valuable tool for obtaining demographic information on wildlife populations due to its noninvasive nature and cost-effectiveness. Recently, several computer-aided photo-matching algorithms have been developed to more efficiently match images of unique individuals in databases with thousands of images. However, the identification accuracy of these algorithms can severely bias estimates of vital rates and population size. Therefore, it is important to understand the performance and limitations of state-of-the-art photo-matching algorithms prior to implementation in capture–recapture studies involving possibly thousands of images. Here, we compared the performance of four photo-matching algorithms; Wild-ID, I3S Pattern+, APHIS, and AmphIdent using multiple amphibian databases of varying image quality. We measured the performance of each algorithm and evaluated the performance in relation to database size and the number of matching images in the database. We found that algorithm performance differed greatly by algorithm and image database, with recognition rates ranging from 100% to 22.6% when limiting the review to the 10 highest ranking images. We found that recognition rate degraded marginally with increased database size and could be improved considerably with a higher number of matching images in the database. In our study, the pixel-based algorithm of AmphIdent exhibited superior recognition rates compared to the other approaches. We recommend carefully evaluating algorithm performance prior to using it to match a complete database. By choosing a suitable matching algorithm, databases of sizes that are unfeasible to match “by eye” can be easily translated to accurate individual capture histories necessary for robust demographic estimates.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With recent technological advancements, quantitative analysis has become an increasingly important area within professional sports. However, the manual process of collecting data on relevant match events like passes, goals and tacklings comes with considerable costs and limited consistency across providers, affecting both research and practice. In football, while automatic detection of events from positional data of the players and the ball could alleviate these issues, it is not entirely clear what accuracy current state-of-the-art methods realistically achieve because there is a lack of high-quality validations on realistic and diverse data sets. This paper adds context to existing research by validating a two-step rule-based pass and shot detection algorithm on four different data sets using a comprehensive validation routine that accounts for the temporal, hierarchical and imbalanced nature of the task. Our evaluation shows that pass and shot detection performance is highly dependent on the specifics of the data set. In accordance with previous studies, we achieve F-scores of up to 0.92 for passes, but only when there is an inherent dependency between event and positional data. We find a significantly lower accuracy with F-scores of 0.71 for passes and 0.65 for shots if event and positional data are independent. This result, together with a critical evaluation of existing methodologies, suggests that the accuracy of current football event detection algorithms operating on positional data is currently overestimated. Further analysis reveals that the temporal extraction of passes and shots from positional data poses the main challenge for rule-based approaches. Our results further indicate that the classification of plays into shots and passes is a relatively straightforward task, achieving F-scores between 0.83 to 0.91 ro rule-based classifiers and up to 0.95 for machine learning classifiers. We show that there exist simple classifiers that accurately differentiate shots from passes in different data sets using a low number of human-understandable rules. Operating on basic spatial features, our classifiers provide a simple, objective event definition that can be used as a foundation for more reliable event-based match analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.
IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.
IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform
The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.
Due to the changes in our systems, some tables have been affected.
Data quality has been improved across all tables.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Lightning Talk at the International Digital Curation Conference 2025. The presentation examines OpenAIRE's solution to the “entity disambiguation” problem, presenting a hybrid data curation method that combines deduplication algorithms with the expertise of human curators to ensure high-quality, interoperable scholarly information. Entity disambiguation is invaluable to building a robust and interconnected open scholarly communication system. It involves accurately identifying and differentiating entities such as authors, organisations, data sources and research results across various entity providers. This task is particularly complex in contexts like the OpenAIRE Graph, where metadata is collected from over 100,000 data sources. Different metadata describing the same entity can be collected multiple times, potentially providing different information, such as different Persistent Identifiers (PIDs) or names, for the same entity. This heterogeneity poses several challenges to the disambiguation process. For example, the same organisation may be referenced using different names in different languages, or abbreviations. In some cases, even the use of PIDs might not be effective, as different identifiers may be assigned by different data providers. Therefore, accurate entity disambiguation is essential for ensuring data quality, improving search and discovery, facilitating knowledge graph construction, and supporting reliable research impact assessment. To address this challenge, OpenAIRE employs a deduplication algorithm to identify and merge duplicate entities, configured to handle different entity types. While the algorithm proves effective for research results, when applied to organisations and data sources, it needs to be complemented with human curation and validation since additional information may be needed. OpenAIRE's data source disambiguation relies primarily on the OpenAIRE technical team overseeing the deduplication process and ensuring accurate matches across DRIS, FAIRSharing, re3data, and OpenDOAR registries. While the algorithm automates much of the process, human experts verify matches, address discrepancies and actively search for matches not proposed by the algorithm. External stakeholders, such as data source managers, can also contribute by submitting suggestions through a dedicated ticketing system. So far OpenAIRE curated almost 3 935 groups for a total of 8 140 data sources. To address organisational disambiguation, OpenAIRE developed OpenOrgs, a hybrid system combining automated processes and human expertise. The tool works on organisational data aggregated from multiple sources (ROR registry, funders databases, CRIS systems, and others) by the OpenAIRE infrastructure, automatically compares metadata, and suggests potential merged entities to human curators. These curators, authorised experts in their respective research landscapes, validate merged entities, identify additional duplicates, and enrich organisational records with missing information such as PIDs, alternative names, and hierarchical relationships. With over 100 curators from 40 countries, OpenOrgs has curated more than 100,000 organisations to date. A dataset containing all the OpenOrgs organizations can be found on Zenodo (https://doi.org/10.5281/zenodo.13271358). This presentation demonstrates how OpenAIRE's entity disambiguation techniques and OpenOrgs aim to be game-changers for the research community by building and maintaining an integrated open scholarly communication system in the years to come.
Facebook
TwitterThis is a layer of water service boundaries for 44,919 community water systems that deliver tap water to 306.88 million people in the US. This amounts to 97.22% of the population reportedly served by active community water systems and 90.85% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When a water system and TIGER place match one-to-one, we label this Tier 2a. When multiple water systems match to the same TIGER place, we label this Tier 2b. Tier 2b reflects overlapping boundaries for multiple systems. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a or Tier 2b), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3).
Several limitations to this data exist–and the layer should be used with these in mind. First, the case of assigning a Census Place TIGER polygon to multiple systems results in an inaccurate assignment of the same exact area to multiple systems; we hope to resolve Tier 2b systems into Tier 2a or Tier 3 in a future iteration. Second, matching algorithms to assign Census Place boundaries require additional validation and iteration. Third, Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Fourth, missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.
All data, methods, documentation, and contributions are open-source and available here: https://github.com/SimpleLab-Inc/wsb.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Generating top-down tandem mass spectra (MS/MS) from complex mixtures of proteoforms benefits from improvements in fractionation, separation, fragmentation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and match-counting approaches producing high-quality proteoform-spectrum matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification (ProSight PD, TopPIC, MSPathFinderT, and pTop) in their yield of PrSMs while controlling false discovery rate. We evaluated deconvolution engines (ThermoFisher Xtract, Bruker AutoMSn, Matrix Science Mascot Distiller, TopFD, and FLASHDeconv) in both ThermoFisher Orbitrap-class and Bruker maXis Q-TOF data (PXD033208) to produce consistent precursor charges and mass determinations. Finally, we sought post-translational modifications (PTMs) in proteoforms from bovine milk (PXD031744) and human ovarian tissue. Contemporary identification workflows produce excellent PrSM yields, although approximately half of all identified proteoforms from these four pipelines were specific to only one workflow. Deconvolution algorithms disagree on precursor masses and charges, contributing to identification variability. Detection of PTMs is inconsistent among algorithms. In bovine milk, 18% of PrSMs produced by pTop and TopMG were singly phosphorylated, but this percentage fell to 1% for one algorithm. Applying multiple search engines produces more comprehensive assessments of experiments. Top-down algorithms would benefit from greater interoperability.
Facebook
Twitter
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a software tool, called cMatch, to reconstruct and identify synthetic genetic constructs from their sequences, or a set of sub-sequences—based on two practical pieces of information: their modular structure, and libraries of components. Although developed for combinatorial pathway engineering problems and addressing their quality control (QC) bottleneck, cMatch is not restricted to these applications. QC takes place post assembly, transformation and growth. It has a simple goal, to verify that the genetic material contained in a cell matches what was intended to be built - and when it is not the case, to locate the discrepancies and estimate their severity. In terms of reproducibility/reliability, the QC step is crucial. Failure at this step requires repetition of the construction and/or sequencing steps. When performed manually or semi-manually QC is an extremely time-consuming, error prone process, which scales very poorly with the number of constructs and their complexity. To make QC frictionless and more reliable, cMatch performs an operation we have called “construct-matching” and automates it. Construct-matching is more thorough than simple sequence-matching, as it matches at the functional level-and quantifies the matching at the individual component level and across the whole construct. Two algorithms (called CM_1 and CM_2) are presented. They differ according to the nature of their inputs. CM_1 is the core algorithm for construct-matching and is to be used when input sequences are long enough to cover constructs in their entirety (e.g., obtained with methods such as next generation sequencing). CM_2 is an extension designed to deal with shorter data (e.g., obtained with Sanger sequencing), and that need recombining. Both algorithms are shown to yield accurate construct-matching in a few minutes (even on hardware with limited processing power), together with a set of metrics that can be used to improve the robustness of the decision-making process. To ensure reliability and reproducibility, cMatch builds on the highly validated pairwise-matching Smith-Waterman algorithm. All the tests presented have been conducted on synthetic data for challenging, yet realistic constructs - and on real data gathered during studies on a metabolic engineering example (lycopene production).
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
As per our latest research, the global Geospatial Address Matching AI market size stands at USD 1.56 billion in 2024, with a robust compound annual growth rate (CAGR) of 17.2% expected from 2025 to 2033. By 2033, the market is forecasted to reach USD 6.12 billion, driven by the increasing adoption of AI-powered geospatial solutions across various industries. The primary growth factor fueling this surge is the rising demand for precise location intelligence and address validation to optimize logistics, enhance urban planning, and support the digital transformation initiatives of both public and private sectors globally.
One of the most significant growth drivers for the Geospatial Address Matching AI market is the exponential increase in the volume of location-based data generated from mobile devices, IoT sensors, and smart infrastructure. Organizations across logistics, e-commerce, and utilities are increasingly leveraging AI-driven address matching solutions to ensure data accuracy, reduce delivery failures, and optimize route planning. The proliferation of smart cities and the integration of geospatial analytics into urban planning and emergency response systems have further amplified the demand for advanced address matching technologies. These solutions enable real-time decision-making, efficient resource allocation, and improved citizen services, making them indispensable in modern urban ecosystems.
Another critical factor propelling market growth is the shift towards digital transformation across industries, particularly in sectors like transportation, BFSI, and government. As organizations strive to enhance operational efficiency and customer experience, accurate geospatial data becomes a cornerstone for strategic decision-making. AI-powered address matching not only automates data cleansing and validation but also supports compliance with regulatory requirements related to data privacy and location accuracy. Furthermore, the integration of AI with cloud-based geospatial platforms has democratized access to sophisticated address matching tools, enabling small and medium enterprises (SMEs) to harness the benefits previously reserved for large enterprises. This democratization is expected to unlock new growth opportunities and drive widespread adoption across diverse industry verticals.
The regional outlook for the Geospatial Address Matching AI market remains highly promising, with North America and Europe leading the charge due to their advanced technology infrastructure and early adoption of AI solutions. Asia Pacific is emerging as a key growth region, fueled by rapid urbanization, government-led smart city initiatives, and the expansion of e-commerce and logistics networks. Latin America and the Middle East & Africa are also witnessing steady growth, supported by investments in digital infrastructure and increasing awareness of the benefits of geospatial intelligence. The global landscape is characterized by a dynamic interplay of technological advancements, regulatory developments, and evolving end-user needs, which collectively shape the trajectory of the market.
The Component segment of the Geospatial Address Matching AI market is broadly classified into Software, Hardware, and Services. Software solutions dominate the market, accounting for the largest share due to their role in enabling advanced address parsing, standardization, validation, and geocoding. These software platforms leverage machine learning algorithms and natural language processing to accurately interpret, match, and enrich address data from disparate sources. The growing preference for cloud-based software-as-a-service (SaaS) models is further accelerating market growth, as organizations seek scalable, flexible, and cost-effective solutions to manage their geospatial data needs. Continuous updates and integration capabilities with other enterprise systems make software offerings indispensable for businesses aiming to enhance data quality and operational efficiency.
Hardware forms an essential backbone for the deployment of geospatial AI solutions, particularly in environments requiring high-performance computing and real-time data processing. Specialized hardware, such as servers, storage devices, and edge computing units, facilitate the rapid execution of AI algorithms and support large-scale address matching operations. While hardware constitutes a smaller share of the overall market
Facebook
Twitter
The global Skill-Based Job Matching AI market size stood at USD 2.1 billion in 2024, according to our latest research, and is forecasted to reach USD 12.4 billion by 2033, growing at a robust CAGR of 21.8% during the forecast period. This remarkable growth is primarily driven by the increasing demand for data-driven recruitment solutions, the rapid digital transformation across industries, and the growing adoption of artificial intelligence in human resource management. As organizations worldwide strive for efficient talent acquisition and workforce optimization, skill-based job matching AI solutions are becoming integral to modern HR ecosystems.
One of the primary growth factors for the Skill-Based Job Matching AI market is the accelerating shift towards digital transformation in human resources. Organizations are increasingly leveraging AI-powered platforms to automate and enhance their recruitment processes, reducing the time-to-hire and improving the quality of candidate-job matches. The proliferation of digital job applications and the exponential increase in candidate data have made manual screening inefficient and error-prone. AI-driven skill-matching algorithms analyze vast datasets, including resumes, job descriptions, and candidate profiles, to identify the best-fit candidates based on skills, experience, and organizational culture fit. This not only streamlines the recruitment process but also significantly reduces hiring biases, leading to more diverse and competent workforces.
Another significant driver is the growing emphasis on talent management and career development within organizations. As the competition for skilled professionals intensifies, companies are adopting skill-based job matching AI to facilitate internal mobility, upskilling, and reskilling initiatives. AI-powered platforms help HR teams identify skill gaps, recommend personalized learning paths, and match employees to new roles or projects that align with their competencies and career aspirations. This proactive approach to talent management enhances employee engagement, retention, and productivity, which are critical success factors in todayÂ’s dynamic business environment. Furthermore, the integration of AI in workforce planning enables organizations to anticipate future skill requirements and strategically align their talent pipelines.
The expansion of the Skill-Based Job Matching AI market is further fueled by the adoption of cloud-based solutions and the increasing penetration of AI technologies across small and medium enterprises (SMEs). Cloud deployment offers scalability, flexibility, and cost-effectiveness, making AI-powered job matching accessible to organizations of all sizes. Additionally, the growing availability of AI-based career guidance tools is empowering job seekers, students, and professionals to make informed career decisions based on their unique skill sets and market demands. Collectively, these trends are driving widespread adoption of skill-based job matching AI solutions across diverse industry verticals, including BFSI, IT and telecommunications, healthcare, retail, manufacturing, and education.
From a regional perspective, North America remains the largest market for skill-based job matching AI, owing to its advanced technological infrastructure, high adoption of AI in HR practices, and presence of leading industry players. However, the Asia Pacific region is witnessing the fastest growth, driven by a burgeoning workforce, rapid digitalization, and increased investments in AI-powered HR technologies across emerging economies such as China, India, and Southeast Asia. Europe is also experiencing steady growth, supported by robust regulatory frameworks and a strong focus on workforce diversity and inclusion. Overall, the global landscape for skill-based job matching AI is characterized by rapid innovation, expanding application areas, and increasing cross-industry adoption.
In addition to skill-based job matching, Product Matching AI is gaining traction as a transformative technology across various industries. This AI-driven approach leverages advanced algorithms to analyze product attributes, consumer preferences, and market trends, enabling businesses to optimize their product offerings and enhance customer satisfaction. By automating the process o
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This comprehensive synthetic dataset contains 1,369 rows and 10 columns specifically designed for predictive modeling in sports betting analytics. The dataset provides a rich foundation for machine learning applications in the sports betting domain, featuring realistic match data across multiple sports with comprehensive betting odds, team information, and outcome predictions.
| Attribute | Details |
|---|---|
| Dataset Name | Sports Betting Predictive Analysis Dataset |
| File Format | CSV (Comma Separated Values) |
| Total Records | 1,369 matches |
| Total Columns | 10 |
| Date Range | July 2023 - July 2025 (2-year span) |
| Sports Covered | Football, Basketball, Tennis, Baseball, Hockey |
| Primary Use Case | Machine Learning for sports betting predictions |
| Data Type | Synthetic (generated using Faker library) |
| Missing Values | Strategic null values (~5% in odds columns) |
| Target Variables | Predicted_Winner, Actual_Winner |
| Key Features | Betting odds, team names, match outcomes |
| Data Quality | Realistic betting odds ranges (1.2 - 5.0) |
| Temporal Distribution | Evenly distributed across 2-year timeframe |
| Geographic Scope | City-based team naming convention |
| Validation Ready | Includes both predictions and actual outcomes |
Facebook
TwitterThe CoastColour project Round Robin (CCRR) project (http://www.coastcolour.org) funded by the European Space Agency (ESA) was designed to bring together a variety of reference datasets and to use these to test algorithms and assess their accuracy for retrieving water quality parameters. This information was then developed to help end-users of remote sensing products to select the most accurate algorithms for their coastal region. To facilitate this, an inter-comparison of the performance of algorithms for the retrieval of in-water properties over coastal waters was carried out. The comparison used three types of datasets on which ocean colour algorithms were tested. The description and comparison of the three datasets are the focus of this paper, and include the Medium Resolution Imaging Spectrometer (MERIS) Level 2 match-ups, in situ reflectance measurements and data generated by a radiative transfer model (HydroLight). The datasets mainly consisted of 6,484 marine reflectance associated with various geometrical (sensor viewing and solar angles) and sky conditions and water constituents: Total Suspended Matter (TSM) and Chlorophyll-a (CHL) concentrations, and the absorption of Coloured Dissolved Organic Matter (CDOM). Inherent optical properties were also provided in the simulated datasets (5,000 simulations) and from 3,054 match-up locations. The distributions of reflectance at selected MERIS bands and band ratios, CHL and TSM as a function of reflectance, from the three datasets are compared. Match-up and in situ sites where deviations occur are identified. The distribution of the three reflectance datasets are also compared to the simulated and in situ reflectances used previously by the International Ocean Colour Coordinating Group (IOCCG, 2006) for algorithm testing, showing a clear extension of the CCRR data which covers more turbid waters.
Facebook
TwitterWhen generating the data set, the frequency of the initial tags was decreasing linearly as a function of the level depth in the exact hierarchy. We show the same quality measures as in Table 1.: the ratio of exactly matching links, , the ratio of acceptable links, , the ratio of inverted links, , the ratio of unrelated links, , the ratio of missing links, , the normalized mutual information between the exact- and the reconstructed hierarchies, , and the linearized mutual information, . The different rows correspond to results obtained from algorithm A, (1 row), algorithm B, (2 row), the method by P. Heymann & H. Garcia-Molina (3 row), and the algorithm by P. Schmitz (4 row).
Facebook
Twitter
According to our latest research, the global patient identity matching software market size in 2024 stands at USD 1.48 billion, reflecting a robust and expanding sector. The market is projected to register a strong CAGR of 14.2% over the forecast period, reaching a value of approximately USD 4.13 billion by 2033. This accelerated growth is being driven by the increasing demand for accurate patient identification, the digitization of healthcare records, and the rising emphasis on interoperability across healthcare systems. As per the latest research, the adoption of patient identity matching software is becoming a critical factor in reducing medical errors, streamlining administrative processes, and enhancing overall healthcare quality worldwide.
One of the primary growth factors propelling the patient identity matching software market is the global surge in electronic health record (EHR) adoption. As healthcare providers transition from paper-based to digital records, the need for precise patient identification has become more pronounced. Errors in patient identification can lead to significant clinical complications, including misdiagnosis, redundant testing, and even adverse events. The integration of advanced patient identity matching software mitigates these risks by ensuring that every piece of patient data is correctly attributed, thereby improving patient safety and operational efficiency. Furthermore, regulatory mandates such as the Health Information Technology for Economic and Clinical Health (HITECH) Act and similar initiatives in Europe and Asia Pacific are compelling healthcare organizations to adopt robust identity management solutions, thereby fueling market growth.
Another significant driver is the rise in healthcare data breaches and fraud, which has underscored the importance of secure and reliable patient identification systems. With healthcare data becoming a prime target for cybercriminals, organizations are increasingly investing in advanced patient identity matching software that incorporates biometric verification, AI-driven algorithms, and blockchain technology. These innovations not only enhance the accuracy of patient matching but also strengthen data security and privacy. Additionally, the growing trend of healthcare consumerism, where patients demand seamless access to their health information across multiple platforms, is further boosting the adoption of interoperable identity matching solutions. This shift is prompting vendors to develop more user-friendly and scalable software that can be easily integrated with existing healthcare IT infrastructures.
The proliferation of value-based care models and the expansion of telehealth services are also contributing to the growth of the patient identity matching software market. As healthcare delivery becomes more decentralized, with patients seeking care from multiple providers and through various digital channels, the risk of data fragmentation and duplication increases. Patient identity matching software plays a pivotal role in aggregating and reconciling patient data from disparate sources, ensuring a unified and accurate health record. This capability is particularly vital for population health management, care coordination, and analytics initiatives, all of which rely on high-quality, longitudinal patient data. Consequently, healthcare organizations are prioritizing investments in identity management technologies to support their digital transformation strategies and improve care outcomes.
Regionally, North America continues to dominate the patient identity matching software market, accounting for the largest share in 2024, followed by Europe and the Asia Pacific. The strong presence of leading healthcare IT vendors, favorable government policies, and high healthcare expenditure in the United States and Canada are key factors driving market growth in this region. Meanwhile, Europe is witnessing steady adoption due to the increasing focus on cross-border health information exchange and regulatory compliance. The Asia Pacific region, on the other hand, is emerging as a lucrative market, supported by rapid healthcare digitization, expanding hospital infrastructure, and growing awareness about the benefits of accurate patient identification. Latin America and the Middle East & Africa are also experiencing gradual market penetration, primarily driven by modernization of healthcare systems and rising investments in health IT.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
We are publishing a walking activity dataset including inertial and positioning information from 19 volunteers, including reference distance measured using a trundle wheel. The dataset includes a total of 96.7 Km walked by the volunteers, split into 203 separate tracks. The trundle wheel is of two types: it is either an analogue trundle wheel, which provides the total amount of meters walked in a single track, or it is a sensorized trundle wheel, which measures every revolution of the wheel, therefore recording a continuous incremental distance.
Each track has data from the accelerometer and gyroscope embedded in the phones, location information from the Global Navigation Satellite System (GNSS), and the step count obtained by the device. The dataset can be used to implement walking distance estimation algorithms and to explore data quality in the context of walking activity and physical capacity tests, fitness, and pedestrian navigation.
Methods
The proposed dataset is a collection of walks where participants used their own smartphones to capture inertial and positioning information. The participants involved in the data collection come from two sites. The first site is the Oxford University Hospitals NHS Foundation Trust, United Kingdom, where 10 participants (7 affected by cardiovascular diseases and 3 healthy individuals) performed unsupervised 6MWTs in an outdoor environment of their choice (ethical approval obtained by the UK National Health Service Health Research Authority protocol reference numbers: 17/WM/0355). All participants involved provided informed consent. The second site is at Malm ̈o University, in Sweden, where a group of 9 healthy researchers collected data. This dataset can be used by researchers to develop distance estimation algorithms and how data quality impacts the estimation.
All walks were performed by holding a smartphone in one hand, with an app collecting inertial data, the GNSS signal, and the step counting. On the other free hand, participants held a trundle wheel to obtain the ground truth distance. Two different trundle wheels were used: an analogue trundle wheel that allowed the registration of a total single value of walked distance, and a sensorized trundle wheel which collected timestamps and distance at every 1-meter revolution, resulting in continuous incremental distance information. The latter configuration is innovative and allows the use of temporal windows of the IMU data as input to machine learning algorithms to estimate walked distance. In the case of data collected by researchers, if the walks were done simultaneously and at a close distance from each other, only one person used the trundle wheel, and the reference distance was associated with all walks that were collected at the same time.The walked paths are of variable length, duration, and shape. Participants were instructed to walk paths of increasing curvature, from straight to rounded. Irregular paths are particularly useful in determining limitations in the accuracy of walked distance algorithms. Two smartphone applications were developed for collecting the information of interest from the participants' devices, both available for Android and iOS operating systems. The first is a web-application that retrieves inertial data (acceleration, rotation rate, orientation) while connecting to the sensorized trundle wheel to record incremental reference distance [1]. The second app is the Timed Walk app [2], which guides the user in performing a walking test by signalling when to start and when to stop the walk while collecting both inertial and positioning data. All participants in the UK used the Timed Walk app.
The data collected during the walk is from the Inertial Measurement Unit (IMU) of the phone and, when available, the Global Navigation Satellite System (GNSS). In addition, the step count information is retrieved by the sensors embedded in each participant’s smartphone. With the dataset, we provide a descriptive table with the characteristics of each recording, including brand and model of the smartphone, duration, reference total distance, types of signals included and additionally scoring some relevant parameters related to the quality of the various signals. The path curvature is one of the most relevant parameters. Previous literature from our team, in fact, confirmed the negative impact of curved-shaped paths with the use of multiple distance estimation algorithms [3]. We visually inspected the walked paths and clustered them in three groups, a) straight path, i.e. no turns wider than 90 degrees, b) gently curved path, i.e. between one and five turns wider than 90 degrees, and c) curved path, i.e. more than five turns wider than 90 degrees. Other features relevant to the quality of collected signals are the total amount of time above a threshold (0.05s and 6s) where, respectively, inertial and GNSS data were missing due to technical issues or due to the app going in the background thus losing access to the sensors, sampling frequency of different data streams, average walking speed and the smartphone position. The start of each walk is set as 0 ms, thus not reporting time-related information. Walks locations collected in the UK are anonymized using the following approach: the first position is fixed to a central location of the city of Oxford (latitude: 51.7520, longitude: -1.2577) and all other positions are reassigned by applying a translation along the longitudinal and latitudinal axes which maintains the original distance and angle between samples. This way, the exact geographical location is lost, but the path shape and distances between samples are maintained. The difference between consecutive points “as the crow flies” and path curvature was numerically and visually inspected to obtain the same results as the original walks. Computations were made possible by using the Haversine Python library.
Multiple datasets are available regarding walking activity recognition among other daily living tasks. However, few studies are published with datasets that focus on the distance for both indoor and outdoor environments and that provide relevant ground truth information for it. Yan et al. [4] introduced an inertial walking dataset within indoor scenarios using a smartphone placed in 4 positions (on the leg, in a bag, in the hand, and on the body) by six healthy participants. The reference measurement used in this study is a Visual Odometry System embedded in a smartphone that has to be worn at the chest level, using a strap to hold it. While interesting and detailed, this dataset lacks GNSS data, which is likely to be used in outdoor scenarios, and the reference used for localization also suffers from accuracy issues, especially outdoors. Vezovcnik et al. [5] analysed estimation models for step length and provided an open-source dataset for a total of 22 km of only inertial walking data from 15 healthy adults. While relevant, their dataset focuses on steps rather than total distance and was acquired on a treadmill, which limits the validity in real-world scenarios. Kang et al. [6] proposed a way to estimate travelled distance by using an Android app that uses outdoor walking patterns to match them in indoor contexts for each participant. They collect data outdoors by including both inertial and positioning information and they use average values of speed obtained by the GPS data as reference labels. Afterwards, they use deep learning models to estimate walked distance obtaining high performances. Their results share that 3% to 11% of the data for each participant was discarded due to low quality. Unfortunately, the name of the used app is not reported and the paper does not mention if the dataset can be made available.
This dataset is heterogeneous under multiple aspects. It includes a majority of healthy participants, therefore, it is not possible to generalize the outcomes from this dataset to all walking styles or physical conditions. The dataset is heterogeneous also from a technical perspective, given the difference in devices, acquired data, and used smartphone apps (i.e. some tests lack IMU or GNSS, sampling frequency in iPhone was particularly low). We suggest selecting the appropriate track based on desired characteristics to obtain reliable and consistent outcomes.
This dataset allows researchers to develop algorithms to compute walked distance and to explore data quality and reliability in the context of the walking activity. This dataset was initiated to investigate the digitalization of the 6MWT, however, the collected information can also be useful for other physical capacity tests that involve walking (distance- or duration-based), or for other purposes such as fitness, and pedestrian navigation.
The article related to this dataset will be published in the proceedings of the IEEE MetroXRAINE 2024 conference, held in St. Albans, UK, 21-23 October.
This research is partially funded by the Swedish Knowledge Foundation and the Internet of Things and People research center through the Synergy project Intelligent and Trustworthy IoT Systems.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With recent technological advancements, quantitative analysis has become an increasingly important area within professional sports. However, the manual process of collecting data on relevant match events like passes, goals and tacklings comes with considerable costs and limited consistency across providers, affecting both research and practice. In football, while automatic detection of events from positional data of the players and the ball could alleviate these issues, it is not entirely clear what accuracy current state-of-the-art methods realistically achieve because there is a lack of high-quality validations on realistic and diverse data sets. This paper adds context to existing research by validating a two-step rule-based pass and shot detection algorithm on four different data sets using a comprehensive validation routine that accounts for the temporal, hierarchical and imbalanced nature of the task. Our evaluation shows that pass and shot detection performance is highly dependent on the specifics of the data set. In accordance with previous studies, we achieve F-scores of up to 0.92 for passes, but only when there is an inherent dependency between event and positional data. We find a significantly lower accuracy with F-scores of 0.71 for passes and 0.65 for shots if event and positional data are independent. This result, together with a critical evaluation of existing methodologies, suggests that the accuracy of current football event detection algorithms operating on positional data is currently overestimated. Further analysis reveals that the temporal extraction of passes and shots from positional data poses the main challenge for rule-based approaches. Our results further indicate that the classification of plays into shots and passes is a relatively straightforward task, achieving F-scores between 0.83 to 0.91 ro rule-based classifiers and up to 0.95 for machine learning classifiers. We show that there exist simple classifiers that accurately differentiate shots from passes in different data sets using a low number of human-understandable rules. Operating on basic spatial features, our classifiers provide a simple, objective event definition that can be used as a foundation for more reliable event-based match analysis.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global patient identity matching software market size reached USD 1.54 billion in 2024, reflecting a robust growth trajectory fueled by widespread digital transformation across healthcare systems. The market is projected to achieve a CAGR of 13.2% from 2025 to 2033, culminating in a forecasted value of USD 4.27 billion by 2033. This growth is primarily driven by increasing concerns over patient safety, the rising incidence of medical identity errors, and the expanding adoption of electronic health records (EHRs) worldwide. As per our latest research, the market’s upward momentum is set to continue, underpinned by technological advancements and the urgent need for interoperability within healthcare ecosystems.
The primary growth factor for the patient identity matching software market is the global shift towards digitization in healthcare, especially the adoption of EHRs and other digital health platforms. As healthcare providers strive for seamless interoperability between disparate systems, the necessity for accurate patient identification becomes paramount. Incorrect or duplicate patient records can lead to serious medical errors, inefficiencies, and increased costs, making robust identity matching solutions indispensable. The integration of advanced technologies such as artificial intelligence and machine learning into patient identity matching software further enhances accuracy and efficiency, reducing manual intervention and mitigating risks associated with human error.
Another significant driver is the escalating regulatory pressure to ensure patient safety and data integrity. Regulatory bodies in major healthcare markets, including the United States and Europe, have established stringent guidelines and compliance frameworks that mandate accurate patient identification. Initiatives such as the Health Information Technology for Economic and Clinical Health (HITECH) Act and GDPR in Europe have set clear expectations for healthcare organizations to implement solutions that minimize identity-related errors. This regulatory landscape is compelling healthcare providers to invest in advanced patient identity matching software to avoid penalties, improve care quality, and foster trust with patients.
In addition, the rapid expansion of telemedicine and remote healthcare services is contributing to the market’s growth. The COVID-19 pandemic accelerated the adoption of virtual care models, which, in turn, highlighted the critical need for reliable patient identification across digital platforms. As healthcare delivery becomes increasingly decentralized, with patients accessing services from multiple locations and providers, the risk of misidentification grows. Patient identity matching software serves as a foundational technology to support secure, accurate, and efficient data exchange, thereby enabling continuity of care and enhancing patient outcomes.
Regionally, North America currently dominates the patient identity matching software market, accounting for over 42% of the global revenue in 2024. This dominance is attributed to the region’s advanced healthcare IT infrastructure, high EHR adoption rates, and supportive regulatory environment. Europe follows closely, driven by significant investments in healthcare digitization and interoperability initiatives. The Asia Pacific region is anticipated to witness the fastest growth during the forecast period, propelled by increasing healthcare spending, government-led digital health initiatives, and a burgeoning population demanding improved healthcare services. Latin America and the Middle East & Africa are also emerging as promising markets, albeit at a relatively nascent stage, as healthcare modernization efforts gain momentum.
The component segment of the patient identity matching software market is bifurcated into software and services, each playing a pivotal role in the overall ecosystem. The software component encompasses core patient identity matching platforms that leverage sophisticated algorithms, data analytics, and machine learning to facilitate accurate patient identification. These solutions are designed to integrate seamlessly with EHRs, laboratory information systems, and other healthcare IT platforms, ensuring data integrity and interoperability. As healthcare organizatio
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Matchmaking Optimization AI market size reached USD 1.9 billion in 2024, reflecting a robust demand for advanced AI-driven solutions across diverse industries. The market is projected to grow at a CAGR of 21.8% from 2025 to 2033, with the market value expected to reach USD 13.1 billion by 2033. This exceptional growth is primarily attributed to the increasing adoption of artificial intelligence in matchmaking processes across online dating, gaming, recruitment, and e-commerce sectors, as businesses strive to enhance user experiences and operational efficiencies.
One of the central growth factors propelling the Matchmaking Optimization AI market is the surging demand for personalized user experiences. As digital platforms proliferate, users expect tailored recommendations—be it in dating, gaming, or shopping. AI-enabled matchmaking leverages machine learning algorithms and big data analytics to analyze user behavior, preferences, and historical data, thereby delivering hyper-personalized matches. This not only enhances user satisfaction and engagement but also drives higher conversion rates for businesses. The proliferation of mobile applications and the growing reliance on digital interaction have further accelerated the need for intelligent matchmaking solutions, making AI a cornerstone in the evolution of customer-centric platforms.
Another significant driver is the integration of Matchmaking Optimization AI in recruitment and talent management processes. Enterprises are increasingly adopting AI-powered platforms to streamline candidate sourcing, screening, and matching, which reduces hiring time and improves the quality of hires. These systems utilize natural language processing and predictive analytics to match candidates with suitable roles based on skills, experience, and cultural fit. The ongoing digital transformation across HR departments, coupled with the need for efficient workforce management, is fueling the adoption of AI in recruitment, thereby expanding the application scope of matchmaking optimization technologies.
Additionally, the expansion of AI-driven matchmaking in sectors such as e-commerce and social networking is contributing to market growth. E-commerce platforms leverage AI to match users with products based on browsing history, purchase patterns, and real-time preferences, resulting in increased sales and customer loyalty. In social networking, AI algorithms facilitate meaningful connections by analyzing user interests, activities, and social graphs. These advancements are underpinned by continuous improvements in AI models, data processing capabilities, and cloud infrastructure, all of which are enabling scalable and efficient matchmaking solutions. The convergence of these factors is expected to sustain the upward trajectory of the Matchmaking Optimization AI market in the coming years.
From a regional perspective, North America currently dominates the Matchmaking Optimization AI market, driven by the presence of leading technology companies, high digital adoption rates, and significant investments in AI research and development. Europe follows closely, with strong demand from the online dating and recruitment sectors. Meanwhile, the Asia Pacific region is emerging as a high-growth market, propelled by expanding internet penetration, a burgeoning youth population, and increasing investments in digital infrastructure. The market landscape across Latin America and the Middle East & Africa is also evolving, with growing awareness of AI benefits and gradual adoption across various industries. This regional diversity underscores the global relevance and potential of matchmaking optimization AI technologies.
The Component segment of the Matchmaking Optimization AI market is primarily bifurcated into Software and Services. Software solutions form the backbone of AI-driven matchmaking, encompassing a range of platforms, algorithms, and analytical tools designed to deliver accurate and efficient matches. These software products are characterized by continuous innovation, with vendors focusing on enhancing algorithmic sophistication, scalability, and integration capabilities. The rapid evolution of machine learning, deep learning, and natural language processing technologies has enabled software providers to offer more nuanced and context-aware matchmaking functionalities, cat
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quantitative evaluation indexes of filtered results of airborne data.
Facebook
TwitterWe analyze the performance of evolutionary algorithms on various matroid optimization problems that encompass a vast number of efficiently solvable as well as NP-hard combinatorial optimization problems (including many well-known examples such as minimum spanning tree and maximum bipartite matching). We obtain very promising bounds on the expected running time and quality of the computed solution. Our results establish a better theoretical understanding of why randomized search heuristics yield empirically good results for many real-world optimization problems.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Photographic capture–recapture is a valuable tool for obtaining demographic information on wildlife populations due to its noninvasive nature and cost-effectiveness. Recently, several computer-aided photo-matching algorithms have been developed to more efficiently match images of unique individuals in databases with thousands of images. However, the identification accuracy of these algorithms can severely bias estimates of vital rates and population size. Therefore, it is important to understand the performance and limitations of state-of-the-art photo-matching algorithms prior to implementation in capture–recapture studies involving possibly thousands of images. Here, we compared the performance of four photo-matching algorithms; Wild-ID, I3S Pattern+, APHIS, and AmphIdent using multiple amphibian databases of varying image quality. We measured the performance of each algorithm and evaluated the performance in relation to database size and the number of matching images in the database. We found that algorithm performance differed greatly by algorithm and image database, with recognition rates ranging from 100% to 22.6% when limiting the review to the 10 highest ranking images. We found that recognition rate degraded marginally with increased database size and could be improved considerably with a higher number of matching images in the database. In our study, the pixel-based algorithm of AmphIdent exhibited superior recognition rates compared to the other approaches. We recommend carefully evaluating algorithm performance prior to using it to match a complete database. By choosing a suitable matching algorithm, databases of sizes that are unfeasible to match “by eye” can be easily translated to accurate individual capture histories necessary for robust demographic estimates.