Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionLinking free-text addresses to unique identifiers in a structural address database [the Ordnance Survey unique property reference number (UPRN) in the United Kingdom (UK)] is a necessary step for downstream geospatial analysis in many digital health systems, e.g., for identification of care home residents, understanding housing transitions in later life, and informing decision making on geographical health and social care resource distribution. However, there is a lack of open-source tools for this task with performance validated in a test data set.MethodsIn this article, we propose a generalisable solution (A Framework for Linking free-text Addresses to Ordnance Survey UPRN database, FLAP) based on a machine learning–based matching classifier coupled with a fuzzy aligning algorithm for feature generation with better performance than existing tools. The framework is implemented in Python as an Open Source tool (available at Link). We tested the framework in a real-world scenario of linking individual’s (n=771,588) addresses recorded as free text in the Community Health Index (CHI) of National Health Service (NHS) Tayside and NHS Fife to the Unique Property Reference Number database (UPRN DB).ResultsWe achieved an adjusted matching accuracy of 0.992 in a test data set randomly sampled (n=3,876) from NHS Tayside and NHS Fife CHI addresses. FLAP showed robustness against input variations including typographical errors, alternative formats, and partially incorrect information. It has also improved usability compared to existing solutions allowing the use of a customised threshold of matching confidence and selection of top n candidate records. The use of machine learning also provides better adaptability of the tool to new data and enables continuous improvement.DiscussionIn conclusion, we have developed a framework, FLAP, for linking free-text UK addresses to the UPRN DB with good performance and usability in a real-world task.
https://www.ons.gov.uk/methodology/geography/licenceshttps://www.ons.gov.uk/methodology/geography/licences
This file contains the National Statistics UPRN Lookup (NSUL) for Great Britain as at February 2023. The NSUL relates the Unique Property Reference Number (UPRN) for each GB address from AddressBase® Epoch 99 to a range of current statutory administrative, electoral, health and other statistical geographies via 'best-fit' allocation from 2021 Census output areas (National Parks and Workplace Zones are exempt from 'best-fit' and use 'exact-fit' allocations). The NSUL is produced by ONS Geography, who provide geographic support to the Office for National Statistics (ONS) and geographic services used by other organisations. The NSUL is issued every 6 weeks and is designed to complement the Ordnance Survey AddressBase® product. For further technical information about this file, please refer to the User Guide document contained within the downloadable zip file. Please note that this product contains Royal Mail, Gridlink, Ordnance Survey and ONS Intellectual Property Rights. (File Size – 463 MB)
Our Price Paid Data includes information on all property sales in England and Wales that are sold for value and are lodged with us for registration.
Get up to date with the permitted use of our Price Paid Data:
check what to consider when using or publishing our Price Paid Data
If you use or publish our Price Paid Data, you must add the following attribution statement:
Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.
Price Paid Data is released under the http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/" class="govuk-link">Open Government Licence (OGL). You need to make sure you understand the terms of the OGL before using the data.
Under the OGL, HM Land Registry permits you to use the Price Paid Data for commercial or non-commercial purposes. However, OGL does not cover the use of third party rights, which we are not authorised to license.
Price Paid Data contains address data processed against Ordnance Survey’s AddressBase Premium product, which incorporates Royal Mail’s PAF® database (Address Data). Royal Mail and Ordnance Survey permit your use of Address Data in the Price Paid Data:
If you want to use the Address Data in any other way, you must contact Royal Mail. Email address.management@royalmail.com.
The following fields comprise the address data included in Price Paid Data:
The June 2025 release includes:
As we will be adding to the June data in future releases, we would not recommend using it in isolation as an indication of market or HM Land Registry activity. When the full dataset is viewed alongside the data we’ve previously published, it adds to the overall picture of market activity.
Your use of Price Paid Data is governed by conditions and by downloading the data you are agreeing to those conditions.
Google Chrome (Chrome 88 onwards) is blocking downloads of our Price Paid Data. Please use another internet browser while we resolve this issue. We apologise for any inconvenience caused.
We update the data on the 20th working day of each month. You can download the:
These include standard and additional price paid data transactions received at HM Land Registry from 1 January 1995 to the most current monthly data.
Your use of Price Paid Data is governed by conditions and by downloading the data you are agreeing to those conditions.
The data is updated monthly and the average size of this file is 3.7 GB, you can download:
Doorda's UK Geospatial Real Estate Data provides a comprehensive database of over 34 million addresses aggregated from 10 data sources, offering unparalleled geospatial insights for customer insights and risk analysis purposes.
Volume and stats: - 34M Addressable locations - 15M Exact Building Location - 9M derived Building Locations
Our Geospatial Real Estate Data offers a multitude of use cases: - Location Planning - Risk Analysis - Customer Insights - Data Augmentation - Market Insights
The key benefits of leveraging our Geospatial Real Estate Data include: - Data Accuracy - Informed Decision-Making - Competitive Advantage - Efficiency - Single Source
Covering a wide range of industries and sectors, our data empowers organisations to make informed decisions, uncover market trends, and gain a competitive edge in the UK market.
A comprehensive self-hosted geospatial database of street names, coordinates, and address data ranges for Enterprise use. The address data are georeferenced with industry-standard WGS84 coordinates (geocoding).
All geospatial data are provided in the official local languages. Names and other data in non-Roman languages are also made available in English through translations and transliterations.
Use cases for the Global Address Database (Geospatial data)
Address capture and validation
Parcel delivery
Master Data Management
Logistics and Shipping
Sales and Marketing
Additional features
Fully and accurately geocoded
Multi-language support
Address ranges for streets covered by several zip codes
Comprehensive city definitions across countries
Administrative areas with a level range of 0-4
International Address Formats
For additional insights, you can combine the map data with:
UNLOCODE and IATA codes (geocoded)
Time zones and Daylight Saving Time (DST)
Population data: Past and future trends
Data export methodology
Our location data packages are offered in CSV format. All geospatial data are optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more.
Why companies choose our location databases
Enterprise-grade service
Reduce integration time and cost by 30%
Frequent, consistent updates for the highest quality
Note: Custom geospatial data packages are available. Please submit a request via the above contact button for more details.
https://www.ons.gov.uk/methodology/geography/licenceshttps://www.ons.gov.uk/methodology/geography/licences
This is the ONS Postcode Directory (ONSPD) for the United Kingdom as at February 2023 in Comma Separated Variable (CSV) and ASCII text (TXT) formats. This file contains the multi CSVs so that postcode areas can be opened in MS Excel. To download the zip file click the Download button. The ONSPD relates both current and terminated postcodes in the United Kingdom to a range of current statutory administrative, electoral, health and other area geographies. It also links postcodes to pre-2002 health areas, 1991 Census enumeration districts for England and Wales, 2001 Census Output Areas (OA) and Super Output Areas (SOA) for England and Wales, 2001 Census OAs and SOAs for Northern Ireland and 2001 Census OAs and Data Zones (DZ) for Scotland. It now contains 2021 Census OAs and SOAs for England and Wales. It helps support the production of area based statistics from postcoded data. The ONSPD is produced by ONS Geography, who provide geographic support to the Office for National Statistics (ONS) and geographic services used by other organisations. The ONSPD is issued quarterly. (File size - 234 MB)NOTE: The 2022 ONSPDs included an incorrect update of the ITL field with two LA changes in Northamptonshire. This error has been corrected from the February 2023 ONSPD.NOTE: There was an issue with the originally published file where some change orders yet to be included in OS Boundary-LineÔ (including The Cumbria (Structural Changes) Order 2022, The North Yorkshire (Structural Changes) Order 2022 and The Somerset (Structural Changes) Order 2022) were mistakenly implemented for terminated postcodes. Version 2 corrects this, so that ward codes E05014171–E05014393 are not yet included. Please note that this product contains Royal Mail, Gridlink, LPS (Northern Ireland), Ordnance Survey and ONS Intellectual Property Rights.
Our UK Postcode Database offers comprehensive postal code data for spatial analysis, including postal and administrative areas. This dataset contains accurate and up-to-date information on all administrative divisions, cities, and zip codes, making it an invaluable resource for various applications such as address capture and validation, map and visualization, reporting and business intelligence (BI), master data management, logistics and supply chain management, and sales and marketing. Our location data packages are available in various formats, including CSV, optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more. Product features include fully and accurately geocoded data, multi-language support with address names in local and foreign languages, comprehensive city definitions, and the option to combine map data with UNLOCODE and IATA codes, time zones, and daylight saving times. Companies choose our location databases for their enterprise-grade service, reduction in integration time and cost by 30%, and weekly updates to ensure the highest quality.
Global Email Address & Contact Data Solutions: 293M+ Verified Emails and Phone Numbers for B2B & B2C Outreach Boost your marketing and sales strategies with Forager.ai's Global Contact Data and Email address Data. Our comprehensive database offers access to over 293 million verified email addresses, along with phone number data and detailed B2B Email data and contact information. Whether you're focused on expanding your B2B Email outreach or improving lead generation, our solutions provide the tools you need to engage decision-makers and drive success.
Designed to support your Email data-driven marketing efforts, Forager.ai delivers valuable insights with email data, phone number data, and contact details for both B2B and B2C audiences. Build meaningful connections and leverage high-quality, verified Email data to execute precise and effective outreach strategies.
Core Features of Forager.ai B2B Email Data Solutions: Targeted B2B Email Data: Gain access to a diverse collection of email addresses that help you execute personalized email campaigns targeting key decision-makers across industries.
Comprehensive Phone Number Data: Enhance your sales and telemarketing strategies with our extensive phone number database, perfect for direct outreach and boosting customer engagement.
B2B and B2C Contact Data: Tailor your messaging with B2B contact data and B2C contact Email address data that allow you to effectively connect with C-suite executives, decision-makers, and key consumer groups.
CEO Contact Information: Unlock direct access to CEO contact details, ideal for high-level networking, partnership building, and executive outreach.
Strategic Applications of Forager.ai Data: Online Marketing & Campaigns: Utilize our email address data and phone number information to run targeted online marketing campaigns, increasing conversion rates and boosting outreach effectiveness.
Database Enrichment: Improve your sales databases and CRM systems by enriching them with accurate and up-to-date contact data, supporting more informed decision-making.
B2B Lead Generation: Tap into our rich B2B Email data to expand your business networks, refine your outreach efforts, and generate high-quality leads.
Sales Data Amplification: Supercharge your sales strategies by integrating enriched contact data for better targeting and higher sales conversion rates.
Competitive Market Intelligence: Gain valuable insights into your competitors by leveraging our comprehensive contact data to analyze trends and shifts in the market.
Why Forager.ai Stands Out: Precision & Accuracy: With a 95%+ accuracy rate, Forager.ai ensures that your email data and contact information is always fresh, reliable, and ready to be used for maximum impact.
Global Reach, Local Relevance: Our Email address data solutions cover global markets while allowing you to focus on specific regions, industries, and audience segments tailored to your business needs.
Cost-Effective Solutions: We offer scalable, affordable B2B email data and B2B contact data packages, ensuring you get high-value results without breaking your budget.
Ethical, Compliant Data: We strictly adhere to GDPR guidelines, ensuring that all contact data is ethically sourced and legally compliant, protecting both your business and your customers.
Unlock the Power of Verified Email (Personal Email data & Business Email data) Contact Data with Forager.ai Explore the potential of our 293M+ verified email addresses and phone numbers to elevate your B2B email marketing, sales outreach, and data-driven initiatives. Our contact data solutions are tailored to support your lead generation, sales pipeline, and competitive intelligence efforts, giving you the tools to execute more effective and impactful campaigns.
Top Use Cases for Forager.ai Data Solutions: Lead Generation & B2B Prospecting
Cold B2B Email Outreach
CRM Enrichment & Marketing Automation
Account-Based Marketing (ABM)
Recruiting & Executive Search
Market Research & Competitive Intelligence
Flexible Data Licensing & Access Options: One-Time Data Files available upon request
24/7 API Access for seamless integration
Monthly & Annual Plans tailored to your needs
API Credits Roll Over with no expiration
Reach out to us today to discover how Forager.ai's high-quality Email data and contact data can transform your outreach strategies and drive greater business success.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Open Postcode Geo is a postcode dataset and API optimised for geocoding applications. You can use Open Postcode Geo to geocode a dataset, geocode user input, and therefore build a proximity search.
Data is derived from the ONS (Office for National Statistics) postcode database and is free to use, subject to including attributions to ONS, OS (Ordinance Survey) and Royal Mail.
Information is also provided on a range of topics, including education, health, crime, business, etc.
Postcodes can be entered at area, district, sector, and unit level - see Postcode map for the geographical relationship between these.
https://www.ons.gov.uk/methodology/geography/licenceshttps://www.ons.gov.uk/methodology/geography/licences
This file contains the National Statistics Postcode Lookup (NSPL) for the United Kingdom as at August 2022 in Comma Separated Variable (CSV) and ASCII text (TXT) formats. To download the zip file click the Download button. The NSPL relates both current and terminated postcodes to a range of current statutory geographies via ‘best-fit’ allocation from the 2021 Census Output Areas (national parks and Workplace Zones are exempt from ‘best-fit’ and use ‘exact-fit’ allocations) for England and Wales. Scotland and Northern Ireland has the 2011 Census Output AreasIt supports the production of area based statistics from postcoded data. The NSPL is produced by ONS Geography, who provide geographic support to the Office for National Statistics (ONS) and geographic services used by other organisations. The NSPL is issued quarterly. (File size - 184 MB).
A global self-hosted Market Research dataset containing all administrative divisions, cities, addresses, and zip codes for 247 countries. All geospatial data is updated weekly to maintain the highest data quality, including challenging countries such as China, Brazil, Russia, and the United Kingdom.
Use cases for the Global Zip Code Database (Market Research data)
Address capture and validation
Map and visualization
Reporting and Business Intelligence (BI)
Master Data Mangement
Logistics and Supply Chain Management
Sales and Marketing
Data export methodology
Our map data packages are offered in variable formats, including .csv. All geographic data are optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more.
Product Features
Fully and accurately geocoded
Administrative areas with a level range of 0-4
Multi-language support including address names in local and foreign languages
Comprehensive city definitions across countries
For additional insights, you can combine the map data with:
UNLOCODE and IATA codes
Time zones and Daylight Saving Times
Why do companies choose our Market Research databases
Enterprise-grade service
Reduce integration time and cost by 30%
Weekly updates for the highest quality
Note: Custom geographic data packages are available. Please submit a request via the above contact button for more details.
https://www.ons.gov.uk/methodology/geography/licenceshttps://www.ons.gov.uk/methodology/geography/licences
This file contains the ONS UPRN Directory (ONSUD) for Great Britain as at January 2022. The ONSUD relates the Unique Property Reference Number (UPRN) for each GB address from AddressBase® Epoch 89 to a range of current statutory administrative, electoral, health and other statistical geographies. The ONSUD is produced by ONS Geography, who provide geographic support to the Office for National Statistics (ONS) and geographic services used by other organisations. The ONSUD is issued every 6 weeks and is designed to complement the Ordnance Survey AddressBase® product. For further technical information about this file, please refer to the User Guide document contained within the downloadable zip file. Please note that this product contains Royal Mail, Gridlink, Ordnance Survey and ONS Intellectual Property Rights. (File Size - 511 MB)
The Free Company Data Product is a downloadable data snapshot containing basic company data of live companies on the register. This snapshot is provided as ZIP files containing data in CSV format and is split into multiple files for ease of downloading.
This snapshot is provided free of charge and will not be supported.
The latest snapshot will be updated within 5 working days of the previous month end.
The contents of the snapshot have been compiled up to the end of the previous month.
A list of the data fields contained in the snapshot can be found here PDF.
Up-to-date company information can be obtained by following the URI links in the data. More details on URIs
If files are viewed with Microsoft Excel, it is recommended that you use version 2007 or later.
Free courses for jobs allows eligible learners to access a high-value level 3 qualification, for free, to gain higher wages and access new job opportunities.
More than 400 qualifications are available on the offer, chosen specifically because they provide good wage outcomes and address skills needs in the economy.
The Department for Education is expanding the free courses for jobs offer to support providers to deliver more construction training. We are adding new, reformed construction qualifications at Level 2 and will be reviewing further construction qualifications for the national list.
Official statistics are produced impartially and free from political influence.
Graph Database Market Size 2025-2029
The graph database market size is forecast to increase by USD 11.24 billion at a CAGR of 29% between 2024 and 2029.
The market is experiencing significant growth, driven by the increasing popularity of open knowledge networks and the rising demand for low-latency query processing. These trends reflect the growing importance of real-time data analytics and the need for more complex data relationships to be managed effectively. However, the market also faces challenges, including the lack of standardization and programming flexibility. These obstacles require innovative solutions from market participants to ensure interoperability and ease of use for businesses looking to adopt graph databases.
Companies seeking to capitalize on market opportunities must focus on addressing these challenges while also offering advanced features and strong performance to differentiate themselves. Effective navigation of these dynamics will be crucial for success in the evolving graph database landscape. Compliance requirements and data privacy regulations drive the need for security access control and data anonymization methods. Graph databases are deployed in both on-premises data centers and cloud regions, providing flexibility for businesses with varying IT infrastructures.
What will be the Size of the Graph Database Market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free Sample
In the dynamic market, security and data management are increasingly prioritized. Authorization mechanisms and encryption techniques ensure data access control and confidentiality. Query optimization strategies and indexing enhance query performance, while data anonymization methods protect sensitive information. Fault tolerance mechanisms and data governance frameworks maintain data availability and compliance with regulations. Data quality assessment and consistency checks address data integrity issues, and authentication protocols secure concurrent graph updates. This model is particularly well-suited for applications in social networks, recommendation engines, and business processes that require real-time analytics and visualization.
Graph database tuning and monitoring optimize hardware resource usage and detect performance bottlenecks. Data recovery procedures and replication methods ensure data availability during disasters and maintain data consistency. Data version control and concurrent graph updates address versioning and conflict resolution challenges. Data anomaly detection and consistency checks maintain data accuracy and reliability. Distributed transactions and data recovery procedures ensure data consistency across nodes in a distributed graph database system.
How is this Graph Database Industry segmented?
The graph database industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
End-user
Large enterprises
SMEs
Type
RDF
LPG
Solution
Native graph database
Knowledge graph engines
Graph processing engines
Graph extension
Geography
North America
US
Canada
Europe
France
Germany
Italy
Spain
UK
APAC
China
India
Japan
Rest of World (ROW)
By End-user Insights
The Large enterprises segment is estimated to witness significant growth during the forecast period. In today's business landscape, large enterprises are turning to graph databases to manage intricate data relationships and improve decision-making processes. Graph databases offer unique advantages over traditional relational databases, enabling superior agility in modeling and querying interconnected data. These systems are particularly valuable for applications such as fraud detection, supply chain optimization, customer 360 views, and network analysis. Graph databases provide the scalability and performance required to handle large, dynamic datasets and uncover hidden patterns and insights in real time. Their support for advanced analytics and AI-driven applications further bolsters their role in enterprise digital transformation strategies. Additionally, their flexibility and integration capabilities make them well-suited for deployment in hybrid and multi-cloud environments.
Graph databases offer various features that cater to diverse business needs. Data lineage tracking ensures accountability and transparency, while graph analytics engines provide advanced insights. Graph database benchmarking helps organizations evaluate performance, and relationship property indexing streamlines data access. Node relationship management facilitates complex data modeling, an
Doorda's UK Residential Real Estate Data provides a comprehensive database of over 34 million addresses sourced from 20 data sources, offering unparalleled insights for business intelligence and analytics purposes.
Volume and stats: - 34M Addressable locations - 6M Addresses linked to Commercial Owner - 24M Energy Performance Inspections
Our Residential Real Estate Data offers a multitude of use cases: - Market Analysis - Competitor Analysis - Lead Generation - Risk Management - Location Planning
The key benefits of leveraging our Residential Real Estate Data include: - Data Accuracy - Informed Decision-Making - Competitive Advantage - Efficiency - Single Source
Covering a wide range of industries and sectors, our data empowers organisations to make informed decisions, uncover market trends, and gain a competitive edge in the UK market.
A global self-hosted location dataset containing all administrative divisions, cities, and zip codes for 247 countries. All geospatial data is updated weekly to maintain the highest data quality, including challenging countries such as China, Brazil, Russia, and the United Kingdom.
Use cases for the Global Zip Code Database (Geospatial data)
Address capture and validation
Map and visualization
Reporting and Business Intelligence (BI)
Master Data Mangement
Logistics and Supply Chain Management
Sales and Marketing
Data export methodology
Our location data packages are offered in variable formats, including .csv. All geospatial data are optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more.
Product Features
Fully and accurately geocoded
Administrative areas with a level range of 0-4
Multi-language support including address names in local and foreign languages
Comprehensive city definitions across countries
For additional insights, you can combine the map data with:
UNLOCODE and IATA codes
Time zones and Daylight Saving Times
Why do companies choose our location databases
Enterprise-grade service
Reduce integration time and cost by 30%
Weekly updates for the highest quality
Note: Custom geospatial data packages are available. Please submit a request via the above contact button for more details.
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
The UK English Speecon database is divided into 2 sets: 1) The first set comprises the recordings of 606 adult UK English speakers (325 males, 281 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place), and consisting of about 195 hours of audio data. 2) The second set comprises the recordings of 51 child UK English speakers (14 boys, 37 girls), recorded over 4 microphone channels in 1 recording environment (children room), and consisting of about 9 hours of audio data. This database is partitioned into 31 DVDs (first set) and 4 DVDs (second set).The speech databases made within the Speecon project were validated by SPEX, the Netherlands, to assess their compliance with the Speecon format and content specifications.Each of the four speech channels is recorded at 16 kHz, 16 bit, uncompressed unsigned integers in Intel format (lo-hi byte order). To each signal file corresponds an ASCII SAM label file which contains the relevant descriptive information.Each speaker uttered the following items (over 290 items for adults and over 210 items for children):Calibration data: 6 noise recordings The “silence word” recordingFree spontaneous items (adults only):5 minutes (session time) of free spontaneous, rich context items (story telling) (an open number of spontaneous topics out of a set of 30 topics)17 Elicited spontaneous items (adults only):3 dates, 2 times, 3 proper names, 2 city names, 1 letter sequence, 2 answers to questions, 3 telephone numbers, 1 language Read speech:30 phonetically rich sentences uttered by adults and 60 uttered by children5 phonetically rich words (adults only)4 isolated digits1 isolated digit sequence4 connected digit sequences1 telephone number3 natural numbers1 money amount2 time phrases (T1 : analogue, T2 : digital)3 dates (D1 : analogue, D2 : relative and general date, D3 : digital)3 letter sequences1 proper name2 city or street names2 questions2 special keyboard characters 1 Web address1 email address208 application specific words and phrases per session (adults)74 toy commands, 14 phone commands and 34 general commands (children)The following age distribution has been obtained: Adults: 321 speakers are between 16 and 30, 182 speakers are between 31 and 45, 103 speakers are over 46.Children: All 51 speakers are between 11 and 14.A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
The UK government manages the .gov.uk domain name.
Public sector bodies may register .gov.uk domain names for a variety of reasons. The rules governing which organisations can register for a .gov.uk domain names, how to choose appropriate names and manage them are set out in the apply for a .gov.uk domain name: step by step.
The list of .gov.uk domain names is available in CSV format with 3 columns.
Domain name: the domain name registered for use, which should work with or without a preceding ‘www’.
Owner: the name of the organisation that owns the domain name, for example a central government department or local authority.
Representing: the organisation the domain name is registered for, often the same as the owner but could be an agency or other organisation the owner is registering the domain on behalf of.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionLinking free-text addresses to unique identifiers in a structural address database [the Ordnance Survey unique property reference number (UPRN) in the United Kingdom (UK)] is a necessary step for downstream geospatial analysis in many digital health systems, e.g., for identification of care home residents, understanding housing transitions in later life, and informing decision making on geographical health and social care resource distribution. However, there is a lack of open-source tools for this task with performance validated in a test data set.MethodsIn this article, we propose a generalisable solution (A Framework for Linking free-text Addresses to Ordnance Survey UPRN database, FLAP) based on a machine learning–based matching classifier coupled with a fuzzy aligning algorithm for feature generation with better performance than existing tools. The framework is implemented in Python as an Open Source tool (available at Link). We tested the framework in a real-world scenario of linking individual’s (n=771,588) addresses recorded as free text in the Community Health Index (CHI) of National Health Service (NHS) Tayside and NHS Fife to the Unique Property Reference Number database (UPRN DB).ResultsWe achieved an adjusted matching accuracy of 0.992 in a test data set randomly sampled (n=3,876) from NHS Tayside and NHS Fife CHI addresses. FLAP showed robustness against input variations including typographical errors, alternative formats, and partially incorrect information. It has also improved usability compared to existing solutions allowing the use of a customised threshold of matching confidence and selection of top n candidate records. The use of machine learning also provides better adaptability of the tool to new data and enables continuous improvement.DiscussionIn conclusion, we have developed a framework, FLAP, for linking free-text UK addresses to the UPRN DB with good performance and usability in a real-world task.