Facebook
TwitterThis zip file contains the Code History Database for the United Kingdom as at April 2018. To download the zip file click the Download button. The Code History Database (CHD) contains the GSS nine-character codes, where allocated, for current and new statistical geographies from 1 January 2009. The codes consist of a simple alphanumeric structure; the first three characters (ANN) represent the area entity (i.e. type; or category of geography) and the following six characters (NNNNNN) represent the specific area instance. The CHD provides multiple functionality including details of codes, relationships, hierarchies and archived data. The CHD can be used in conjunction with the Register of Geographic Codes (RGC) that summarises the range of area instances within each geographic entity. The GSS Coding and Naming policy for some statistical geographies was implemented on 1 January 2011. From this date, where new codes have been allocated they should be used in all exchanges of statistics and published outputs that normally include codes. For further information on this product, please read the user guide and version notes contained within the product zip file. Updated GeographiesUpdates to Parishes (E04) (name change), Wards (E05) (name change), NMD (E07) (name change), Clinical Commissioning Groups in England (E38), NHS (Region, Local Office) (E39) and NHS England Regions (E40)Updates to Council Areas (S13) and Wards (S13)Updates to the Change History, SI Details, Name Changes, Equivalents table and Information table.Database ChangesUpdates to form design to account for December 2017 version have been made.
Facebook
TwitterThe Adults’ People and Nature Survey for England gathers information on people’s experiences and views about the natural environment, and its contributions to our health and wellbeing.
Data is published quarterly as Accredited Official Statistics. Since June 2023 we no longer publish the full dataset on gov.uk. The full dataset will instead be published via https://beta.ukdataservice.ac.uk/datacatalogue/series/series?id=2000123">UK Data Service.
Our statistical practice is regulated by the Office for Statistics Regulation (OSR). OSR sets the standards of trustworthiness, quality and value in the https://code.statisticsauthority.gov.uk/the-code/">Code of Practice for Statistics that all producers of official statistics should adhere to. You can read about how Official Statistics in Defra comply with these standards on the Defra Statistics website.
You are welcome to contact us directly at people_and_nature@naturalengland.org.uk with any comments about how we meet these standards. Alternatively, you can contact OSR by emailing regulation@statistics.gov.uk or via the OSR website.
To receive updates on the survey, including data releases and publications, sign-up via the https://people-and-nature-survey-defra.hub.arcgis.com/">People and Nature User Hub.
Facebook
Twitterhttps://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
The following files show the number of times each listed SNOMED code was added to a GP patient record within the period 1 Aug 2024 to 31 July 2025, aggregated at England level. The data is available in .txt and .xlsx formats. Data does not show how many patients had each code added to their record. A patient could have one code added to their record multiple times throughout the year, therefore it is not possible to infer the number of patients with a particular code from this data or use the data to calculate disease prevalence. Only valid SNOMED codes are included and not all practices are included - please read the metadata file linked at the foot of this page, prior to using this data.
Facebook
TwitterThis zip file contains the Code History Database for the United Kingdom as at 1st June 2025. (File size: 52.5 MB)To download the zip file click the Download button.Updates in England to: Civil Parishes (E04), Electoral Wards/Divisions (E05), Non-metropolitan Districts (E07), Metropolitan Districts (E08) Non-Civil Parished Areas (E43), Combined Authorities (E47), County Electoral Divisions (E58), Local Planning Authorities (E60)Updates in Wales to: Communities (W04)
Facebook
TwitterThis file contains the names and codes for the countries of the United Kingdom as at 31st December 2024. (File size - 16 KB)Field Names - CTRY24CD, CTRY24NM, CTRY24NMWField Types - Text, Text, TextField Lengths - 9, 17, 16
Facebook
Twitterhttps://www.ons.gov.uk/methodology/geography/licenceshttps://www.ons.gov.uk/methodology/geography/licences
This file contains names and codes for Counties in England as at 31st December 2016. (File Size - 16 KB)
Facebook
TwitterCode-Point® precisely locates 1.7 million postcode units in Great Britain and Northern Ireland. Each unit contains an average of 15 adjoining addresses. For direct marketers, Code-Point® helps you maximise your response rates by targeting customers in the best postcodes for your offering. Code-Point is a powerful insights tool. It shows up geographical hot-spots so you can target local action on, for example, petty crime or disease outbreaks. We give you the split between residential and commercial addresses in each postcode. This lets you work out which areas to target for services like business broadband. Code-Point provides a precise geographical location for each postcode unit in the United Kingdom. It also contains additional information, for example, NHS region and area codes, local government county, district and ward codes, PO boxes and the total number of addresses, both domestic and non-domestic, in each postcode unit. Code-Point includes Gridlink® data. Gridlink is a consortium initiative involving a number of government agencies that have cooperated to improve the consistency and quality of spatially referenced, postcode-based data.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This file contains the NHS England (Region, Local Office) (NHSRLO) names and codes, as at 1 April 2017. This file replaces the NHS Region (Geography) (April 2017) Names and Codes in England file.
Facebook
TwitterThis dataset was created by Muhammad
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides a comprehensive view of UK companies, including their registration details, financial information, ownership, management, and recent filings for up to the 31st December 2023. The data has been meticulously processed using dbt (Data Build Tool) scripts to ensure accuracy and relevance.
Play with this dataset at the BI app. (Free registration is required)
https://www.youtube.com/watch?v=iybNM8UtQRA" alt="Dataset overview">
The dataset comprises the following tables:
Below is a detailed description of each table.
Description: Contains detailed information about companies registered in the UK up to January 1, 2024.
Columns:
company_number: Unique identifier for each company.company_type: Type of company (e.g., private limited, public limited).office_address: Registered office address.incorporation_date: Date of company incorporation.jurisdiction: Legal jurisdiction of the company.company_status: Current status (e.g., active, dissolved).account_type: Type of accounts filed.company_name: Official name of the company.sic_codes: Standard Industrial Classification codes.date_of_cessation: Date when the company ceased operations (if applicable).next_accounts_overdue: Indicator if the next accounts are overdue.confirmation_statement_overdue: Indicator if the confirmation statement is overdue.owners: Number of registered owners (persons with significant control).officers: Number of officers (directors, secretaries) associated with the company.average_number_employees_during_period: Average number of employees during the last accounting period.current_assets: Current assets as per the last accounts.last_accounts_period_end: End date of the last accounting period.company_url: Where you can check the up-to-date company information. Free registration is required.Data Generation Process:
ch_psc and ch_officers tables.ch_accounts table to include financial information.Description: Provides detailed SIC (Standard Industrial Classification) codes for each company.
Columns:
company_number: Company identifier.sic_code: SIC code assigned to the company.sic_description: Description of the SIC code.sic_section: Section of the SIC code.sic_division: Division of the SIC code.company_url: Where you can check the up-to-date company information. Free registration is required.Data Generation Process:
ch_companies_sic_codes with the sic_codes table to enrich SIC code information.Description: Lists up to the five most recent filings for each company as of January 1, 2024.
Columns:
transaction_id.company_number: Company identifier.date: Date of the filing.Data Generation Process:
ch_filings table.Description: Details up to five most recent officers and owners for each company, including their roles and personal information.
Columns:
company_number: Company identifier.name: Full name of the officer or owner.kind: Type of person (individual or corporate entity).officer_role: Role within the company.occupation: Occupation of the individual.date: Date of appointment or notification.is_owner: Boolean indicating if the person is an owner.country_of_residence: Country where the individual resides.nationality: Nationality of the individual.company_country: Country of the company (for corporate persons).person_id: Unique identifier for the person.person_url: Where you can check the up-to-date company information. Free registration is required.Data Generation Process:
ch_officers and ch_psc tables.Segments Included:
Data Generation Process:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For use with UK Biobank data. v2: Change to scoring for AUDIT questionnaire. v3: Change to coding for exercise and cannabis use to accompany revised paper
Facebook
TwitterBritish English Phonetic Dataset
Introduction
This dataset is an extension of Common Voice, from which 6 subsets were selected (Common Voice Corpus 1, Common Voice Corpus 2, Common Voice Corpus 3, Common Voice Corpus 4, Common Voice Corpus 18.0, Common Voice Corpus 19.0). All data containing the England accent from these 6 subsets were extracted and phonetically annotated accordingly.
Description
Key fields explanation:
sentence: The English sentence… See the full description on the dataset page: https://huggingface.co/datasets/zdm-code/england-phoneme-dataset.
Facebook
TwitterThis dataset is part of the ALLANAI.CrimeVision open-data series by Mahira (ALLANAI Labs).
It includes monthly crime reports from October 2022 to December 2022, retrieved through the UK Police API.
| Column | Description |
|---|---|
crime_id | Unique crime record ID |
crime_type | Category of offence |
month | Month of report |
latitude, longitude | Location coordinates |
street_name | Street of incident |
outcome | Investigation result |
force_id | Police force code |
neighbourhood_id | Local neighbourhood code |
📧 Author: Mahira
🌐 ALLANAI Labs
💡 Building AI for a safer and sustainable world.
Facebook
TwitterThe Clinical Practice Research Datalink (CPRD, www.cprd.com) is a large UK database of primary care health records, to which UoB has an institutional site license. CPRD is used for a range of observational primary care research such as epidemiology, pharmacovigilance and health services research. Research studies employing CPRD require the development of diagnostic and treatment code lists (based on the Read and BNF coding systems respectively). This dataset comprises code lists used in a CPRD analysis of the impact of the UK Quality and Outcomes Framework (QOF) payment-for-performance system on recording and treatment of cardiovascular risk factors in UK general practice for patients with severe mental illness. Codes are provided for defining severe mental illness within CPRD, as well as a range of cardiovascular risk factors.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Numbers of enterprises and local units produced from a snapshot of the Inter-Departmental Business Register (IDBR) taken on 14 March 2025.
Facebook
TwitterThe Register of Geographic Codes (RGC) is a key product that contains the definitive list of UK statistical geographies. ONS maintains the definitive set of statistical geographies, coordinates the issue of new codes, and maintains the relationship between active and archived code ranges on behalf of the Government Statistical Service. The RGC should be used in conjunction with the Code History Database, available to download separately.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
If you want to give feedback on this dataset, or wish to request it in another form (e.g csv), please fill out this survey here. We are a not-for-profit research organisation keen to see how others use our open models and tools, so all feedback is appreciated! It's a short form that takes 5 minutes to complete.
Important Note: Before downloading this dataset, please read the License and Software Attribution section at the bottom.
This dataset aligns with the work published in Centre for Net Zero's report "Hitting the Target". In this work, we simulate a range of interventions to model the situations in which we believe the UK will meet its 600,000 heat pump installation per year target by 2028. For full modelling assumptions and findings, read our report on our website.
The code for running our simulation is open source here.
This dataset contains over 9 million households that have been address matched between Energy Performance Certificates (EPC) data and Price Paid Data (PPD). The code for our address matching is here. Since these datasets are Open Government License (OGL), this dataset is too. We basically model specific columns from various datasets, as set out in our methodology section in our report, to simplify and clean up this dataset for academic use. License information is also available in the appendix of our report above.
The EPC data loaders can be found here (the data is here) and the rest of the schemas and data download locations can be found here.
Note that this dataset is not regularly maintained or updated. It is correct as of January 2022. The data was curated and tested using dbt via this Github repository and would be simple to rerun on the latest data.
The schema / data dictionary for this data can be found here.
Our recommended way of loading this data is in Python. After downloading all "parts" of the dataset to a folder. You can run:
import pandas as pd
data = pd.read_parquet("path/to/data/folder/")
Licenses and software attribution:
For EPC, PPD and UK House Price Index data:
For the EPC data, we are permitted to republish this providing we mention that all researchers who download this dataset follow these copyright restrictions. We do not explicitly release any Royal Mail address data, instead we use these fields to generate a pseudonymised "address_cluster_id" which reflects a unique combination of the address lines and postcodes, as well as other metadata. When viewing ICO and GDPR guidelines, this still counts as personal data, but we have gone to measures to pseudonymise as much as possible to fulfil our obligations as a data processor. You must read this carefully before downloading the data, and ensure that you are using it for the research purposes as determined by this copyright notice.
Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.
Contains OS data © Crown copyright and database right 2022.
Contains Office for National Statistics data licensed under the Open Government Licence v.3.0.
The OGL v3.0 license states that we are free to:
copy, publish, distribute and transmit the Information;
adapt the Information;
exploit the Information commercially and non-commercially for example, by combining it with other Information, or by including it in your own product or application.
However we must (where we do any of the above):
acknowledge the source of the Information in your product or application by including or linking to any attribution statement specified by the Information Provider(s) and, where possible, provide a link to this licence;
You can see more information here.
For XOServe Off Gas Postcodes:
This dataset has been released openly for all uses here.
For the address matching:
GNU Parallel: O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014
Facebook
TwitterThis data set covers the provisional quarterly estimates of local authority collected waste generation and management for England and the regions.
If you require the data in another format or wish to comment please contact: enviro.statistics@defra.gov.uk
<p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute"><abbr title="OpenDocument Spreadsheet" class="gem-c-attachment_abbr">ODS</abbr></span>, <span class="gem-c-attachment_attribute">25.2 KB</span></p>
<p class="gem-c-attachment_metadata">
This file is in an <a href="https://www.gov.uk/guidance/using-open-document-formats-odf-in-your-organisation" target="_self" class="govuk-link">OpenDocument</a> format
<p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">55 KB</span></p>
Facebook
TwitterLast updated on 22 Feb 2025
This dataset provides comprehensive information on property sales in England and Wales, sourced from the UK government's HM Land Registry. Although the government site claims to update on the same day each month, actual updates can vary. To bridge this update variation gap, our fully automated ETL pipeline retrieves the official government data on a daily basis. This ensures that the dataset always reflects the most current transaction data available.
Our ETL (Extract, Transform, Load) process is designed to automate the data update and publishing workflow:
1. Extract:
The pipeline uses web scraping to retrieve the latest data from the official government website. This step is necessary as the site does not offer an API.
2. Transform:
Before loading the data, the ETL pipeline processes the dataset to ensure consistency and usability. As part of the transformation stage, the first column (Transaction_unique_identifier) is removed. This column is dropped during staging to focus on the most relevant transactional information. The column removal successfully reduces the data file size from almost 6GB to 3.1GB, and therefore will greatly increase the data analysis efficiency, and reduces the chance of kernal error/restart.
3. Load:
Finally, the transformed data is loaded into the dataset.
The transformed data is loaded into the dataset in two parts: - Complete Data (pp-complete.csv): This file encompasses all records from January 1995 to the present. The complete data file is replaced during each update to reflect any corrections or additional historical data. The first column is price. - Monthly Data: A separate monthly file is amended each month. This monthly archive ensures a complete record of updates over time, allowing users to track changes and trends more granularly.
The dataset (pp-complete.csv) contains records of property sales dating back to January 1995, up to the most recent monthly data. It covers various types of transactions—from residential to commercial properties—providing a holistic view of the real estate market in England and Wales.
The original data includes the following columns:
- Transaction_unique_identifier
- price
- Date_of_Transfer
- postcode
- Property_Type
- Old/New
- Duration
- PAON
- SAON
- Street
- Locality
- Town/City
- District
- County
- PPDCategory_Type
- Record_Status - monthly_file_only
Note: As part of the transformation process, the Transaction_unique_identifier column is removed from the final published pp-complete.csv data file. Therefore the first column of the pp-complete.csv file is price.
Address data Explanation - Postcode: The postal code where the property is located. - PAON (Primary Addressable Object Name): Typically the house number or name. - SAON (Secondary Addressable Object Name): Additional information if the building is divided into flats or sub-buildings. - Street: The street name where the property is located. - Locality: Additional locality information. - Town/City: The town or city where the property is located. - District: The district in which the property resides. - County: The county where the property is located. - Price Paid: The price for which the property was sold.
Ownership and Attribution This dataset is the property of HM Land Registry and is released under the Open Government Licence (OGL). If you use or publish this dataset, you are required to include the following attribution statement:
>"Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0."
The data can be used for both commercial and non-commercial purposes.
The OGL does not cover third-party rights, which HM Land Registry is not authorized to license. For any other use of the Address Data, you must contact Royal Mail.
Market Trend Analysis: Understand the ups and downs of the property market over time. Investment Research: Identify potential areas for property investment. Academic Studies: Use the data for economic research and studies related to the housing market. Policy Making: Assist government agencies in making informed decisions regarding housing policies. Real Estate Apps: Integrate the data into apps that provide property price information services.
By using this dataset, you agree to abide by the terms and conditions as specified by HM Land Registry. Failure to do so may result in legal consequences.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveLong-term follow-up of population-based prospective studies is often achieved through linkages to coded regional or national health care data. Our knowledge of the accuracy of such data is incomplete. To inform methods for identifying stroke cases in UK Biobank (a prospective study of 503,000 UK adults recruited in middle-age), we systematically evaluated the accuracy of these data for stroke and its main pathological types (ischaemic stroke, intracerebral haemorrhage, subarachnoid haemorrhage), determining the optimum codes for case identification.MethodsWe sought studies published from 1990-November 2013, which compared coded data from death certificates, hospital admissions or primary care with a reference standard for stroke or its pathological types. We extracted information on a range of study characteristics and assessed study quality with the Quality Assessment of Diagnostic Studies tool (QUADAS-2). To assess accuracy, we extracted data on positive predictive values (PPV) and—where available—on sensitivity, specificity, and negative predictive values (NPV).Results37 of 39 eligible studies assessed accuracy of International Classification of Diseases (ICD)-coded hospital or death certificate data. They varied widely in their settings, methods, reporting, quality, and in the choice and accuracy of codes. Although PPVs for stroke and its pathological types ranged from 6–97%, appropriately selected, stroke-specific codes (rather than broad cerebrovascular codes) consistently produced PPVs >70%, and in several studies >90%. The few studies with data on sensitivity, specificity and NPV showed higher sensitivity of hospital versus death certificate data for stroke, with specificity and NPV consistently >96%. Few studies assessed either primary care data or combinations of data sources.ConclusionsParticular stroke-specific codes can yield high PPVs (>90%) for stroke/stroke types. Inclusion of primary care data and combining data sources should improve accuracy in large epidemiological studies, but there is limited published information about these strategies.
Facebook
TwitterThis zip file contains the Code History Database for the United Kingdom as at April 2018. To download the zip file click the Download button. The Code History Database (CHD) contains the GSS nine-character codes, where allocated, for current and new statistical geographies from 1 January 2009. The codes consist of a simple alphanumeric structure; the first three characters (ANN) represent the area entity (i.e. type; or category of geography) and the following six characters (NNNNNN) represent the specific area instance. The CHD provides multiple functionality including details of codes, relationships, hierarchies and archived data. The CHD can be used in conjunction with the Register of Geographic Codes (RGC) that summarises the range of area instances within each geographic entity. The GSS Coding and Naming policy for some statistical geographies was implemented on 1 January 2011. From this date, where new codes have been allocated they should be used in all exchanges of statistics and published outputs that normally include codes. For further information on this product, please read the user guide and version notes contained within the product zip file. Updated GeographiesUpdates to Parishes (E04) (name change), Wards (E05) (name change), NMD (E07) (name change), Clinical Commissioning Groups in England (E38), NHS (Region, Local Office) (E39) and NHS England Regions (E40)Updates to Council Areas (S13) and Wards (S13)Updates to the Change History, SI Details, Name Changes, Equivalents table and Information table.Database ChangesUpdates to form design to account for December 2017 version have been made.