Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Findings from the Coronavirus (COVID-19) Infection Survey for England.
On 1 April 2025 responsibility for fire and rescue transferred from the Home Office to the Ministry of Housing, Communities and Local Government.
This information covers fires, false alarms and other incidents attended by fire crews, and the statistics include the numbers of incidents, fires, fatalities and casualties as well as information on response times to fires. The Ministry of Housing, Communities and Local Government (MHCLG) also collect information on the workforce, fire prevention work, health and safety and firefighter pensions. All data tables on fire statistics are below.
MHCLG has responsibility for fire services in England. The vast majority of data tables produced by the Ministry of Housing, Communities and Local Government are for England but some (0101, 0103, 0201, 0501, 1401) tables are for Great Britain split by nation. In the past the Department for Communities and Local Government (who previously had responsibility for fire services in England) produced data tables for Great Britain and at times the UK. Similar information for devolved administrations are available at https://www.firescotland.gov.uk/about/statistics/" class="govuk-link">Scotland: Fire and Rescue Statistics, https://statswales.gov.wales/Catalogue/Community-Safety-and-Social-Inclusion/Community-Safety" class="govuk-link">Wales: Community safety and https://www.nifrs.org/home/about-us/publications/" class="govuk-link">Northern Ireland: Fire and Rescue Statistics.
If you use assistive technology (for example, a screen reader) and need a version of any of these documents in a more accessible format, please email alternativeformats@homeoffice.gov.uk. Please tell us what format you need. It will help us if you say what assistive technology you use.
Fire statistics guidance
Fire statistics incident level datasets
https://assets.publishing.service.gov.uk/media/67fe79e3393a986ec5cf8dbe/FIRE0101.xlsx">FIRE0101: Incidents attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 126 KB) Previous FIRE0101 tables
https://assets.publishing.service.gov.uk/media/67fe79fbed87b81608546745/FIRE0102.xlsx">FIRE0102: Incidents attended by fire and rescue services in England, by incident type and fire and rescue authority (MS Excel Spreadsheet, 1.56 MB) Previous FIRE0102 tables
https://assets.publishing.service.gov.uk/media/67fe7a20694d57c6b1cf8db0/FIRE0103.xlsx">FIRE0103: Fires attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 156 KB) Previous FIRE0103 tables
https://assets.publishing.service.gov.uk/media/67fe7a40ed87b81608546746/FIRE0104.xlsx">FIRE0104: Fire false alarms by reason for false alarm, England (MS Excel Spreadsheet, 331 KB) Previous FIRE0104 tables
https://assets.publishing.service.gov.uk/media/67fe7a5f393a986ec5cf8dc0/FIRE0201.xlsx">FIRE0201: Dwelling fires attended by fire and rescue services by motive, population and nation (MS Excel Spreadsheet, <span class="gem-c-attachm
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Headline estimates for England, Wales, Northern Ireland and Scotland.
Data tables containing aggregated information about vehicles in the UK are also available.
A number of changes were introduced to these data files in the 2022 release to help meet the needs of our users and to provide more detail.
Fuel type has been added to:
Historic UK data has been added to:
A new datafile has been added df_VEH0520.
We welcome any feedback on the structure of our data files, their usability, or any suggestions for improvements; please contact vehicles statistics.
CSV files can be used either as a spreadsheet (using Microsoft Excel or similar spreadsheet packages) or digitally using software packages and languages (for example, R or Python).
When using as a spreadsheet, there will be no formatting, but the file can still be explored like our publication tables. Due to their size, older software might not be able to open the entire file.
df_VEH0120_GB: https://assets.publishing.service.gov.uk/media/68494aca74fe8fe0cbb4676c/df_VEH0120_GB.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: Great Britain (CSV, 58.1 MB)
Scope: All registered vehicles in Great Britain; from 1994 Quarter 4 (end December)
Schema: BodyType, Make, GenModel, Model, Fuel, LicenceStatus, [number of vehicles; 1 column per quarter]
df_VEH0120_UK: https://assets.publishing.service.gov.uk/media/68494acb782e42a839d3a3ac/df_VEH0120_UK.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: United Kingdom (CSV, 34.1 MB)
Scope: All registered vehicles in the United Kingdom; from 2014 Quarter 3 (end September)
Schema: BodyType, Make, GenModel, Model, Fuel, LicenceStatus, [number of vehicles; 1 column per quarter]
df_VEH0160_GB: https://assets.publishing.service.gov.uk/media/68494ad774fe8fe0cbb4676d/df_VEH0160_GB.csv">Vehicles registered for the first time by body type, make, generic model and model: Great Britain (CSV, 24.8 MB)
Scope: All vehicles registered for the first time in Great Britain; from 2001 Quarter 1 (January to March)
Schema: BodyType, Make, GenModel, Model, Fuel, [number of vehicles; 1 column per quarter]
df_VEH0160_UK: https://assets.publishing.service.gov.uk/media/68494ad7aae47e0d6c06e078/df_VEH0160_UK.csv">Vehicles registered for the first time by body type, make, generic model and model: United Kingdom (CSV, 8.26 MB)
Scope: All vehicles registered for the first time in the United Kingdom; from 2014 Quarter 3 (July to September)
Schema: BodyType, Make, GenModel, Model, Fuel, [number of vehicles; 1 column per quarter]
In order to keep the datafile df_VEH0124 to a reasonable size, it has been split into 2 halves; 1 covering makes starting with A to M, and the other covering makes starting with N to Z.
df_VEH0124_AM: <a class="govuk-link" href="https://assets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
http://dx.doi.org/10.1093/pubmed/fdab017On March 11, the WHO declared that the novel SARS-CoV-2 virus was at pandemic levels. In the UK, a number of public health measures such as social distancing and lockdown were introduced to minimise viral transmission. We sought to assess whether or not the initial outbreak of COVID-19 was associated with a change in the prescription rates of ICS, prednisolone, and antibiotics in primary care in England.Find the datasets and R scripts used for each medication analysis separately to confirm our results.
Accessible Tables and Improved Quality
As part of the Analysis Function Reproducible Analytical Pipeline Strategy, processes to create all National Travel Survey (NTS) statistics tables have been improved to follow the principles of Reproducible Analytical Pipelines (RAP). This has resulted in improved efficiency and quality of NTS tables and therefore some historical estimates have seen very minor change, at least the fifth decimal place.
All NTS tables have also been redesigned in an accessible format where they can be used by as many people as possible, including people with an impaired vision, motor difficulties, cognitive impairments or learning disabilities and deafness or impaired hearing.
If you wish to provide feedback on these changes then please email national.travelsurvey@dft.gov.uk.
Revision to table NTS9919
On the 16th April 2025, the figures in table NTS9919 have been revised and recalculated to include only day 1 of the travel diary where short walks of less than a mile are recorded (from 2017 onwards), whereas previous versions included all days. This is to more accurately capture the proportion of trips which include short walks before a surface rail stage. This revision has resulted in fewer available breakdowns than previously published due to the smaller sample sizes.
NTS0303: https://assets.publishing.service.gov.uk/media/66ce0f118e33f28aae7e1f75/nts0303.ods">Average number of trips, stages, miles and time spent travelling by mode: England, 2002 onwards (ODS, 53.9 KB)
NTS0308: https://assets.publishing.service.gov.uk/media/66ce0f128e33f28aae7e1f76/nts0308.ods">Average number of trips and distance travelled by trip length and main mode; England, 2002 onwards (ODS, 191 KB)
NTS0312: https://assets.publishing.service.gov.uk/media/66ce0f12bc00d93a0c7e1f71/nts0312.ods">Walks of 20 minutes or more by age and frequency: England, 2002 onwards (ODS, 35.1 KB)
NTS0313: https://assets.publishing.service.gov.uk/media/66ce0f12bc00d93a0c7e1f72/nts0313.ods">Frequency of use of different transport modes: England, 2003 onwards (ODS, 27.1 KB)
NTS0412: https://assets.publishing.service.gov.uk/media/66ce0f1325c035a11941f653/nts0412.ods">Commuter trips and distance by employment status and main mode: England, 2002 onwards (ODS, 53.8 KB)
NTS0504: https://assets.publishing.service.gov.uk/media/66ce0f141aaf41b21139cf7d/nts0504.ods">Average number of trips by day of the week or month and purpose or main mode: England, 2002 onwards (ODS, 141 KB)
<h2 id=
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
National and subnational mid-year population estimates for the UK and its constituent countries by administrative area, age and sex (including components of population change, median age and population density).
The dataset was originally created to allow the construction of age-specific mortality series and cohort mortality series for particular diseases, from the mid-nineteenth century to the present (in conjunction with the comparable mortality database created by the Office of National Statistics which covers 1901 – present). The dataset is fairly comprehensive and therefore allows both fine analysis of trends in single causes and also the construction of consistent aggregated categories of causes over time. Additionally, comparison of trends in individual causes can be used to infer transfers of deaths between categories over time, that may cause artifactual changes in mortality rates of particular causes. The data are presented by sex, allowing calculation of sex ratios. The age-specific and annual nature of the dataset allows the analysis of cause-specific mortality by birth cohort (assuming low migration at the national level). The database can be used in conjunction with the ONS database “Historic Mortality and Population Data, 1901-1992”, already in the UK Data Archive collection as SN 2902, to create continuous cause-of-death series for the period 1848-1992 (or later, if using more recent versions of the ONS database).
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
This is a publication on maternity activity in English NHS hospitals. This report examines data relating to delivery and birth episodes in 2023-24, and the booking appointments for these deliveries. This annual publication covers the financial year ending March 2024. Data is included from both the Hospital Episodes Statistics (HES) data warehouse and the Maternity Services Data Set (MSDS). HES contains records of all admissions, appointments and attendances for patients admitted to NHS hospitals in England. The HES data used in this publication are called 'delivery episodes'. The MSDS collects records of each stage of the maternity service care pathway in NHS-funded maternity services, and includes information not recorded in HES. The MSDS is a maturing, national-level dataset. In April 2019, the MSDS transitioned to a new version of the dataset. This version, MSDS v2.0, is an update that introduced a new structure and content - including clinical terminology, in order to meet current clinical practice and incorporate new requirements. It is designed to meet requirements that resulted from the National Maternity Review, which led to the publication of the Better Births report in February 2016. This is the fifth publication of data from MSDS v2.0 and data from 2019-20 onwards is not directly comparable to data from previous years. This publication shows the number of HES delivery episodes during the period, with a number of breakdowns including by method of onset of labour, delivery method and place of delivery. It also shows the number of MSDS deliveries recorded during the period, with a breakdown for the mother's smoking status at the booking appointment by age group. It also provides counts of live born term babies with breakdowns for the general condition of newborns (via Apgar scores), skin-to-skin contact and baby's first feed type - all immediately after birth. There is also data available in a separate file on breastfeeding at 6 to 8 weeks. For the first time information on 'Smoking at Time of Delivery' has been presented using annual data from the MSDS. This includes national data broken down by maternal age, ethnicity and deprivation. From 2025/2026, MSDS will become the official source of 'Smoking at Time of Delivery' information and will replace the historic 'Smoking at Time of Delivery' data which is to become retired. We are currently undergoing dual collection and reporting on a quarterly basis for 2024/25 to help users compare information from the two sources. We are working with data submitters to help reconcile any discrepancies at a local level before any close down activities begin. A link to the dual reporting in the SATOD publication series can be found in the links below. Information on how all measures are constructed can be found in the HES Metadata and MSDS Metadata files provided below. In this publication we have also included an interactive Power BI dashboard to enable users to explore key NHS Maternity Statistics measures. The purpose of this publication is to inform and support strategic and policy-led processes for the benefit of patient care. This report will also be of interest to researchers, journalists and members of the public interested in NHS hospital activity in England. Any feedback on this publication or dashboard can be provided to enquiries@nhsdigital.nhs.uk, under the subject “NHS Maternity Statistics”.
The English Longitudinal Study of Ageing (ELSA) is a longitudinal survey of ageing and quality of life among older people that explores the dynamic relationships between health and functioning, social networks and participation, and economic position as people plan for, move into and progress beyond retirement. The main objectives of ELSA are to:
Further information may be found on the "https://www.elsa-project.ac.uk/"> ELSA project website, the or Natcen Social Research: ELSA web pages.
Wave 11 data has been deposited - May 2025
For the 45th edition (May 2025) ELSA Wave 11 core and pension grid data and documentation were deposited. Users should note this dataset version does not contain the survey weights. A version with the survey weights along with IFS and financial derived datasets will be deposited in due course. In the meantime, more information about the data collection or the data collected during this wave of ELSA can be found in the Wave 11 Technical Report or the User Guide.
Health conditions research with ELSA - June 2021
The ELSA Data team have found some issues with historical data measuring health conditions. If you are intending to do any analysis looking at the following health conditions, then please read the ELSA User Guide or if you still have questions contact elsadata@natcen.ac.uk for advice on how you should approach your analysis. The affected conditions are: eye conditions (glaucoma; diabetic eye disease; macular degeneration; cataract), CVD conditions (high blood pressure; angina; heart attack; Congestive Heart Failure; heart murmur; abnormal heart rhythm; diabetes; stroke; high cholesterol; other heart trouble) and chronic health conditions (chronic lung disease; asthma; arthritis; osteoporosis; cancer; Parkinson's Disease; emotional, nervous or psychiatric problems; Alzheimer's Disease; dementia; malignant blood disorder; multiple sclerosis or motor neurone disease).
For information on obtaining data from ELSA that are not held at the UKDS, see the ELSA Genetic data access and Accessing ELSA data webpages.
Wave 10 Health data
Users should note that in Wave 10, the health section of the ELSA questionnaire has been revised and all respondents were asked anew about their health conditions, rather than following the prior approach of asking those who had taken part in the past waves to confirm previously recorded conditions. Due to this reason, the health conditions feed-forward data was not archived for Wave 10, as was done in previous waves.
Harmonized dataset:
Users of the Harmonized dataset who prefer to use the Stata version will need access to Stata MP software, as the version G3 file contains 11,779 variables (the limit for the standard Stata 'Intercooled' version is 2,047).
ELSA COVID-19 study:
A separate ad-hoc study conducted with ELSA respondents, measuring the socio-economic effects/psychological impact of the lockdown on the aged 50+ population of England, is also available under SN 8688,
English Longitudinal Study of Ageing COVID-19 Study.
The National Child Development Study (NCDS) is a continuing longitudinal study that seeks to follow the lives of all those living in Great Britain who were born in one particular week in 1958. The aim of the study is to improve understanding of the factors affecting human development over the whole lifespan.
The NCDS has its origins in the Perinatal Mortality Survey (PMS) (the original PMS study is held at the UK Data Archive under SN 2137). This study was sponsored by the National Birthday Trust Fund and designed to examine the social and obstetric factors associated with stillbirth and death in early infancy among the 17,000 children born in England, Scotland and Wales in that one week. Selected data from the PMS form NCDS sweep 0, held alongside NCDS sweeps 1-3, under SN 5565.
Survey and Biomeasures Data (GN 33004):
To date there have been nine attempts to trace all members of the birth cohort in order to monitor their physical, educational and social development. The first three sweeps were carried out by the National Children's Bureau, in 1965, when respondents were aged 7, in 1969, aged 11, and in 1974, aged 16 (these sweeps form NCDS1-3, held together with NCDS0 under SN 5565). The fourth sweep, also carried out by the National Children's Bureau, was conducted in 1981, when respondents were aged 23 (held under SN 5566). In 1985 the NCDS moved to the Social Statistics Research Unit (SSRU) - now known as the Centre for Longitudinal Studies (CLS). The fifth sweep was carried out in 1991, when respondents were aged 33 (held under SN 5567). For the sixth sweep, conducted in 1999-2000, when respondents were aged 42 (NCDS6, held under SN 5578), fieldwork was combined with the 1999-2000 wave of the 1970 Birth Cohort Study (BCS70), which was also conducted by CLS (and held under GN 33229). The seventh sweep was conducted in 2004-2005 when the respondents were aged 46 (held under SN 5579), the eighth sweep was conducted in 2008-2009 when respondents were aged 50 (held under SN 6137) and the ninth sweep was conducted in 2013 when respondents were aged 55 (held under SN 7669).
Four separate datasets covering responses to NCDS over all sweeps are available. National Child Development Deaths Dataset: Special Licence Access (SN 7717) covers deaths; National Child Development Study Response and Outcomes Dataset (SN 5560) covers all other responses and outcomes; National Child Development Study: Partnership Histories (SN 6940) includes data on live-in relationships; and National Child Development Study: Activity Histories (SN 6942) covers work and non-work activities. Users are advised to order these studies alongside the other waves of NCDS.
From 2002-2004, a Biomedical Survey was completed and is available under End User Licence (EUL) (SN 8731) and Special Licence (SL) (SN 5594). Proteomics analyses of blood samples are available under SL SN 9254.
Linked Geographical Data (GN 33497):
A number of geographical variables are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies.
Linked Administrative Data (GN 33396):
A number of linked administrative datasets are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies. These include a Deaths dataset (SN 7717) available under SL and the Linked Health Administrative Datasets (SN 8697) available under Secure Access.
Additional Sub-Studies (GN 33562):
In addition to the main NCDS sweeps, further studies have also been conducted on a range of subjects such as parent migration, unemployment, behavioural studies and respondent essays. The full list of NCDS studies available from the UK Data Service can be found on the NCDS series access data webpage.
How to access genetic and/or bio-medical sample data from a range of longitudinal surveys:
For information on how to access biomedical data from NCDS that are not held at the UKDS, see the CLS Genetic data and biological samples webpage.
Further information about the full NCDS series can be found on the Centre for Longitudinal Studies website.
The National Child Development Study: Biomedical Survey 2002-2004 was funded under the Medical Research Council 'Health of the Public' initiative, and was carried out in 2002-2004 in collaboration with the Institute of Child Health, St George's Hospital Medical School, and NatCen. The survey was designed to obtain objective measures of ill-health and biomedical risk factors in order to address a wide range of specific hypotheses relating to anthropometry: cardiovascular, respiratory and allergic diseases; visual and hearing impairment; and mental ill-health.
The majority of the biomedical data (1,064 variables) are now available under EUL (SN 8731), with some data considered sensitive still available under Special Licence (SN 5594). This decision was the result of the CLS's disclosure assessment of each variable and the broad aim to make as much data available with the lowest possible barriers. Information about the medication taken by the cohort members of the study is also available under EUL for the first time. These data were collected in 2002-2004, but they were never released via the UKDS.
The Special Licence dataset contains 122 variables including new data on child adversity not previously released, as well as a number of original variables that were previously available under Special Licence due to their sensitive nature, such as Clinical Interview Schedule-Revised (CIS-R) specific questions on mental health and questions which contain categories with small frequencies related to personal details such as skin colour, pregnancy, a surgical operation, specific height and unusual high number of children.
For the second edition (December 2020), the data and documentation have been revised. Previously unreleased variables on child adversity have been added and some variables removed as they are now available under EUL. Users are advised to download the EUL version (SN 8731) before deciding to apply for the Special Licence version.
The Reference Observatory of Basins for INternational hydrological climate change detection (ROBIN) dataset is a global hydrological dataset containing publicly available daily flow data for 2,386 gauging stations across the globe which have natural or near-natural catchments. Metadata is also provided alongside these stations for the Full ROBIN Dataset consisting of 3,060 gauging stations. Data were quality controlled by the central ROBIN team before being added to the dataset, and two levels of data quality are applied to guide users towards appropriate the data usage. Most records have data of at least 40 years with minimal missing data with data records starting in the late 19th Century for some sites through to 2022. ROBIN represents a significant advance in global-scale, accessible streamflow data. The project was funded the UK Natural Environment Research Council Global Partnership Seedcorn Fund - NE/W004038/1 and the NC-International programme [NE/X006247/1] delivering National Capability
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Provisional counts of the number of deaths registered in England and Wales, by age, sex, region and Index of Multiple Deprivation (IMD), in the latest weeks for which data are available.
https://eidc.ceh.ac.uk/licences/ogl-ukbms/plainhttps://eidc.ceh.ac.uk/licences/ogl-ukbms/plain
This dataset provides the details of all sites on which butterflies have been monitored as part of the UK Butterfly Monitoring Scheme (UKBMS). This includes all standard UKBMS transect sites, Wider Countryside Butterfly Survey (WCBS) sites and targeted species survey sites (timed counts, single-species transects, larval web and egg counts). Data includes the location within the UK, the length and number of sections for the butterfly transect on each site and the number of years each transect has been monitored. The location of some sites are sensitive and are excluded from this dataset. Users requiring access to the complete dataset, including sensitive site location data, can submit a request via the UKBMS website. The UKBMS started in 1976 with fewer than 50 sites. Since then the number of sites monitored each year has increased to several thousand. There are new sites each year and a small number where the transect is no longer surveyed. Details of this are provided in the site dataset in the form of the first and last year in which each site was surveyed. The majority of site data is provided by recorders at the time a transect is created. Site data are crucial in order to determine where extra recording effort is required and to investigate where butterfly populations are changing most and thus where conservation should be targeted, including across different habitat types. The UK Butterfly Monitoring Scheme is organised and funded by Butterfly Conservation (BC), the UK Centre for Ecology & Hydrology (UKCEH), the British Trust for Ornithology (BTO), and the Joint Nature Conservation Committee (JNCC). The UKBMS is indebted to all volunteers who contribute data to the scheme. This work was supported by the Natural Environment Research Council award number NE/R016429/1 as part of the UK-SCAPE programme delivering National Capability.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset was compiled for the Regional Seabed Monitoring Plan (RSMP) baseline assessment reported in Cooper & Barry (2017).
The dataset comprises of 33,198 macrofaunal samples (83% with associated data on sediment particle size composition) covering large parts of the UK continental shelf. Whilst most samples come from existing datasets, also included are 2,500 new samples collected specifically for the purpose of this study. These new samples were collected during 2014-2016 from the main English aggregate dredging regions (Humber, Anglian, Thames, Eastern English Channel and South Coast) and at four individual, isolated extraction sites where the RSMP methodology is also being adopted (e.g. Area 457, North-West dredging region; Area 392, North-West dredging region; Area 376, Bristol Channel dredging region; Goodwin Sands, English Channel). This work was funded by the aggregates industry, and carried out by contractors on their behalf. Samples were collected in accordance with a detailed protocols document which included control measures to ensure the quality of faunal and sediment sample processing. Additional samples were acquired to fill in gaps in spatial coverage and to provide a contemporary baseline for sediment composition.
Sources of existing data include both government and industry, with contributions from the marine aggregate dredging, offshore wind, oil and gas, nuclear and port and harbour sectors. Samples have been collected over a period of 48 years from 1969 to 2016, although the vast majority (96%) were acquired since 2000. Samples have been collected during every month of the year, although there is a clear peak during summer months when weather conditions are generally more favourable for fieldwork.
The DOI includes multiple files for use with the R script that accompanies the paper: Cooper, K. M. & Barry, J. A big data approach to macrofaunal baseline assessment, monitoring and sustainable exploitation of the seabed. Scientific Reports 7, doi: 10.1038/s41598-017-11377-9 (2017). Files include:
*At the request of data owners, macrofaunal abundance and sediment particle size data have been redacted from 13 of the 777 surveys (1.7%) in the dataset. Note that metadata and derived variables are still included. Surveys with redacted data include:
SurveyName
Cefas will only make redacted data available where the data requester can provide written permission from the relevant data owner(s) - see below. Note that it is the responsibility of the data requester to seek permission from the relevant data owners.
Data owners for the redacted surveys listed above are:
Description of the C5922DATASET13022017.csv/ C5922DATASET13022017REDACTED.csv (Raw data)
A variety of gear types have been used for sample collection including grabs (0.1m2 Hamon, 0.2m2 Hamon, 0.1m2 Day, 0.1m2 Van Veen and 0.1m2 Smith McIntrye) and cores. Of these various devices, 93% of samples were acquired using either a 0.1m2 Hamon grab or a 0.1m2 Day grab. Sieve sizes used in sample processing include 1mm and 0.5mm, reflecting the conventional preference for 1mm offshore and 0.5mm inshore (see Figure 2). Of the samples collected using either a 0.1m2 Hamon grab or a 0.1m2 Day grab, 88% were processed using a 1mm sieve.
Taxon names were standardised according to the WoRMS (World Register of Marine Species) list using the Taxon Match Tool (http://www.marinespecies.org/aphia.php?p=match). Of the initial 13,449 taxon names, only 4,248 remained after correction. The output from this tool also provides taxonomic aggregation information, allowing data to be analysed at different taxonomic levels - from species to phyla. The final dataset comprises of a single sheet comma-separated values (.csv) file. Colonials accounted for less than 20% of the total number of taxa and, where present, were given a value of 1 in the dataset. This component of the fauna was missing from 325 out of the 777 surveys, reflecting either a true absence, or simply that colonial taxa were ignored by the analyst. Sediment particle size data were provided as percentage weight by sieve mesh size, with the dataset including 99 different sieve sizes. Sediment samples have been processed using sieve, and a combination of sieve and laser diffraction techniques. Key metadata fields include: Sample coordinates (Latitude & Longitude), Survey Name, Gear, Date, Grab Sample Volume (litres) and Water Depth (m). A number of additional explanatory variables are also provided (salinity, temperature, chlorophyll a, Suspended particulate matter, Water depth, Wave Orbital Velocity, Average Current, Bed Stress). In total, the dataset dimensions are 33,198 rows (samples) x 13,588 columns (variables/factors), yielding a matrix of 451,094,424 individual data values.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):
Label Data type Description
isogramy int The order of isogramy, e.g. "2" is a second order isogram
length int The length of the word in letters
word text The actual word/isogram in ASCII
source_pos text The Part of Speech tag from the original corpus
count int Token count (total number of occurences)
vol_count int Volume count (number of different sources which contain the word)
count_per_million int Token count per million words
vol_count_as_percent int Volume count as percentage of the total number of volumes
is_palindrome bool Whether the word is a palindrome (1) or not (0)
is_tautonym bool Whether the word is a tautonym (1) or not (0)
The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:
Label
Data type
Description
!total_1grams
int
The total number of words in the corpus
!total_volumes
int
The total number of volumes (individual sources) in the corpus
!total_isograms
int
The total number of isograms found in the corpus (before compacting)
!total_palindromes
int
How many of the isograms found are palindromes
!total_tautonyms
int
How many of the isograms found are tautonyms
The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.
What the general public thinks about crime and punishment is a vexed question. In an effort to bring systematic data to bear on this question, I have assembled the largest compilation of aggregated survey data on attitudes to crime and punishment in England and Wales to date. The dataset contains 1,190 question-year pairs, which track popular attitudes across four areas: (i) Crime concern 1965-2023, (ii) Punitiveness 1981-2023, (iii) Support for the death penalty 1962-2023, and (iv) Prioritisation of crime/law-and-order as a social issue 1973-2023.
For example, in 2014, 58% of respondents to the British Election Studies Internet Panel thought that the level of crime was increasing. By 2019, this number had increased to 83%, and by 2023 it had fallen back to 77%. For 16-24 year olds, the numbers are 38%, 69% and 65%.
Harmonised latent trends for each area can be derived from the aggregated survey data using Stimson’s (2018) Dyad Ratio Algorithm for different demographic groups using the R script below.
Frog data from the UK Environmental Change Network (ECN) terrestrial sites. Variables measured include phenology (i.e. the dates when frogs start congregating, spawning, when hatching occurs and when the frogs leave), number of spawn masses, total surface area covered by spawn, percentage of dead spawn, depth, pH, conductivity, alkalinity, aluminium, calcium, chloride, ammonium, nitrate nitrogen, phosphate phosphorous, potassium, sulphate sulphur, sodium, total nitrogen and total dissolved phosphorous. These data are collected at ECN's terrestrial sites using a standard protocol. They represent continuous records from 1994 to 2015. ECN is the UK's long-term environmental monitoring programme. It is a multi-agency programme sponsored by a consortium of fourteen government departments and agencies. These organisations contribute to the programme through funding either site monitoring and/or network co-ordination activities. These organisations are: Agri-Food and Biosciences Institute, Biotechnology and Biological Sciences Research Council, Cyfoeth Naturiol Cymru - Natural Resources Wales, Defence Science & Technology Laboratory, Department for Environment, Food and Rural Affairs, Environment Agency, Forestry Commission, Llywodraeth Cymru - Welsh Government, Natural England, Natural Environment Research Council, Northern Ireland Environment Agency, Scottish Environment Protection Agency, Scottish Government and Scottish Natural Heritage.
Revision
Finalised data on government support for buses was not available when these statistics were originally published (27 November 2024). The Ministry of Housing, Communities and Local Government (MHCLG) have since published that data so the following have been revised to include it:
Revision
The following figures relating to local bus passenger journeys per head have been revised:
Table BUS01f provides figures on passenger journeys per head of population at Local Transport Authority (LTA) level. Population data for 21 counties were duplicated in error, resulting in the halving of figures in this table. This issue does not affect any other figures in the published tables, including the regional and national breakdowns.
The affected LTAs were: Cambridgeshire, Derbyshire, Devon, East Sussex, Essex, Gloucestershire, Hampshire, Hertfordshire, Kent, Lancashire, Leicestershire, Lincolnshire, Norfolk, Nottinghamshire, Oxfordshire, Staffordshire, Suffolk, Surrey, Warwickshire, West Sussex, and Worcestershire.
A minor typo in the units was also corrected in the BUS02_mi spreadsheet.
A full list of tables can be found in the table index.
BUS0415: https://assets.publishing.service.gov.uk/media/6852b8d399b009dcdcb73612/bus0415.ods">Local bus fares index by metropolitan area status and country, quarterly: Great Britain (ODS, 35.4 KB)
This spreadsheet includes breakdowns by country, region, metropolitan area status, urban-rural classification and Local Authority. It also includes data per head of population, and concessionary journeys.
BUS01: https://assets.publishing.service.gov.uk/media/67603526239b9237f0915411/bus01.ods"> Local bus passenger journeys (ODS, 145 KB)
Limited historic data is available
These spreadsheets include breakdowns by country, region, metropolitan area status, urban-rural classification and Local Authority, as well as by service type. Vehicle distance travelled is a measure of levels of service provision.
BUS02_mi: https://assets.publishing.service.gov.uk/media/6760353198302e574b91540c/bus02_mi.ods">Vehicle distance travelled (miles) (ODS, 117 KB)
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Findings from the Coronavirus (COVID-19) Infection Survey for England.