Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Findings from the Coronavirus (COVID-19) Infection Survey for England.
On 1 April 2025 responsibility for fire and rescue transferred from the Home Office to the Ministry of Housing, Communities and Local Government.
This information covers fires, false alarms and other incidents attended by fire crews, and the statistics include the numbers of incidents, fires, fatalities and casualties as well as information on response times to fires. The Ministry of Housing, Communities and Local Government (MHCLG) also collect information on the workforce, fire prevention work, health and safety and firefighter pensions. All data tables on fire statistics are below.
MHCLG has responsibility for fire services in England. The vast majority of data tables produced by the Ministry of Housing, Communities and Local Government are for England but some (0101, 0103, 0201, 0501, 1401) tables are for Great Britain split by nation. In the past the Department for Communities and Local Government (who previously had responsibility for fire services in England) produced data tables for Great Britain and at times the UK. Similar information for devolved administrations are available at https://www.firescotland.gov.uk/about/statistics/" class="govuk-link">Scotland: Fire and Rescue Statistics, https://statswales.gov.wales/Catalogue/Community-Safety-and-Social-Inclusion/Community-Safety" class="govuk-link">Wales: Community safety and https://www.nifrs.org/home/about-us/publications/" class="govuk-link">Northern Ireland: Fire and Rescue Statistics.
If you use assistive technology (for example, a screen reader) and need a version of any of these documents in a more accessible format, please email alternativeformats@communities.gov.uk. Please tell us what format you need. It will help us if you say what assistive technology you use.
Fire statistics guidance
Fire statistics incident level datasets
https://assets.publishing.service.gov.uk/media/686d2aa22557debd867cbe14/FIRE0101.xlsx">FIRE0101: Incidents attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 153 KB) Previous FIRE0101 tables
https://assets.publishing.service.gov.uk/media/686d2ab52557debd867cbe15/FIRE0102.xlsx">FIRE0102: Incidents attended by fire and rescue services in England, by incident type and fire and rescue authority (MS Excel Spreadsheet, 2.19 MB) Previous FIRE0102 tables
https://assets.publishing.service.gov.uk/media/686d2aca10d550c668de3c69/FIRE0103.xlsx">FIRE0103: Fires attended by fire and rescue services by nation and population (MS Excel Spreadsheet, 201 KB) Previous FIRE0103 tables
https://assets.publishing.service.gov.uk/media/686d2ad92557debd867cbe16/FIRE0104.xlsx">FIRE0104: Fire false alarms by reason for false alarm, England (MS Excel Spreadsheet, 492 KB) Previous FIRE0104 tables
https://assets.publishing.service.gov.uk/media/686d2af42cfe301b5fb6789f/FIRE0201.xlsx">FIRE0201: Dwelling fires attended by fire and rescue services by motive, population and nation (MS Excel Spreadsheet, <span class="gem-c-attac
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Headline estimates for England, Wales, Northern Ireland and Scotland.
Data tables containing aggregated information about vehicles in the UK are also available.
A number of changes were introduced to these data files in the 2022 release to help meet the needs of our users and to provide more detail.
Fuel type has been added to:
Historic UK data has been added to:
A new datafile has been added df_VEH0520.
We welcome any feedback on the structure of our data files, their usability, or any suggestions for improvements; please contact vehicles statistics.
CSV files can be used either as a spreadsheet (using Microsoft Excel or similar spreadsheet packages) or digitally using software packages and languages (for example, R or Python).
When using as a spreadsheet, there will be no formatting, but the file can still be explored like our publication tables. Due to their size, older software might not be able to open the entire file.
df_VEH0120_GB: https://assets.publishing.service.gov.uk/media/68494aca74fe8fe0cbb4676c/df_VEH0120_GB.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: Great Britain (CSV, 58.1 MB)
Scope: All registered vehicles in Great Britain; from 1994 Quarter 4 (end December)
Schema: BodyType, Make, GenModel, Model, Fuel, LicenceStatus, [number of vehicles; 1 column per quarter]
df_VEH0120_UK: https://assets.publishing.service.gov.uk/media/68494acb782e42a839d3a3ac/df_VEH0120_UK.csv">Vehicles at the end of the quarter by licence status, body type, make, generic model and model: United Kingdom (CSV, 34.1 MB)
Scope: All registered vehicles in the United Kingdom; from 2014 Quarter 3 (end September)
Schema: BodyType, Make, GenModel, Model, Fuel, LicenceStatus, [number of vehicles; 1 column per quarter]
df_VEH0160_GB: https://assets.publishing.service.gov.uk/media/68494ad774fe8fe0cbb4676d/df_VEH0160_GB.csv">Vehicles registered for the first time by body type, make, generic model and model: Great Britain (CSV, 24.8 MB)
Scope: All vehicles registered for the first time in Great Britain; from 2001 Quarter 1 (January to March)
Schema: BodyType, Make, GenModel, Model, Fuel, [number of vehicles; 1 column per quarter]
df_VEH0160_UK: https://assets.publishing.service.gov.uk/media/68494ad7aae47e0d6c06e078/df_VEH0160_UK.csv">Vehicles registered for the first time by body type, make, generic model and model: United Kingdom (CSV, 8.26 MB)
Scope: All vehicles registered for the first time in the United Kingdom; from 2014 Quarter 3 (July to September)
Schema: BodyType, Make, GenModel, Model, Fuel, [number of vehicles; 1 column per quarter]
In order to keep the datafile df_VEH0124 to a reasonable size, it has been split into 2 halves; 1 covering makes starting with A to M, and the other covering makes starting with N to Z.
df_VEH0124_AM: <a class="govuk-link" href="https://assets.
Data files containing detailed information about vehicles in the UK are also available, including make and model data.
Some tables have been withdrawn and replaced. The table index for this statistical series has been updated to provide a full map between the old and new numbering systems used in this page.
Tables VEH0101 and VEH1104 have not yet been revised to include the recent changes to Large Goods Vehicles (LGV) and Heavy Goods Vehicles (HGV) definitions for data earlier than 2023 quarter 4. This will be amended as soon as possible.
Overview
VEH0101: https://assets.publishing.service.gov.uk/media/6846e8dc57f3515d9611f119/veh0101.ods">Vehicles at the end of the quarter by licence status and body type: Great Britain and United Kingdom (ODS, 151 KB)
Detailed breakdowns
VEH0103: https://assets.publishing.service.gov.uk/media/6846e8dcd25e6f6afd4c01d5/veh0103.ods">Licensed vehicles at the end of the year by tax class: Great Britain and United Kingdom (ODS, 33 KB)
VEH0105: https://assets.publishing.service.gov.uk/media/6846e8dd57f3515d9611f11a/veh0105.ods">Licensed vehicles at the end of the quarter by body type, fuel type, keepership (private and company) and upper and lower tier local authority: Great Britain and United Kingdom (ODS, 16.3 MB)
VEH0206: https://assets.publishing.service.gov.uk/media/6846e8dee5a089417c806179/veh0206.ods">Licensed cars at the end of the year by VED band and carbon dioxide (CO2) emissions: Great Britain and United Kingdom (ODS, 42.3 KB)
VEH0601: https://assets.publishing.service.gov.uk/media/6846e8df5e92539572806176/veh0601.ods">Licensed buses and coaches at the end of the year by body type detail: Great Britain and United Kingdom (ODS, 24.6 KB)
VEH1102: https://assets.publishing.service.gov.uk/media/6846e8e0e5a089417c80617b/veh1102.ods">Licensed vehicles at the end of the year by body type and keepership (private and company): Great Britain and United Kingdom (ODS, 146 KB)
VEH1103: https://assets.publishing.service.gov.uk/media/6846e8e0e5a089417c80617c/veh1103.ods">Licensed vehicles at the end of the quarter by body type and fuel type: Great Britain and United Kingdom (ODS, 992 KB)
VEH1104: https://assets.publishing.service.gov.uk/media/6846e8e15e92539572806177/veh1104.ods">Licensed vehicles at the end of the
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository stores synthetic datasets derived from the database of the UK Biobank (UKB) cohort.
The datasets were generated for illustrative purposes, in particular for reproducing specific analyses on the health risks associated with long-term exposure to air pollution using the UKB cohort. The code used to create the synthetic datasets is available and documented in a related GitHub repo, with details provided in the section below. These datasets can be freely used for code testing and for illustrating other examples of analyses on the UKB cohort.
Note: while the synthetic versions of the datasets resemble the real ones in several aspects, the users should be aware that these data are fake and must not be used for testing and making inferences on specific research hypotheses. Even more importantly, these data cannot be considered a reliable description of the original UKB data, and they must not be presented as such.
The original datasets are described in the article by Vanoli et al in Epidemiology (2024) (DOI: 10.1097/EDE.0000000000001796) [freely available here], which also provides information about the data sources.
The work was supported by the Medical Research Council-UK (Grant ID: MR/Y003330/1).
Content
The series of synthetic datasets (stored in two versions with csv and RDS formats) are the following:
synthbdcohortinfo: basic cohort information regarding the follow-up period and birth/death dates for 502,360 participants.
synthbdbasevar: baseline variables, mostly collected at recruitment.
synthpmdata: annual average exposure to PM2.5 for each participant reconstructed using their residential history.
synthoutdeath: death records that occurred during the follow-up with date and ICD-10 code.
In addition, this repository provides these additional files:
codebook: a pdf file with a codebook for the variables of the various datasets, including references to the fields of the original UKB database.
asscentre: a csv file with information on the assessment centres used for recruitment of the UKB participants, including code, names, and location (as northing/easting coordinates of the British National Grid).
Countries_December_2022_GB_BUC: a zip file including the shapefile defining the boundaries of the countries in Great Britain (England, Wales, and Scotland), used for mapping purposes [source].
Generation of the synthetic data
The datasets resemble the real data used in the analysis, and they were generated using the R package synthpop (www.synthpop.org.uk). The generation process involves two steps, namely the synthesis of the main data (cohort info, baseline variables, annual PM2.5 exposure) and then the sampling of death events. The R scripts for performing the data synthesis are provided in the GitHub repo (subfolder Rcode/synthcode).
The first part merges all the data including the annual PM2.5 levels in a single wide-format dataset (with a row for each subject), generates a synthetic version, adds fake IDs, and then extracts (and reshapes) the single datasets. In the second part, a Cox proportional hazard model is fitted on the original data to estimate risks associated with various predictors (including the main exposure represented by PM2.5), and then these relationships are used to simulate death events in each year. Details on the modelling aspects are provided in the article.
This process guarantees that the synthetic data do not hold specific information about the original records, thus preserving confidentiality. At the same time, the multivariate distribution and correlation across variables as well as the mortality risks resemble those of the original data, so the results of descriptive and inferential analyses are similar to those in the original assessments. However, as noted above, the data are used only for illustrative purposes, and they must not be used to test other research hypotheses.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Daily Coronavirus (Covid-19) positive tests in Leicester City Council and surrounding districts.Data for the most recent 4-5 days is likely to be incomplete.Please note automatic updates to this dataset were discontinued on 12th December 2023.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Provisional counts of the number of deaths registered in England and Wales, by age, sex, region and Index of Multiple Deprivation (IMD), in the latest weeks for which data are available.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
National and subnational mid-year population estimates for the UK and its constituent countries by administrative area, age and sex (including components of population change, median age and population density).
Accessible Tables and Improved Quality
As part of the Analysis Function Reproducible Analytical Pipeline Strategy, processes to create all National Travel Survey (NTS) statistics tables have been improved to follow the principles of Reproducible Analytical Pipelines (RAP). This has resulted in improved efficiency and quality of NTS tables and therefore some historical estimates have seen very minor change, at least the fifth decimal place.
All NTS tables have also been redesigned in an accessible format where they can be used by as many people as possible, including people with an impaired vision, motor difficulties, cognitive impairments or learning disabilities and deafness or impaired hearing.
If you wish to provide feedback on these changes then please email national.travelsurvey@dft.gov.uk.
Revision to table NTS9919
On the 16th April 2025, the figures in table NTS9919 have been revised and recalculated to include only day 1 of the travel diary where short walks of less than a mile are recorded (from 2017 onwards), whereas previous versions included all days. This is to more accurately capture the proportion of trips which include short walks before a surface rail stage. This revision has resulted in fewer available breakdowns than previously published due to the smaller sample sizes.
NTS0303: https://assets.publishing.service.gov.uk/media/66ce0f118e33f28aae7e1f75/nts0303.ods">Average number of trips, stages, miles and time spent travelling by mode: England, 2002 onwards (ODS, 53.9 KB)
NTS0308: https://assets.publishing.service.gov.uk/media/66ce0f128e33f28aae7e1f76/nts0308.ods">Average number of trips and distance travelled by trip length and main mode; England, 2002 onwards (ODS, 191 KB)
NTS0312: https://assets.publishing.service.gov.uk/media/66ce0f12bc00d93a0c7e1f71/nts0312.ods">Walks of 20 minutes or more by age and frequency: England, 2002 onwards (ODS, 35.1 KB)
NTS0313: https://assets.publishing.service.gov.uk/media/66ce0f12bc00d93a0c7e1f72/nts0313.ods">Frequency of use of different transport modes: England, 2003 onwards (ODS, 27.1 KB)
NTS0412: https://assets.publishing.service.gov.uk/media/66ce0f1325c035a11941f653/nts0412.ods">Commuter trips and distance by employment status and main mode: England, 2002 onwards (ODS, 53.8 KB)
NTS0504: https://assets.publishing.service.gov.uk/media/66ce0f141aaf41b21139cf7d/nts0504.ods">Average number of trips by day of the week or month and purpose or main mode: England, 2002 onwards (ODS, 141 KB)
<h2 id=
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset containing the images and labels for the MultNIST data used in the CVPR NAS workshop Unseen-data challenge under the codename "Mateo"The MultNIST dataset is a constructed dataset from MNIST Images. The intention of this dataset is to require machine learning models to do more than just image classification but also perform a calculation, in this case multiplaction followed by a mod operation. For each image, three MNIST Images were randomly chosen and combined together through the colour channels, resulting in a three colour-channel image so each MNIST image represents one colour channel. The data is in a channels-first format with a shape of (n, 3, 28, 28) where n is the number of samples in the corresponding set (50,000 for training, 10,000 for validation, and 10,000 for testing).There are ten classes in the dataset, with 7,000 examples of each, distributed evenly between the three subsets.The label of each image is generated using the formula "(r * b * g) % 10" where r, g, and b are the red, green, and blue colour channels respectively. An example of a MultNIST Image would be a rgb configuation of 3, 7, and 4 respectively, which would result in a label of 4 ((3 * 7 * 4) % 10).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):
Label Data type Description
isogramy int The order of isogramy, e.g. "2" is a second order isogram
length int The length of the word in letters
word text The actual word/isogram in ASCII
source_pos text The Part of Speech tag from the original corpus
count int Token count (total number of occurences)
vol_count int Volume count (number of different sources which contain the word)
count_per_million int Token count per million words
vol_count_as_percent int Volume count as percentage of the total number of volumes
is_palindrome bool Whether the word is a palindrome (1) or not (0)
is_tautonym bool Whether the word is a tautonym (1) or not (0)
The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:
Label
Data type
Description
!total_1grams
int
The total number of words in the corpus
!total_volumes
int
The total number of volumes (individual sources) in the corpus
!total_isograms
int
The total number of isograms found in the corpus (before compacting)
!total_palindromes
int
How many of the isograms found are palindromes
!total_tautonyms
int
How many of the isograms found are tautonyms
The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate in the United Kingdom was last recorded at 4.25 percent. This dataset provides - United Kingdom Interest Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset was compiled for the Regional Seabed Monitoring Plan (RSMP) baseline assessment reported in Cooper & Barry (2017).
The dataset comprises of 33,198 macrofaunal samples (83% with associated data on sediment particle size composition) covering large parts of the UK continental shelf. Whilst most samples come from existing datasets, also included are 2,500 new samples collected specifically for the purpose of this study. These new samples were collected during 2014-2016 from the main English aggregate dredging regions (Humber, Anglian, Thames, Eastern English Channel and South Coast) and at four individual, isolated extraction sites where the RSMP methodology is also being adopted (e.g. Area 457, North-West dredging region; Area 392, North-West dredging region; Area 376, Bristol Channel dredging region; Goodwin Sands, English Channel). This work was funded by the aggregates industry, and carried out by contractors on their behalf. Samples were collected in accordance with a detailed protocols document which included control measures to ensure the quality of faunal and sediment sample processing. Additional samples were acquired to fill in gaps in spatial coverage and to provide a contemporary baseline for sediment composition.
Sources of existing data include both government and industry, with contributions from the marine aggregate dredging, offshore wind, oil and gas, nuclear and port and harbour sectors. Samples have been collected over a period of 48 years from 1969 to 2016, although the vast majority (96%) were acquired since 2000. Samples have been collected during every month of the year, although there is a clear peak during summer months when weather conditions are generally more favourable for fieldwork.
The DOI includes multiple files for use with the R script that accompanies the paper: Cooper, K. M. & Barry, J. A big data approach to macrofaunal baseline assessment, monitoring and sustainable exploitation of the seabed. Scientific Reports 7, doi: 10.1038/s41598-017-11377-9 (2017). Files include:
*At the request of data owners, macrofaunal abundance and sediment particle size data have been redacted from 13 of the 777 surveys (1.7%) in the dataset. Note that metadata and derived variables are still included. Surveys with redacted data include:
SurveyName
Cefas will only make redacted data available where the data requester can provide written permission from the relevant data owner(s) - see below. Note that it is the responsibility of the data requester to seek permission from the relevant data owners.
Data owners for the redacted surveys listed above are:
Description of the C5922DATASET13022017.csv/ C5922DATASET13022017REDACTED.csv (Raw data)
A variety of gear types have been used for sample collection including grabs (0.1m2 Hamon, 0.2m2 Hamon, 0.1m2 Day, 0.1m2 Van Veen and 0.1m2 Smith McIntrye) and cores. Of these various devices, 93% of samples were acquired using either a 0.1m2 Hamon grab or a 0.1m2 Day grab. Sieve sizes used in sample processing include 1mm and 0.5mm, reflecting the conventional preference for 1mm offshore and 0.5mm inshore (see Figure 2). Of the samples collected using either a 0.1m2 Hamon grab or a 0.1m2 Day grab, 88% were processed using a 1mm sieve.
Taxon names were standardised according to the WoRMS (World Register of Marine Species) list using the Taxon Match Tool (http://www.marinespecies.org/aphia.php?p=match). Of the initial 13,449 taxon names, only 4,248 remained after correction. The output from this tool also provides taxonomic aggregation information, allowing data to be analysed at different taxonomic levels - from species to phyla. The final dataset comprises of a single sheet comma-separated values (.csv) file. Colonials accounted for less than 20% of the total number of taxa and, where present, were given a value of 1 in the dataset. This component of the fauna was missing from 325 out of the 777 surveys, reflecting either a true absence, or simply that colonial taxa were ignored by the analyst. Sediment particle size data were provided as percentage weight by sieve mesh size, with the dataset including 99 different sieve sizes. Sediment samples have been processed using sieve, and a combination of sieve and laser diffraction techniques. Key metadata fields include: Sample coordinates (Latitude & Longitude), Survey Name, Gear, Date, Grab Sample Volume (litres) and Water Depth (m). A number of additional explanatory variables are also provided (salinity, temperature, chlorophyll a, Suspended particulate matter, Water depth, Wave Orbital Velocity, Average Current, Bed Stress). In total, the dataset dimensions are 33,198 rows (samples) x 13,588 columns (variables/factors), yielding a matrix of 451,094,424 individual data values.
These historical datasets were created by the ESRC funded project, ‘Victims' access to justice through English criminal courts, 1675 to the present’ , carried out between 2018 and 2022. This interdisciplinary project examined public access to justice in England over more than three centuries - from the 1670s to the present. Bringing together leading criminologists and crime historians, it assembled and analysed data on over 235,000 victims involved in trials at the Old Bailey in London in order to understand the rights of, and resources and services available to, victims in the past, present and future. It constructed a new evidence base to establish who these victims were, how they came to be complainants, prosecutors or witnesses, and how they made use of available legal and financial resources. The project director was Pamela Cox and the other co-investigators were Barry Godfrey, Ruth Lamont, Robert Shoemaker, Heather Shore, and Sandra Walklate. Lucy Williams and Elisa Impara were research officers. The results of the data analysis were reported in a monograph: Pamela Cox, Robert Shoemaker and Heather Shore, Victims and Criminal Justice: A History (Oxford University Press, 2023). An additional contemporary dataset was created for the project using aggregated data on crime victims from the Crime Survey for England and Wales 1982-2017. It may be accessed via the UK Data Archive.
There are six datasets in this deposit. Two contain information pertaining to all victims of crimes prosecuted at the Old Bailey between 1674 and 1913, derived from the Old Bailey Proceedings Online (all fields and clean fields); two contain information on victims manually extracted from the Times newspaper between 1910-25 and 1960-75 (all fields and clean fields); data on a sample of Trial Participants at the Old Bailey between 1805 and 1900, collected manually; and judicial statistics extracted from the Parliamentary Papers from 1805-1879.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The LScDC (Leicester Scientific Dictionary-Core Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScDC (Leicester Scientific Dictionary-Core) is formed using the updated LScD (Leicester Scientific Dictionary) - Version 3*. All steps applied to build the new version of core dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. The files provided with this description are also same as described as for LScDC Version 2. The numbers of words in the 3rd versions of LScD and LScDC are summarized below. # of wordsLScD (v3) 972,060LScDC (v3) 103,998 * Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v3 ** Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v2[Version 2] Getting StartedThis file describes a sorted and cleaned list of words from LScD (Leicester Scientific Dictionary), explains steps for sub-setting the LScD and basic statistics of words in the LSC (Leicester Scientific Corpus), to be found in [1, 2]. The LScDC (Leicester Scientific Dictionary-Core) is a list of words ordered by the number of documents containing the words, and is available in the CSV file published. There are 104,223 unique words (lemmas) in the LScDC. This dictionary is created to be used in future work on the quantification of the sense of research texts. The objective of sub-setting the LScD is to discard words which appear too rarely in the corpus. In text mining algorithms, usage of enormous number of text data brings the challenge to the performance and the accuracy of data mining applications. The performance and the accuracy of models are heavily depend on the type of words (such as stop words and content words) and the number of words in the corpus. Rare occurrence of words in a collection is not useful in discriminating texts in large corpora as rare words are likely to be non-informative signals (or noise) and redundant in the collection of texts. The selection of relevant words also holds out the possibility of more effective and faster operation of text mining algorithms.To build the LScDC, we decided the following process on LScD: removing words that appear in no more than 10 documents (
This is a full description of the quality control procedure undertaken and the derived files produced by the MRC-IEU associated with the full UK Biobank (version 3, March 2018) genetic data. This dataset supersedes the earlier version available at DOI: 10.5523/bris.3074krb6t2frj29yh2b03x3wxj Complete download (zip, 176.9 KiB)
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Files for use with the R script accompanying the paper Cooper et al. (2018). Note that this script also uses files from https://doi.org/10.14466/CefasDataHub.34 (details provided in script). Cooper, K.M., Bolam, S.G., Downie, A. Callaway, A., Barry, J. (2018). Biological-based habitat classification approaches promote cost-efficient monitoring: an example using seabed assemblages. Journal of Applied Ecology. Files include: R SCRIPT FINAL.R (R script)
C5922DATASETFAM13022017REDACTED.csv (see below for description) UKSeaMap2016_SedimentsDissClip.shp (UK Seamap data clipped to study area. These data are available from http://jncc.defra.gov.uk/ukseamap under an Open Government Licence)) StudyArea.shp (polygon for study area) FaunalCluster.tif (faunal cluster habitat map in raster format) PhysicalCluster.tif (physical cluster habitat map in raster format) FaunalClusterClip.tif (faunal cluster habitat map, clipped to study area, in raster format) PhysicalClusterClip.tif (physical cluster habitat map, clipped to study area, in raster format)
Description of C5922DATASETFAM13022017REDACTED.csv This file is based on the RSMP dataset (see https://www.cefas.co.uk/cefas-data-hub/dois/rsmp-baseline-dataset/), but with macrofaunal data output at the level of family or above. A variety of gear types have been used for sample collection including grabs (0.1m2 Hamon, 0.2m2 Hamon, 0.1m2 Day, 0.1m2 Van Veen and 0.1m2 Smith McIntrye) and cores. Of these various devices, 93% of samples were acquired using either a 0.1m2 Hamon grab or a 0.1m2 Day grab. Sieve sizes used in sample processing include 1mm and 0.5mm, reflecting the conventional preference for 1mm offshore and 0.5mm inshore (see Figure 2). Of the samples collected using either a 0.1m2 Hamon grab or a 0.1m2 Day grab, 88% were processed using a 1mm sieve. Taxon names were standardised according to the WoRMS (World Register of Marine Species) list using the Taxon Match Tool (http://www.marinespecies.org/aphia.php?p=match). Of the initial 13,449 taxon names, only 774 remained after correction and aggregation to family level. The final dataset comprises of a single sheet comma-separated values (.csv) file. Colonials accounted for less than 20% of the total number of taxa and, where present, were given a value of 1 in the dataset. This component of the fauna was missing from 325 out of the 777 surveys, reflecting either a true absence, or simply that colonial taxa were ignored by the analyst. Sediment particle size data were provided as percentage weight by sieve mesh size, with the dataset including 99 different sieve sizes. Sediment samples have been processed using sieve, and a combination of sieve and laser diffraction techniques. Key metadata fields include: Sample coordinates (Latitude & Longitude), Survey Name, Gear, Date, Grab Sample Volume (litres) and Water Depth (m). A number of additional explanatory variables are also provided (salinity, temperature, chlorophyll a, Suspended particulate matter, Water depth, Wave Orbital Velocity, Average Current, Bed Stress). In total, the dataset dimensions are 33,198 rows (samples) x 900 columns (variables/factors), yielding a matrix of 29,878,200 individual data values.
The English Longitudinal Study of Ageing (ELSA) is a longitudinal survey of ageing and quality of life among older people that explores the dynamic relationships between health and functioning, social networks and participation, and economic position as people plan for, move into and progress beyond retirement. The main objectives of ELSA are to:
Further information may be found on the "https://www.elsa-project.ac.uk/"> ELSA project website, the or Natcen Social Research: ELSA web pages.
Wave 11 data has been deposited - May 2025
For the 45th edition (May 2025) ELSA Wave 11 core and pension grid data and documentation were deposited. Users should note this dataset version does not contain the survey weights. A version with the survey weights along with IFS and financial derived datasets will be deposited in due course. In the meantime, more information about the data collection or the data collected during this wave of ELSA can be found in the Wave 11 Technical Report or the User Guide.
Health conditions research with ELSA - June 2021
The ELSA Data team have found some issues with historical data measuring health conditions. If you are intending to do any analysis looking at the following health conditions, then please read the ELSA User Guide or if you still have questions contact elsadata@natcen.ac.uk for advice on how you should approach your analysis. The affected conditions are: eye conditions (glaucoma; diabetic eye disease; macular degeneration; cataract), CVD conditions (high blood pressure; angina; heart attack; Congestive Heart Failure; heart murmur; abnormal heart rhythm; diabetes; stroke; high cholesterol; other heart trouble) and chronic health conditions (chronic lung disease; asthma; arthritis; osteoporosis; cancer; Parkinson's Disease; emotional, nervous or psychiatric problems; Alzheimer's Disease; dementia; malignant blood disorder; multiple sclerosis or motor neurone disease).
For information on obtaining data from ELSA that are not held at the UKDS, see the ELSA Genetic data access and Accessing ELSA data webpages.
Wave 10 Health data
Users should note that in Wave 10, the health section of the ELSA questionnaire has been revised and all respondents were asked anew about their health conditions, rather than following the prior approach of asking those who had taken part in the past waves to confirm previously recorded conditions. Due to this reason, the health conditions feed-forward data was not archived for Wave 10, as was done in previous waves.
Harmonized dataset:
Users of the Harmonized dataset who prefer to use the Stata version will need access to Stata MP software, as the version G3 file contains 11,779 variables (the limit for the standard Stata 'Intercooled' version is 2,047).
ELSA COVID-19 study:
A separate ad-hoc study conducted with ELSA respondents, measuring the socio-economic effects/psychological impact of the lockdown on the aged 50+ population of England, is also available under SN 8688,
English Longitudinal Study of Ageing COVID-19 Study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically