Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionThe UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names.Resultsukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata.ConclusionHaving a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository stores synthetic datasets derived from the database of the UK Biobank (UKB) cohort.
The datasets were generated for illustrative purposes, in particular for reproducing specific analyses on the health risks associated with long-term exposure to air pollution using the UKB cohort. The code used to create the synthetic datasets is available and documented in a related GitHub repo, with details provided in the section below. These datasets can be freely used for code testing and for illustrating other examples of analyses on the UKB cohort.
Note: while the synthetic versions of the datasets resemble the real ones in several aspects, the users should be aware that these data are fake and must not be used for testing and making inferences on specific research hypotheses. Even more importantly, these data cannot be considered a reliable description of the original UKB data, and they must not be presented as such.
The original datasets are described in the article by Vanoli et al in Epidemiology (2024) (DOI: 10.1097/EDE.0000000000001796) [freely available here], which also provides information about the data sources.
The work was supported by the Medical Research Council-UK (Grant ID: MR/Y003330/1).
The series of synthetic datasets (stored in two versions with csv and RDS formats) are the following:
In addition, this repository provides these additional files:
The datasets resemble the real data used in the analysis, and they were generated using the R package synthpop (www.synthpop.org.uk). The generation process involves two steps, namely the synthesis of the main data (cohort info, baseline variables, annual PM2.5 exposure) and then the sampling of death events. The R scripts for performing the data synthesis are provided in the GitHub repo (subfolder Rcode/synthcode).
The first part merges all the data including the annual PM2.5 levels in a single wide-format dataset (with a row for each subject), generates a synthetic version, adds fake IDs, and then extracts (and reshapes) the single datasets. In the second part, a Cox proportional hazard model is fitted on the original data to estimate risks associated with various predictors (including the main exposure represented by PM2.5), and then these relationships are used to simulate death events in each year. Details on the modelling aspects are provided in the article.
This process guarantees that the synthetic data do not hold specific information about the original records, thus preserving confidentiality. At the same time, the multivariate distribution and correlation across variables as well as the mortality risks resemble those of the original data, so the results of descriptive and inferential analyses are similar to those in the original assessments. However, as noted above, the data are used only for illustrative purposes, and they must not be used to test other research hypotheses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For use with UK Biobank data. v2: Change to scoring for AUDIT questionnaire. v3: Change to coding for exercise and cannabis use to accompany revised paper
This dataset contains time series observations of surface-atmosphere exchanges of net ecosystem carbon dioxide exchange (NEE), sensible heat (H) and latent heat (LE), and momentum (τ) measured at a lowland valley fen located on Anglesey, North Wales, UK. Turbulent flux densities were monitored using the micrometeorological eddy covariance (EC) technique between 1st January 2015 and 10th October 2018. The dataset includes ancillary weather and soil physics observations, as well as variables describing atmospheric turbulence and the quality of the turbulent flux observations. This work was supported by the Natural Environment Research Council award number NE/R016429/1 as part of the UK-SCAPE programme delivering National Capability.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Investigating the habitat preference of water voles in the UK. Measuring habitat variables to determine their preferred habitat features.
This statistic shows consumers' choice of retail stores to buy toys in the United Kingdom (UK) as of October 2018, broken down by former Toys R Us customers and others. Argos was revealed to be the leading retail store for both former Toys R Us customers and the rest of the nation with ** and ** percent of respondents respectively stating that they shopped at the British catalogue retailer. With ** percent of former Toys R Us customers and ** percent of the rest of the nation shopping there, Wilko was respondents' second choice of retail goods store.
The English Longitudinal Study of Ageing (ELSA) is a longitudinal survey of ageing and quality of life among older people that explores the dynamic relationships between health and functioning, social networks and participation, and economic position as people plan for, move into and progress beyond retirement. The main objectives of ELSA are to:
Further information may be found on the "https://www.elsa-project.ac.uk/"> ELSA project website, the or Natcen Social Research: ELSA web pages.
Wave 11 data has been deposited - May 2025
For the 45th edition (May 2025) ELSA Wave 11 core and pension grid data and documentation were deposited. Users should note this dataset version does not contain the survey weights. A version with the survey weights along with IFS and financial derived datasets will be deposited in due course. In the meantime, more information about the data collection or the data collected during this wave of ELSA can be found in the Wave 11 Technical Report or the User Guide.
Health conditions research with ELSA - June 2021
The ELSA Data team have found some issues with historical data measuring health conditions. If you are intending to do any analysis looking at the following health conditions, then please read the ELSA User Guide or if you still have questions contact elsadata@natcen.ac.uk for advice on how you should approach your analysis. The affected conditions are: eye conditions (glaucoma; diabetic eye disease; macular degeneration; cataract), CVD conditions (high blood pressure; angina; heart attack; Congestive Heart Failure; heart murmur; abnormal heart rhythm; diabetes; stroke; high cholesterol; other heart trouble) and chronic health conditions (chronic lung disease; asthma; arthritis; osteoporosis; cancer; Parkinson's Disease; emotional, nervous or psychiatric problems; Alzheimer's Disease; dementia; malignant blood disorder; multiple sclerosis or motor neurone disease).
For information on obtaining data from ELSA that are not held at the UKDS, see the ELSA Genetic data access and Accessing ELSA data webpages.
Wave 10 Health data
Users should note that in Wave 10, the health section of the ELSA questionnaire has been revised and all respondents were asked anew about their health conditions, rather than following the prior approach of asking those who had taken part in the past waves to confirm previously recorded conditions. Due to this reason, the health conditions feed-forward data was not archived for Wave 10, as was done in previous waves.
Harmonized dataset:
Users of the Harmonized dataset who prefer to use the Stata version will need access to Stata MP software, as the version G3 file contains 11,779 variables (the limit for the standard Stata 'Intercooled' version is 2,047).
ELSA COVID-19 study:
A separate ad-hoc study conducted with ELSA respondents, measuring the socio-economic effects/psychological impact of the lockdown on the aged 50+ population of England, is also available under SN 8688,
English Longitudinal Study of Ageing COVID-19 Study.
http://www.nationalarchives.gov.uk/doc/non-commercial-government-licence/version/2/http://www.nationalarchives.gov.uk/doc/non-commercial-government-licence/version/2/
This is version v3.4.0.2023f of Met Office Hadley Centre's Integrated Surface Database, HadISD. These data are global sub-daily surface meteorological data.
This update (v3.4.0.2023f) to HadISD corrects a long-standing bug which was discovered in autumn 2023 whereby the neighbour checks (and associated [un]flagging for some other tests) were not being implemented. For more details see the posts on the HadISD blog: https://hadisd.blogspot.com/2023/10/bug-in-buddy-checks.html & https://hadisd.blogspot.com/2024/01/hadisd-v3402023f-future-look.html
The quality controlled variables in this dataset are: temperature, dewpoint temperature, sea-level pressure, wind speed and direction, cloud data (total, low, mid and high level). Past significant weather and precipitation data are also included, but have not been quality controlled, so their quality and completeness cannot be guaranteed. Quality control flags and data values which have been removed during the quality control process are provided in the qc_flags and flagged_values fields, and ancillary data files show the station listing with a station listing with IDs, names and location information.
The data are provided as one NetCDF file per station. Files in the station_data folder station data files have the format "station_code"_HadISD_HadOBS_19310101-20240101_v3.4.1.2023f.nc. The station codes can be found under the docs tab. The station codes file has five columns as follows: 1) station code, 2) station name 3) station latitude 4) station longitude 5) station height.
To keep informed about updates, news and announcements follow the HadOBS team on twitter @metofficeHadOBS.
For more detailed information e.g bug fixes, routine updates and other exploratory analysis, see the HadISD blog: http://hadisd.blogspot.co.uk/
References: When using the dataset in a paper you must cite the following papers (see Docs for link to the publications) and this dataset (using the "citable as" reference) :
Dunn, R. J. H., (2019), HadISD version 3: monthly updates, Hadley Centre Technical Note.
Dunn, R. J. H., Willett, K. M., Parker, D. E., and Mitchell, L.: Expanding HadISD: quality-controlled, sub-daily station data from 1931, Geosci. Instrum. Method. Data Syst., 5, 473-491, doi:10.5194/gi-5-473-2016, 2016.
Dunn, R. J. H., et al. (2012), HadISD: A Quality Controlled global synoptic report database for selected variables at long-term stations from 1973-2011, Clim. Past, 8, 1649-1679, 2012, doi:10.5194/cp-8-1649-2012
Smith, A., N. Lott, and R. Vose, 2011: The Integrated Surface Database: Recent Developments and Partnerships. Bulletin of the American Meteorological Society, 92, 704–708, doi:10.1175/2011BAMS3015.1
For a homogeneity assessment of HadISD please see this following reference
Dunn, R. J. H., K. M. Willett, C. P. Morice, and D. E. Parker. "Pairwise homogeneity assessment of HadISD." Climate of the Past 10, no. 4 (2014): 1501-1522. doi:10.5194/cp-10-1501-2014, 2014.
https://www.icpsr.umich.edu/web/ICPSR/studies/34807/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/34807/terms
The Ithaka S+R, Jisc, RLUK UK Survey of Academics 2012 examined the attitudes and behaviors of academics at higher education institutions across the United Kingdom. Respondents were asked about resource discovery and current awareness, library collections and content access, the print to electronic format transition, academic research methods and practices, undergraduate instruction, publishing and research dissemination, the role and value of the academic library, and the role of learned society. Demographic variables include age, gender, academic field, number of years of employment at the respondent's current college or university, and number of years working in the respondent's current field.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data set and R code used in the "Variation in wild wound incidence across the UK Drosophila genus" Dissertation paper Complete Drosophila collection data setComplete R code for statistical analysis CSV files made from main data set, used for statistical analysis, in order listed on R code
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Employment rate of parents living with dependent children as a couple or lone parent by age of the youngest child in the UK.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We provide a large set of binned collision strengths and effective collision strengths for ions of the
Li-, Be-, B-, C-, N-, O-, Ne-, Na-, and Mg-like sequences.
They were calculated over a long period of time by the
UK Atomic Process for Astrophysical Plasma (APAP) network,
coordinated by the late Nigel Badnell, with funding from PPARC/STFC.
AUTOSTRUCTURE and a suite of R-matrix codes, included in the package, were used.
For several sequences, we have found problems in the published effective collision strengths, so the present values replace the
published ones. The present data are fundamental for the modelling of laboratory and astrophysical plasma.
The binned collision strengths are provided to model plasma where electrons are non-Maxwellian.
Some details are provided in Del Zanna et al., 2025 Atoms, in a series of README files, and in the publications listed in the README files.
Giulio Del Zanna 28-Feb-2025
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary results from log-log linear regression models.
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Explore the historical Whois records related to healthy-r-us.co.uk (Domain). Get insights into ownership history and changes over time.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R code and research dataset of medieval coins recorded by the Portable Antiquities Scheme in England and Wales (https://finds.org.uk/) used in the article:
Oksanen, Eljas and Brookes, Stuart (2025). 'The afterlife of Roman roads in England: insights from the fifteenth-century Gough Map of Great Britain', Journal of Archaeological Science.
https://doi.org/10.1016/j.jas.2025.106227
The coin finds data dump was obtained by the PAS website (https://finds.org.uk/) on 28.03.2025 under CC-BY licence and was filtered to contain only medieval coin findspots that have coordinate values. The R Code for analysis is included and was developed by Eljas Oksanen.
http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitationshttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Occurrence data for fossil fishes in British record and associated R code. From: Lloyd, G. T., & Friedman, M. (2013). A survey of palaeontological sampling biases in fishes based on the Phanerozoic record of Great Britain. Palaeogeography, Palaeoclimatology, Palaeoecology, 372, 5-17.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pearson correlations between UK Biobank tests and age, general tests, and reference tests (n = 154–160).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pearson correlations and age-adjusted Pearson correlations between general cognitive ability created using 11 reference tests and the UK Biobank tests (n = 151–160).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Evidence about the relationship between lighting and crime is mixed. Although a review of evidence found that improved road / street lighting was associated with reductions in crime, these reductions occurred in daylight as well as after dark, suggesting any effect was not due only to changes in visual conditions. One limitation of previous studies is that crime data are reported in aggregate and thus previous analyses were required to make simplifications concerning types of crimes or locations. We will overcome that by working with a UK police force to access records of individual crimes. We will use these data to determine whether the risk of crime at a specific time of day is greater after dark than during daylight. If no difference is found, this would suggest improvements to visual conditions after dark through lighting would have no effect. If however the risk of crime occurring after dark was greater than during daylight, quantifying this effect would provide a measure to assess the potential effectiveness of lighting in reducing crime risk after dark. We will use a case and control approach to analyse ten years of crime data. We will compare counts of crimes in ‘case’ hours, that are in daylight and darkness at different times of the year, and ‘control’ hours, that are in daylight throughout the year. From these counts we will calculate odds ratios as a measure of the effect of darkness on risk of crime, using these to answer three questions: 1) Is the risk of overall crime occurring greater after dark than during daylight? 2) Does the risk of crime occurring after dark vary depending on the category of crime? 3) Does the risk of crime occurring after dark vary depending on the geographical area?
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionThe UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names.Resultsukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata.ConclusionHaving a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.