92 datasets found
  1. f

    ukbtools: An R package to manage and query UK Biobank data

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ken B. Hanscombe; Jonathan R. I. Coleman; Matthew Traylor; Cathryn M. Lewis (2023). ukbtools: An R package to manage and query UK Biobank data [Dataset]. http://doi.org/10.1371/journal.pone.0214311
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ken B. Hanscombe; Jonathan R. I. Coleman; Matthew Traylor; Cathryn M. Lewis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionThe UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names.Resultsukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata.ConclusionHaving a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.

  2. d

    European LD Reference from UK Biobank

    • search.dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chen, Tony (2023). European LD Reference from UK Biobank [Dataset]. http://doi.org/10.7910/DVN/FDAROV
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Chen, Tony
    Description

    LD blocks based on 20,000 European individuals from the UK Biobank (split by chromosome), with about 1.5 million SNPs based on HapMap3 and MEGA chips

  3. o

    Results for 2,230 UK Biobank binary and continuous traits

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +1more
    Updated Jan 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiyang Ma; Chen Wang; Iuliana Ionita-Laza (2021). Results for 2,230 UK Biobank binary and continuous traits [Dataset]. http://doi.org/10.5281/zenodo.4397932
    Explore at:
    Dataset updated
    Jan 28, 2021
    Authors
    Shiyang Ma; Chen Wang; Iuliana Ionita-Laza
    Description

    Results for 2,230 UK Biobank binary and continuous traits. We applied the gene-based tests (Gene1D, Gene3D, GeneScan1D and GeneScan3D) to 1,403 UK Biobank binary phecodes and 827 continuous phenotypes (797 continuous traits + 30 biomarkers) using GWAS summary statistics on 28 million imputed variants. The results are in 3 different zipped folders: 'GeneScan3D_UKBB_1403binary_results.zip', 'GeneScan3D_UKBB_797continuous_results.zip' and 'GeneScan3D_UKBB_30biomarkers_results.zip'. A list of all 2,230 binary and continuous phenotypes is available in excel file 'UKBB_phenotype_description.xlsx'. Reference: Ma, S., Dalgleish, J. L ., Lee, J., Wang, C., Liu, L., Gill, R., Buxbaum, J. D., Chung, W., Aschard, H., Silverman, E. K., Cho, M. H., He, Z. and Ionita-Laza, I. "Improved gene-based testing by integrating long-range chromatin interactions and knockoff statistics", 2021

  4. UK Biobank Genetic Data: MRC-IEU Quality Control, Version 1 [Updated version...

    • search.datacite.org
    Updated 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    R Mitchell; G Hemani; T Dudding; L Paternoster (2017). UK Biobank Genetic Data: MRC-IEU Quality Control, Version 1 [Updated version at DOI: 10.5523/bris.1ovaau5sxunp2cv8rcy88688v] [Dataset]. http://doi.org/10.5523/bris.3074krb6t2frj29yh2b03x3wxj
    Explore at:
    Dataset updated
    2017
    Dataset provided by
    DataCitehttps://www.datacite.org/
    University of Bristol
    Authors
    R Mitchell; G Hemani; T Dudding; L Paternoster
    Description

    This dataset has been superseded. Updated version available at DOI: 10.5523/bris.1ovaau5sxunp2cv8rcy88688v

    This is a full description of the quality control procedure undertaken and the derived files produced by the MRC-IEU associated with the full UK Biobank (July 2017) genetic data.

  5. o

    Data from: Fine-mapping gene-based associations via knockoff analysis of...

    • explore.openaire.eu
    Updated May 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiyang Ma; Chen Wang; Iuliana Ionita-Laza (2022). Fine-mapping gene-based associations via knockoff analysis of biobank-scale data with applications to UK Biobank [Dataset]. http://doi.org/10.5281/zenodo.6582345
    Explore at:
    Dataset updated
    May 25, 2022
    Authors
    Shiyang Ma; Chen Wang; Iuliana Ionita-Laza
    Description

    The results of BIGKnock analyses of manuscript ''Fine-mapping gene-based associations via knockoff analysis of biobank-scale data with applications to UK Biobank''

  6. d

    Pleiotropy of UK Biobank metabolites

    • search.dataone.org
    • zenodo.org
    • +1more
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Courtney Smith; Nasa Sinnott-Armstrong; Anna Cichonska; Heli Julkunen; Eric Fauman; Peter Wurtz; Jonathan Pritchard (2025). Pleiotropy of UK Biobank metabolites [Dataset]. http://doi.org/10.5061/dryad.79cnp5hxs
    Explore at:
    Dataset updated
    May 21, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Courtney Smith; Nasa Sinnott-Armstrong; Anna Cichonska; Heli Julkunen; Eric Fauman; Peter Wurtz; Jonathan Pritchard
    Time period covered
    Jan 1, 2022
    Description

    Pleiotropy and genetic correlation are widespread features in GWAS, but they are often difficult to interpret at the molecular level. Here, we perform GWAS of 16 metabolites clustered at the intersection of amino acid catabolism, glycolysis, and ketone body metabolism in a subset of UK Biobank. We utilize the well-documented biochemistry jointly impacting these metabolites to analyze pleiotropic effects in the context of their pathways. Among the 213 lead GWAS hits, we find a strong enrichment for genes encoding pathway-relevant enzymes and transporters. We demonstrate that the effect directions of variants acting on biology between metabolite pairs often contrast with those of upstream or downstream variants as well as the polygenic background. Thus, we find that these outlier variants often reflect biology local to the traits. Finally, we explore the implications for interpreting disease GWAS, underscoring the potential of unifying biochemistry with dense metabolomics data to understa...

  7. d

    Data from: Patterns of recent natural selection on genetic loci associated...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Audrey M. Arner; Kathleen E. Grogan; Mark Grabowski; Hugo Reyes-Centeno; George H. Perry (2025). Patterns of recent natural selection on genetic loci associated with sexually differentiated human body size and shape phenotypes [Dataset]. http://doi.org/10.5061/dryad.nzs7h44rc
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Audrey M. Arner; Kathleen E. Grogan; Mark Grabowski; Hugo Reyes-Centeno; George H. Perry
    Time period covered
    Jan 1, 2021
    Description

    Levels of sex differences for human body size and shape phenotypes are hypothesized to have adaptively reduced following the agricultural transition as part of an evolutionary response to relatively more equal divisions of labor and new technology adoption. In this study, we tested this hypothesis by studying genetic variants associated with five sexually differentiated human phenotypes: height, body mass, hip circumference, body fat percentage, and waist circumference. We first analyzed genome-wide association (GWAS) results for UK Biobank individuals (~197,000 females and ~167,000 males) to identify a total of 119,023 single nucleotide polymorphisms (SNPs) significantly associated with at least one of the studied phenotypes in females, males, or both sexes (P<5x10-8). From these loci we then identified 3,016 SNPs (2.5%) with significant differences in the strength of association between the female- and male-specific GWAS results at a low false-discovery rate (FDR<0.001). Genes w...

  8. Z

    Phenome-wide association studies across large population cohorts support...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vangjeli, Ciara (2021). Phenome-wide association studies across large population cohorts support drug target validation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2671776
    Explore at:
    Dataset updated
    Mar 8, 2021
    Dataset provided by
    Vangjeli, Ciara
    Franklin, Chris S.
    Weale, Michael E.
    Spencer, Chris C. A.
    Donnelly, Peter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary-level data generated by Genomics plc as presented in: Diogo, D. et al. Phenome-wide association studies across large population cohorts support drug target validation. Nat. Commun. 9, 4285 (2018). https://doi.org/10.1038/s41467-018-06540-3

    If you have any questions or comments regarding these files, please contact Genomics plc at research@genomicsplc.com

    NOTES

    These analyses were carried out using the interim UK Biobank imputation data release. Analyses were restricted to a subset of "white-British" unrelated samples with a maximum sample size of 112,337 individuals.

    Case control phenotypes were defined based on categorical datafields as listed in the accompanying file. Quantitative phenotypes were either rank-normalised before analysis, or beta/se values were standardised after analysis using the variance of the phenotype. The normalisation value is indicated in the accompanying file.

    All analyses included Age at assessment, sex, genotyping chip, and 10 principal components as covariates.

    We used plink1.9 linear/logistic regression as appropriate. For chromosome X variants males were treated as having 0 or 2 alternative alleles.

    The results are not adjusted for genomic control.

    DATA FILE CONTENT DESCRIPTION

    CHR - Chromosome SNP - Variant rsID ALT - Alternative allele (effect allele) REF - Reference Allele (non-effect allele) BP - Position in base pairs (b37, 1-based) NMISS - Number of samples with non-missing genotypes BETA - Effect size (log odds ratio or standardised effect size) SE - Standard error P - P-value F_MISS - genotype missing rate P_hwe - Hardy-weinberg p-value MAF - ALT allele frequency

  9. E

    Sex-stratified linear mixed models: Clinical binary traits (Item 1/3)

    • find.data.gov.scot
    • dtechtive.com
    tsv, txt, zip
    Updated May 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. The Roslin Institute (2021). Sex-stratified linear mixed models: Clinical binary traits (Item 1/3) [Dataset]. http://doi.org/10.7488/ds/3046
    Explore at:
    zip(5264.384 MB), txt(0.0006 MB), zip(3287.04 MB), zip(6538.24 MB), zip(6235.136 MB), zip(3280.896 MB), tsv(0.0388 MB), zip(1967.104 MB), zip(21012.48 MB), zip(11755.52 MB), txt(0.0166 MB), zip(6224.896 MB), zip(8525.824 MB), zip(12134.4 MB), zip(2945.024 MB), zip(14776.32 MB)Available download formats
    Dataset updated
    May 25, 2021
    Dataset provided by
    University of Edinburgh. The Roslin Institute
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sex-stratified GWAS can help shed light on sexual differences in genetic architecture. In Bernabeu et al (2021) we fit sex-stratified linear mixed models (using DISSECT) across a total of 530 phenotypes to assess the effects of sex on genetic effect estimates, and compared estimates between males and females in a search for genetic variants that presented significant differences in association to the traits considered. Here, the summary statistics of said efforts, pertaining to clinical binary traits, are included (note: does not include UK Biobank cancer traits - these are found in DataShare item pertaining to non-clinical binary traits). Each file contains the results for a single clinical binary trait, as stated in the file name, using its corresponding UK Biobank trait code. Trait descriptions, including their respective UK Biobank codes, are stated in the 'trait_description.tsv' file. For each trait (each .gz file), GWAS summary statistics obtained for over 4 million genetic variants across the genome (both autosomal, and X chromosome, MAF 10% filtered) and circa 450K individuals, as well as the results of the t-test comparing genetic effect estimates between the sexes, are included.

  10. E

    SUPERSEDED - Summary statistics for three depression phenotypes in UK...

    • find.data.gov.scot
    • dtechtive.com
    txt
    Updated Mar 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh (2018). SUPERSEDED - Summary statistics for three depression phenotypes in UK Biobank [Dataset]. http://doi.org/10.7488/ds/2314
    Explore at:
    txt(0.0166 MB), txt(595.8 MB), txt(0.0007 MB), txt(603.2 MB), txt(597.3 MB)Available download formats
    Dataset updated
    Mar 5, 2018
    Dataset provided by
    University of Edinburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    UNITED KINGDOM
    Description

    This item has been replaced by the one which can be found at https://doi.org/10.7488/ds/2350 ## Depression is a polygenic trait that causes extensive periods of disability. Previous genetic studies have identified common risk variants which have progressively increased in number with increasing sample sizes of the respective studies. Here, we conduct a genome-wide association study in 322,580 UK Biobank participants for three depression-related phenotypes: broad depression, probable major depressive disorder (MDD), and International Classification of Diseases (ICD, version 9 or 10)-coded MDD. We identify 17 independent loci that are significantly associated (P < 5 x 10-8) across the three phenotypes. The direction of effect of these loci is consistently replicated in an independent sample, with 14 loci likely representing novel findings. Gene sets are enriched in excitatory neurotransmission, mechanosensory behavior, postsynapse, neuron spine, and dendrite functions. Our findings suggest that broad depression is the most tractable UK Biobank phenotype for discovering genes and gene-sets that further our understanding of the biological pathways underlying depression. Note: The effect sizes and standard errors reported in this data are on the 0-1 scale. We are working on adding these values transformed on to the logistic scale and a new version of the data will be made available in due course.

  11. E

    Data from: Associations between alcohol use and accelerated biological...

    • find.data.gov.scot
    • explore.openaire.eu
    gz, txt
    Updated Nov 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. Centre for Clinical Brain Sciences (2020). Associations between alcohol use and accelerated biological ageing [Dataset]. http://doi.org/10.7488/ds/2956
    Explore at:
    txt(0.0166 MB), gz(634.1 MB)Available download formats
    Dataset updated
    Nov 24, 2020
    Dataset provided by
    University of Edinburgh. Centre for Clinical Brain Sciences
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data supporting the manuscript 'Associations between alcohol use and accelerated biological ageing'. Specifically: Genome Wide Association Study of brain age.

  12. E

    Data from: Factors associated with sharing email information and mental...

    • find.data.gov.scot
    • dtechtive.com
    gz, txt
    Updated Jun 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh. Division of Psychiatry (2019). Factors associated with sharing email information and mental health survey participation in large population cohorts [Dataset]. http://doi.org/10.7488/ds/2554
    Explore at:
    txt(0.0008 MB), gz(417.4 MB), gz(417.7 MB), txt(0.0166 MB)Available download formats
    Dataset updated
    Jun 3, 2019
    Dataset provided by
    University of Edinburgh. Division of Psychiatry
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    UNITED KINGDOM
    Description

    Genome-wide association study summary statistics of email contact and Mental Health Questionnaire participation in UK Biobank. Data in support of the manuscript: 'Factors associated with sharing email information and mental health survey participation in large population cohorts'. ABSTRACT BACKGROUND People who opt to participate in scientific studies tend to be healthier, wealthier, and more educated than the broader population. While selection bias does not always pose a problem for analysing the relationships between exposures and diseases or other outcomes, it can lead to biased effect size estimates. Biased estimates may weaken the utility of genetic findings because the goal is often to make inferences in a new sample (such as in polygenic risk score analysis). METHODS We used data from UK Biobank, Generation Scotland, and Partners Biobank and conducted phenotypic and genome-wide association analyses on two phenotypes that reflected mental health data availability: (1) whether participants were contactable by email for follow-up and (2) whether participants responded to follow-up surveys of mental health. RESULTS In UK Biobank, we identified nine genetic loci associated (P < 5 x 10-8) with email contact and 25 loci associated with mental health survey completion. Both phenotypes were positively genetically correlated with higher educational attainment and better health and negatively genetically correlated with psychological distress and schizophrenia. One SNP association replicated along with the overall direction of effect of all association results. CONCLUSIONS Recontact availability and follow-up participation can act as further genetic filters for data on mental health phenotypes.

  13. d

    Summary statistics for \"Biobank-driven genomic discovery yields new insight...

    • search.dataone.org
    • dataverse.azure.uit.no
    Updated Jul 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nielsen, Jonas B.; Willer, Cristen J. (2024). Summary statistics for \"Biobank-driven genomic discovery yields new insight into atrial fibrillation biology\" [Dataset]. http://doi.org/10.18710/VC2PSH
    Explore at:
    Dataset updated
    Jul 29, 2024
    Dataset provided by
    DataverseNO
    Authors
    Nielsen, Jonas B.; Willer, Cristen J.
    Description

    To identify genetic variation underlying atrial fibrillation, the most common cardiac arrhythmia, we performed a genome-wide association study of >1,000,000 people, including 60,620 atrial fibrillation cases and 970,216 controls. We identified 142 independent risk variants at 111 loci and prioritized 151 functional candidate genes likely to be involved in atrial fibrillation. Many of the identified risk variants fall near genes where more deleterious mutations have been reported to cause serious heart defects in humans (GATA4, MYH6, NKX2-5, PITX2, TBX5)1, or near genes important for striated muscle function and integrity (for example, CFL2, MYH7, PKP2, RBM20, SGCG, SSPN). Pathway and functional enrichment analyses also suggested that many of the putative atrial fibrillation genes act via cardiac structural remodeling, potentially in the form of an ‘atrial cardiomyopathy’2, either during fetal heart development or as a response to stress in the adult heart. The data format is .tbl

  14. H

    UK BiLEVE Consortium Dataset

    • find.data.gov.scot
    • dtechtive.com
    Updated May 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BREATHE (2023). UK BiLEVE Consortium Dataset [Dataset]. https://find.data.gov.scot/datasets/26430
    Explore at:
    Dataset updated
    May 5, 2023
    Dataset provided by
    BREATHE
    Area covered
    United Kingdom
    Description

    This project aims to leverage the power of UK Biobank to detect rare genetic variants associated with lung function.

  15. D

    Data from: Genome-wide association study of nociceptive musculoskeletal pain...

    • lifesciences.datastations.nl
    • explore.openaire.eu
    application/gzip, pdf +1
    Updated Jan 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S. Li; G.J.V. Poelmans; R.L.M. van Boekel; M.J.H. Coenen; S. Li; G.J.V. Poelmans; R.L.M. van Boekel; M.J.H. Coenen (2022). Data from: Genome-wide association study of nociceptive musculoskeletal pain treatment response in UK Biobank [Dataset]. http://doi.org/10.17026/DANS-XNS-UN6C
    Explore at:
    zip(25047), pdf(85206), application/gzip(242906210)Available download formats
    Dataset updated
    Jan 11, 2022
    Dataset provided by
    DANS Data Station Life Sciences
    Authors
    S. Li; G.J.V. Poelmans; R.L.M. van Boekel; M.J.H. Coenen; S. Li; G.J.V. Poelmans; R.L.M. van Boekel; M.J.H. Coenen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Drug treatment for nociceptive musculoskeletal pain (NMP) follows a three-step analgesic ladder, starting from non-steroidal anti-inflammatory drugs (NSAIDs), followed by weak or strong opioids until the pain is under control. Here, we conducted a genome-wide association study (GWAS) of a binary phenotype comparing NSAID users and opioid users as a proxy of treatment response to NSAID using data from the UK Biobank. We aim to find the common genetic variants associated with pain treatment response in the general population.Type of data uploaded in this repositoryUK Biobank is a large-scale biomedical database and research resource containing in-depth genetic and health information from half a million UK participants (https://www.ukbiobank.ac.uk/). The database is globally accessible to approved researchers undertaking vital research into the most common and life-threatening diseases. As the raw data is quite large and only available upon application to UKB, we only provide the results from our analysis, which is also described here: medrxiv and currently in revision in a scientific journal. In the dataset, you will find the association of 9,435,994 SNPs genetic variants with the pain treatment response (PTR) phenotype. This dataset is not applicable to be opened with Excel and can best be opened on a cluster computer or using specific software.SubjectsThe UK Biobank is a general population cohort with over 0.5 million participants aged 40–69 recruited across the United Kingdom (UK). We derived a phenotype as a proxy for the pain treatment response to NSAIDs by using recently released primary care (general practitioners', GPs') data, which contains longitudinal structured diagnosis and prescription data. To define the PTR phenotype, we first extracted all nociceptive musculoskeletal pain (NMP) treatments and diagnoses from the GP data. NMP diagnosis was primarily selected from the chapters on musculoskeletal and connective tissue diseases and relevant symptoms or signs from other chapters in the Read codes (versions 2 and 3). See Supplementary data 1 on medrxiv for the diagnosis codes included in this study. Secondly, pain prescriptions (NSAID and opioid) were extracted from the GP data using the British national formulary (BNF), dictionary of medicines and devices (dmd), and Read code (version 2) for data extraction. An overview of the extracted medication codes is provided in Supplementary data 2 on medrxiv. Only participants with an NMP diagnosis record and a pain prescription record occurring on the same date were included for analysis to ensure that we would only include pain treatment for NMP.PhenotypeBased on the information of NMP and pain prescriptions from the UK biobank, a dichotomous score was used for the binary (case/control) PTR phenotype: NSAID users were defined as controls and opioid users as cases. Two additional quality control (QC) steps were applied. First, participants with only one treatment event were removed to safeguard the inclusion of only participants with relatively long-term treatment. Second, a chronological check was applied for the first prescription of each ladder to ensure that the treatment ladder was correctly followed, i.e., initial NSAID use was followed by weak or strong opioids. Participants that were not treated according to this order were removed.SNP genotyping and quality controlGenotyping procedures have been described in detail elsewhere [PMID: 30305743].The third-release genotyping data were used for analysis (see https://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id=100319).Participants passing quality control were included for analysis. QC steps for the samples included removal of participants with (1) inconsistent self-reported and genetically determined sex, (2) missing individual genetic data with a frequency of more than 0.1, (3) putative sex-chromosome aneuploidy. Participants were also excluded from the analysis if they were considered outliers due to missing heterozygosity, not white British ancestry based on the genotype, and had missing covariate data. Note that when we fit the linear mixed model in GCTA, it reminded us that the number of closely related participants was low. Therefore, we didn't further remove the related individuals in the sample.Routine QC steps for genetic markers on autosomes included removal of single nucleotide polymorphisms (SNPs) with (1) an imputation quality score less than 0.8, (2) a minor allele frequency (MAF) less than 0.005, (3) a Hardy-Weinberg equilibrium (HWE) test P-value less than 1 × 10−6, and (4) a genotyping call rate less than 0.95.Genome-wide association analysisA GWAS for binary PTR phenotype was conducted using a linear function in GCTA [38] for markers on the autosomal chromosomes, adjusting for age, sex, BMI, depression history, smoking status, drinking frequency, assessment center, genotyping array, and the first ten principal components (PCs). The following variables from the UK Biobank data set...

  16. o

    An integrated polygenic tool substantially enhances coronary artery disease...

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +1more
    Updated Jan 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Riveros-Mckay; Michael E. Weale; Rachel Moore; Saskia Selzam; Eva Krapohl; Michael Sivley; William A. Tarran; Peter Sørensen; Alexander S. Lachapelle; Jonathan A. Griffiths; Ayden Saffari; Chris C. A. Spencer; Vincent Plagnol; Peter Donnelly (2021). An integrated polygenic tool substantially enhances coronary artery disease prediction [Dataset]. http://doi.org/10.5281/zenodo.4421038
    Explore at:
    Dataset updated
    Jan 6, 2021
    Authors
    Fernando Riveros-Mckay; Michael E. Weale; Rachel Moore; Saskia Selzam; Eva Krapohl; Michael Sivley; William A. Tarran; Peter Sørensen; Alexander S. Lachapelle; Jonathan A. Griffiths; Ayden Saffari; Chris C. A. Spencer; Vincent Plagnol; Peter Donnelly
    Description

    Summary-level CAD GWAS data generated by Genomics plc as presented in: Riveros-Mckay F. et al. An integrated polygenic tool substantially enhances coronary artery disease prediction. Circulation: Genomics and Precision Medicine (in press). If you have any questions or comments regarding these files, please contact Genomics plc at research@genomicsplc.com NOTES ----------------------------- These analyses were carried out using the full UK Biobank imputation data release (v3b). Analyses were restricted to a subset of UK Biobank, described as “Group I” in the published paper. Group I, “no PCE/QRISK3 available”, included 114,196 European-ancestry individuals with missing data that prevented PCE or QRISK3 calculation. CAD case phenotypes were defined as described in the “Phenotype definitions” section of the paper’s Supplementary Materials, using both prevalent (pre-baseline) and incident (post-baseline) events. All analyses included Age at assessment, sex, genotyping chip, and 10 principal components as covariates. We used plink2.0 logistic regression. For chromosome X variants males were treated as having 0 or 2 alternative alleles. The results are not adjusted for genomic control. DATA FILE CONTENT DESCRIPTION ----------------------------- cpra Variant ID in ‘CPRA’ format. Position reflects position in b37. chrom Chromosome pos Position in base pairs (b37, 1-based) alt Alternative allele (effect allele) beta Effect size (log odds ratio) standard_error Standard error of beta minus_log10_p Minus log(base 10) of P-value ref Reference allele (non-effect allele) ncase Number of cases ncontrol Number of controls

  17. f

    Table 9_Machine learning-based identification of proteomic markers in...

    • frontiersin.figshare.com
    xlsx
    Updated Jan 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swarnima Kollampallath Radhakrishnan; Dipanwita Nath; Dominic Russ; Laura Bravo Merodio; Priyani Lad; Folakemi Kola Daisi; Animesh Acharjee (2025). Table 9_Machine learning-based identification of proteomic markers in colorectal cancer using UK Biobank data.xlsx [Dataset]. http://doi.org/10.3389/fonc.2024.1505675.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Frontiers
    Authors
    Swarnima Kollampallath Radhakrishnan; Dipanwita Nath; Dominic Russ; Laura Bravo Merodio; Priyani Lad; Folakemi Kola Daisi; Animesh Acharjee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Colorectal cancer is one of the leading causes of cancer-related mortality in the world. Incidence and mortality are predicted to rise globally during the next several decades. When detected early, colorectal cancer is treatable with surgery and medications. This leads to the requirement for prognostic and diagnostic biomarker development. Our study integrates machine learning models and protein network analysis to identify protein biomarkers for colorectal cancer. Our methodology leverages an extensive collection of proteome profiles from both healthy and colorectal cancer individuals. To identify a potential biomarker with high predictive ability, we used three machine learning models. To enhance the interpretability of our models, we quantify each protein’s contribution to the model’s predictions using SHapley Additive exPlanations values. Three classifiers—LASSO, XGBoost, and LightGBM were evaluated for predictive performance along with hyperparameter tuning of each model using grid search, with LASSO achieving the highest AUC of 75% in the UK Biobank dataset and the AUCs for LightGBM and XGBoost are 69.61% and 71.42%, respectively. Using SHapley Additive exPlanations values, TFF3, LCN2, and CEACAM5 were found to be key biomarkers associated with cell adhesion and inflammation. Protein quantitative trait loci analyze studies provided further evidence for the involvement of TFF1, CEACAM5, and SELE in colorectal cancer, with possible connections to the PI3K/Akt and MAPK signaling pathways. By offering insights into colorectal cancer diagnostics and targeted therapeutics, our findings set the stage for further biomarker validation.

  18. Data from: Brain Ages Derived from Different MRI Modalities are Associated...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrei-Claudiu Roibu; Andrei-Claudiu Roibu; Stanislaw Adaszewski; Torsten Schindler; Stephen M. Smith; Stephen M. Smith; Ana I.L. Namburete; Ana I.L. Namburete; Frederik J. Lange; Frederik J. Lange; Stanislaw Adaszewski; Torsten Schindler (2025). Brain Ages Derived from Different MRI Modalities are Associated with Distinct Biological Phenotypes [Dataset]. http://doi.org/10.5281/zenodo.8110876
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrei-Claudiu Roibu; Andrei-Claudiu Roibu; Stanislaw Adaszewski; Torsten Schindler; Stephen M. Smith; Stephen M. Smith; Ana I.L. Namburete; Ana I.L. Namburete; Frederik J. Lange; Frederik J. Lange; Stanislaw Adaszewski; Torsten Schindler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    Brain ageing is a highly variable, spatially and temporally heterogeneous process, marked by numerous structural and functional changes. These can cause discrepancies between individuals’ chronological age and the apparent age of their brain, as inferred from neuroimaging data. Machine learning models, and particularly Convolutional Neural Networks (CNNs), have proven adept in capturing patterns relating to ageing induced changes in the brain. The differences between the predicted and chronological ages, referred to as brain age deltas, have emerged as useful biomarkers for exploring those factors which promote accelerated ageing or resilience, such as pathologies or lifestyle factors. However, previous studies rely only on structural neuroimaging for predictions, overlooking potentially informative functional and microstructural changes. Here we show that multiple contrasts derived from different MRI modalities can predict brain age, each encoding bespoke brain ageing information. By using 3D CNNs and UK Biobank data, we found that 57 contrasts derived from structural, susceptibility-weighted, diffusion, and functional MRI can successfully predict brain age. For each contrast, different patterns of association with non-imaging phenotypes were found, resulting in a total of 191 unique, statistically significant associations. Furthermore, we found that ensembling data from multiple contrasts results in both higher prediction accuracies and stronger correlations to non-imaging measurements. Our results demonstrate that other 3D contrasts and modalities, which have not been considered so far for the task of brain age prediction, encode different information about the ageing brain. We envision our work as being the starting point for future investigations into the causal links underpinning the observed brain age deltas and non-imaging measurement associations. For instance, drug effects can be monitored, given that certain medications correlated with accelerated brain ageing. Furthermore, continued development of brain age models could facilitate their deployment in clinical trials for recruitment and monitoring, and hospitals for diagnostic and screening tasks.

    Data Description

    This dataset contains the full correlation results with all nIDPs in the UK Biobank. These are presented in datasets split by sex in Female and Male subjects. For easier data manipulation, two smaller datasets have also been made available, containing just those correlation which pass the False Discovery Rate (FDR) threshold.

    As experiments were also conducted for ensembles using multiple contrasts, similar datasets are provided for those.

    Finally, global datasets are also provided. These are the concatenation of the associations contained in the Male and Female datasets.

    Paper & Code

    The original paper for this article can be accessed here:

    To access the codes relevant for this project, please access the project GitHub Repos:

    If using this work, please cite it based on the above paper, or using the following BibTex:

    @inproceedings{roibu2023brain,
     title={Brain Ages Derived from Different MRI Modalities are Associated with Distinct Biological Phenotypes},
     author={Roibu, Andrei-Claudiu and Adaszewski, Stanislaw and Schindler, Torsten and Smith, Stephen M and Namburete, Ana IL and Lange, Frederik J},
     booktitle={2023 10th IEEE Swiss Conference on Data Science (SDS)},
     pages={17--25},
     year={2023},
     organization={IEEE},
     doi={10.1109/SDS57534.2023.00010}
    }

    Data Access

    The data for this project is freely available upon application at the UK Biobank. For more information regarding the individual nIDPs, please access the UK Biobank Showcase website at: https://biobank.ctsu.ox.ac.uk/showcase/search.cgi

    Funding

    ACR is supported by EPSRC Grant EP/S024093/1, F. Hoffmann-La Roche AG and a 2021 Industrial Fellowship offered by the Royal Commission for the Exhibition of 1851. SMS is supported by a Wellcome Trust Collaborative Award 215573/Z/19/Z. AILN is grateful for support from the Academy of Medical Sciences under the Springboard Awards scheme (SBF005/1136), and the Bill and Melinda Gates Foundation. FJL is supported by a Wellcome Trust Collaborative Award (215573/Z/19/Z). The WIN is supported by core funding from the Wellcome Trust (203139/Z/16/Z). The computational aspects were supported by the Wellcome Trust (203141/Z/16/Z) and the NIHR Oxford BRC. Corresponding authors: ACR (andreiroibu@icloud.com), SA (stanislaw.adaszewski@roche.com) and AILN (ana.namburete@cs.ox.ac.uk).

  19. o

    GWAS on self-reported hearing difficulty in the UK Biobank

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +1more
    Updated Oct 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H.R.R. Wells; M.B. Freidin; F.N. Zainul Abidin; A. Payton; P Dawes; K.J. Munro; C.C. Morton; D.R. Moore; S.J. Dawson; F.M.K. Williams (2019). GWAS on self-reported hearing difficulty in the UK Biobank [Dataset]. http://doi.org/10.5281/zenodo.3490749
    Explore at:
    Dataset updated
    Oct 15, 2019
    Authors
    H.R.R. Wells; M.B. Freidin; F.N. Zainul Abidin; A. Payton; P Dawes; K.J. Munro; C.C. Morton; D.R. Moore; S.J. Dawson; F.M.K. Williams
    Description

    The dataset contains results of two genome-wide association studies for age-related hearing impairment (ARHI)-related traits as described in the following publication Wells HRR, Freidin MB, Zainul Abidin FN, Payton A, Dawes P, Munro KJ, Morton CC, Moore DR, Dawson SJ, Williams FMK. GWAS Identifies 44 Independent Associated Genomic Loci for Self-Reported Adult Hearing Difficulty in UK Biobank. Am J Hum Genet. 2019 Oct 3;105(4):788-802. doi: 10.1016/j.ajhg.2019.09.008. Epub 2019 Sep 26. Please cite the article if using this dataset. Two files provide summary statistics for discovery analysis of Hearing difficulty (HD) and Hearing aid use (HAID) phenotypes for individuals of European descent from UK Biobank. Acknowledgements The research was carried out using the UK Biobank Resource under application number 11516. H.R.R.W. is funded by a PhD Studentship Grant, S44, from Action on Hearing Loss. The study was also supported by funding from NIHR UCLH BRC Deafness and Hearing Problems Theme, a grant from MED_EL, and the NIHR Manchester Biomedical Research Centre. The English Longitudinal Study of Aging is jointly run by University College London, Institute for Fiscal Studies, University of Manchester, and National Centre for Social Research. Genetic analyses have been carried out by UCL Genomics and funded by the Economic and Social Research Council and the National Institute on Aging. Data governance was provided by the METADAC data access committee, funded by ESRC, Wellcome, and MRC (2015-2018: Grant Number MR/N01104X/1 2018-2020: Grant Number ES/S008349/1). TwinsUK is funded by the Wellcome Trust, Medical Research Council, European Union, the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility, and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London. We would like to thank all the participants of UK Biobank, English Longitudinal Study of Aging, and TwinsUK. Column headers: SNP, SNP rsID CHR, chromosome BP, genomic position (GRCh37 build) ALLELE1, effect allele (coded as "1") ALLELE0, reference allele (coded as "0") A1FREQ, effect allele frequency INFO, imputation quality BETA, effect size of effect allele SE: standard error of effect size P, P-value of association (without GC correction)

  20. d

    Summary statistics for 45 UK Biobank diseases/traits analyzed by TGFM.

    • search.dataone.org
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Strober, Benjamin (2023). Summary statistics for 45 UK Biobank diseases/traits analyzed by TGFM. [Dataset]. http://doi.org/10.7910/DVN/GTEGPE
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Strober, Benjamin
    Description

    BOLT-LMM summary statistics for 45 UK Biobank diseases/traits analyzed by TGFM. See README for more details.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ken B. Hanscombe; Jonathan R. I. Coleman; Matthew Traylor; Cathryn M. Lewis (2023). ukbtools: An R package to manage and query UK Biobank data [Dataset]. http://doi.org/10.1371/journal.pone.0214311

ukbtools: An R package to manage and query UK Biobank data

Explore at:
18 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Ken B. Hanscombe; Jonathan R. I. Coleman; Matthew Traylor; Cathryn M. Lewis
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

IntroductionThe UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names.Resultsukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata.ConclusionHaving a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.

Search
Clear search
Close search
Google apps
Main menu