84 datasets found

d
European LD Reference from UK Biobank
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen, Tony (2023). European LD Reference from UK Biobank [Dataset]. http://doi.org/10.7910/DVN/FDAROV
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/FDAROV
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Chen, Tony
Description
LD blocks based on 20,000 European individuals from the UK Biobank (split by chromosome), with about 1.5 million SNPs based on HapMap3 and MEGA chips
H
UK Biobank
dtechtive.com
find.data.gov.scot
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UK Biobank (2023). UK Biobank [Dataset]. https://dtechtive.com/datasets/26022
Explore at:
Dataset updated
May 30, 2023
Dataset provided by
UK Biobank
Area covered
United Kingdom
Description
UK Biobank is a large-scale biomedical database and research resource that provides researchers access to detailed longitudinal phenotype, medical and genetic data from 500,000 volunteer participants.
Data from: Brain Ages Derived from Different MRI Modalities are Associated...
data.niaid.nih.gov
zenodo.org
Updated Aug 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roibu, Andrei-Claudiu; Adaszewski, Stanislaw; Schindler, Torsten; Smith, Stephen M.; Namburete, Ana I.L.; Lange, Frederik J. (2023). Brain Ages Derived from Different MRI Modalities are Associated with Distinct Biological Phenotypes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8110875
Explore at:
Dataset updated
Aug 9, 2023
Dataset provided by
Roche Holding AGhttp://roche.com/
Oxford Machine Learning in NeuroImaging Lab (OMNI), University of Oxford, Oxford, U.K.
Wellcome Centre for Integrative Neuroimaging (WIN), University of Oxford, Oxford, U.K.
Authors
Roibu, Andrei-Claudiu; Adaszewski, Stanislaw; Schindler, Torsten; Smith, Stephen M.; Namburete, Ana I.L.; Lange, Frederik J.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

Brain ageing is a highly variable, spatially and temporally heterogeneous process, marked by numerous structural and functional changes. These can cause discrepancies between individuals’ chronological age and the apparent age of their brain, as inferred from neuroimaging data. Machine learning models, and particularly Convolutional Neural Networks (CNNs), have proven adept in capturing patterns relating to ageing induced changes in the brain. The differences between the predicted and chronological ages, referred to as brain age deltas, have emerged as useful biomarkers for exploring those factors which promote accelerated ageing or resilience, such as pathologies or lifestyle factors. However, previous studies rely only on structural neuroimaging for predictions, overlooking potentially informative functional and microstructural changes. Here we show that multiple contrasts derived from different MRI modalities can predict brain age, each encoding bespoke brain ageing information. By using 3D CNNs and UK Biobank data, we found that 57 contrasts derived from structural, susceptibility-weighted, diffusion, and functional MRI can successfully predict brain age. For each contrast, different patterns of association with non-imaging phenotypes were found, resulting in a total of 191 unique, statistically significant associations. Furthermore, we found that ensembling data from multiple contrasts results in both higher prediction accuracies and stronger correlations to non-imaging measurements. Our results demonstrate that other 3D contrasts and modalities, which have not been considered so far for the task of brain age prediction, encode different information about the ageing brain. We envision our work as being the starting point for future investigations into the causal links underpinning the observed brain age deltas and non-imaging measurement associations. For instance, drug effects can be monitored, given that certain medications correlated with accelerated brain ageing. Furthermore, continued development of brain age models could facilitate their deployment in clinical trials for recruitment and monitoring, and hospitals for diagnostic and screening tasks.

Data Description

This dataset contains the full correlation results with all nIDPs in the UK Biobank. These are presented in datasets split by sex in Female and Male subjects. For easier data manipulation, two smaller datasets have also been made available, containing just those correlation which pass the False Discovery Rate (FDR) threshold.

As experiments were also conducted for ensembles using multiple contrasts, similar datasets are provided for those.

Finally, global datasets are also provided. These are the concatenation of the associations contained in the Male and Female datasets.

Paper & Code

The original paper for this article can be accessed here:

https://ieeexplore.ieee.org/abstract/document/10196736

To access the codes relevant for this project, please access the project GitHub Repos:

https://github.com/AndreiRoibu/AgeMapper

If using this work, please cite it based on the above paper, or using the following BibTex:

@inproceedings{roibu2023brain, title={Brain Ages Derived from Different MRI Modalities are Associated with Distinct Biological Phenotypes}, author={Roibu, Andrei-Claudiu and Adaszewski, Stanislaw and Schindler, Torsten and Smith, Stephen M and Namburete, Ana IL and Lange, Frederik J}, booktitle={2023 10th IEEE Swiss Conference on Data Science (SDS)}, pages={17--25}, year={2023}, organization={IEEE}, doi={10.1109/SDS57534.2023.00010} }

Data Access

The data for this project is freely available upon application at the UK Biobank. For more information regarding the individual nIDPs, please access the UK Biobank Showcase website at: https://biobank.ctsu.ox.ac.uk/showcase/search.cgi

Funding

ACR is supported by EPSRC Grant EP/S024093/1, F. Hoffmann-La Roche AG and a 2021 Industrial Fellowship offered by the Royal Commission for the Exhibition of 1851. SMS is supported by a Wellcome Trust Collaborative Award 215573/Z/19/Z. AILN is grateful for support from the Academy of Medical Sciences under the Springboard Awards scheme (SBF005/1136), and the Bill and Melinda Gates Foundation. FJL is supported by a Wellcome Trust Collaborative Award (215573/Z/19/Z). The WIN is supported by core funding from the Wellcome Trust (203139/Z/16/Z). The computational aspects were supported by the Wellcome Trust (203141/Z/16/Z) and the NIHR Oxford BRC. Corresponding authors: ACR (andreiroibu@icloud.com), SA (stanislaw.adaszewski@roche.com) and AILN (ana.namburete@cs.ox.ac.uk).
H
UK BiLEVE Consortium Dataset
find.data.gov.scot
Updated May 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BREATHE (2023). UK BiLEVE Consortium Dataset [Dataset]. https://find.data.gov.scot/datasets/26430
Explore at:
Dataset updated
May 5, 2023
Dataset provided by
BREATHE
Area covered
United Kingdom
Description
This project aims to leverage the power of UK Biobank to detect rare genetic variants associated with lung function.
Phenome-wide association studies across large population cohorts support...
data.niaid.nih.gov
explore.openaire.eu
Updated Mar 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Franklin, Chris S.; Spencer, Chris C. A.; Weale, Michael E.; Donnelly, Peter; Vangjeli, Ciara (2021). Phenome-wide association studies across large population cohorts support drug target validation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2671776
Explore at:
Dataset updated
Mar 8, 2021
Dataset provided by
Genomics Ltd
Authors
Franklin, Chris S.; Spencer, Chris C. A.; Weale, Michael E.; Donnelly, Peter; Vangjeli, Ciara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary-level data generated by Genomics plc as presented in: Diogo, D. et al. Phenome-wide association studies across large population cohorts support drug target validation. Nat. Commun. 9, 4285 (2018). https://doi.org/10.1038/s41467-018-06540-3

If you have any questions or comments regarding these files, please contact Genomics plc at research@genomicsplc.com

NOTES

These analyses were carried out using the interim UK Biobank imputation data release. Analyses were restricted to a subset of "white-British" unrelated samples with a maximum sample size of 112,337 individuals.

Case control phenotypes were defined based on categorical datafields as listed in the accompanying file. Quantitative phenotypes were either rank-normalised before analysis, or beta/se values were standardised after analysis using the variance of the phenotype. The normalisation value is indicated in the accompanying file.

All analyses included Age at assessment, sex, genotyping chip, and 10 principal components as covariates.

We used plink1.9 linear/logistic regression as appropriate. For chromosome X variants males were treated as having 0 or 2 alternative alleles.

The results are not adjusted for genomic control.

DATA FILE CONTENT DESCRIPTION

CHR - Chromosome SNP - Variant rsID ALT - Alternative allele (effect allele) REF - Reference Allele (non-effect allele) BP - Position in base pairs (b37, 1-based) NMISS - Number of samples with non-missing genotypes BETA - Effect size (log odds ratio or standardised effect size) SE - Standard error P - P-value F_MISS - genotype missing rate P_hwe - Hardy-weinberg p-value MAF - ALT allele frequency
E
Summary statistics for three depression phenotypes in UK Biobank
dtechtive.com
find.data.gov.scot
txt, zip
Updated May 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh (2018). Summary statistics for three depression phenotypes in UK Biobank [Dataset]. http://doi.org/10.7488/ds/2350
Explore at:
zip(0.008 MB), txt(884.9 MB), txt(883.1 MB), txt(0.0166 MB), txt(0.0021 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/2350
Dataset updated
May 16, 2018
Dataset provided by
University of Edinburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
UNITED KINGDOM
Description
Depression is a polygenic trait that causes extensive periods of disability. Previous genetic studies have identified common risk variants which have progressively increased in number with increasing sample sizes of the respective studies. Here, we conduct a genome-wide association study in 322,580 UK Biobank participants for three depression-related phenotypes: broad depression, probable major depressive disorder (MDD), and International Classification of Diseases (ICD, version 9 or 10)-coded MDD. We identify 17 independent loci that are significantly associated (P < 5 x 10-8) across the three phenotypes. The direction of effect of these loci is consistently replicated in an independent sample, with 14 loci likely representing novel findings. Gene sets are enriched in excitatory neurotransmission, mechanosensory behavior, postsynapse, neuron spine, and dendrite functions. Our findings suggest that broad depression is the most tractable UK Biobank phenotype for discovering genes and gene-sets that further our understanding of the biological pathways underlying depression.
d
Data from: Patterns of recent natural selection on genetic loci associated...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Audrey M. Arner; Kathleen E. Grogan; Mark Grabowski; Hugo Reyes-Centeno; George H. Perry (2025). Patterns of recent natural selection on genetic loci associated with sexually differentiated human body size and shape phenotypes [Dataset]. http://doi.org/10.5061/dryad.nzs7h44rc
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.nzs7h44rc
Dataset updated
Apr 21, 2025
Dataset provided by
Dryad Digital Repository
Authors
Audrey M. Arner; Kathleen E. Grogan; Mark Grabowski; Hugo Reyes-Centeno; George H. Perry
Time period covered
Jan 1, 2021
Description
Levels of sex differences for human body size and shape phenotypes are hypothesized to have adaptively reduced following the agricultural transition as part of an evolutionary response to relatively more equal divisions of labor and new technology adoption. In this study, we tested this hypothesis by studying genetic variants associated with five sexually differentiated human phenotypes: height, body mass, hip circumference, body fat percentage, and waist circumference. We first analyzed genome-wide association (GWAS) results for UK Biobank individuals (~197,000 females and ~167,000 males) to identify a total of 119,023 single nucleotide polymorphisms (SNPs) significantly associated with at least one of the studied phenotypes in females, males, or both sexes (P<5x10-8). From these loci we then identified 3,016 SNPs (2.5%) with significant differences in the strength of association between the female- and male-specific GWAS results at a low false-discovery rate (FDR<0.001). Genes w...
H
UKBiobank Sarcopenia
dataverse.harvard.edu
dataone.org
Updated Aug 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zihao Gui (2025). UKBiobank Sarcopenia [Dataset]. http://doi.org/10.7910/DVN/IMPHVM
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/IMPHVM
Dataset updated
Aug 16, 2025
Dataset provided by
Harvard Dataverse
Authors
Zihao Gui
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Research code for the study of muscle mass reduction and the risk of severe MASLD in the UK Biobank population data. Research code for the study of muscle mass reduction and the risk of severe MASLD in the UK Biobank population data.
E
Data from: Factors associated with sharing email information and mental...
dtechtive.com
find.data.gov.scot
gz, txt
Updated Jun 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh. Division of Psychiatry (2019). Factors associated with sharing email information and mental health survey participation in large population cohorts [Dataset]. http://doi.org/10.7488/ds/2554
Explore at:
txt(0.0008 MB), gz(417.4 MB), gz(417.7 MB), txt(0.0166 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/2554
Dataset updated
Jun 3, 2019
Dataset provided by
University of Edinburgh. Division of Psychiatry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
UNITED KINGDOM
Description
Genome-wide association study summary statistics of email contact and Mental Health Questionnaire participation in UK Biobank. Data in support of the manuscript: 'Factors associated with sharing email information and mental health survey participation in large population cohorts'. ABSTRACT BACKGROUND People who opt to participate in scientific studies tend to be healthier, wealthier, and more educated than the broader population. While selection bias does not always pose a problem for analysing the relationships between exposures and diseases or other outcomes, it can lead to biased effect size estimates. Biased estimates may weaken the utility of genetic findings because the goal is often to make inferences in a new sample (such as in polygenic risk score analysis). METHODS We used data from UK Biobank, Generation Scotland, and Partners Biobank and conducted phenotypic and genome-wide association analyses on two phenotypes that reflected mental health data availability: (1) whether participants were contactable by email for follow-up and (2) whether participants responded to follow-up surveys of mental health. RESULTS In UK Biobank, we identified nine genetic loci associated (P < 5 x 10-8) with email contact and 25 loci associated with mental health survey completion. Both phenotypes were positively genetically correlated with higher educational attainment and better health and negatively genetically correlated with psychological distress and schizophrenia. One SNP association replicated along with the overall direction of effect of all association results. CONCLUSIONS Recontact availability and follow-up participation can act as further genetic filters for data on mental health phenotypes.
n
Pleiotropy of UK Biobank metabolites
data.niaid.nih.gov
nde-dev.biothings.io
+3more
zip
Updated Oct 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Courtney Smith; Nasa Sinnott-Armstrong; Anna Cichonska; Heli Julkunen; Eric Fauman; Peter Wurtz; Jonathan Pritchard (2022). Pleiotropy of UK Biobank metabolites [Dataset]. http://doi.org/10.5061/dryad.79cnp5hxs
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.79cnp5hxs
Dataset updated
Oct 13, 2022
Dataset provided by
Nightingale Health
Pfizer (United States)
Stanford University
Fred Hutch Cancer Center
Authors
Courtney Smith; Nasa Sinnott-Armstrong; Anna Cichonska; Heli Julkunen; Eric Fauman; Peter Wurtz; Jonathan Pritchard
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Pleiotropy and genetic correlation are widespread features in GWAS, but they are often difficult to interpret at the molecular level. Here, we perform GWAS of 16 metabolites clustered at the intersection of amino acid catabolism, glycolysis, and ketone body metabolism in a subset of UK Biobank. We utilize the well-documented biochemistry jointly impacting these metabolites to analyze pleiotropic effects in the context of their pathways. Among the 213 lead GWAS hits, we find a strong enrichment for genes encoding pathway-relevant enzymes and transporters. We demonstrate that the effect directions of variants acting on biology between metabolite pairs often contrast with those of upstream or downstream variants as well as the polygenic background. Thus, we find that these outlier variants often reflect biology local to the traits. Finally, we explore the implications for interpreting disease GWAS, underscoring the potential of unifying biochemistry with dense metabolomics data to understand the molecular basis of pleiotropy in complex traits and diseases. Methods The details of the dataset processing are provided in our manuscript: https://elifesciences.org/articles/79348 Briefly, we performed GWAS of technically-corrected metabolite levels from the Nightingale NMR Metabolomics dataset on 94,464 European-ancestry individuals and 98,189 individuals in our ancestry-inclusive analysis using BOLT-REML and integrated these results with a curated biochemical map connecting the 16 core metabolites spanning glycolysis, ketones, and amino acids. Files with names "*_step3.txt" and "*_step2.txt" are the local genetic correlation and local heritability estimates for each approximately independent LD block (Berisa et al. 2016) using rho-HESS (Shi et al. 2017) and HESS (Shi et al. 2016), respectively. These were derived from European-ancestry summary statistics. Files with names that start with a SNP identifier, "both," or "neither" are the conditional fine-mapping summary statistics from our example loci, generated with the PLINK2 "--condition" option. Please see the manucript for additional details.
H
Summary statistics for 45 UK Biobank diseases/traits analyzed by TGFM.
dataverse.harvard.edu
search.dataone.org
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Strober (2023). Summary statistics for 45 UK Biobank diseases/traits analyzed by TGFM. [Dataset]. http://doi.org/10.7910/DVN/GTEGPE
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/GTEGPE
Dataset updated
Oct 31, 2023
Dataset provided by
Harvard Dataverse
Authors
Benjamin Strober
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
BOLT-LMM summary statistics for 45 UK Biobank diseases/traits analyzed by TGFM. See README for more details.
GCTB sparse shrunk LD matrices from 2.8M common variants from the UK Biobank...
search.datacite.org
zenodo.org
Updated Aug 23, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke Lloyd-Jones (2019). GCTB sparse shrunk LD matrices from 2.8M common variants from the UK Biobank - Part AA - START HERE [Dataset]. http://doi.org/10.5281/zenodo.3375372
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3375372
Dataset updated
Aug 23, 2019
Dataset provided by
Zenodohttp://zenodo.org/
DataCite
Authors
Luke Lloyd-Jones
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GCTB sparse shrunk LD matrices from 2.8M common variants from the UK Biobank. Part AA of AA, AB, AC, AD and AE. TO JOIN AND UNZIP THESE MATRICES Download all parts to one folder from: PartAA - 10.5281/zenodo.3375373 PartAB - 10.5281/zenodo.3376357 Part AC - 10.5281/zenodo.3376456 Parts AD and AE - 10.5281/zenodo.3376628 Use cat to join cat ukb_50k_bigset_2.8M.zip.part* > ukb_50k_bigset_2.8M.zip Then unzip. See README for further details. unzip ukb_50k_bigset_2.8M.zip
95% credible sets for glucose from SparsePro+.
figshare.com
xlsx
Updated Jan 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenmin Zhang; Hamed Najafabadi; Yue Li (2024). 95% credible sets for glucose from SparsePro+. [Dataset]. http://doi.org/10.1371/journal.pgen.1011104.s019
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1011104.s019
Dataset updated
Jan 10, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Wenmin Zhang; Hamed Najafabadi; Yue Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identifying causal variants from genome-wide association studies (GWAS) is challenging due to widespread linkage disequilibrium (LD) and the possible existence of multiple causal variants in the same genomic locus. Functional annotations of the genome may help to prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. Classical fine-mapping methods conducting an exhaustive search of variant-level causal configurations have a high computational cost, especially when the underlying genetic architecture and LD patterns are complex. SuSiE provided an iterative Bayesian stepwise selection algorithm for efficient fine-mapping. In this work, we build connections between SuSiE and a paired mean field variational inference algorithm through the implementation of a sparse projection, and propose effective strategies for estimating hyperparameters and summarizing posterior probabilities. Moreover, we incorporate functional annotations into fine-mapping by jointly estimating enrichment weights to derive functionally-informed priors. We evaluate the performance of SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved improved power for fine-mapping with reduced computation time. We demonstrate the utility of SparsePro through fine-mapping of five functional biomarkers of clinically relevant phenotypes. In summary, we have developed an efficient fine-mapping method for integrating summary statistics and functional annotations. Our method can have wide utility in understanding the genetics of complex traits and increasing the yield of functional follow-up studies of GWAS. SparsePro software is available on GitHub at https://github.com/zhwm/SparsePro.
E
Sex-stratified linear mixed models: Non-binary traits (Item 3/3)
dtechtive.com
find.data.gov.scot
gz, tsv, txt
Updated May 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh. The Roslin Institute (2021). Sex-stratified linear mixed models: Non-binary traits (Item 3/3) [Dataset]. http://doi.org/10.7488/ds/3048
Explore at:
gz(708.2 MB), gz(707.3 MB), gz(716.3 MB), txt(0.0006 MB), gz(713 MB), gz(706.1 MB), gz(712.9 MB), gz(713.7 MB), gz(716.8 MB), gz(716.7 MB), gz(717.1 MB), gz(715.4 MB), gz(711 MB), gz(715.6 MB), gz(711.2 MB), gz(709.1 MB), gz(715.1 MB), gz(714.3 MB), gz(715.7 MB), txt(0.0166 MB), gz(712.1 MB), gz(716.2 MB), gz(718.1 MB), gz(718.6 MB), gz(707.4 MB), gz(711.6 MB), gz(716.1 MB), gz(719.5 MB), gz(716.4 MB), gz(715.8 MB), gz(714.1 MB), gz(704.8 MB), gz(713.9 MB), gz(729.4 MB), gz(716 MB), gz(704.4 MB), gz(718.3 MB), gz(710.2 MB), gz(711.4 MB), gz(710 MB), gz(709 MB), gz(713.8 MB), gz(715.9 MB), gz(706 MB), gz(708.5 MB), gz(719.3 MB), gz(717 MB), gz(713.1 MB), gz(716.9 MB), gz(713.6 MB), gz(714.2 MB), gz(704.9 MB), gz(703.1 MB), gz(705.9 MB), gz(717.5 MB), gz(709.5 MB), gz(714.8 MB), gz(718 MB), gz(714.9 MB), gz(708 MB), gz(714 MB), gz(718.5 MB), gz(719.2 MB), gz(712.8 MB), tsv(0.0388 MB), gz(712.2 MB), gz(711.3 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/3048
Dataset updated
May 25, 2021
Dataset provided by
University of Edinburgh. The Roslin Institute
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sex-stratified GWAS can help shed light on sexual differences in genetic architecture. In Bernabeu et al (2021) we fit sex-stratified linear mixed models (using DISSECT) across a total of 530 phenotypes to assess the effects of sex on genetic effect estimates, and compared estimates between males and females in a search for genetic variants that presented significant differences in association to the traits considered. Here, the summary statistics of said efforts, pertaining to non-binary traits, are included. Each file contains the results for a single non-binary trait, as stated in the file name, using its corresponding UK Biobank trait code. Trait descriptions, including their respective UK Biobank codes, are stated in the 'trait_description.tsv' file. For each trait (each .gz file), GWAS summary statistics obtained for over 9 million genetic variants across the genome (both autosomal, and X chromosome) and circa 450K individuals, as well as the results of the t-test comparing genetic effect estimates between the sexes, are included.
Data from: Accuracy of identifying incident stroke cases from linked...
zenodo.org
data.niaid.nih.gov
+2more
pdf
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristiina Rannikmae; Kristiina Rannikmae (2024). Accuracy of identifying incident stroke cases from linked healthcare data in UK Biobank [Dataset]. http://doi.org/10.5061/dryad.w9ghx3fk0
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.w9ghx3fk0
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kristiina Rannikmae; Kristiina Rannikmae
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Objective: In UK Biobank (UKB), a large population-based prospective study, cases of many diseases are ascertained through linkage to routinely collected, coded national health datasets. We assessed the accuracy of these for identifying incident strokes.

Methods: In a regional UKB sub-population (n=17,249), we identified all participants with ≥1 code signifying a first stroke after recruitment (incident stroke-coded cases) in linked hospital admission, primary care or death record data. Stroke physicians reviewed their full electronic patient records (EPRs) and generated reference standard diagnoses. We evaluated the number and proportion of cases that were true positives (i.e. positive predictive value, PPV) for all codes combined and by code source and type.

Results: Of 232 incident stroke-coded cases, 97% had EPR information available. Data sources were: 30% hospital admission only; 39% primary care only; 28% hospital and primary care; 3% death records only. While 42% of cases were coded as unspecified stroke type, review of EPRs enabled a pathological type to be assigned in >99%. PPVs (95% confidence intervals) were: 79% (73%-84%) for any stroke (89% for hospital admission codes, 80% for primary care codes) and 83% (74%-90%) for ischemic stroke. PPVs for small numbers of death record and hemorrhagic stroke codes were low but imprecise.

Conclusions: Stroke and ischemic stroke cases in UKB can be ascertained through linked health datasets with sufficient accuracy for many research studies. Further work is needed to understand the accuracy of death record and hemorrhagic stroke codes and to develop scalable approaches for better identifying stroke types.
n
Sociability GWAS in a population-based sample : summary statistics of a...
narcis.nl
lifesciences.datastations.nl
pdf
Updated Mar 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bralten, J.B. (Radboud University); Roth Mota, N. (Radboud University); Klemann, C.J.H.M. (Radboud University); Witte, W. de (2021). Sociability GWAS in a population-based sample : summary statistics of a genome-wide association study of an aggregated sociability score in the UK Biobank [Dataset]. http://doi.org/10.17026/dans-ztj-zga6
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.17026/dans-ztj-zga6
Dataset updated
Mar 12, 2021
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Bralten, J.B. (Radboud University); Roth Mota, N. (Radboud University); Klemann, C.J.H.M. (Radboud University); Witte, W. de
Area covered
northlimit=59.62358300012501; eastlimit=2.374666058072581; southlimit=49.568413008749225; westlimit=-8.205608345022652United Kingdom
Description
Levels of sociability are continuously distributed in the general population, and decreased sociability represents an early manifestation of several brain disorders. Here, we investigated the genetic underpinnings of sociability in the population.

Main question of our research: 1. Are there common genetic variants that are associated with sociability in the general population? 2. Are genetic variants that are associated with sociability also associated with neuropsychiatric disorders?

Type of data uploaded in this repository: The UK Biobank project (see https://www.ukbiobank.ac.uk/) is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. The database is globally accessible to approved researchers undertaking vital research into the most common and life-threatening diseases. The raw data that this project is based on comes from the publically available UK Biobank set, which is very large and is therefore not provided here. Here we only provide the results from our analysis, that is also described here: https://www.biorxiv.org/content/10.1101/781195v2 and currently in revision in a scientific journal. In the dataset you will find the association of 9327396 genetic variants with the phenotype sociability. This dataset is not applicable to be opened with Excel, and can best be opened on a cluster computer or using specfic software.

Subjects The UK Biobank (UKBB) is a major population-based cohort from the United Kingdom that includes individuals aged between 37 and 73 years. We constructed a sociability measure based on the the aggregation of scores per participant on four questions from the UKBB database that link to sociability, including (1) a question about the frequency of friend/family visits, (2) a question on the number and type of social venues that are visited, (3) a question about worrying after social embarrassment and (4) a question about feeling lonely, leading to a sociability score ranging from 0-4. Participants were excluded if they had somatic problems that could be related to social withdrawal (BMI < 15 or BMI > 40, narcolepsy (all the time), stroke, severe tinnitus, deafness or brain-related cancers) or if they answered that they had “No friends/family outside household” or “Do not know” or “Prefer not to answer” to any of the questions.

SNP genotyping and quality control Details about the available genome-wide genotyping data for UKBB participants have been reported previously (PMID: 30305743). We used third-release genotyping data (see https://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id=100319). Briefly, 49,950 participants were genotyped using the UK BiLEVE Axiom Array and 438,427 participants were genotyped using UK Biobank Axiom Array. Genotypes were imputed into the dataset using the Haplotype Reference Consortium (HRC), and the UK10K haplotype resource. To account for ethnicity, we included only those individuals that identified themselves as "white" by self-report and plotted the Principal Components (PC) provided by the UKBB, excluding individuals considered to be outliers according to PCs 1 and 2. Genetic relatedness calculated with KING kinship and provided by the UKBB (https://kenhanscombe.github.io/ukbtools/articles/explore-ukb-data.html ; http://www.ukbiobank.ac.uk/wp-content/uploads/2014/04/UKBiobank_genotyping_QC_documentation-web.pdf) was used to identify first and second-degree relatives. Subsequently ´families´ (i.e. clusters of related individuals above an IBD>0.125 threshold) were created and only one individual from each of these created ‘families’ was included in the analysis. If self-reported sex and SNP-based sex differed, individuals were excluded from further analysis. Single nucleotide polymorphisms (SNPs) with minor allele frequency <0.005, Hardy-Weinberg equilibrium test P value<1e−6, missing genotype rate >0.05, and imputation quality of INFO <0.8 were excluded. In the current study, all analyses are based on 342,461 participants of European ancestry for which both genotype data and sociability scores were available.

Genome-wide association analysis Genome-wide association analysis with the imputed marker dosages was performed in PLINK1.9, using a linear regression model with the sociability measure as the dependent variable and including sex, age, 10 first PCs, assessment center, and genotype batch as covariates. SNPs were considered significantly associated if they had p-value < 5e-8. Associated loci were considered independent of each other at r2 0.6 and lead SNPs were classified as the SNP with the smallest association p-value and at r2 0.1, using a 250kb window. The summary statistics come from the plink2 linear regression analysis.
o
Summary-Level Data From Meta-Analysis Of Fat Distribution Phenotypes In Uk...
explore.openaire.eu
zenodo.org
Updated May 23, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sara L Pulit (2018). Summary-Level Data From Meta-Analysis Of Fat Distribution Phenotypes In Uk Biobank And Giant [Dataset]. http://doi.org/10.5281/zenodo.1251813
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.1251813
Dataset updated
May 23, 2018
Authors
Sara L Pulit
Description
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary-level data as presented in: "Meta-analysis of genome-wide association studies for body fat distribution in 694,649 individuals of European ancestry." Pulit, SL et al. bioRxiv, 2018. https://www.biorxiv.org/content/early/2018/04/18/304030 **If you use these data, please cite the above preprint. If you have any questions or comments regarding these files, please contact me: Sara L Pulit spulit@well.ox.ac.uk or s.l.pulit@umcutrecht.nl ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (1) Data files i. whradjbmi.giant-ukbb.meta-analysis.combined.23May2018.txt Meta-analysis of waist-to-hip ratio adjusted for body mass index (whradjbmi) in UK Biobank and GIANT data. Combined set of samples, max N = 694,649. ii. whradjbmi.giant-ukbb.meta-analysis.females.23May2018.txt Meta-analysis of whradjbmi in UK Biobank and GIANT data. Female samples only, max N = 379,501. iii. whradjbmi.giant-ukbb.meta-analysis.males.23May2018.txt Meta-analysis of whradjbmi in UK Biobank and GIANT data. Male samples only, max N = 315,284. iv. whr.giant-ukbb.meta-analysis.combined.23May2018.txt Meta-analysis of waist-to-hip ratio (whr) in UK Biobank and GIANT data. Combined set of samples, max N = 697,734. v. whr.giant-ukbb.meta-analysis.females.23May2018.txt Meta-analysis of whr in UK Biobank and GIANT data. Female samples only, max N = 381,152. vi. whr.giant-ukbb.meta-analysis.males.23May2018.txt Meta-analysis of whr in UK Biobank and GIANT data. Male samples only, max N = 316,772. vii. bmi.giant-ukbb.meta-analysis.combined.23May2018.txt Meta-analysis of body mass index (bmi) in UK Biobank and GIANT data. Combined set of samples, max N = 806,834. viii. bmi.giant-ukbb.meta-analysis.females.23May2018.txt Meta-analysis of bmi in UK Biobank and GIANT data. Female samples only, max N = 434,794. ix. bmi.giant-ukbb.meta-analysis.males.23May2018.txt Meta-analysis of bmi in UK Biobank and GIANT data. Male samples only, max N = 374,756. (2) Data file format CHR: Chromosome POS: Chromosomal position of the SNP, build hg19 SNP: the dbSNP151 identifier of the SNP, followed by the first allele and second allele of the SNP, delimited with a colon. A small number of SNPs (<9,000) from the GIANT data had no dbSNP151 identifier, and are left as just an rsID. Note that these SNPs are also missing chromosome and position information (not provided in the GIANT data). Tested_Allele: the allele for which all association statistics are reported Other_Allele: the other allele at the SNP Freq_Tested_Allele: frequency of the tested allele BETA: the effect size of the tested allele SE: the standard error of the beta P: the p-value of the SNP, as reported from the inverse variance-weighted fixed effects meta-analysis N: the total sample size for this SNP INFO: the imputation quality (info score) of the SNP, as reported by UK Biobank. A number between 0 and 1 indicating quality of imputation (0, poor quality; 1, high quality or genotyped). Note that the summary-level GIANT data does not report info score, so SNPs appearing only in the GIANT analysis do not have info scores.
Data from: In search of the genetic variants of human sex ratio at birth:...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Sep 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siliang Song; Jianzhi Zhang (2024). In search of the genetic variants of human sex ratio at birth: Was Fisher wrong about sex ratio evolution? [Dataset]. http://doi.org/10.5061/dryad.vdncjsz43
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.vdncjsz43
Dataset updated
Sep 6, 2024
Dataset provided by
University of Michigan
Authors
Siliang Song; Jianzhi Zhang
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The human sex ratio (fraction of males) at birth is close to 0.5 at the population level, an observation commonly explained by Fisher's principle. However, past human studies yielded conflicting results regarding the existence of sex ratio-influencing mutations-a prerequisite to Fisher’s principle, raising the question of whether the nearly even population sex ratio is instead dictated by the random X/Y chromosome segregation in male meiosis. Here we show that, because a person’s offspring sex ratio (OSR) has an enormous measurement error, a gigantic sample is required to detect OSR-influencing genetic variants. Conducting a UK Biobank-based genome-wide association study that is more powerful than previous studies, we detect an OSR-associated genetic variant, which awaits verification in independent samples. Given the abysmal precision in measuring OSR, it is unsurprising that the estimated heritability of OSR is effectively zero. We further show that OSR’s estimated heritability would remain virtually zero even if OSR is as genetically variable as the highly heritable human standing height. These analyses, along with simulations of human sex ratio evolution under selection, demonstrate the compatibility of the observed genetic architecture of human OSR with Fisher’s principle and suggest the plausibility of presence of multiple human OSR-influencing genetic variants. Methods GWAS: When conducting the GWAS in the UKB, we did not simply use the sibling sex ratio as the trait, because of the difficulty in accounting for different estimation errors of the sibling sex ratio for different families as a result of the variation in family size. For example, individual A has one brother and zero sister, while individual B has four brothers and one sister. Although A has a higher sibling sex ratio than B, B’s siblings obviously provide stronger evidence for a male-biased sibling sex ratio than A’s siblings. To properly weigh the data by the family size, we considered the birth of each sibling as an independent event. In the above example, we would associate A’s genotype with one male birth and associate B’s genotype with four male births and one female birth. In GWAS, a male birth is coded as 1 and a female birth is coded as 0. The UKB participants have a total of 873,715 full siblings, leading to an unprecedented statistical power. In our GWAS in the UKB, we included genetic sex, year of birth, and the first ten genetic principle components as covariates. Gene-based test: We performed two gene-based association analyses. First, we analyzed the UKB-based GWAS summary statistics through the R package sumFREGAT for autosomal protein-coding genes (N = 17,389). All SNPs within the transcribed region of a gene derived from the European samples in the 1000 Genome Project were used in the test. We implemented the optimal unified test (SKAT-O), principal component analysis-based test (PCA), and aggregated Cauchy association test (ACAT-V) in sumFREGAT. For all three methods, weights were uniformly assigned for all alleles [beta.par = c(1, 1) in sumFREGAT] with other settings left at default values. Variant correlation matrix files (one file per gene) were needed for the gene-based analysis, and we used the pre-calculated matrices from 1KG European samples provided by the R-package development team (http://mga.bionet.nsc.ru/sumFREGAT). The input data were pre-processed using the R package function prep.score.files() with the reference file provided by the R-package development team (http://mga.bionet.nsc.ru/sumFREGAT). The P values in the three tests were then combined by the omnibus aggregated Cauchy association test (ACAT-O) in sumFREGAT. Second, we performed a gene-based burden test using rare missense variants (MAF < 1%) in the UKB whole exome sequencing data. The burden test assumes that rare variants are functionally disruptive and therefore have the same direction of effect. To properly weigh OSR of UKB participants by their heterogenous measurement errors, we generated a plink bed file that contained burden scores of all genes for all UKB individuals using the “--write-mask” option in REGENIE. The annotation file that specifies the functional class of each SNP and the corresponding gene required in this step was provided in the UKB Research Analysis Platform (see https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/using-regenie-to-generate-variant-masks), which included protein coding genes in autosomes, X, and Y chromosomes (N = 18,845). We chose to include all loss-of-function and missense SNPs to calculate the burden score. In the default setting, the burden score is calculated as the maximum number of alternative alleles across sites of a gene, being 0, 1, or 2 (see REGENIE online documentation for details, https://rgcgithub.github.io/regenie/options/). We then used this gene-level bed file to perform association analysis on the sibling sex following the same procedure describe in the “GWAS” section. Simulating the genetic architecture of sex ratio following that of standing height To simulate the genetic architecture of sex ratio following that of human standing height, we obtained the hypothetical sex ratio of a participant of European ancestry in the UKB through the following four steps. First, we computed the hypothetical sex ratio of a participant by dividing the participant’s standing height by twice the mean standing height of all UKB participants of European ancestry. Second, we performed a multiple regression on hypothetical sex ratio; the independent variables included genetic sex, age, age squared, and the first ten genetic principal components but not SNPs. Third, we obtained the regression residual of each participant, which is the difference between the hypothetical sex ratio computed in the first step and that predicted by the multiple regression model in the second step. Fourth, the covariate-corrected hypothetic sex ratio was set to be the regression residual in the preceding step plus 0.5. GWAS was subsequently performed on the covariate-corrected hypothetic sex ratio. SNP-based heritability of the covariate-corrected hypothetical sex ratio was computed. Based on the covariate-corrected hypothetical sex ratio, we generated the sexes of each participant’s offspring with 20 replicates. To ensure comparability with the original GWAS data, we assumed that each participant had the same number of offspring as the number of siblings in the UKB. We then conducted a GWAS using the simulated sexes of all offspring and estimated the SNP-based heritability of the estimated hypothetical sex ratio. Simulations of human sex ratio evolution We used SLiM 3 to simulate sex ratio evolution in humans. A non-Wright-Fisher model with separate sexes and non-overlapping generations was enabled in the simulation, along with the human demographic history described by the default example code in SLiM 3 (see SLiM manual, https://messerlab.org/slim/, p. 136-142). The diploid genome has a pair of 1000-nt chromosomes, and the recombination rate is 1×10-3 per site per generation such that one recombination per chromosome per generation is expected. In every generation, males and females will mate randomly, and each mating will result in one offspring. The random mating continues until the number of offspring matches the expected population size in the next generation. To achieve the mutation-drift-selection equilibrium, the population was pre-evolved for 73,105 generations (10 times the effective population size) in every simulation. The mutation rate varied from 1×10-6 to 1×10-2 per genome per generation. The mean mutation size () varied from 0.00125 to 0.16. Given , the actual size of a mutation is sampled from an exponential distribution with a mean of . The genetic effect of the mutation is set to be paternal. Thirty simulation replications were performed for each combination of mutation rate and mean size. Under the directional selection scenario, we assumed that the optimal OSR changed from the default value of 0.5 to around 0.52 at 800,000 years before present. To set the optimal OSR at around 0.52, we introduced unbalanced parental investments by reduce the future mating probability of individuals who have had daughters: future mating probability = 1 – 0.1 × number of daughters. The optimal OSR is 0.524, which was estimated by averaging sex ratios at the last 10 time points of 10-generation intervals in all simulations where mutation rate is 0.01 and mean mutation size is 0.00125, 0.0025, or 0.005. The heritability of sex ratio (with measurement error) was calculated by dividing the variance of genetically expected sex by the variance of observed sex. To obtain the number of detectable variants, we used the UKB statistical power map generated earlier (Fig. 1c). A SNP was considered detectable if its detectability exceeded 0.9. Key statistics such as the heritability of sex ratio (with measurement error), number of detectable variants, and number of variants in each simulation replicate were calculated by averaging sex ratios at the final 10 time points where consecutive time points were separated by 10 generations. These statistics from the 30 replicates were used to plot the mean, maximum, and minimum in Fig. 4.
E
Data from: Genome-wide meta-analysis of depression identifies 102...
find.data.gov.scot
txt
Updated Oct 24, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh (2018). Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions [Dataset]. http://doi.org/10.7488/ds/2458
Explore at:
txt(347.4 MB), txt(0.0023 MB), txt(0.0166 MB), txt(0.4338 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/2458
Dataset updated
Oct 24, 2018
Dataset provided by
University of Edinburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Major depression is a debilitating psychiatric illness that is typically associated with low mood, anhedonia and a range of comorbidities. Depression has a heritable component that has remained difficult to elucidate with current sample sizes due to the polygenic nature of the disorder. To maximise sample size, we meta-analysed data on 807,553 individuals (246,363 cases and 561,190 controls) from the three largest genome-wide association studies of depression. We identified 102 independent variants, 269 genes, and 15 gene-sets associated with depression, including both genes and gene-pathways associated with synaptic structure and neurotransmission. Further evidence of the importance of prefrontal brain regions in depression was provided by an enrichment analysis. In an independent replication sample of 1,306,354 individuals (414,055 cases and 892,299 controls), 87 of the 102 associated variants were significant following multiple testing correction. Based on the putative genes associated with depression this work also highlights several potential drug repositioning opportunities. These findings advance our understanding of the complex genetic architecture of depression and provide several future avenues for understanding aetiology and developing new treatment approaches. The data contained in this item is described in a published manuscript located at https://doi.org/10.1038/s41593-018-0326-7.
Data from: Insights into the genetic basis of retinal detachment
dtechtive.com
find.data.gov.scot
gz, txt
Updated Nov 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MRC Human Genetics Unit IGMM. University of Edinburgh (2019). Insights into the genetic basis of retinal detachment [Dataset]. http://doi.org/10.7488/ds/2712
Explore at:
gz(176.7 MB), txt(0.0166 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/2712
Dataset updated
Nov 20, 2019
Dataset provided by
MRC Human Genetics Unit
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
UNITED KINGDOM
Description
Dataset of genome-wide association meta-analysis summary statistics associated with the publication 'Insights into the genetic basis of retinal detachment' available at HMG: DOI: 10.1093/hmg/ddz294. If you use this dataset, please cite the manuscript.