Facebook
TwitterLD blocks based on 20,000 European individuals from the UK Biobank (split by chromosome), with about 1.5 million SNPs based on HapMap3 and MEGA chips
Facebook
TwitterUK Biobank is a large-scale biomedical database and research resource that provides researchers access to detailed longitudinal phenotype, medical and genetic data from 500,000 volunteer participants.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
Brain ageing is a highly variable, spatially and temporally heterogeneous process, marked by numerous structural and functional changes. These can cause discrepancies between individuals’ chronological age and the apparent age of their brain, as inferred from neuroimaging data. Machine learning models, and particularly Convolutional Neural Networks (CNNs), have proven adept in capturing patterns relating to ageing induced changes in the brain. The differences between the predicted and chronological ages, referred to as brain age deltas, have emerged as useful biomarkers for exploring those factors which promote accelerated ageing or resilience, such as pathologies or lifestyle factors. However, previous studies rely only on structural neuroimaging for predictions, overlooking potentially informative functional and microstructural changes. Here we show that multiple contrasts derived from different MRI modalities can predict brain age, each encoding bespoke brain ageing information. By using 3D CNNs and UK Biobank data, we found that 57 contrasts derived from structural, susceptibility-weighted, diffusion, and functional MRI can successfully predict brain age. For each contrast, different patterns of association with non-imaging phenotypes were found, resulting in a total of 191 unique, statistically significant associations. Furthermore, we found that ensembling data from multiple contrasts results in both higher prediction accuracies and stronger correlations to non-imaging measurements. Our results demonstrate that other 3D contrasts and modalities, which have not been considered so far for the task of brain age prediction, encode different information about the ageing brain. We envision our work as being the starting point for future investigations into the causal links underpinning the observed brain age deltas and non-imaging measurement associations. For instance, drug effects can be monitored, given that certain medications correlated with accelerated brain ageing. Furthermore, continued development of brain age models could facilitate their deployment in clinical trials for recruitment and monitoring, and hospitals for diagnostic and screening tasks.
Data Description
This dataset contains the full correlation results with all nIDPs in the UK Biobank. These are presented in datasets split by sex in Female and Male subjects. For easier data manipulation, two smaller datasets have also been made available, containing just those correlation which pass the False Discovery Rate (FDR) threshold.
As experiments were also conducted for ensembles using multiple contrasts, similar datasets are provided for those.
Finally, global datasets are also provided. These are the concatenation of the associations contained in the Male and Female datasets.
Paper & Code
The original paper for this article can be accessed here:
https://ieeexplore.ieee.org/abstract/document/10196736
To access the codes relevant for this project, please access the project GitHub Repos:
https://github.com/AndreiRoibu/AgeMapper
If using this work, please cite it based on the above paper, or using the following BibTex:
@inproceedings{roibu2023brain, title={Brain Ages Derived from Different MRI Modalities are Associated with Distinct Biological Phenotypes}, author={Roibu, Andrei-Claudiu and Adaszewski, Stanislaw and Schindler, Torsten and Smith, Stephen M and Namburete, Ana IL and Lange, Frederik J}, booktitle={2023 10th IEEE Swiss Conference on Data Science (SDS)}, pages={17--25}, year={2023}, organization={IEEE}, doi={10.1109/SDS57534.2023.00010} }
Data Access
The data for this project is freely available upon application at the UK Biobank. For more information regarding the individual nIDPs, please access the UK Biobank Showcase website at: https://biobank.ctsu.ox.ac.uk/showcase/search.cgi
Funding
ACR is supported by EPSRC Grant EP/S024093/1, F. Hoffmann-La Roche AG and a 2021 Industrial Fellowship offered by the Royal Commission for the Exhibition of 1851. SMS is supported by a Wellcome Trust Collaborative Award 215573/Z/19/Z. AILN is grateful for support from the Academy of Medical Sciences under the Springboard Awards scheme (SBF005/1136), and the Bill and Melinda Gates Foundation. FJL is supported by a Wellcome Trust Collaborative Award (215573/Z/19/Z). The WIN is supported by core funding from the Wellcome Trust (203139/Z/16/Z). The computational aspects were supported by the Wellcome Trust (203141/Z/16/Z) and the NIHR Oxford BRC. Corresponding authors: ACR (andreiroibu@icloud.com), SA (stanislaw.adaszewski@roche.com) and AILN (ana.namburete@cs.ox.ac.uk).
Facebook
TwitterThis project aims to leverage the power of UK Biobank to detect rare genetic variants associated with lung function.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary-level data generated by Genomics plc as presented in: Diogo, D. et al. Phenome-wide association studies across large population cohorts support drug target validation. Nat. Commun. 9, 4285 (2018). https://doi.org/10.1038/s41467-018-06540-3
If you have any questions or comments regarding these files, please contact Genomics plc at research@genomicsplc.com
These analyses were carried out using the interim UK Biobank imputation data release. Analyses were restricted to a subset of "white-British" unrelated samples with a maximum sample size of 112,337 individuals.
Case control phenotypes were defined based on categorical datafields as listed in the accompanying file. Quantitative phenotypes were either rank-normalised before analysis, or beta/se values were standardised after analysis using the variance of the phenotype. The normalisation value is indicated in the accompanying file.
All analyses included Age at assessment, sex, genotyping chip, and 10 principal components as covariates.
We used plink1.9 linear/logistic regression as appropriate. For chromosome X variants males were treated as having 0 or 2 alternative alleles.
The results are not adjusted for genomic control.
CHR - Chromosome SNP - Variant rsID ALT - Alternative allele (effect allele) REF - Reference Allele (non-effect allele) BP - Position in base pairs (b37, 1-based) NMISS - Number of samples with non-missing genotypes BETA - Effect size (log odds ratio or standardised effect size) SE - Standard error P - P-value F_MISS - genotype missing rate P_hwe - Hardy-weinberg p-value MAF - ALT allele frequency
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Depression is a polygenic trait that causes extensive periods of disability. Previous genetic studies have identified common risk variants which have progressively increased in number with increasing sample sizes of the respective studies. Here, we conduct a genome-wide association study in 322,580 UK Biobank participants for three depression-related phenotypes: broad depression, probable major depressive disorder (MDD), and International Classification of Diseases (ICD, version 9 or 10)-coded MDD. We identify 17 independent loci that are significantly associated (P < 5 x 10-8) across the three phenotypes. The direction of effect of these loci is consistently replicated in an independent sample, with 14 loci likely representing novel findings. Gene sets are enriched in excitatory neurotransmission, mechanosensory behavior, postsynapse, neuron spine, and dendrite functions. Our findings suggest that broad depression is the most tractable UK Biobank phenotype for discovering genes and gene-sets that further our understanding of the biological pathways underlying depression.
Facebook
TwitterLevels of sex differences for human body size and shape phenotypes are hypothesized to have adaptively reduced following the agricultural transition as part of an evolutionary response to relatively more equal divisions of labor and new technology adoption. In this study, we tested this hypothesis by studying genetic variants associated with five sexually differentiated human phenotypes: height, body mass, hip circumference, body fat percentage, and waist circumference. We first analyzed genome-wide association (GWAS) results for UK Biobank individuals (~197,000 females and ~167,000 males) to identify a total of 119,023 single nucleotide polymorphisms (SNPs) significantly associated with at least one of the studied phenotypes in females, males, or both sexes (P<5x10-8). From these loci we then identified 3,016 SNPs (2.5%) with significant differences in the strength of association between the female- and male-specific GWAS results at a low false-discovery rate (FDR<0.001). Genes w...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Research code for the study of muscle mass reduction and the risk of severe MASLD in the UK Biobank population data. Research code for the study of muscle mass reduction and the risk of severe MASLD in the UK Biobank population data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome-wide association study summary statistics of email contact and Mental Health Questionnaire participation in UK Biobank. Data in support of the manuscript: 'Factors associated with sharing email information and mental health survey participation in large population cohorts'. ABSTRACT BACKGROUND People who opt to participate in scientific studies tend to be healthier, wealthier, and more educated than the broader population. While selection bias does not always pose a problem for analysing the relationships between exposures and diseases or other outcomes, it can lead to biased effect size estimates. Biased estimates may weaken the utility of genetic findings because the goal is often to make inferences in a new sample (such as in polygenic risk score analysis). METHODS We used data from UK Biobank, Generation Scotland, and Partners Biobank and conducted phenotypic and genome-wide association analyses on two phenotypes that reflected mental health data availability: (1) whether participants were contactable by email for follow-up and (2) whether participants responded to follow-up surveys of mental health. RESULTS In UK Biobank, we identified nine genetic loci associated (P < 5 x 10-8) with email contact and 25 loci associated with mental health survey completion. Both phenotypes were positively genetically correlated with higher educational attainment and better health and negatively genetically correlated with psychological distress and schizophrenia. One SNP association replicated along with the overall direction of effect of all association results. CONCLUSIONS Recontact availability and follow-up participation can act as further genetic filters for data on mental health phenotypes.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Pleiotropy and genetic correlation are widespread features in GWAS, but they are often difficult to interpret at the molecular level. Here, we perform GWAS of 16 metabolites clustered at the intersection of amino acid catabolism, glycolysis, and ketone body metabolism in a subset of UK Biobank. We utilize the well-documented biochemistry jointly impacting these metabolites to analyze pleiotropic effects in the context of their pathways. Among the 213 lead GWAS hits, we find a strong enrichment for genes encoding pathway-relevant enzymes and transporters. We demonstrate that the effect directions of variants acting on biology between metabolite pairs often contrast with those of upstream or downstream variants as well as the polygenic background. Thus, we find that these outlier variants often reflect biology local to the traits. Finally, we explore the implications for interpreting disease GWAS, underscoring the potential of unifying biochemistry with dense metabolomics data to understand the molecular basis of pleiotropy in complex traits and diseases. Methods The details of the dataset processing are provided in our manuscript: https://elifesciences.org/articles/79348 Briefly, we performed GWAS of technically-corrected metabolite levels from the Nightingale NMR Metabolomics dataset on 94,464 European-ancestry individuals and 98,189 individuals in our ancestry-inclusive analysis using BOLT-REML and integrated these results with a curated biochemical map connecting the 16 core metabolites spanning glycolysis, ketones, and amino acids. Files with names "*_step3.txt" and "*_step2.txt" are the local genetic correlation and local heritability estimates for each approximately independent LD block (Berisa et al. 2016) using rho-HESS (Shi et al. 2017) and HESS (Shi et al. 2016), respectively. These were derived from European-ancestry summary statistics. Files with names that start with a SNP identifier, "both," or "neither" are the conditional fine-mapping summary statistics from our example loci, generated with the PLINK2 "--condition" option. Please see the manucript for additional details.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
BOLT-LMM summary statistics for 45 UK Biobank diseases/traits analyzed by TGFM. See README for more details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GCTB sparse shrunk LD matrices from 2.8M common variants from the UK Biobank. Part AA of AA, AB, AC, AD and AE. TO JOIN AND UNZIP THESE MATRICES Download all parts to one folder from: PartAA - 10.5281/zenodo.3375373 PartAB - 10.5281/zenodo.3376357 Part AC - 10.5281/zenodo.3376456 Parts AD and AE - 10.5281/zenodo.3376628 Use cat to join cat ukb_50k_bigset_2.8M.zip.part* > ukb_50k_bigset_2.8M.zip Then unzip. See README for further details. unzip ukb_50k_bigset_2.8M.zip
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Identifying causal variants from genome-wide association studies (GWAS) is challenging due to widespread linkage disequilibrium (LD) and the possible existence of multiple causal variants in the same genomic locus. Functional annotations of the genome may help to prioritize variants that are biologically relevant and thus improve fine-mapping of GWAS results. Classical fine-mapping methods conducting an exhaustive search of variant-level causal configurations have a high computational cost, especially when the underlying genetic architecture and LD patterns are complex. SuSiE provided an iterative Bayesian stepwise selection algorithm for efficient fine-mapping. In this work, we build connections between SuSiE and a paired mean field variational inference algorithm through the implementation of a sparse projection, and propose effective strategies for estimating hyperparameters and summarizing posterior probabilities. Moreover, we incorporate functional annotations into fine-mapping by jointly estimating enrichment weights to derive functionally-informed priors. We evaluate the performance of SparsePro through extensive simulations using resources from the UK Biobank. Compared to state-of-the-art methods, SparsePro achieved improved power for fine-mapping with reduced computation time. We demonstrate the utility of SparsePro through fine-mapping of five functional biomarkers of clinically relevant phenotypes. In summary, we have developed an efficient fine-mapping method for integrating summary statistics and functional annotations. Our method can have wide utility in understanding the genetics of complex traits and increasing the yield of functional follow-up studies of GWAS. SparsePro software is available on GitHub at https://github.com/zhwm/SparsePro.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sex-stratified GWAS can help shed light on sexual differences in genetic architecture. In Bernabeu et al (2021) we fit sex-stratified linear mixed models (using DISSECT) across a total of 530 phenotypes to assess the effects of sex on genetic effect estimates, and compared estimates between males and females in a search for genetic variants that presented significant differences in association to the traits considered. Here, the summary statistics of said efforts, pertaining to non-binary traits, are included. Each file contains the results for a single non-binary trait, as stated in the file name, using its corresponding UK Biobank trait code. Trait descriptions, including their respective UK Biobank codes, are stated in the 'trait_description.tsv' file. For each trait (each .gz file), GWAS summary statistics obtained for over 9 million genetic variants across the genome (both autosomal, and X chromosome) and circa 450K individuals, as well as the results of the t-test comparing genetic effect estimates between the sexes, are included.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Objective: In UK Biobank (UKB), a large population-based prospective study, cases of many diseases are ascertained through linkage to routinely collected, coded national health datasets. We assessed the accuracy of these for identifying incident strokes.
Methods: In a regional UKB sub-population (n=17,249), we identified all participants with ≥1 code signifying a first stroke after recruitment (incident stroke-coded cases) in linked hospital admission, primary care or death record data. Stroke physicians reviewed their full electronic patient records (EPRs) and generated reference standard diagnoses. We evaluated the number and proportion of cases that were true positives (i.e. positive predictive value, PPV) for all codes combined and by code source and type.
Results: Of 232 incident stroke-coded cases, 97% had EPR information available. Data sources were: 30% hospital admission only; 39% primary care only; 28% hospital and primary care; 3% death records only. While 42% of cases were coded as unspecified stroke type, review of EPRs enabled a pathological type to be assigned in >99%. PPVs (95% confidence intervals) were: 79% (73%-84%) for any stroke (89% for hospital admission codes, 80% for primary care codes) and 83% (74%-90%) for ischemic stroke. PPVs for small numbers of death record and hemorrhagic stroke codes were low but imprecise.
Conclusions: Stroke and ischemic stroke cases in UKB can be ascertained through linked health datasets with sufficient accuracy for many research studies. Further work is needed to understand the accuracy of death record and hemorrhagic stroke codes and to develop scalable approaches for better identifying stroke types.
Facebook
TwitterLevels of sociability are continuously distributed in the general population, and decreased sociability represents an early manifestation of several brain disorders. Here, we investigated the genetic underpinnings of sociability in the population.
Main question of our research: 1. Are there common genetic variants that are associated with sociability in the general population? 2. Are genetic variants that are associated with sociability also associated with neuropsychiatric disorders?
Type of data uploaded in this repository: The UK Biobank project (see https://www.ukbiobank.ac.uk/) is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. The database is globally accessible to approved researchers undertaking vital research into the most common and life-threatening diseases. The raw data that this project is based on comes from the publically available UK Biobank set, which is very large and is therefore not provided here. Here we only provide the results from our analysis, that is also described here: https://www.biorxiv.org/content/10.1101/781195v2 and currently in revision in a scientific journal. In the dataset you will find the association of 9327396 genetic variants with the phenotype sociability. This dataset is not applicable to be opened with Excel, and can best be opened on a cluster computer or using specfic software.
Subjects The UK Biobank (UKBB) is a major population-based cohort from the United Kingdom that includes individuals aged between 37 and 73 years. We constructed a sociability measure based on the the aggregation of scores per participant on four questions from the UKBB database that link to sociability, including (1) a question about the frequency of friend/family visits, (2) a question on the number and type of social venues that are visited, (3) a question about worrying after social embarrassment and (4) a question about feeling lonely, leading to a sociability score ranging from 0-4. Participants were excluded if they had somatic problems that could be related to social withdrawal (BMI < 15 or BMI > 40, narcolepsy (all the time), stroke, severe tinnitus, deafness or brain-related cancers) or if they answered that they had “No friends/family outside household” or “Do not know” or “Prefer not to answer” to any of the questions.
SNP genotyping and quality control Details about the available genome-wide genotyping data for UKBB participants have been reported previously (PMID: 30305743). We used third-release genotyping data (see https://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id=100319). Briefly, 49,950 participants were genotyped using the UK BiLEVE Axiom Array and 438,427 participants were genotyped using UK Biobank Axiom Array. Genotypes were imputed into the dataset using the Haplotype Reference Consortium (HRC), and the UK10K haplotype resource. To account for ethnicity, we included only those individuals that identified themselves as "white" by self-report and plotted the Principal Components (PC) provided by the UKBB, excluding individuals considered to be outliers according to PCs 1 and 2. Genetic relatedness calculated with KING kinship and provided by the UKBB (https://kenhanscombe.github.io/ukbtools/articles/explore-ukb-data.html ; http://www.ukbiobank.ac.uk/wp-content/uploads/2014/04/UKBiobank_genotyping_QC_documentation-web.pdf) was used to identify first and second-degree relatives. Subsequently ´families´ (i.e. clusters of related individuals above an IBD>0.125 threshold) were created and only one individual from each of these created ‘families’ was included in the analysis. If self-reported sex and SNP-based sex differed, individuals were excluded from further analysis. Single nucleotide polymorphisms (SNPs) with minor allele frequency <0.005, Hardy-Weinberg equilibrium test P value<1e−6, missing genotype rate >0.05, and imputation quality of INFO <0.8 were excluded. In the current study, all analyses are based on 342,461 participants of European ancestry for which both genotype data and sociability scores were available.
Genome-wide association analysis Genome-wide association analysis with the imputed marker dosages was performed in PLINK1.9, using a linear regression model with the sociability measure as the dependent variable and including sex, age, 10 first PCs, assessment center, and genotype batch as covariates. SNPs were considered significantly associated if they had p-value < 5e-8. Associated loci were considered independent of each other at r2 0.6 and lead SNPs were classified as the SNP with the smallest association p-value and at r2 0.1, using a 250kb window. The summary statistics come from the plink2 linear regression analysis.
Facebook
Twitter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary-level data as presented in: "Meta-analysis of genome-wide association studies for body fat distribution in 694,649 individuals of European ancestry." Pulit, SL et al. bioRxiv, 2018. https://www.biorxiv.org/content/early/2018/04/18/304030 **If you use these data, please cite the above preprint. If you have any questions or comments regarding these files, please contact me: Sara L Pulit spulit@well.ox.ac.uk or s.l.pulit@umcutrecht.nl ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (1) Data files i. whradjbmi.giant-ukbb.meta-analysis.combined.23May2018.txt Meta-analysis of waist-to-hip ratio adjusted for body mass index (whradjbmi) in UK Biobank and GIANT data. Combined set of samples, max N = 694,649. ii. whradjbmi.giant-ukbb.meta-analysis.females.23May2018.txt Meta-analysis of whradjbmi in UK Biobank and GIANT data. Female samples only, max N = 379,501. iii. whradjbmi.giant-ukbb.meta-analysis.males.23May2018.txt Meta-analysis of whradjbmi in UK Biobank and GIANT data. Male samples only, max N = 315,284. iv. whr.giant-ukbb.meta-analysis.combined.23May2018.txt Meta-analysis of waist-to-hip ratio (whr) in UK Biobank and GIANT data. Combined set of samples, max N = 697,734. v. whr.giant-ukbb.meta-analysis.females.23May2018.txt Meta-analysis of whr in UK Biobank and GIANT data. Female samples only, max N = 381,152. vi. whr.giant-ukbb.meta-analysis.males.23May2018.txt Meta-analysis of whr in UK Biobank and GIANT data. Male samples only, max N = 316,772. vii. bmi.giant-ukbb.meta-analysis.combined.23May2018.txt Meta-analysis of body mass index (bmi) in UK Biobank and GIANT data. Combined set of samples, max N = 806,834. viii. bmi.giant-ukbb.meta-analysis.females.23May2018.txt Meta-analysis of bmi in UK Biobank and GIANT data. Female samples only, max N = 434,794. ix. bmi.giant-ukbb.meta-analysis.males.23May2018.txt Meta-analysis of bmi in UK Biobank and GIANT data. Male samples only, max N = 374,756. (2) Data file format CHR: Chromosome POS: Chromosomal position of the SNP, build hg19 SNP: the dbSNP151 identifier of the SNP, followed by the first allele and second allele of the SNP, delimited with a colon. A small number of SNPs (<9,000) from the GIANT data had no dbSNP151 identifier, and are left as just an rsID. Note that these SNPs are also missing chromosome and position information (not provided in the GIANT data). Tested_Allele: the allele for which all association statistics are reported Other_Allele: the other allele at the SNP Freq_Tested_Allele: frequency of the tested allele BETA: the effect size of the tested allele SE: the standard error of the beta P: the p-value of the SNP, as reported from the inverse variance-weighted fixed effects meta-analysis N: the total sample size for this SNP INFO: the imputation quality (info score) of the SNP, as reported by UK Biobank. A number between 0 and 1 indicating quality of imputation (0, poor quality; 1, high quality or genotyped). Note that the summary-level GIANT data does not report info score, so SNPs appearing only in the GIANT analysis do not have info scores.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The human sex ratio (fraction of males) at birth is close to 0.5 at the population level, an observation commonly explained by Fisher's principle. However, past human studies yielded conflicting results regarding the existence of sex ratio-influencing mutations-a prerequisite to Fisher’s principle, raising the question of whether the nearly even population sex ratio is instead dictated by the random X/Y chromosome segregation in male meiosis. Here we show that, because a person’s offspring sex ratio (OSR) has an enormous measurement error, a gigantic sample is required to detect OSR-influencing genetic variants. Conducting a UK Biobank-based genome-wide association study that is more powerful than previous studies, we detect an OSR-associated genetic variant, which awaits verification in independent samples. Given the abysmal precision in measuring OSR, it is unsurprising that the estimated heritability of OSR is effectively zero. We further show that OSR’s estimated heritability would remain virtually zero even if OSR is as genetically variable as the highly heritable human standing height. These analyses, along with simulations of human sex ratio evolution under selection, demonstrate the compatibility of the observed genetic architecture of human OSR with Fisher’s principle and suggest the plausibility of presence of multiple human OSR-influencing genetic variants. Methods GWAS: When conducting the GWAS in the UKB, we did not simply use the sibling sex ratio as the trait, because of the difficulty in accounting for different estimation errors of the sibling sex ratio for different families as a result of the variation in family size. For example, individual A has one brother and zero sister, while individual B has four brothers and one sister. Although A has a higher sibling sex ratio than B, B’s siblings obviously provide stronger evidence for a male-biased sibling sex ratio than A’s siblings. To properly weigh the data by the family size, we considered the birth of each sibling as an independent event. In the above example, we would associate A’s genotype with one male birth and associate B’s genotype with four male births and one female birth. In GWAS, a male birth is coded as 1 and a female birth is coded as 0. The UKB participants have a total of 873,715 full siblings, leading to an unprecedented statistical power. In our GWAS in the UKB, we included genetic sex, year of birth, and the first ten genetic principle components as covariates. Gene-based test: We performed two gene-based association analyses. First, we analyzed the UKB-based GWAS summary statistics through the R package sumFREGAT for autosomal protein-coding genes (N = 17,389). All SNPs within the transcribed region of a gene derived from the European samples in the 1000 Genome Project were used in the test. We implemented the optimal unified test (SKAT-O), principal component analysis-based test (PCA), and aggregated Cauchy association test (ACAT-V) in sumFREGAT. For all three methods, weights were uniformly assigned for all alleles [beta.par = c(1, 1) in sumFREGAT] with other settings left at default values. Variant correlation matrix files (one file per gene) were needed for the gene-based analysis, and we used the pre-calculated matrices from 1KG European samples provided by the R-package development team (http://mga.bionet.nsc.ru/sumFREGAT). The input data were pre-processed using the R package function prep.score.files() with the reference file provided by the R-package development team (http://mga.bionet.nsc.ru/sumFREGAT). The P values in the three tests were then combined by the omnibus aggregated Cauchy association test (ACAT-O) in sumFREGAT. Second, we performed a gene-based burden test using rare missense variants (MAF < 1%) in the UKB whole exome sequencing data. The burden test assumes that rare variants are functionally disruptive and therefore have the same direction of effect. To properly weigh OSR of UKB participants by their heterogenous measurement errors, we generated a plink bed file that contained burden scores of all genes for all UKB individuals using the “--write-mask” option in REGENIE. The annotation file that specifies the functional class of each SNP and the corresponding gene required in this step was provided in the UKB Research Analysis Platform (see https://dnanexus.gitbook.io/uk-biobank-rap/science-corner/using-regenie-to-generate-variant-masks), which included protein coding genes in autosomes, X, and Y chromosomes (N = 18,845). We chose to include all loss-of-function and missense SNPs to calculate the burden score. In the default setting, the burden score is calculated as the maximum number of alternative alleles across sites of a gene, being 0, 1, or 2 (see REGENIE online documentation for details, https://rgcgithub.github.io/regenie/options/). We then used this gene-level bed file to perform association analysis on the sibling sex following the same procedure describe in the “GWAS” section. Simulating the genetic architecture of sex ratio following that of standing height To simulate the genetic architecture of sex ratio following that of human standing height, we obtained the hypothetical sex ratio of a participant of European ancestry in the UKB through the following four steps. First, we computed the hypothetical sex ratio of a participant by dividing the participant’s standing height by twice the mean standing height of all UKB participants of European ancestry. Second, we performed a multiple regression on hypothetical sex ratio; the independent variables included genetic sex, age, age squared, and the first ten genetic principal components but not SNPs. Third, we obtained the regression residual of each participant, which is the difference between the hypothetical sex ratio computed in the first step and that predicted by the multiple regression model in the second step. Fourth, the covariate-corrected hypothetic sex ratio was set to be the regression residual in the preceding step plus 0.5. GWAS was subsequently performed on the covariate-corrected hypothetic sex ratio. SNP-based heritability of the covariate-corrected hypothetical sex ratio was computed. Based on the covariate-corrected hypothetical sex ratio, we generated the sexes of each participant’s offspring with 20 replicates. To ensure comparability with the original GWAS data, we assumed that each participant had the same number of offspring as the number of siblings in the UKB. We then conducted a GWAS using the simulated sexes of all offspring and estimated the SNP-based heritability of the estimated hypothetical sex ratio. Simulations of human sex ratio evolution We used SLiM 3 to simulate sex ratio evolution in humans. A non-Wright-Fisher model with separate sexes and non-overlapping generations was enabled in the simulation, along with the human demographic history described by the default example code in SLiM 3 (see SLiM manual, https://messerlab.org/slim/, p. 136-142). The diploid genome has a pair of 1000-nt chromosomes, and the recombination rate is 1×10-3 per site per generation such that one recombination per chromosome per generation is expected. In every generation, males and females will mate randomly, and each mating will result in one offspring. The random mating continues until the number of offspring matches the expected population size in the next generation. To achieve the mutation-drift-selection equilibrium, the population was pre-evolved for 73,105 generations (10 times the effective population size) in every simulation. The mutation rate varied from 1×10-6 to 1×10-2 per genome per generation. The mean mutation size () varied from 0.00125 to 0.16. Given , the actual size of a mutation is sampled from an exponential distribution with a mean of . The genetic effect of the mutation is set to be paternal. Thirty simulation replications were performed for each combination of mutation rate and mean size. Under the directional selection scenario, we assumed that the optimal OSR changed from the default value of 0.5 to around 0.52 at 800,000 years before present. To set the optimal OSR at around 0.52, we introduced unbalanced parental investments by reduce the future mating probability of individuals who have had daughters: future mating probability = 1 – 0.1 × number of daughters. The optimal OSR is 0.524, which was estimated by averaging sex ratios at the last 10 time points of 10-generation intervals in all simulations where mutation rate is 0.01 and mean mutation size is 0.00125, 0.0025, or 0.005. The heritability of sex ratio (with measurement error) was calculated by dividing the variance of genetically expected sex by the variance of observed sex. To obtain the number of detectable variants, we used the UKB statistical power map generated earlier (Fig. 1c). A SNP was considered detectable if its detectability exceeded 0.9. Key statistics such as the heritability of sex ratio (with measurement error), number of detectable variants, and number of variants in each simulation replicate were calculated by averaging sex ratios at the final 10 time points where consecutive time points were separated by 10 generations. These statistics from the 30 replicates were used to plot the mean, maximum, and minimum in Fig. 4.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Major depression is a debilitating psychiatric illness that is typically associated with low mood, anhedonia and a range of comorbidities. Depression has a heritable component that has remained difficult to elucidate with current sample sizes due to the polygenic nature of the disorder. To maximise sample size, we meta-analysed data on 807,553 individuals (246,363 cases and 561,190 controls) from the three largest genome-wide association studies of depression. We identified 102 independent variants, 269 genes, and 15 gene-sets associated with depression, including both genes and gene-pathways associated with synaptic structure and neurotransmission. Further evidence of the importance of prefrontal brain regions in depression was provided by an enrichment analysis. In an independent replication sample of 1,306,354 individuals (414,055 cases and 892,299 controls), 87 of the 102 associated variants were significant following multiple testing correction. Based on the putative genes associated with depression this work also highlights several potential drug repositioning opportunities. These findings advance our understanding of the complex genetic architecture of depression and provide several future avenues for understanding aetiology and developing new treatment approaches. The data contained in this item is described in a published manuscript located at https://doi.org/10.1038/s41593-018-0326-7.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of genome-wide association meta-analysis summary statistics associated with the publication 'Insights into the genetic basis of retinal detachment' available at HMG: DOI: 10.1093/hmg/ddz294. If you use this dataset, please cite the manuscript.
Facebook
TwitterLD blocks based on 20,000 European individuals from the UK Biobank (split by chromosome), with about 1.5 million SNPs based on HapMap3 and MEGA chips