Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionThe UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names.Resultsukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata.ConclusionHaving a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary-level data as presented in: "Meta-analysis of genome-wide association studies for body fat distribution in 694,649 individuals of European ancestry." Pulit, SL et al. bioRxiv, 2018. https://www.biorxiv.org/content/early/2018/04/18/304030 **If you use these data, please cite the above preprint. If you have any questions or comments regarding these files, please contact me: Sara L Pulit spulit@well.ox.ac.uk or s.l.pulit@umcutrecht.nl ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (1) Data files i. whradjbmi.giant-ukbb.meta-analysis.combined.23May2018.txt Meta-analysis of waist-to-hip ratio adjusted for body mass index (whradjbmi) in UK Biobank and GIANT data. Combined set of samples, max N = 694,649. ii. whradjbmi.giant-ukbb.meta-analysis.females.23May2018.txt Meta-analysis of whradjbmi in UK Biobank and GIANT data. Female samples only, max N = 379,501. iii. whradjbmi.giant-ukbb.meta-analysis.males.23May2018.txt Meta-analysis of whradjbmi in UK Biobank and GIANT data. Male samples only, max N = 315,284. iv. whr.giant-ukbb.meta-analysis.combined.23May2018.txt Meta-analysis of waist-to-hip ratio (whr) in UK Biobank and GIANT data. Combined set of samples, max N = 697,734. v. whr.giant-ukbb.meta-analysis.females.23May2018.txt Meta-analysis of whr in UK Biobank and GIANT data. Female samples only, max N = 381,152. vi. whr.giant-ukbb.meta-analysis.males.23May2018.txt Meta-analysis of whr in UK Biobank and GIANT data. Male samples only, max N = 316,772. vii. bmi.giant-ukbb.meta-analysis.combined.23May2018.txt Meta-analysis of body mass index (bmi) in UK Biobank and GIANT data. Combined set of samples, max N = 806,834. viii. bmi.giant-ukbb.meta-analysis.females.23May2018.txt Meta-analysis of bmi in UK Biobank and GIANT data. Female samples only, max N = 434,794. ix. bmi.giant-ukbb.meta-analysis.males.23May2018.txt Meta-analysis of bmi in UK Biobank and GIANT data. Male samples only, max N = 374,756. (2) Data file format CHR: Chromosome POS: Chromosomal position of the SNP, build hg19 SNP: the dbSNP151 identifier of the SNP, followed by the first allele and second allele of the SNP, delimited with a colon. A small number of SNPs (<9,000) from the GIANT data had no dbSNP151 identifier, and are left as just an rsID. Note that these SNPs are also missing chromosome and position information (not provided in the GIANT data). Tested_Allele: the allele for which all association statistics are reported Other_Allele: the other allele at the SNP Freq_Tested_Allele: frequency of the tested allele BETA: the effect size of the tested allele SE: the standard error of the beta P: the p-value of the SNP, as reported from the inverse variance-weighted fixed effects meta-analysis N: the total sample size for this SNP INFO: the imputation quality (info score) of the SNP, as reported by UK Biobank. A number between 0 and 1 indicating quality of imputation (0, poor quality; 1, high quality or genotyped). Note that the summary-level GIANT data does not report info score, so SNPs appearing only in the GIANT analysis do not have info scores.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Major depression is a debilitating psychiatric illness that is typically associated with low mood, anhedonia and a range of comorbidities. Depression has a heritable component that has remained difficult to elucidate with current sample sizes due to the polygenic nature of the disorder. To maximise sample size, we meta-analysed data on 807,553 individuals (246,363 cases and 561,190 controls) from the three largest genome-wide association studies of depression. We identified 102 independent variants, 269 genes, and 15 gene-sets associated with depression, including both genes and gene-pathways associated with synaptic structure and neurotransmission. Further evidence of the importance of prefrontal brain regions in depression was provided by an enrichment analysis. In an independent replication sample of 1,306,354 individuals (414,055 cases and 892,299 controls), 87 of the 102 associated variants were significant following multiple testing correction. Based on the putative genes associated with depression this work also highlights several potential drug repositioning opportunities. These findings advance our understanding of the complex genetic architecture of depression and provide several future avenues for understanding aetiology and developing new treatment approaches. The data contained in this item is described in a published manuscript located at https://doi.org/10.1038/s41593-018-0326-7.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Aggregated UK Biobank clinical assessments and neuroimaging biomarkers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"meta.tar" archive contains 249 files representing UK Biobank + Estonian Biobank sex-combined GWAS meta-analysis summary statistics for the Nightingale panel of 249 circulating plasma metabolic markers presented in "Pleiotropic and sex-specific genetic architecture of circulating metabolic markers" [https://doi.org/10.1101/2024.07.30.24311254].
Each file contains nine columns:
SNP: ID of the genetic marker;
CHR: chromosome code (GRCh37 genomic build);
BP: base-pair coordinate (GRCh37 genomic build);
PVAL: regression p-value;
A1: effect allele;
A2: other allele;
N: sample size;
BETA: regression coefficient for effect allele (A1);
SE: standard error of regression coefficient (BETA).
https://ega-archive.org/dacs/EGAC50000000050https://ega-archive.org/dacs/EGAC50000000050
This is a meta-analysis of myeloma datasets, both with and without the UK Biobank cohort included.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Library of the 55 different classification and regression machine-learning algorithms used by the ensemble predictor SuperLearner (SL.library) in the CBDA 2.0 implementation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The total number of subsamples M = 5,000.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains results of a genome-wide association study of back pain. Two files contain association summary statistics for discovery GWAS based on the analysis of 350,000 white British individuals from the UK Biobank and meta-analysis GWAS based on the meta-analysis of the same 350,000 individuals and additional 103,862 individuals of European Ancestry from the UK biobank (total N = 453,862). The phenotype of back pain was defined by the answer provided by the UK biobank participants to the following question: "Pain type(s) experienced in last month". Those who reported “Back pain”, were considered as cases, all the rest were considered as controls. Individuals who did not reply or replied: "Prefer not to answer" or "Pain all over the body" were excluded. This dataset is also available for graphical exploration in the genomic context at http://gwasarchive.org.
The data are provided on an "AS-IS" basis, without warranty of any type, expressed or implied, including but not limited to any warranty as to their performance, merchantability, or fitness for any particular purpose. If investigators use these data, any and all consequences are entirely their responsibility. By downloading and using these data, you agree that you will cite the appropriate publication in any communications or publications arising directly or indirectly from these data; for utilisation of data available prior to publication, you agree to respect the requested responsibilities of resource users under 2003 Fort Lauderdale principles; you agree that you will never attempt to identify any participant. This research has been conducted using the UK Biobank Resource and the use of the data is guided by the principles formulated by the UK Biobank.
When using downloaded data, please cite corresponding paper and this repository:
Funding:
This study was supported by the European Community’s Seventh Framework Programme funded project PainOmics (Grant agreement # 602736).
The research has been conducted using the UK Biobank Resource (project # 18219).
The development of software implementing SMR/HEIDI test and database for GWAS results was supported by the Russian Ministry of Science and Education under the 5-100 Excellence Program”.
Dr. Suri’s time for this work was supported by VA Career Development Award # 1IK2RX001515 from the United States (U.S.) Department of Veterans Affairs Rehabilitation Research and Development Service. The contents of this work do not represent the views of the U.S. Department of Veterans Affairs or the United States Government.
Dr. Tsepilov’s time for this work was supported in part by the Russian Ministry of Science and Education under the 5-100 Excellence Program.
Column headers - discovery (350K)
Column headers - meta-analysis (450K)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GWAS summary statistics from anxiety disorders meta-analysis (combining genome-wide associations from UK Biobank, iPSYCH, ANGST, MVP, and FINNGEN)
Summary statistics are stored as compressed .zip files. Unzipped files can be opened with any text editor, Excel, or read into R programming environment (or equivalent data analysis software)' note the files are space-separated and have a header row. Each row in the files contains information for a single SNP. All authors have approved sharing the summary statistics of the analysis. The original individual datasets are only available from request from the original authors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A total of 12 novel testosterone-associated loci with leading bi-allelic SNPs uniquely identified by the recommended T2,metaQ but missed by any other methods in the UK Biobank data application.
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Background Lean body mass is a crucial physiological component of body composition. Although lean body mass has a high heritability, studies evaluating the genetic determinants of lean mass (LM) have to date been limited largely to genome-wide association studies (GWAS) and common variants. Using whole genome sequencing (WGS)-based studies, we aimed to discover novel genetic variants associated with LM in population-based cohorts with multiple ancestries. Results We describe the largest WGS-based meta-analysis of lean body mass to date, encompassing 10,729 WGS samples from six TOPMed cohorts and the Louisiana Osteoporosis Study (LOS) cohort, measured with dual-energy X-ray absorptiometry. We identify seven genome-wide loci significantly associated with LM not reported by previous GWAS. We partially replicate these associations in UK Biobank samples. In rare variant analysis, we discover one novel protein-coding gene, DMAC1, associated with both whole-body LM and appendicular LM in females, and a long non-coding RNA gene linked to appendicular LM in males. Both genes exhibit notably high expression levels in skeletal muscle tissue. We investigate the functional roles of two novel lean-mass-related genes, EMP2 and SSUH2, in animal models. EMP2 deficiency in Drosophila leads to significantly reduced mobility without altering muscle tissue or body fat morphology, whereas an SSUH2 gene mutation in zebrafish stimulates muscle fiber growth. Conclusions Our comprehensive analysis, encompassing a large-scale WGS meta-analysis and functional investigations, reveals novel genomic loci and genes associated with lean mass traits, shedding new insights into pathways influencing muscle metabolism and muscle mass regulation.
This dataset includes summary statistics from Raynaud's syndrome meta-analysis conducted from GWAS summary statistics of four independent population cohorts: The UK Biobank, FinnGen data freeze 10, The Estonian Biobank and The Mass-General Brigham Biobank. Details of the phenotype used and the GWAS performed in the individual cohorts can be found from the corresponding publication. The meta-analysis was conducted using METAL (https://genome.sph.umich.edu/wiki/METAL_Documentation) with the standard settings and the final summary statistics are displayed in GRCh38 (for further details, see the corresponding publication).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundRecent studies have demonstrated the relevance of circulating factors in the occurrence and development of colorectal cancer (CRC); however, the causal relationship remains unclear.MethodsSummary-level data for CRC were obtained from the UK Biobank (5,657 cases and 372,016 controls), FinnGen cohort (3,022 cases and 215,770 controls), and BioBank Japan Project (BBJ, 7,062 cases and 195,745 controls). Thirty-two peripheral markers with consistent definitions were collected from the three biobanks. Mendelian randomization (MR) was used to evaluate the causal effect of circulating factors on CRC. The effects from the three consortiums were combined using trans-ancestry meta-analysis methods.ResultsOur analysis provided compelling evidence for the causal association of higher genetically predicted eosinophil cell count (EOS, odds ratio [OR], 0.8639; 95% confidence interval [CI] 0.7922–0.9421) and red cell distribution width (RDW, OR, 0.9981; 95% CI, 0.9972–0.9989) levels with a decreased risk of CRC. Additionally, we found suggestive evidence indicating that higher levels of total cholesterol (TC, OR, 1.0022; 95% CI, 1.0002–1.0042) may increase the risk of CRC. Conversely, higher levels of platelet count (PLT, OR, 0.9984; 95% CI, 0.9972–0.9996), total protein (TP, OR, 0.9445; 95% CI, 0.9037–0.9872), and C-reactive protein (CRP, OR, 0.9991; 95% CI, 0.9983–0.9999) may confer a protective effect against CRC. Moreover, we identified six ancestry-specific causal factors, indicating the necessity of considering patients’ ancestry backgrounds before formulating prevention strategies.ConclusionsMR findings support the independent causal roles of circulating factors in CRC, which might provide a deeper insight into early detection of CRC and supply potential preventative strategies.
Globally, tuberculosis (TB) presents with a clear male bias that cannot be completely accounted for by environment, behaviour, socioeconomic factors, or the impact of sex hormones on the immune system. This suggests that genetic and biological differences, which may be mediated by the X chromosome, further influence the observed male sex bias. The X chromosome is heavily implicated in immune function and yet has largely been ignored in previous association studies. Here we report the first multi-ancestry X chromosome specific meta-analysis on TB susceptibility. We identified X-linked TB susceptibility variants using seven genotyping data sets and 20,255 individuals from diverse genetic ancestries. Sex-specific effects were also identified in polygenic heritability between males and females along with enhanced concordance in direction of genetic effects for males but not females. These sex-specific genetic effects were supported by a sex-stratified and combined meta-analysis conducted us...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A total of 19 bi-allelic SNPs exhibiting genome-wide significant effects in opposite directions between sexes, all associated with urate levels in the UK Biobank data application.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises summary statistics from the meta-GWAS of 17 studies on age-related hearing impairement. The dataset accompanies the following paper:
Genome-wide association meta-analysis identifies 48 risk variants and highlights the role of the stria vascularis in age-related hearing impairment
Please cite the paper if using this dataset.
Phenotype of ARHI was established using ICD diagnoses and self-reported hearing loss. The study comprised 148,152 cases and 575,472 controls or European ancestry. Adult male and female participants were included from the following 17 population-based cohort studies: Age, Genes/Environment Susceptibility - Reykjavik (AGES), the Danish Twin Registry (DTR), the Estonian Genome Center at the University of Tartu (EGCUT), FinnGen, Framingham Heart Study (FHS), Health Aging and Body Composition (HABC), Italian Network of Genetic Isolates - Friuli Venezia Giulia (INGI-FVG), the Rotterdam Study (RS, cohorts 1 - 3), the Salus in Apulia study (SA; formerly known as Great Age study), Screening Across the Lifespan Twin (SALT and SALTY - young), Screening Twin Adults: Genes and Environment (STAGE), TwinsUK, UK Biobank (UKBB), and the Women’s Genome Health Study (WGHS).
UK Biobank data have been used under project #11516.
Individual GWASs have been QC'd and harmonyzed using EasyQC followed by fixed-effects IVW meta-analysis using METAL. The dataset includes the results of meta-analysis for n = 8,244,938 SNV with MAF >0.001 and present in at least 9 cohorts.
Dataset columns:
SNP, rsID
CHR, chromosome
BP, genomic position (hg19)
Allele1, effect allele
Allele2, other allele
Freq1, mean frequency of Allele1
FreqSE, standard error of Freq1
MinFreq, minimal frequency of Allele1 in the study cohorts
MaxFreq, maximal frequency of Allele1 in the study cohorts
Effect, effect size from the meta-analysis for Allele1
StdErr, standard error of Effect
P.value, corresponding p-value for meta-analysis
Direction, direction of effects in individual studies
HetISq, I2 statistic for heterogeneity between studies
HetChiSq, chi2 statistic for heterogeneity between studies
HetDf, degrees of freedom for the chi2 statistic
HetPval, p-value for heterogeneity between studies
N, summary sample size
Dataset columns description:
Seventeen studies included:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Age-related hearing impairment (ARHI), one of the most common sensory disorders, can be mitigated, but not cured or eliminated. To identify genetic influences underlying ARHI, we conducted a genome-wide association study of ARHI in 6,527 cases and 45,882 controls among the non-Hispanic whites from the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. We identified two novel genome-wide significant SNPs: rs4932196 (odds ratio = 1.185, p = 4.0x10-11), 52Kb 3’ of ISG20, which replicated in a meta-analysis of the other GERA race/ethnicity groups (1,025 cases, 12,388 controls, p = 0.00094) and in a UK Biobank case-control analysis (30,802 self-reported cases, 78,586 controls, p = 0.015); and rs58389158 (odds ratio = 1.132, p = 1.8x10-9), which replicated in the UK Biobank (p = 0.00021). The latter SNP lies just outside exon 8 and is highly correlated (r2 = 0.96) with the missense SNP rs5756795 in exon 7 of TRIOBP, a gene previously associated with prelingual nonsyndromic hearing loss. We further tested these SNPs in phenotypes from audiologist notes available on a subset of GERA (4,903 individuals), stratified by case/control status, to construct an independent replication test, and found a significant effect of rs58389158 on speech reception threshold (SRT; overall GERA meta-analysis p = 1.9x10-6). We also tested variants within exons of 132 other previously-identified hearing loss genes, and identified two common additional significant SNPs: rs2877561 (synonymous change in ILDR1, p = 6.2x10-5), which replicated in the UK Biobank (p = 0.00057), and had a significant GERA SRT (p = 0.00019) and speech discrimination score (SDS; p = 0.0019); and rs9493627 (missense change in EYA4, p = 0.00011) which replicated in the UK Biobank (p = 0.0095), other GERA groups (p = 0.0080), and had a consistent significant result for SRT (p = 0.041) and suggestive result for SDS (p = 0.081). Large cohorts with GWAS data and electronic health records may be a useful method to characterize the genetic architecture of ARHI.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A list of association methods considered in this analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionThe UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names.Resultsukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata.ConclusionHaving a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.