Facebook
TwitterGWAS Central (previously the Human Genome Variation database of Genotype-to-Phenotype information) is a database of summary level findings from genetic association studies, both large and small. It gathers datasets from public domain projects, and accepts direct data submission. It is based upon Marker information encompassing SNP and variant information from public databases, to which allele and genotype frequency data, and genetic association findings are additionally added. A Study (most generic level) contains one or more Experiments, one or more Sample Panels of test subjects, and one or more Phenotypes. This collection references a GWAS Central Marker.
Facebook
TwitterCatalog of published genome-wide association studies. Genome-wide set of genetic variants in different individuals to see if any variant is associated with trait and disease. Database of genome-wide association study (GWAS) publications including only those attempting to assay single nucleotide polymorphisms (SNPs). Publications are organized from most to least recent date of publication. Studies are identified through weekly PubMed literature searches, daily NIH-distributed compilations of news and media reports, and occasional comparisons with an existing database of GWAS literature (HuGE Navigator). Works with HANCESTRO ancestry representation.
Facebook
TwitterThe GWAS Catalog provides a consistent, searchable, visualisable and freely available database of published SNP-trait associations, which can be easily integrated with other resources, and is accessed by scientists, clinicians and other users worldwide.
Facebook
TwitterInformation about the 4 GWAS data sets used in this study.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome-wide association study results for the metabolism of phosphatidylcholine (PC) (16:0/16:0) and dulcitol.
Facebook
TwitterGWAS data sets with individual level data.
Facebook
TwitterPublicly available database of summary level findings from genetic association studies in humans, including genome wide association studies (GWAS). Previously named HGBASE, HGVbase and HGVbaseG2P.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The data includes genotypes of 482906 markers for 1,000 individuals. They come from a simulation based on the Illumina 650K human array, typically used for SNP genotyping.
In theory, it's easy to create such data, it's just columns with values of 0, 1 and 2, but what's important is the correlation structure that has been preserved here and corresponds to the real one.
The data can be used to test methods for finding significant SNPs. You can generate a trait based on the significant variables of your choice, and then try to find them using the chosen technique (which is not easy, due to the huge number of variables).
The y.txt file contains the trait I simulated based on the following list of 24 SNPs: - ch01_19810 - ch01_27796 - ch01_32763 - ch02_22034 - ch02_39189 - ch03_2703 - ch03_10846 - ch04_05127 - ch05_7371 - ch06_25838 - ch08_15190 - ch10_444 - ch10_8265 - ch11_12611 - ch11_20057 - ch12_3421 - ch14_6999 - ch15_3859 - ch16_4525 - ch17_4306 - ch18_1031 - ch19_1377 - ch19_6378 - ch22_33
See which ones you can find!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary information on the data from the genome-wide association studies used in the MR analysis.
Facebook
TwitterCharacteristics of GWAS data for exposure and outcomes.
Facebook
TwitterNon-alcoholic fatty liver disease (NAFLD) is a common liver disease; the histological spectrum of which ranges from steatosis to steatohepatitis. Nonalcoholic steatohepatitis (NASH) often leads to cirrhosis and development of hepatocellular carcinoma. To better understand pathogenesis of NAFLD, we performed the pathway of distinction analysis (PoDA) on a genome-wide association study dataset of 250 non-Hispanic white female adult patients with NAFLD, who were enrolled in the NASH Clinical Research Network (CRN) Database Study, to investigate whether biologic process variation measured through genomic variation of genes within these pathways was related to the development of steatohepatitis or cirrhosis. Pathways such as Recycling of eIF2:GDP, biosynthesis of steroids, Terpenoid biosynthesis and Cholesterol biosynthesis were found to be significantly associated with NASH. SNP variants in Terpenoid synthesis, Cholesterol biosynthesis and biosynthesis of steroids were associated with lobular inflammation and cytologic ballooning while those in Terpenoid synthesis were also associated with fibrosis and cirrhosis. These were also related to the NAFLD activity score (NAS) which is derived from the histological severity of steatosis, inflammation and ballooning degeneration. Eukaryotic protein translation and recycling of eIF2:GDP related SNP variants were associated with ballooning, steatohepatitis and cirrhosis. Il2 signaling events mediated by PI3K, Mitotic metaphase/anaphase transition, and Prostanoid ligand receptors were also significantly associated with cirrhosis. Taken together, the results provide evidence for additional ways, beyond the effects of single SNPs, by which genetic factors might contribute to the susceptibility to develop a particular phenotype of NAFLD and then progress to cirrhosis. Further studies are warranted to explain potential important genetic roles of these biological processes in NAFLD.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The real GWAS data for the seven common complex diseases.
Facebook
TwitterCombines collections of genetic variants (GVs) from GWAS and their comprehensive functional annotations, as well as disease classifications. Used to maximize utilility of GWAS data to gain biological insights through integrative, multi-dimensional functional annotation portal. In addition to all GVs annotated in NHGRI GWAS Catalog, we manually curate GVs that are marginally significant (P value < 10-3) by looking into supplementary materials of each original publication and provide extensive functional annotations for these GVs. GVs are manually classified by diseases according to Disease Ontology Lite and HPO (Human Phenotype Ontology) for easy access. Database can also conduct gene based pathway enrichment and PPI network association analysis for those diseases with sufficient variants. SOAP services are available. You may Download GWASdb SNP. (This file contains all of the significant SNP in GWASdb. In the pvalue column, 0 means this P-value is not reported in the study but it is significant SNP. In the source column, GWAS:A represents the original data in GWAS catalog, while GWAS:B is our curation data which P-value < 10-3)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GWAS summary statistics for 72 long-term conditions. Up to three sources of genetics data are used, depending on the condition: UK Biobank, FinnGen, and consortium-published meta-analyses (where available).
If you use these resources please cite the below and include the resource release version:
Murrin et al. (2024) A systematic analysis of the contribution of genetics to multimorbidity and comparisons with primary care data. eBioMedicine. https://doi.org/10.1016/j.ebiom.2025.105584
See our GitHub repos for more information: https://github.com/GEMINI-multimorbidity
Summary information
See the `conditions.txt` file for a list of conditions included, plus file suffix, studies included, and effective sample size.
GWAS files are provided in GWAS catalog format, with positions mapped to build 37.
For each GWAS file there is a README, detailing the source of the summary statistics: 1) UK Biobank [UKB], a large population-based prospective study with 450,197 individuals of European genetic ancestry. 2) FinnGen, a large-scale genomics initiative including over 500,000 participants with linked health diagnosis data. 3) Disease-specific GWAS meta-analyses summary statistics when available for each LTC. See the GEMINI GitHub for details on LTC diagnostic codes: https://github.com/GEMINI-multimorbidity
An extract of the methods from Murrin 2025 (https://doi.org/10.1016/j.ebiom.2025.105584) are included below:
UK Biobank data
To perform the genetic analyses we ascertained diagnosis of LTCs using both primary-care linked data (available for 45% of participants, censoring date: 28/02/2016 – Read v2 and CTV3 codes, truncated to 5 bytes) and hospital inpatient diagnoses (available for all participants, censoring date: 31/10/2022 - ICD-10 codes). Participants were genotyped using two near identical (>95% shared variants, n=805,426 total) microarray platforms: the Affymetrix Axiom UK Biobank array (in 438,427 participants) and the Affymetrix UKBiLEVE array (in 49,950 participants). UK Biobank centrally performed genotype imputation in 487,442 participants using data from the Haplotype Reference Consortium and UK10K reference panels, increasing the number of genetic variants to ~96 million.8 We exclude genetic variants with <0.1% minor allele frequency or with imputed INFO score <0.3, leaving ~16 million for GWAS analysis. GWAS were performed in up to 451,197 participants genetically similar to the 1000 Genomes EUR population (described previously.9 In brief, individuals from the UK Biobank were projected into the 1000 Genomes principal component (PC) space using the SNP loadings derived from the initial PC analysis to minimise confounding of PC values due to varying degrees of relatedness within UK Biobank.10 Using the means derived from the 1000 Genomes reference dataset, we subsequently performed K-means clustering analyses to determine which individuals from UK Biobank could be classified as EUR-like. GWAS were performed in UKB participants genetically similar to the 1000 Genomes EUR reference population for 84 LTCs, using the same clinical code lists as above in CPRD, using the REGENIE software (v3.1.3) to account for population structure and relatedness, adjusted for age at baseline assessment, sex, genotyping chip, and assessment centre. 11 For quality control, we restricted variants to those with a minor allele frequency (MAF) of >0.1%, and an imputation INFO score ≥0.3.
FinnGen data
FinnGen is a large-scale genomics initiative, that contains data from over 500,000 participants and is linked to health diagnosis data. GWAS summary statistics from the FinnGen cohort (release 9) with 377,277 participants, provided for predetermined disease (“endpoints”), defined using ICD-10-FM (Finnish Modification). 12
Disease-specific GWAS
Disease-specific GWAS meta-analyses summary statistics when available for each LTC. We used the GWAS Catalog (https://www.ebi.ac.uk/gwas), 13 disease-specific public repositories and contacted authors of the latest GWAS to identify relevant studies with aligned disease definitions and participants of European ancestry to enable comparison with UKB and FinnGen. The below LTCs had available published and available GWAS summary statistics and were used in the genetics analysis (see Supplementary Table 1 for further information).
• Anxiety disorders.14
• Asthma.15
• Atrial fibrillation.16
• Chronic kidney disease.17
• Chronic obstructive pulmonary disease.18
• Coronary heart disease.19
• Depression.20
• Erectile dysfunction.21
• Gastro-oesophageal reflux disease.22
• Glaucoma.23
• Gout.24
• Hearing loss.25
• Heart failure.26
• Hyperthyroidism, hypothyroidism.27
• Irritable bowel syndrome.28
• Migraine.29
• Osteoarthritis.30
• Primary breast malignancy.31
• Rheumatoid arthritis.32
• Schizophrenia, schizotypal and delusional disorders.33
• Type 2 diabetes.34
• Ulcerative colitis.35
GWAS meta-analysis
For the 72 conditions meeting the heritability criteria above, we meta-analysed genome-wide summary data from up to 3 data sources – UKB, FinnGen and disease-specific GWAS (referred to as Consortium data). See Supplementary Figure 2 for analysis flowchart, and Supplementary Table 1 for effective sample size and other information. A cross-trait LD-score regression framework, that estimates the within-condition, between-dataset genetic correlation, measured the similarity between conditions. 40 The FinnGen and Consortium data were added to the meta-analysis when within-condition genetic correlation (R_g) with UK Biobank was >0.8. Where consortium data included UK Biobank or FinnGen data, the consortium data was used to avoid overlapping datasets (i.e., if UKB was in the consortium GWAS, then we only meta-analysed consortium+FinnGen). Studies were meta-analysed using GWAMA. 41
References
1 Amell A, Roso-Llorach A, Palomero L, et al. Disease networks identify specific conditions and pleiotropy influencing multimorbidity in the general population. Sci Rep 2018; 8: 15970.
2 Fadason T, Schierding W, Lumley T, O’Sullivan JM. Chromatin interactions and expression quantitative trait loci reveal genetic drivers of multimorbidities. Nat Commun 2018; 9: 5198.
3 Dong G, Feng J, Sun F, Chen J, Zhao X-M. A global overview of genetically interpretable multimorbidities among common diseases in the UK Biobank. Genome Med 2021; 13: 110.
4 Kim S-S, Hudgins AD, Gonzalez B, et al. A Compendium of Age-Related PheWAS and GWAS Traits for Human Genetic Association Studies, Their Networks and Genetic Correlations. Front Genet 2021; 12. DOI:10.3389/fgene.2021.680560.
5 West CE, Karim M, Falaguera MJ, et al. Integrative GWAS and co-localisation analysis suggests novel genes associated with age-related multimorbidity. Sci Data 2023; 10: 655.
6 Recalde M, Rodríguez C, Burn E, et al. Data Resource Profile: The Information System for Research in Primary Care (SIDIAP). Int J Epidemiol 2022; 51: e324–36.
7 Sudlow C, Gallacher J, Allen N, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med 2015; 12: e1001779.
8 Bycroft C, Freeman C, Petkova D, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018; 562: 203–9.
9 Casanova F, Tian Q, Atkins JL, et al. Iron and risk of dementia: Mendelian randomisation analysis in UK Biobank. J Med Genet 2024; : jmg-2023-109295.
10 Fairley S, Lowy-Gallego E, Perry E, Flicek P. The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res 2020; 48: D941–7.
11 Mbatchou J, Barnard L, Backman J, et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet 2021; 53: 1097–103.
12 Kurki MI, Karjalainen J, Palta P, et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023; 613: 508–18.
13 Sollis E, Mosaku A, Abid A, et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res 2023; 51: D977–85.
14 Otowa T, Hek K, Lee M, et al. Meta-analysis of genome-wide association studies of anxiety disorders. Mol Psychiatry 2016; 21: 1391–9.
15 Olafsdottir TA, Theodors F, Bjarnadottir K, et al. Eighty-eight variants highlight the role of T cell regulation and airway remodeling in asthma pathogenesis. Nat Commun 2020; 11. DOI:10.1038/S41467-019-14144-8.
16 Roselli C, Chaffin MD, Weng LC, et al. Multi-ethnic genome-wide association study for atrial fibrillation. Nat Genet 2018; 50: 1225–33.
17 Wuttke M, Li Y, Li M, et al. A catalog of genetic loci associated with kidney function from analyses of a million individuals. Nat Genet 2019; 51: 957–72.
18 Sakornsakolpat P, Prokopenko D, Lamontagne M, et al. Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. Nat Genet 2019; 51: 494–505.
19 Aragam KG, Jiang T, Goel A, et al. Discovery and systematic characterization of risk variants and genes for coronary artery disease in over a million
Facebook
TwitterThis deposit provides full details of the genome wide association study (GWAS) pipeline developed by the MRC-IEU for the full UK Biobank (version 3, March 2018) genetic data. For any issues with use of this documentation please contact: mrc-ieu@bristol.ac.uk. This dataset supersedes the earlier version at https://doi.org/10.5523/bris.2fahpksont1zi26xosyamqo8rr
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
bedrock-bio/gwas-catalog dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
This dataset corresponds to the data associated with the study: "Disentangling group specific QTL allele effects from genetic background epistasis in GWAS: an application to maize flowering" by Rio et al. (2019). The data includes genotypic and phenotypic information for a panel of 970 maize inbred lines, including 300 dent, 304 flint and 366 admixed lines that were generated by mating dent and flint lines. The R code used to run GWAS analyses and GWAS results were also included to this dataset.
Facebook
TwitterThe Epilepsy Genetic Association Database (epiGAD) is an online repository of data relating to genetic association studies in the field of epilepsy. It summarizes the results of both published and unpublished studies, and is intended as a tool for researchers in the field to keep abreast of recent studies, providing a bird''s eye view of this research area. The goal of epiGAD is to collate all association studies in epilepsy in order to help researchers in this area identify all the available gene-disease associations. Finally, by including unpublished studies, it hopes to reduce the problem of publication bias and provide more accurate data for future meta-analyses. It is also hoped that epiGAD will foster collaboration between the different epilepsy genetics groups around the world, and faciliate formation of a network of investigators in epilepsy genetics. There are 4 databases within epiGAD: - the susceptibility genes database - the epilepsy pharmacogenetics database - the meta-analysis database - the genome-wide association studies (GWAS) database The susceptibility genes database compiles all studies related to putative epilepsy susceptibility genes (eg. interleukin-1-beta in TLE), while the pharmacogenetics studies in epilepsy (eg. ABCB1 studies) are stored in ''phamacogenetics''. The meta-analysis database compiles all existing published epilepsy genetic meta-analyses, whether for susceptibility genes, or pharmacogenetics. The GWAS database is currently empty, but will be filled once GWAS are published. Sponsors: The epiGAD website is supported by the ILAE Genetics Commission.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome-wide association studies (GWAS) have identified hundreds of SNPs responsible for variation in human quantitative traits. However, genome-wide-significant associations often fail to replicate across independent cohorts, in apparent inconsistency with their apparent strong effects in discovery cohorts. This limited success of replication raises pervasive questions about the utility of the GWAS field. We identify all 332 studies of quantitative traits from the NHGRI-EBI GWAS Database with attempted replication. We find that the majority of studies provide insufficient data to evaluate replication rates. The remaining papers replicate significantly worse than expected (p < 10−14), even when adjusting for regression-to-the-mean of effect size between discovery- and replication-cohorts termed the Winner’s Curse (p < 10−16). We show this is due in part to misreporting replication cohort-size as a maximum number, rather than per-locus one. In 39 studies accurately reporting per-locus cohort-size for attempted replication of 707 loci in samples with similar ancestry, replication rate matched expectation (predicted 458, observed 457, p = 0.94). In contrast, ancestry differences between replication and discovery (13 studies, 385 loci) cause the most highly-powered decile of loci to replicate worse than expected, due to difference in linkage disequilibrium.
Facebook
TwitterGWAS Central (previously the Human Genome Variation database of Genotype-to-Phenotype information) is a database of summary level findings from genetic association studies, both large and small. It gathers datasets from public domain projects, and accepts direct data submission. It is based upon Marker information encompassing SNP and variant information from public databases, to which allele and genotype frequency data, and genetic association findings are additionally added. A Study (most generic level) contains one or more Experiments, one or more Sample Panels of test subjects, and one or more Phenotypes. This collection references a GWAS Central Marker.