Catalog of published genome-wide association studies. Genome-wide set of genetic variants in different individuals to see if any variant is associated with trait and disease. Database of genome-wide association study (GWAS) publications including only those attempting to assay single nucleotide polymorphisms (SNPs). Publications are organized from most to least recent date of publication. Studies are identified through weekly PubMed literature searches, daily NIH-distributed compilations of news and media reports, and occasional comparisons with an existing database of GWAS literature (HuGE Navigator). Works with HANCESTRO ancestry representation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GWAS SVatalog is a novel visualization tool and database for structural variants (SV) found in a predominantly European population of 101 individuals with Cystic Fibrosis (CF). Aside from the CF-causing variants on chromosome 7 and the LD block in which they lie, the remainder of the genome is comparable to a the 1000 Genomes healthy European population. This data is a collection of SV calls and their linkage disequilibrium (LD) statistics with GWAS-significant SNPs reported in the GWAS Catalog.
The goal of this project is to provide a resource to aid fine mapping of GWAS loci using SVs. GWAS loci are generally identified by SNPs which account for an incomplete proportion of genetic variation and phenotypic heritability. Their relevance to the phenotype might be limited, tagging other polymorphisms, such as SVs, that could be the cause of the association signal. To leverage this data to its full potential, visit the GWAS SVatalog web tool. Here, interactive visualizations can illustrate SVs identified in high LD with GWAS-significant SNPs, suggesting putative causal variation that could guide additional functional investigation.
For more information on how to use GWAS SVatalog, visit the documentation.
This project was accomplished in collaboration with the Strug Lab at The Hospital for Sick Children (SickKids), The Center for Applied Genomics (TCAG), and University of Toronto.
GWAS Central (previously the Human Genome Variation database of Genotype-to-Phenotype information) is a database of summary level findings from genetic association studies, both large and small. It gathers datasets from public domain projects, and accepts direct data submission. It is based upon Marker information encompassing SNP and variant information from public databases, to which allele and genotype frequency data, and genetic association findings are additionally added. A Study (most generic level) contains one or more Experiments, one or more Sample Panels of test subjects, and one or more Phenotypes. This collection references a GWAS Central Marker.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Rows are as follows: (1) “Totals”: number of samples of a given ancestry in analyzed papers, with redundancy between studies published multiple times; (2) “Rate in GWAS”: percentage of total samples considered that were of this ancestry; (3) “Rate in Population”: percentage of world’s population that is of this ancestry; (4) “Enrichment in GWAS”: relative over (or under) representation of ancestry in GWAS relative to its rate in the world. Ancestry labels are approximations with the standard correspondences to HapMap2 reference samples (European = CEU, East Asian = JPT+CHB, African = YRI); here, “African American” denotes samples reported with that nomenclature, which typically corresponds to 80:20 admixture between ancestral sub-Saharan African and Western European genetics [11]. All of these equivalences are oversimplifications but correspond to assumptions widely used in the field. Counts are computed from totals across all papers analyzed in this study, not adjusting for duplicate uses of the same datasets across multiple studies. Total sample sizes are maximum counts of samples assuming no per-genotype missingness is present. The totals are rounded to the nearest integer as several imputed studies reported nonintegral sample sizes. Row 3 percentages in world population are approximations based on demographic data from 2014–2015 [12, 13].
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary information on the data from the genome-wide association studies used in the MR analysis.
The GWAS Catalog provides a consistent, searchable, visualisable and freely available database of published SNP-trait associations, which can be easily integrated with other resources, and is accessed by scientists, clinicians and other users worldwide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome-wide association studies (GWAS) have identified hundreds of SNPs responsible for variation in human quantitative traits. However, genome-wide-significant associations often fail to replicate across independent cohorts, in apparent inconsistency with their apparent strong effects in discovery cohorts. This limited success of replication raises pervasive questions about the utility of the GWAS field. We identify all 332 studies of quantitative traits from the NHGRI-EBI GWAS Database with attempted replication. We find that the majority of studies provide insufficient data to evaluate replication rates. The remaining papers replicate significantly worse than expected (p < 10−14), even when adjusting for regression-to-the-mean of effect size between discovery- and replication-cohorts termed the Winner’s Curse (p < 10−16). We show this is due in part to misreporting replication cohort-size as a maximum number, rather than per-locus one. In 39 studies accurately reporting per-locus cohort-size for attempted replication of 707 loci in samples with similar ancestry, replication rate matched expectation (predicted 458, observed 457, p = 0.94). In contrast, ancestry differences between replication and discovery (13 studies, 385 loci) cause the most highly-powered decile of loci to replicate worse than expected, due to difference in linkage disequilibrium.
Publicly available database of summary level findings from genetic association studies in humans, including genome wide association studies (GWAS). Previously named HGBASE, HGVbase and HGVbaseG2P.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains results of a genome-wide association study of back pain. Two files contain association summary statistics for discovery GWAS based on the analysis of 350,000 white British individuals from the UK Biobank and meta-analysis GWAS based on the meta-analysis of the same 350,000 individuals and additional 103,862 individuals of European Ancestry from the UK biobank (total N = 453,862). The phenotype of back pain was defined by the answer provided by the UK biobank participants to the following question: "Pain type(s) experienced in last month". Those who reported “Back pain”, were considered as cases, all the rest were considered as controls. Individuals who did not reply or replied: "Prefer not to answer" or "Pain all over the body" were excluded. This dataset is also available for graphical exploration in the genomic context at http://gwasarchive.org.
The data are provided on an "AS-IS" basis, without warranty of any type, expressed or implied, including but not limited to any warranty as to their performance, merchantability, or fitness for any particular purpose. If investigators use these data, any and all consequences are entirely their responsibility. By downloading and using these data, you agree that you will cite the appropriate publication in any communications or publications arising directly or indirectly from these data; for utilisation of data available prior to publication, you agree to respect the requested responsibilities of resource users under 2003 Fort Lauderdale principles; you agree that you will never attempt to identify any participant. This research has been conducted using the UK Biobank Resource and the use of the data is guided by the principles formulated by the UK Biobank.
When using downloaded data, please cite corresponding paper and this repository:
Funding:
This study was supported by the European Community’s Seventh Framework Programme funded project PainOmics (Grant agreement # 602736).
The research has been conducted using the UK Biobank Resource (project # 18219).
The development of software implementing SMR/HEIDI test and database for GWAS results was supported by the Russian Ministry of Science and Education under the 5-100 Excellence Program”.
Dr. Suri’s time for this work was supported by VA Career Development Award # 1IK2RX001515 from the United States (U.S.) Department of Veterans Affairs Rehabilitation Research and Development Service. The contents of this work do not represent the views of the U.S. Department of Veterans Affairs or the United States Government.
Dr. Tsepilov’s time for this work was supported in part by the Russian Ministry of Science and Education under the 5-100 Excellence Program.
Column headers - discovery (350K)
Column headers - meta-analysis (450K)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study aimed to identify susceptibility genes and pathways associated with ankylosing spondylitis (AS) by integrating whole transcriptome-wide association study (TWAS) analysis and mRNA expression profiling data. AS genome-wide association study (GWAS) summary data from the large GWAS database were used. This included data of 1265 AS patients and 452264 controls. A TWAS of AS was conducted using these data. The analysis software used was FUSION, and Epstein-Barr virus–transformed lymphocytes, transformed fibroblasts, peripheral blood, and whole blood were used as gene expression references. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed for the important genes identified via TWAS. Protein-protein interaction (PPI) network analysis based on the STRING database was also performed to detect genes shared by TWAS and mRNA expression profiles in AS. TWAS identified 920 genes (P
This deposit provides full details of the genome wide association study (GWAS) pipeline developed by the MRC-IEU for the full UK Biobank (version 3, March 2018) genetic data. For any issues with use of this documentation please contact: mrc-ieu@bristol.ac.uk. This dataset supersedes the earlier version at https://doi.org/10.5523/bris.2fahpksont1zi26xosyamqo8rr
This repo contains data produced from the manuscript entitled: "Discovering non-additive heritability using additive GWAS summary statistics". Here, we provide the additive and cis-interaction LD scores used for the real data analyses of 25 well-studied quantitative phenotypes from 349,468 individuals of self-identified European ancestry in the UK Biobank and up to 159,095 individuals in BioBank Japan. Note that for the UK Biobank analysis, LD scores were computed using a reference panel of 489 individuals from the European superpopulation (EUR) of the 1000 Genomes Project. For the analysis of BioBank Japan, In order to analyze data from BioBank Japan, we downloaded publicly available GWAS summary statistics for the 25 traits from http://jenger.riken.jp/en/result. Summary statistics used age, sex, and the first ten principal components as confounders in the initial GWAS study. We then used individuals from the East Asian (EAS) superpopulation from the 1000 Genomes Project Phase 3 to calculate paired LD scores from a reference panel.
EAGLE eczema consortium GWAS summary results, from Paternoster et al. 2015 Nature Genetics (doi:10.1038/ng.3424). Results are published for the European-only and multi-ancestry atopic dermatitic GWAS analyses. Results are excluding 23andMe data. Please read the AD_GWAS_README.txt file for more information.
It re-directs to the ''''GWAS Central'''' resource, https://www.gwascentral.org/. Centralized compilation of summary level findings from genetic association studies, both large and small. They actively gather datasets from public domain projects, and encourage direct data submission from the community. HGVbaseG2P is built upon a basal layer of Markers that comprises all known SNPs and other variants from public databases such as dbSNP and the DBGV. Allele and genotype frequency data, plus genetic association significance findings, are added on top of the Marker data, and organized the same way that investigations are reported in typical journal manuscripts. Critically, no individual level genotypes or phenotypes are presented in HGVbaseG2P - only group level aggregated (summary level) data. The largest unit in a data submission is a Study, which can be thought of as being equivalent to one journal article. This may contain one or more Experiments, one or more Sample Panels of test subjects, and one or more Phenotypes. Sample Panels may be characterized in terms of various Phenotypes, and they also may be combined and/or split into Assayed Panels. The Assayed Panels are used as the basis for reporting allele/genotype frequencies (in Genotype Experiments
) and/or genetic association findings (in ''''Analysis Experiments''''). Environmental factors are handled as part of the Sample Panel and Assayed Panel data structures.
Combines collections of genetic variants (GVs) from GWAS and their comprehensive functional annotations, as well as disease classifications. Used to maximize utilility of GWAS data to gain biological insights through integrative, multi-dimensional functional annotation portal. In addition to all GVs annotated in NHGRI GWAS Catalog, we manually curate GVs that are marginally significant (P value < 10-3) by looking into supplementary materials of each original publication and provide extensive functional annotations for these GVs. GVs are manually classified by diseases according to Disease Ontology Lite and HPO (Human Phenotype Ontology) for easy access. Database can also conduct gene based pathway enrichment and PPI network association analysis for those diseases with sufficient variants. SOAP services are available. You may Download GWASdb SNP. (This file contains all of the significant SNP in GWASdb. In the pvalue column, 0 means this P-value is not reported in the study but it is significant SNP. In the source column, GWAS:A represents the original data in GWAS catalog, while GWAS:B is our curation data which P-value < 10-3)
This dataset includes genetic variations found in 882 poplar trees, and provides useful information to scientists studying plants as well as researchers more generally in the fields of biofuels, materials science, and secondary plant compounds. For nearly 10 years, researchers with DOE’s BioEnergy Science Center (BESC), a multi-institutional organization headquartered at ORNL, have studied the genome of Populus — a fast-growing perennial tree recognized for its economic potential in biofuels production. This Genome-Wide Association Study (GWAS) dataset includes more than 28 million single nucleotide polymorphisms, or SNPs that have been derived from 17 trillion bases of sequence data generated from 882 undomesticated Populus genotypes. Each SNP represents a variation in a single DNA nucleotide, or building block, that can act as a biological marker and/or causal allele within a protein sequence, helping scientists locate genes associated with certain characteristics, conditions or diseases. The results of this analysis have been used, among other things, to 1) seek genetic control of cell-wall recalcitrance — a natural characteristic of plant cell walls that prevent the release of sugars under microbial conversion and restricts biofuels production and 2) identify the molecular mechanisms controlling deposition of lignin in plant structures. Lignin is a polyphenolic polymermore » that strengthens plant cell walls and acts as a barrier to microbial access to cellulose during saccharfication — the process of breaking cellulose down into simple sugars for fermentation. Although the dataset’s most immediate applications are in fundamental plant sciences, ORNL researchers plan to use the GWAS data to inform applied work in areas such as cleaner, sustainable transportation biofuels, carbon fiber for lightweight vehicles and alternatives to conventional plastics and building insulation materials.« less
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Levels of sociability are continuously distributed in the general population, and decreased sociability represents an early manifestation of several brain disorders. Here, we investigated the genetic underpinnings of sociability in the population.Main question of our research: 1. Are there common genetic variants that are associated with sociability in the general population? 2. Are genetic variants that are associated with sociability also associated with neuropsychiatric disorders?Type of data uploaded in this repository:The UK Biobank project (see https://www.ukbiobank.ac.uk/) is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. The database is globally accessible to approved researchers undertaking vital research into the most common and life-threatening diseases. The raw data that this project is based on comes from the publically available UK Biobank set, which is very large and is therefore not provided here. Here we only provide the results from our analysis, that is also described here: https://www.biorxiv.org/content/10.1101/781195v2 and currently in revision in a scientific journal. In the dataset you will find the association of 9327396 genetic variants with the phenotype sociability. This dataset is not applicable to be opened with Excel, and can best be opened on a cluster computer or using specfic software.SubjectsThe UK Biobank (UKBB) is a major population-based cohort from the United Kingdom that includes individuals aged between 37 and 73 years. We constructed a sociability measure based on the the aggregation of scores per participant on four questions from the UKBB database that link to sociability, including (1) a question about the frequency of friend/family visits, (2) a question on the number and type of social venues that are visited, (3) a question about worrying after social embarrassment and (4) a question about feeling lonely, leading to a sociability score ranging from 0-4. Participants were excluded if they had somatic problems that could be related to social withdrawal (BMI < 15 or BMI > 40, narcolepsy (all the time), stroke, severe tinnitus, deafness or brain-related cancers) or if they answered that they had “No friends/family outside household” or “Do not know” or “Prefer not to answer” to any of the questions.SNP genotyping and quality controlDetails about the available genome-wide genotyping data for UKBB participants have been reported previously (PMID: 30305743). We used third-release genotyping data (see https://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id=100319). Briefly, 49,950 participants were genotyped using the UK BiLEVE Axiom Array and 438,427 participants were genotyped using UK Biobank Axiom Array. Genotypes were imputed into the dataset using the Haplotype Reference Consortium (HRC), and the UK10K haplotype resource. To account for ethnicity, we included only those individuals that identified themselves as "white" by self-report and plotted the Principal Components (PC) provided by the UKBB, excluding individuals considered to be outliers according to PCs 1 and 2. Genetic relatedness calculated with KING kinship and provided by the UKBB (https://kenhanscombe.github.io/ukbtools/articles/explore-ukb-data.html ; http://www.ukbiobank.ac.uk/wp-content/uploads/2014/04/UKBiobank_genotyping_QC_documentation-web.pdf) was used to identify first and second-degree relatives. Subsequently ´families´ (i.e. clusters of related individuals above an IBD>0.125 threshold) were created and only one individual from each of these created ‘families’ was included in the analysis. If self-reported sex and SNP-based sex differed, individuals were excluded from further analysis. Single nucleotide polymorphisms (SNPs) with minor allele frequency <0.005, Hardy-Weinberg equilibrium test P value<1e−6, missing genotype rate >0.05, and imputation quality of INFO <0.8 were excluded. In the current study, all analyses are based on 342,461 participants of European ancestry for which both genotype data and sociability scores were available.Genome-wide association analysisGenome-wide association analysis with the imputed marker dosages was performed in PLINK1.9, using a linear regression model with the sociability measure as the dependent variable and including sex, age, 10 first PCs, assessment center, and genotype batch as covariates. SNPs were considered significantly associated if they had p-value < 5e-8. Associated loci were considered independent of each other at r2 0.6 and lead SNPs were classified as the SNP with the smallest association p-value and at r2 0.1, using a 250kb window.The summary statistics come from the plink2 linear regression analysis.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
GWAS results in cardiovascular research
https://dataverse.nl/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.34894/TYHGEFhttps://dataverse.nl/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.34894/TYHGEF
These are the single-cell RNAseq data from the Athero-Express Biobank Study as used after quality control in the paper referenced below; abstract below. Background Genome-wide association studies (GWAS) have discovered hundreds of common genetic variants for atherosclerotic disease and cardiovascular risk factors. The translation of susceptibility loci into biological mechanisms and targets for drug discovery remains challenging. Intersecting genetic and gene expression data has led to identification of candidate genes. However, the assayed tissues are often non-diseased and heterogeneous in cell composition confounding the candidate prioritization. We collected single-cell transcriptomics (scRNA-seq) from atherosclerotic plaques and aimed to identify cell-type-specific expression of disease-associated genes. Methods and Results To identify disease-associated candidate genes, we applied gene-based analyses using GWAS summary statistics from 46 atherosclerotic, cardiometabolic, and other traits. Next we intersected these candidates with single-cell transcriptomics (scRNA-seq) to identify those genes that are specifically expressed in individual cell (sub)populations of atherosclerotic plaques. We derive an enrichment score and show that loci that associated with coronary artery disease demonstrated a prominent substrate in plaque smooth muscle cells (SKI, KANK2, SORT1), endothelial cells (SLC44A1, ATP2B1), and macrophages (APOE, HNRNPUL1). Further sub clustering of SMC-subtypes revealed genes in risk loci for coronary calcification specifically enriched in a synthetic cluster of SMCs. To verify the robustness of our approach, we used liver-derived scRNAseq-data and showed enrichment of circulating lipids-associated loci in hepatocytes. Conclusion We confirm known gene-cell pairs relevant for atherosclerotic disease, and discovered novel pairs pointing to new biological mechanisms amenable for therapy. We present an intuitive single-cell transcriptomics driven workflow rooted in human large-scale genetic studies to identify putative candidate genes and affected cells associated with cardiovascular traits. Athero-Express Biobank Study The AE started in 2002 and now includes over 3,500 patients who underwent surgery to remove atherosclerotic plaques (endarterectomy) from one (or more) of their major arteries (majority carotids and femorals); this is further described here. The study design and staining protocols are described by Verhoeven et al.. GitHub A link to the public GitHub repository: https://github.com/CirculatoryHealth/gwas2single. This contains all scripts used for the data, which is pseudonymized and shared here. Additional data Additional clinical data is available upon discussion and signing a Data Sharing Agreement (see Terms of Access). PlaqView In collaboration with the http://millerlab.org from the University of Virginia (USA) we created PlaqView.com. You can query any gene of interest in many carotid-plaque datasets, including ours. From our experience we know that usually this suffices most research questions and prevents the lengthy process of obtaining these data through a DSA.
Seed size is an important trait for yield and commercial value in dry-grain cowpea. Seed size varies widely among different cowpea accessions, and the genetic basis of such variation is not yet well understood. To better decipher the genetic basis of seed size, a genome-wide association study (GWAS) and meta-analysis were conducted on a panel of 368 cowpea diverse accessions from 51 countries. Four traits, including seed weight, length, width and density were evaluated across three locations. Using 51,128 single nucleotide polymorphisms covering the cowpea genome, 17 loci were identified for these traits. One locus was common to weight, width and length, suggesting pleiotropy. By integrating synteny-based analysis with common bean, six candidate genes (Vigun05g036000, Vigun05g039600, Vigun05g204200, Vigun08g217000, Vigun11g187000, and Vigun11g191300) which are implicated in multiple functional categories related to seed size such as endosperm development, embryo development, and cell elongation were identified. These results suggest that a combination of GWAS meta-analysis with synteny comparison in a related plant is an efficient approach to identify candidate gene (s) for complex traits in cowpea. The identified loci and candidate genes provide useful information for improving cowpea varieties and for molecular investigation of seed size.
Catalog of published genome-wide association studies. Genome-wide set of genetic variants in different individuals to see if any variant is associated with trait and disease. Database of genome-wide association study (GWAS) publications including only those attempting to assay single nucleotide polymorphisms (SNPs). Publications are organized from most to least recent date of publication. Studies are identified through weekly PubMed literature searches, daily NIH-distributed compilations of news and media reports, and occasional comparisons with an existing database of GWAS literature (HuGE Navigator). Works with HANCESTRO ancestry representation.