THIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 9, 2023. An aggregated data platform for genome sequencing data created by a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data set provided on this website spans 61,486 unrelated individuals sequenced as part of various disease-specific and population genetic studies. They have removed individuals affected by severe pediatric disease, so this data set should serve as a useful reference set of allele frequencies for severe disease studies. All of the raw data from these projects have been reprocessed through the same pipeline, and jointly variant-called to increase consistency across projects. They ask that you not publish global (genome-wide) analyses of these data until after the ExAC flagship paper has been published, estimated to be in early 2015. If you''re uncertain which category your analyses fall into, please email them. The aggregation and release of summary data from the exomes collected by the Exome Aggregation Consortium has been approved by the Partners IRB (protocol 2013P001477, Genomic approaches to gene discovery in rare neuromuscular diseases).
The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data pertains to unrelated individuals sequenced as part of various disease-specific and population genetic studies and serves as a reference set of allele frequencies for severe disease studies. This collection references transcript information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AA = amino acid; N = sample size; n.a. = not applicable; n.t. = not tested; CZ = Czech Republic, GE = Germany, LT = Lithuania. The cumulative assessment is based on the results of various effect prediction algorithms; details see S4 Table.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of clinical features and disease/candidate variants identified. The major clinical features and the disease/candidate variants as well as the prediction scores and classifications for damaging effect of the variants are listed. For Polyphen2 (Pph2), D for probably damaging, P for possibly damaging, and B for benign. For LRT, D for deleterious, N for neutral, and U for unknown. For MutationTaster (MT), A for disease causing automatic, D for disease causing, N for polymorphism, and P for polymorphism automatic. Moreover, this table includes the ranking and ACMG criteria for each gene, the supporting evidence and the discussion of other variants, as well as the frequency/number of the variants in our internal CMG database, Atherosclerosis Risk in Communities Study (ARIC), ExAC database, Thousand Genome project, and NHLBI GO Exome Sequencing Project (ESP). pLI: probability of loss-of-function (LoF) intolerance. Table S2. Categorization of families based on major clinical features. Y: the family has this clinical feature; N: the family does not have this clinical feature. Families with brain malformations were not counted in the ID/DD groups, even if this feature was present. Percentages of families with each feature are shown at the bottom of the table. Table S3. Categorization of disease genes/candidates by major clinical features. Table S4. AOH metrics for the probands carrying known or candidate disease genes. Table S5. Raw data of ddPCR in RPS6KC1 in family 025. Table S6. Homologs of disease genes/candidates between human and fruit fly. The HCOP website ( http://www.genenames.org/cgi-bin/hcop ) was used to identify the fly homologs of the identified disease/candidate genes, listed in the upper panel. These fly homologs are then used to search for additional human homologs to find paralogs of the original human genes, as shown in the bottom part of the list. (XLS 165 kb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Minor allele frequency in the African American HCM cohort, African American controls and individuals with African ancestry from the ExAc Database.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Statistical tests for Hardy-Weinberg equilibrium are important elementary tools in genetic data analysis. X-chromosomal variants have long been tested by applying autosomal test procedures to females only, and gender is usually not considered when testing autosomal variants for equilibrium. Recently, we proposed specific X-chromosomal exact test procedures for bi-allelic variants that include the hemizygous males, as well as autosomal tests that consider gender. In this paper we present the extension of the previous work for variants with multiple alleles. A full enumeration algorithm is used for the exact calculations of triallelic variants. For variants with many alternate alleles we use a permutation test. Some empirical examples with data from the 1000 genomes project are discussed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Control frequency refers to the allele frequency of the same variant in the subjects included in the ExAC database (60,706 unrelated individuals). Progenitor phenotype refers only to the progenitor carrying the alteration. Hg19 assembly. F: female; M: male; Mat: maternal; Pat: paternal; NA: not available.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A table of previously reported Sanfilippo Type B incidence rates.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
MECP2 genetic variant data from ExAC in CSV format.FAIR machine-readable metadata is available at:http://purl.org/biosemantics-lumc/rettbase/fdp
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: The prevalence of dementia in Parkinson disease (PD) increases dramatically with advancing age, approaching 80% in patients who survive 20 years with the disease. Increasing evidence suggests clinical, pathological and genetic overlap between Alzheimer disease, dementia with Lewy bodies and frontotemporal dementia with PD. However, the contribution of the dementia-causing genes to PD risk, cognitive impairment and dementia in PD is not fully established.Objective: To assess the contribution of coding variants in Mendelian dementia-causing genes on the risk of developing PD and the effect on cognitive performance of PD patients.Methods: We analyzed the coding regions of the amyloid-beta precursor protein (APP), Presenilin 1 and 2 (PSEN1, PSEN2), and Granulin (GRN) genes from 1,374 PD cases and 973 controls using pooled-DNA targeted sequence, human exome-chip and whole-exome sequencing (WES) data by single variant and gene base (SKAT-O and burden tests) analyses. Global cognitive function was assessed using the Mini-Mental State Examination (MMSE) or the Montreal Cognitive Assessment (MoCA). The effect of coding variants in dementia-causing genes on cognitive performance was tested by multiple regression analysis adjusting for gender, disease duration, age at dementia assessment, study site and APOE carrier status.Results: Known AD pathogenic mutations in the PSEN1 (p.A79V) and PSEN2 (p.V148I) genes were found in 0.3% of all PD patients. There was a significant burden of rare, likely damaging variants in the GRN and PSEN1 genes in PD patients when compared with frequencies in the European population from the ExAC database. Multiple regression analysis revealed that PD patients carrying rare variants in the APP, PSEN1, PSEN2, and GRN genes exhibit lower cognitive tests scores than non-carrier PD patients (p = 2.0 × 10−4), independent of age at PD diagnosis, age at evaluation, APOE status or recruitment site.Conclusions: Pathogenic mutations in the Alzheimer disease-causing genes (PSEN1 and PSEN2) are found in sporadic PD patients. PD patients with cognitive decline carry rare variants in dementia-causing genes. Variants in genes causing Mendelian neurodegenerative diseases exhibit pleiotropic effects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundWith the expanded availability of next generation sequencing (NGS)-based clinical genetic tests, clinicians seeking to test patients with Mendelian diseases must weigh the superior coverage of targeted gene panels with the greater number of genes included in whole exome sequencing (WES) when considering their first-tier testing approach. Here, we use an in silico analysis to predict the analytic sensitivity of WES using pathogenic variants identified on targeted NGS panels as a reference.MethodsCorresponding nucleotide positions for 1533 different alterations classified as pathogenic or likely pathogenic identified on targeted NGS multi-gene panel tests in our laboratory were interrogated in data from 100 randomly-selected clinical WES samples to quantify the sequence coverage at each position. Pathogenic variants represented 91 genes implicated in hereditary cancer, X-linked intellectual disability, primary ciliary dyskinesia, Marfan syndrome/aortic aneurysms, cardiomyopathies and arrhythmias.ResultsWhen assessing coverage among 100 individual WES samples for each pathogenic variant (153,300 individual assessments), 99.7% (n = 152,798) would likely have been detected on WES. All pathogenic variants had at least some coverage on exome sequencing, with a total of 97.3% (n = 1491) detectable across all 100 individuals. For the remaining 42 pathogenic variants, the number of WES samples with adequate coverage ranged from 35 to 99. Factors such as location in GC-rich, repetitive, or homologous regions likely explain why some of these alterations were not detected across all samples. To validate study findings, a similar analysis was performed against coverage data from 60,706 exomes available through the Exome Aggregation Consortium (ExAC). Results from this validation confirmed that 98.6% (91,743,296/93,062,298) of pathogenic variants demonstrated adequate depth for detection.ConclusionsResults from this in silico analysis suggest that exome sequencing may achieve a diagnostic yield similar to panel-based testing for Mendelian diseases.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
THIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 9, 2023. An aggregated data platform for genome sequencing data created by a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data set provided on this website spans 61,486 unrelated individuals sequenced as part of various disease-specific and population genetic studies. They have removed individuals affected by severe pediatric disease, so this data set should serve as a useful reference set of allele frequencies for severe disease studies. All of the raw data from these projects have been reprocessed through the same pipeline, and jointly variant-called to increase consistency across projects. They ask that you not publish global (genome-wide) analyses of these data until after the ExAC flagship paper has been published, estimated to be in early 2015. If you''re uncertain which category your analyses fall into, please email them. The aggregation and release of summary data from the exomes collected by the Exome Aggregation Consortium has been approved by the Partners IRB (protocol 2013P001477, Genomic approaches to gene discovery in rare neuromuscular diseases).