Facebook
TwitterDatabase that aggregates exome and genome sequencing data from large-scale sequencing projects. The gnomAD data set contains individuals sequenced using multiple exome capture methods and sequencing chemistries. Raw data from the projects have been reprocessed through the same pipeline, and jointly variant-called to increase consistency across projects.
Facebook
Twitterhttps://bioregistry.io/spdx:CC0-1.0https://bioregistry.io/spdx:CC0-1.0
The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community (from https://gnomad.broadinstitute.org).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and genome data from a wide range of large-scale human sequencing projects Sign up for the gnomAD mailing list here. This dataset was derived from summary data from gnomAD release 3.1, available on the Registry of Open Data on AWS for ready enrollment into the Data Lake as Code.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
基因组组装
Facebook
TwitterThe Genome Aggregation Database (gnomAD) is maintained by an international coalition of investigators to aggregate and harmonize data from large-scale sequencing projects. These public datasets are available in VCF format in Google Cloud Storage and in Google BigQuery as integer range partitioned tables . Each dataset is sharded by chromosome meaning variants are distributed across 24 tables (indicated with “_chr*” suffix). Utilizing the sharded tables reduces query costs significantly. Variant Transforms was used to process these VCF files and import them to BigQuery. VEP annotations were parsed into separate columns for easier analysis using Variant Transforms’ annotation support . These public datasets are included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. Find out more in our blog post, Providing open access to gnomAD on Google Cloud . Questions? Contact gcp-life-sciences-discuss@googlegroups.com.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variant data for those variants used in the European ancestry mutational burden analysis which fall within nominally significant disrupted genes. (XLS)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variant data for those variants used in the Hispanic mutational burden analysis which fall within nominally significant disrupted genes. (XLS)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gene Ontology (GO) enrichment analysis.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 9, 2023. An aggregated data platform for genome sequencing data created by a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data set provided on this website spans 61,486 unrelated individuals sequenced as part of various disease-specific and population genetic studies. They have removed individuals affected by severe pediatric disease, so this data set should serve as a useful reference set of allele frequencies for severe disease studies. All of the raw data from these projects have been reprocessed through the same pipeline, and jointly variant-called to increase consistency across projects. They ask that you not publish global (genome-wide) analyses of these data until after the ExAC flagship paper has been published, estimated to be in early 2015. If you''re uncertain which category your analyses fall into, please email them. The aggregation and release of summary data from the exomes collected by the Exome Aggregation Consortium has been approved by the Partners IRB (protocol 2013P001477, Genomic approaches to gene discovery in rare neuromuscular diseases).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A list of nominally significant disrupted gene combinations that contained RDVs in the same sample. (XLSX)
Facebook
TwitterIntroductionGenetic variants may influence Toll-like receptor (TLR) signaling in the immune response to human papillomavirus (HPV) infection and lead to cervical cancer. In this study, we investigated the pattern of TLR expression in the transcriptome of HPV-positive and HPV-negative cervical cancer samples and looked for variants potentially related to TLR gene alterations in exomes from different populations.Materials and methodsA cervical tissue sample from 28 women, which was obtained from the Gene Expression Omnibus database, was used to examine TLR gene expression. Subsequently, the transcripts related to the TLRs that showed significant gene expression were queried in the Genome Aggregation Database to search for variants in more than 5,728 exomes from different ethnicities.ResultsCancer and HPV were found to be associated (p<0.0001). TLR1(p = 0.001), TLR3(p = 0.004), TLR4(221060_s_at)(p = 0.001), TLR7(p = 0.001;p = 0.047), TLR8(p = 0.002) and TLR10(p = 0.008) were negatively regulated, while TLR4(1552798_at)(p<0.0001) and TLR6(p = 0.019) were positively regulated in HPV-positive patients (p<0.05). The clinical significance of the variants was statistically significant for TLR1, TLR3, TLR6 and TLR8 in association with ethnicity. Genetic variants in different TLRs have been found in various ethnic populations. Variants of the TLR gene were of the following types: TLR1(5_prime_UTR), TLR4(start_lost), TLR8(synonymous;missense) and TLR10(3_prime_UTR). The “missense” variant was found to have a risk of its clinical significance being pathogenic in South Asian populations (OR = 56,820[95%CI:40,206,80,299]).ConclusionThe results of this study suggest that the variants found in the transcriptomes of different populations may lead to impairment of the functional aspect of TLRs that show significant gene expression in cervical cancer samples caused by HPV.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of 20,038 human-specific residues from 7052 different proteins, and the amino acid in the rest of the non-human primates.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Environmental DNA (eDNA) provides a promising supplement to traditional sampling methods for population genetic inferences, but current studies have almost entirely focused on short mitochondrial markers. Here, we develop one mitochondrial and one nuclear set of target capture probes for the whale shark (Rhincodon typus) and test them on seawater samples collected in Qatar to investigate the potential of target capture for eDNA-based population studies. The mitochondrial target capture successfully retrieved ~235x (90x-352x per base position) coverage of the whale shark mitogenome. Using a minor allele frequency of 5%, we find 29 variable sites throughout the mitogenome, indicative of at least five contributing individuals. We also retrieved numerous mitochondrial reads from an abundant non-target species mackerel tuna (Euthynnus affinis), showing a clear relation between sequence similarity to the capture probes and the number of captured reads. The nuclear target capture probes retrieved only few reads and polymorphic variants from the whale shark, but we successfully obtained millions of reads and thousands of polymorphic variants with different allele frequencies from E. affinis. We demonstrate that target capture of complete mitochondrial genomes and thousands of nuclear loci is possible from aquatic eDNA samples. Our results highlight that careful probe design, taking into account the range of divergence between target and non-target sequences as well as presence of non-target species at the sampling site, is crucial to consider. Environmental DNA sampling coupled with target capture approaches provide an efficient means with which to retrieve population genomic data from aggregating and spawning aquatic species.
Methods This data is the raw sequencing output for a mitochondrial capture and a nuclear capture using custom-made myBaits target capture. Both captures are based on a single sample (two one-liter eDNA samples combined after extraction) collected in Qatari waters in the middle of a whale shark aggregation, in an area with an expected high abundance of Euthynnus affinis. There is thus no need for demultiplexing.
The sequencing was performed on a MiSeq with 301 bp PE sequencing.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Projected AAMD sample sizes required to test for significant association between Type 1 CFI rare variants and disease in different ethnicities defined by gnomAD.
Facebook
TwitterDatabase that aggregates sample information for reference samples (e.g. Coriell Cell lines) and samples for which data exist in one of the EBI''''s assay databases such as ArrayExpress, the European Nucleotide Archive or PRoteomics Identificates DatabasE. It provides links to assays for specific samples, and accepts direct submissions of sample information. The goals of the BioSample Database include: # recording and linking of sample information consistently within EBI databases such as ENA, ArrayExpress and PRIDE; # minimizing data entry efforts for EBI database submitters by enabling submitting sample descriptions once and referencing them later in data submissions to assay databases and # supporting cross database queries by sample characteristics. The database includes a growing set of reference samples, such as cell lines, which are repeatedly used in experiments and can be easily referenced from any database by their accession numbers. Accession numbers for the reference samples will be exchanged with a similar database at NCBI. The samples in the database can be queried by their attributes, such as sample types, disease names or sample providers. A simple tab-delimited format facilitates submissions of sample information to the database, initially via email to biosamples (at) ebi.ac.uk. Current data sources: * European Nucleotide Archive (424,811 samples) * PRIDE (17,001 samples) * ArrayExpress (1,187,884 samples) * ENCODE cell lines (119 samples) * CORIELL cell lines (27,002 samples) * Thousand Genome (2,628 samples) * HapMap (1,417 samples) * IMSR (248,660 samples)
Facebook
TwitterThe Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data pertains to unrelated individuals sequenced as part of various disease-specific and population genetic studies and serves as a reference set of allele frequencies for severe disease studies. This collection references gene information.
Facebook
TwitterDatabase of curated links to molecular resources, tools and databases selected on the basis of recommendations from bioinformatics experts in the field. This resource relies on input from its community of bioinformatics users for suggestions. Starting in 2003, it has also started listing all links contained in the NAR Webserver issue. The different types of information available in this portal: * Computer Related: This category contains links to resources relating to programming languages often used in bioinformatics. Other tools of the trade, such as web development and database resources, are also included here. * Sequence Comparison: Tools and resources for the comparison of sequences including sequence similarity searching, alignment tools, and general comparative genomics resources. * DNA: This category contains links to useful resources for DNA sequence analyses such as tools for comparative sequence analysis and sequence assembly. Links to programs for sequence manipulation, primer design, and sequence retrieval and submission are also listed here. * Education: Links to information about the techniques, materials, people, places, and events of the greater bioinformatics community. Included are current news headlines, literature sources, educational material and links to bioinformatics courses and workshops. * Expression: Links to tools for predicting the expression, alternative splicing, and regulation of a gene sequence are found here. This section also contains links to databases, methods, and analysis tools for protein expression, SAGE, EST, and microarray data. * Human Genome: This section contains links to draft annotations of the human genome in addition to resources for sequence polymorphisms and genomics. Also included are links related to ethical discussions surrounding the study of the human genome. * Literature: Links to resources related to published literature, including tools to search for articles and through literature abstracts. Additional text mining resources, open access resources, and literature goldmines are also listed. * Model Organisms: Included in this category are links to resources for various model organisms ranging from mammals to microbes. These include databases and tools for genome scale analyses. * Other Molecules: Bioinformatics tools related to molecules other than DNA, RNA, and protein. This category will include resources for the bioinformatics of small molecules as well as for other biopolymers including carbohydrates and metabolites. * Protein: This category contains links to useful resources for protein sequence and structure analyses. Resources for phylogenetic analyses, prediction of protein features, and analyses of interactions are also found here. * RNA: Resources include links to sequence retrieval programs, structure prediction and visualization tools, motif search programs, and information on various functional RNAs.
Facebook
TwitterAutosomal recessive early-onset parkinsonism is clinically and genetically heterogeneous. Mutations of three genes, PRKN, PINK1, and DJ-1 cause pure phenotypes usually characterized by levodopa-responsive Parkinson's disease. By contrast, mutations of other genes, including ATP13A2, PLA2G6, FBXO7, DNAJC6, SYNJ1, VPS13C, and PTRHD1, cause rarer, more severe diseases with a poor response to levodopa, generally with additional atypical features. We performed data mining on a gene panel or whole-exome sequencing in 460 index cases with early-onset (≤ 40 years) Parkinson's disease, including 57 with autosomal recessive disease and 403 isolated cases. We identified two isolated cases carrying biallelic mutations of SYNJ1 (double-heterozygous p.D791fs/p.Y232H and homozygous p. Y832C mutations) and two siblings with the recurrent homozygous p.R258Q mutation. All four variants were absent or rare in the Genome Aggregation Database, were predicted to be deleterious on in silico analysis and were found to be highly conserved between species. The patient with both the previously unknown p.D791fs and p.Y232H mutations presented with dystonia-parkinsonism accompanied by a frontal syndrome and oculomotor disturbances at the age of 39. In addition, two siblings from an Algerian consanguineous family carried the homozygous p.R258Q mutation and presented generalized tonic-clonic seizures during childhood, with severe intellectual disability, followed by progressive parkinsonism during their teens. By contrast, the isolated patient with the homozygous p. Y832C mutation, diagnosed at the age of 20, had typical parkinsonism, with no atypical symptoms and slow disease progression. Our findings expand the mutational spectrum and phenotypic profile of SYNJ1-related parkinsonism.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and Inferred Networks accompanying the manuscript entitled - “Aggregation of recount3 RNA-seq data improves the inference of consensus and context-specific gene co-expression networks”
Authors: Prashanthi Ravichandran, Princy Parsana, Rebecca Keener, Kaspar Hansen, Alexis Battle
Affiliations: Johns Hopkins University School of Medicine, Johns Hopkins University Department of Computer Science, Johns Hopkins University Bloomberg School of Public Health
Description:
This folder includes data produced in the analysis contained in the manuscript and inferred consensus and context-specific networks from graphical lasso and WGCNA with varying numbers of edges. Contents include:
all_metadata.rds: File including meta-data columns of study accession ID, sample ID, assigned tissue category, cancer status and disease status obtained through manual curation for the 95,484 RNA-seq samples used in the study.
all_counts.rds: log2 transformed RPKM normalized read counts for 5999 genes and 95,484 RNA-seq samples which was utilized for dimensionality reduction and data exploration
precision_matrices.zip: Zipped folder including networks inferred by graphical lasso for different experiments presented in the paper using weighted covariance aggregation following PC correction.
The networks can be found as follows. First, select the folder corresponding to the network of interest - for example, Blood, this will then include two or more folders which indicate the data aggregation utilized, select the folder corresponding appropriate level of data aggregation - either all samples/ GTEx for blood-specific networks, this includes precision matrices inferred across a range of penalization parameters. To view the precision matrix inferred for a particular value of the penalization parameter X, select the file labeled lambda_X.rds
For select networks, we have included the computed centrality measures which can be accessed at centrality_X.rds for a particular value of the penalization parameter X.
We have also included .rds files that list the hub genes from the consensus networks inferred from non-cancerous samples at “normal_hubs.rds”, and the consensus networks inferred from cancerous samples at “cancer_hubs.rds”
The file “context_specific_selected_networks.csv” includes the networks that were selected for downstream biological interpretation based on the scale-free criterion which is also summarized in the Supplementary Tables.
WGCNA.zip: A zipped folder containing gene modules inferred from WGCNA for sequentially aggregated GTEx, SRA, and blood studies. Select the data aggregated, and the number of studies based on folder names. For example, blood networks inferred from 20 studies can be accessed at blood/consensus/net_20. The individual networks correspond to distinct cut heights, and include information on the cut height used, the genes that the network was inferred over merged module labels, and merged module colors.
Facebook
TwitterLevels of sociability are continuously distributed in the general population, and decreased sociability represents an early manifestation of several brain disorders. Here, we investigated the genetic underpinnings of sociability in the population.
Main question of our research: 1. Are there common genetic variants that are associated with sociability in the general population? 2. Are genetic variants that are associated with sociability also associated with neuropsychiatric disorders?
Type of data uploaded in this repository: The UK Biobank project (see https://www.ukbiobank.ac.uk/) is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. The database is globally accessible to approved researchers undertaking vital research into the most common and life-threatening diseases. The raw data that this project is based on comes from the publically available UK Biobank set, which is very large and is therefore not provided here. Here we only provide the results from our analysis, that is also described here: https://www.biorxiv.org/content/10.1101/781195v2 and currently in revision in a scientific journal. In the dataset you will find the association of 9327396 genetic variants with the phenotype sociability. This dataset is not applicable to be opened with Excel, and can best be opened on a cluster computer or using specfic software.
Subjects The UK Biobank (UKBB) is a major population-based cohort from the United Kingdom that includes individuals aged between 37 and 73 years. We constructed a sociability measure based on the the aggregation of scores per participant on four questions from the UKBB database that link to sociability, including (1) a question about the frequency of friend/family visits, (2) a question on the number and type of social venues that are visited, (3) a question about worrying after social embarrassment and (4) a question about feeling lonely, leading to a sociability score ranging from 0-4. Participants were excluded if they had somatic problems that could be related to social withdrawal (BMI < 15 or BMI > 40, narcolepsy (all the time), stroke, severe tinnitus, deafness or brain-related cancers) or if they answered that they had “No friends/family outside household” or “Do not know” or “Prefer not to answer” to any of the questions.
SNP genotyping and quality control Details about the available genome-wide genotyping data for UKBB participants have been reported previously (PMID: 30305743). We used third-release genotyping data (see https://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id=100319). Briefly, 49,950 participants were genotyped using the UK BiLEVE Axiom Array and 438,427 participants were genotyped using UK Biobank Axiom Array. Genotypes were imputed into the dataset using the Haplotype Reference Consortium (HRC), and the UK10K haplotype resource. To account for ethnicity, we included only those individuals that identified themselves as "white" by self-report and plotted the Principal Components (PC) provided by the UKBB, excluding individuals considered to be outliers according to PCs 1 and 2. Genetic relatedness calculated with KING kinship and provided by the UKBB (https://kenhanscombe.github.io/ukbtools/articles/explore-ukb-data.html ; http://www.ukbiobank.ac.uk/wp-content/uploads/2014/04/UKBiobank_genotyping_QC_documentation-web.pdf) was used to identify first and second-degree relatives. Subsequently ´families´ (i.e. clusters of related individuals above an IBD>0.125 threshold) were created and only one individual from each of these created ‘families’ was included in the analysis. If self-reported sex and SNP-based sex differed, individuals were excluded from further analysis. Single nucleotide polymorphisms (SNPs) with minor allele frequency <0.005, Hardy-Weinberg equilibrium test P value<1e−6, missing genotype rate >0.05, and imputation quality of INFO <0.8 were excluded. In the current study, all analyses are based on 342,461 participants of European ancestry for which both genotype data and sociability scores were available.
Genome-wide association analysis Genome-wide association analysis with the imputed marker dosages was performed in PLINK1.9, using a linear regression model with the sociability measure as the dependent variable and including sex, age, 10 first PCs, assessment center, and genotype batch as covariates. SNPs were considered significantly associated if they had p-value < 5e-8. Associated loci were considered independent of each other at r2 0.6 and lead SNPs were classified as the SNP with the smallest association p-value and at r2 0.1, using a 250kb window. The summary statistics come from the plink2 linear regression analysis.
Facebook
TwitterDatabase that aggregates exome and genome sequencing data from large-scale sequencing projects. The gnomAD data set contains individuals sequenced using multiple exome capture methods and sequencing chemistries. Raw data from the projects have been reprocessed through the same pipeline, and jointly variant-called to increase consistency across projects.