Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Concordance of genotypes represented in VCF and gVCF files with those detected by the MI RISK Plus kit.
Facebook
Twitterhttps://ega-archive.org/dacs/EGAC00001000259https://ega-archive.org/dacs/EGAC00001000259
Short read whole genome sequencing (WGS) VCF files for the NIHR BioResource Rare Diseases WGS project – Participants from the Hypertrophic Cardiomyopathy (HCM) Rare Disease domain
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genotyping of known SNPs from ClinVar using the VCF and gVCF file formats and the number of homozygous reference sites and no-calls based on WGS data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
VCF file containing filtered mutated sites in SARS-CoV-2 genomes obtained from GISAID EpiCoV, separated by individual mutations. The columns correspond to viral genome accession ID, nucleotide position in the genome, mutation ID (left blank in all rows), reference nucleotide, identified mutation, quality, filter, and information columns (all left blank), format (GT in all rows), column corresponding to reference genome (all 0, referring to reference nucleotide column), and columns corresponding to isolate genomes, with each row identifying the nucleotide in the POS column, and whether it is non-mutant (0), or the mutant indicated in the identified mutation column (1). The file is tab delimited, with 22546 rows including the names, and 30690 columns.
The file was generated to test the hypothesis whether the five most common mutations in the SARS-CoV-2 genome replication complex proteins, nsps 7, 8, 12, and 14, significantly affect the mutation density of the virus over time and whether these affect the synonymous and nonsynonymous mutation densities differently. We discovered that mutations in nsp14, an exonuclease with error correcting capabilities, are most likely to be correlated with increased mutational load across the genome compared to wildtype SARS-CoV-2. These results were obtained by identifying the frequency of mutations across all isolates in genomic regions of interest, analyzing which of the twenty mutations (five per nsp) have a statistically meaningful relationship with the mutation density in the M and E genes (chosen due to being under little selective pressure), and identifying the synonymous and nonsynonymous genomic SNV density for isolates with any of the statistically meaningful mutations, as well as isolates with none of the identified mutations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset makes available the UCSC Genome Browser (genome.ucsc.edu) GRCh37 genome build public session NA12878 WES Benchmark files in a single dataset so that these files can be used in other applications or genome browsers such as IGV. All genomic variant calls in all VCF files were decomposed and normalized with vt. This dataset contains:
Genome in a bottle (GIAB) version 3.3.2 high confidence (HC) variant calls and genomic regions for HapMap individual NA12878 :
GIAB_v3.3.2_NA12878-decomposed-normalized.vcf.gz
GIAB_v3.3.2_NA12878-decomposed-normalized.vcf.gz.tbi
GIAB_v3.3.2_NA12878_HC_regions.bed
HapMap individual NA12878 WES variant calls (VCF) and capture regions (BED) from diagnostic laboratories :
ARUP whole exome sequencing data (HiSeq 2000) publically available from NCBI GeT-RM Browser
converted_ARUP_NA12878_Exome-decomposed-normalized.vcf.gz
converted_ARUP_NA12878_Exome-decomposed-normalized.vcf.gz.tbi
ARUP_SeqCap_EZ_Exome.bed
UCSF whole exome sequencing data (HiSeq 2500) publically available from NCBI GeT-RM Browser
converted_UCSF_NA12878_WES_Agilent_V4_Custom-decomposed-normalized.vcf.gz
converted_UCSF_NA12878_WES_Agilent_V4_Custom-decomposed-normalized.vcf.gz.tbi
UCSF_WES_Agilent_V4_Custom.bed
Whole exome data (NextSeq 500) sequenced in CHEO diagnostic laboratory
CHEO_NA12878_WES_S1dataset.vcf.gz
CHEO_NA12878_WES_S1dataset.vcf.gz.tbi
Agilent_CRE_v2.bed
Genomic coordinates (BED) of OMIM genes for which a molecular basis of the associated disease is known (as of September 2019) :
Omim_Genes.bed
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genotyping of GWAS catalog sites using the VCF and gVCF file formats and the number of homozygous reference sites and no-calls based on WGS data.
Facebook
TwitterVerticillium dahliae is an important soil-borne pathogen causing Verticillium wilt. It is also the primary causal agent of the Potato Early Dying, a disease complex involving the root-lesion nematode. Here, we report the whole-genome sequencing of 192 isolates of V. dahliae originating from the major potato production areas across Canada. Our results yielded a resource of 277,010 genetic variations that will be useful for genetic analyses and revealed the presence of two major lineages, both present in all provinces but exhibiting differences in regional prevalence., Filtered WGS reads (fastp) aligned on Verticillium dahliae reference (https://www.ncbi.nlm.nih.gov/assembly/95341/GCA_000150675.2) with BWA. VCF called with freebayes v1.3.6 and annotated with snpeff.,
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The American chestnut (Castanea dentata) is a functionally extinct tree species that was decimated by an invasive fungal pathogen in the early 20th century. An understanding of the genomic architecture of local adaptation in wild American chestnut was necessary in order to deploy locally adapted, disease-resistant American chestnut populations. Here, we characterize the genomic basis of climate adaptation in remnant wild American chestnut, develop new computational methods, and evaluate the adaptive genomic content captured within backcross breeding populations. Whole genome re-sequencing data of 356 trees from Sandercock et al. (2022) coupled with genotype-environment association methods identified 18483 climate associated loci.Methods: VCF file: The ~21 million SNP dataset from Sandercock et al. (2022) was first imputed using BEAGLE and filtered to remove SNPs with MAF < 0.05. Climate associated loci were then identified using RDA and LFMM2 genotype-environment association methods. Seed zone shape files: Three seed zones were identified using the ~18k climate associated loci. These regions partition the chestnut range into geographic seed zones that reflect relatively homogeneous areas with respect to multivariate adaptive genomic variation. These regions can be used to conserve germplasm ex situ and guide subsequent breeding crosses that lead to climate-matched restoration populations. gmbigxhorn.jtl.map.2022.csv is a genetic map generated from American chestnut backcross genotyping-by-sequencing data. R code for estimating the average migration distance for each seed zone under future climate change conditions.
Facebook
TwitterDatabase for phenotype genotype associations for humans. Used by clinical researchers to store standardized phenotypic information, diagnosis, and pedigree data and then run analyses on VCF files from individuals, families or cohorts with suspected Mendelian disease.
Facebook
TwitterHypertrophic cardiomyopathy (HCM) is the most common inherited cardiac disease in cats, often leading to congestive heart failure, arterial thromboembolism, and sudden cardiac death. The genetics of feline HCM are poorly understood, and limited genetic discoveries remain breed or family-specific. We aimed to identify novel causative or disease-modifying variants in a large cohort of cats reflective of the general cat population. In a second cohort, we sought to characterize transcriptomic differences between HCM-affected cats and healthy controls. DNA was isolated from 138 domestic cats (109 HCM and 29 controls). No single or combination of variants of high, moderate, or modifying impact were identified in genome-wide analysis to cause or modify the disease severity of HCM. Several rare high and moderate-impact variants in genes associated with human HCM were detected in diseased cats. In a second cohort, left ventricular (LV), interventricular septal (IVS), and left atrial (LA) tissues..., WGS data generation A total of 1-2 mL of whole blood were collected from the cephalic, saphenous, or jugular vein into EDTA blood collection tubes. DNA was either isolated from whole blood or from buffy coats after whole blood centrifugation at 2000 rpm for 15 minutes. Genomic DNA isolation was performed using commercially available kits (Gentra Puregene Blood kit, QIAGEN, Hilden Germany; ArchivePure;5Prime) and by following the respective manufacturer’s protocol. High-quality unfragmented DNA was selected by a combination of 1% agarose gel visualization and spectrophotometric confirmation (a 260/280 ratio of ~1.8 and a concentration of > 50 ng/uL; NanoDrop One/One, Thermofisher, Waltham, GA, USA). Samples were stored at -20°C until ready for shipment to Theragen Bio Co., Ltd, Gyeonggi-do, Republic of Korea for WGS. Paired-end DNA libraries were generated with a TruSeq DNA Nano library prep kit. Samples were then pooled and sequenced at ~30x coverage on the Illumina NovaSeq6000 platf..., # Unraveling the genetics of feline hypertrophic cardiomyopathy: A multiomics study of 138 cats
Dataset DOI: 10.5061/dryad.cjsxksnjh
1. A population level vcf of polymorphic SNP and indel variants were called among 138 domestic cats with and without hypertrophic cardiomyopathy (HCM). The VCF was generated by mapping paired wgs fastq reads to the Fca126 reference genome with bwa mem and calling variants through GATK4 best practices. Variant annotations were generated with Ensembl's VEP based on Fca126 gene and exon boundaries.  The vcf file contains meta-information lines, followed by a header line specifying fixed fields per sample and subsequent data lines detail variants at genomic positions. The fixed fields include chromosome (CHROM), position (POS), identifier (ID), the reference base(s) (REF), alternate base(s) (ALT), quality (QUAL), filter status (FILTER), and additional information ...,
Facebook
TwitterThese datasets contain phenotypic and genotypic data from three connected populations of common bean (Phaseolus vulgaris L.) that were used to identify the genomic regions controlling the phenotypic response to Bean Leaf Crumple Virus (BLCrV). The first is the Andean by Meso (AxM) population, which contains 190 individuals derived from bi-parental crosses between Andean and Mesoamerican breeding lines. The AxM population included 120 additional breeding lines of Andean and Mesoamerican origin that were used as checks for their response against other viral diseases, such as Bean Golden Yellow Mosaic Virus (BGYMV). The second is a pre-breeding population (termed P135-136) composed of 111 lines that was obtained from two-way and three-way crosses between elite Andean lines and resistant sources against viral diseases. The third population is a panel of 186 Mesoamerican breeding lines assembled from a collection of elite materials from the Mesoamerican breeding pipeline at CIAT. The AxM population was evaluated in three yield trials in Palmira (Colombia)between 2013 and 2015 for flowering, maturity time and yield. All three population were evaluated in three BLCrV trials in Pradera (Colombia), where the disease pressure is naturally high. The AxM and the Mesoamerican panel were genotyped by sequencing (GBS), and these datasets contain their corresponding genotypic matrices in variant-call format (VCF, v4.2) with sequence variants mapped against the reference genome of P. vulgaris (G19833, v2.1). A joint genotypic matrix with all available GBS data from these three populations is also included. The population P135-136 was genotyped with the DArTag targeted genotyping service offered by Diversity Arrays Technology (DArT PL, Bruce ACT, Australia), and the genotypic matrix is similarly included in VCF format.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
VCF files submitted for each group/pipeline.
Facebook
TwitterThe increasing prevalence of vector-borne diseases around the world highlights the pressing need for an in-depth exploration of the genetic and environmental factors that shape the adaptability and widespread distribution of mosquito populations. This research focuses on Culex tarsalis, a principal vector for various viral diseases, including West Nile Virus (WNV). Through the development of a new reference genome and the examination of Restriction-Site Associated DNA sequencing (RAD-seq) data from over 300 individuals and 28 locations, we demonstrate that variables such as temperature, evaporation rates, and the density of vegetation significantly impact the genetic makeup of Cx. tarsalis populations. Among the alleles most strongly associated with environmental factors is a nonsynonymous mutation in a key gene related to circadian rhythms. These results offer new insights into the mechanisms of spread and adaptation in a key North American vector species, which is poised to beco..., Sample Collection Individual mosquitoes were trapped and collected from 28 different locations across the United States and Canada as part of the North American Mosquito Project (NAMP). All samples used in this study were collected in 2012 between the months of April and October. Genome Sequencing, Assembly, and Annotation An F4 population was used to generate the reference genome assembly, and high molecular weight DNA was extracted and sequenced on a Pacific Biosciences (PacBio) RS II (University of Delaware). Thirty-five SMRT cells were generated. The resulting reads provided 76X coverage of the ~790Mb Cx. tarsalis genome, and were assembled with MECAT. Gene annotation was completed by MAKER using EST and protein data from the Culex quinquefasciatus and Aedes aegypti mosquitoes. Sequences were downloaded from the NCBI Taxonomy database and both Trinotate and InterProScan were used for functional annotation of the MAKER predicted genes. The annotated assembly was ass..., , # Climate adaptation and genetic differentiation in the mosquito species Culex tarsalis
https://doi.org/10.5061/dryad.51c59zwh3
The data were stored in 8 different files.
bi_20missing_filtSNP_maf_005.recode.vcf
File Details
bi_10missing_filtSNP_maf_005.recode.vcfculex.60x.contigs.fasta (from header ##reference=...)QD, MQ, FS, ReadPosRankSum, RGQ, PL, SB, END, NON_REF) suggest generation with GATK/HaplotypeCaller (gVCF→VCF workflow).13-2, 13-3, 13-4, 13-5, 13-6, 13-7, 13-8Data Description
 -The VCF contains standard columns: * #CHROM – Contig/chromosome * POS – 1-based position * ID – Variant identifier * REF – Reference allele * ALT – Alternate allele...,
Facebook
Twitterhttps://ega-archive.org/dacs/EGAC50000000693https://ega-archive.org/dacs/EGAC50000000693
We profile the whole-transcriptome (bulk RNAseq) of 7 patient-derived Sézary Syndrome (SS) cells to identify expression patterns, functional programs and expressed gene mutations that may provide clues on new therapeutic options for SS patients. The libraries were sequenced on NextSeq500 (Illumina) with a paired-end read length of 2x75bp. Raw data (FASTQ) and obtained processed data (VCF) including all called raw variants are available.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Divergent selection in the face of gene flow is usually associated with a heterogeneous genomic landscape of divergence in nascent species pairs. However, multiple factors, such as divergent selection and local recombination rate variation, can influence the formation of these genomic island. This conundrum can be solved through examination of the genomic landscapes of species pairs that are still in the early stages of evolution. In this study, population genomics analyses were undertaken using a wide range of sampling and whole-genome resequencing data from 96 unrelated individuals of Kentish plover (Charadrius alexandrinus) and white-faced plover (C. dealbatus). We suggest that the two species exhibit varying levels of population admixture along the Chinese coast and on Taiwan Island. Genome-wide analyses for introgression indicate that ancient introgression had occurred in Taiwan population, and recurrent gene flow is still ongoing in mainland coastal populations. Furthermore, we identified a few genomic regions with significant levels of interspecific differentiation and local recombination suppression, which contain several genes potentially associated with disease resistance, coloration, and regulation of plumage moulting, thus may be connected to the phenotypic and ecological divergence of the two nascent species. Overall, our findings suggest that divergent selection in low recombination regions may be the main force in shaping the genomic islands in two incipient shorebird species.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of the number of dbSNP, ClinVar and GWAScat sites represented using VCF, gVCF and eVCF files.
Facebook
TwitterMucocele formation in dogs is a unique and enigmatic muco-obstructive disease of the gallbladder caused by amassment of abnormal mucus that bears striking pathological similarity to cystic fibrosis. We investigated the role of CFTR in the pathogenesis of this disease. The location and frequency of disease-associated variants in the coding region of CFTR was compared using whole genome sequence data from 2,642 dogs representing breeds at low-risk, high-risk, or with confirmed disease. Expression, localization, and ion transport activity of CFTR was quantified in control and mucocele gallbladders by NanoString, Western blotting, immunofluorescence imaging, and studies in Ussing chambers. Our results establish significant loss of CFTR-dependent anion secretion by mucocele gallbladder mucosa. A significantly lower quantity of CFTR protein was demonstrated relative to E-cadherin in mucocele compared to control gallbladder mucosa. Immunofluorescence identified CFTR along the apical membrane o..., We used the Whole Animal Genome Sequencing (WAGS) pipeline to identify short nucleotide variants in a dataset of 2,642 dogs encompassing both private and public resources including 1,971 genomes from the Dog10K project. Briefly, the WAGS pipeline used Burrows-Wheeler Alignment tool-MEM to map paired-end reads to the UU_Cfam_GSD_1.0 reference genome. Variant calling was executed with Genome Analysis Toolkit (GATK4), and Ensembl’s Variant Effect Predictor (VEP, RRID:SCR_007931) predicted variant annotations and consequences. From the resulting VEP-processed VCF file, we extracted CFTR genic variants plus variants within 1Kb of the flanking sequence that passed filters. Subsequently, non-reference allele frequencies were calculated for each variant within the control, risk, and affected dog groups. , , # Acquired dysfunction of CFTR underlies cystic fibrosis-like disease of the canine gallbladder.
https://doi.org/10.5061/dryad.2rbnzs7xq
This dataset includes supplementary materials for the manuscript entitled Acquired dysfunction of CFTR underlies cystic fibrosis-like disease of the canine gallbladder.
Supplemental Figure S1 illustrates sample procurement and appearance of gallbladder from each of 9 dogs having mucosal RNA extracted for targeted gene expression analysis. Samples of lumen mucosa were obtained by excision from regions devoid of mucus or from which mucus could be gently removed. During sampling (panel A) and after removal of sample (panel B). Remaining panels show each of 9 individual mucocele gallbladders used for mucosal RNA sample collection. Pictures are immediately post-cholecystectomy followed by opening of the gallbladder to expose the lumen.
**Supplemental Table S1...
Facebook
TwitterThe mosquito Aedes aegypti is the primary vector of many human arboviruses such as dengue, yellow fever, chikungunya, and Zika, which affect millions of people world-wide. Population genetics studies on this mosquito have been important in understanding its invasion pathways and success as a vector of human disease. The Axiom aegypti1 SNP chip was developed from a sample of geographically diverse Ae. aegypti populations to facilitate genomic studies on this species. Here we evaluate the utility of the Axiom aegypti1 SNP chip for population genetics and compare it with a low-depth shot-gun sequencing approach using mosquitoes from the species’ native (Africa) and invasive range (outside Africa). These analyses indicate that the results from the SNP chip are highly reproducible and have a higher sensitivity to capture alternative alleles than a low-coverage whole-genome sequencing approach. Although the SNP chip suffers from ascertainment bias, results from population structure, ancestry,..., DNA from individual Aedes aegypti mosquitoes was extracted and used for genotyping at 50,000 loci distributed along the species genome, using the Axiom Aegypti1 SNP chip (Life Technologies Corporation CAT#550481). Files "all_snps_G3Dryad" and "Replicas_SNPchip" contain all 50,000 SNPs genotyped, prior to filtering. File "50k_SNPs_30_samples_LD_MAF_miss_FINAL" contain the SNPs after applying filters in Plink 1.9 (https://www.cog-genomics.org/plink/) for linkage disequilibrium (LD: -indep-pairwise 50 10 0.3), minor allele frequency (MAF: -maf 0.1) and missing data (-geno 0.1)., , # Genotypes of Aedes aegypti mosquitoes derived from SNP chip and low-coverage whole genome sequencing for platform cross-validation
https://doi.org/10.5061/dryad.m0cfxppbd
Files: Replicas_SNPchip
SNP chip data generated from 20 individual Aedes aegypti mosquitos from Sudan and Sri Lanka using the Axiom Aegypti1 array (Life Technologies Corporation CAT#550481) . Each mosquito was genotyped in triplicate independently in different chips. All 50,000 loci genotyped are included, prior to any filtering.
Files: all_snps_G3Dryad
SNP chip data generated from 13 individual Aedes aegypti mosquitos from populations worldwide using the Axiom Aegypti1 array (Life Technologies Corporation CAT#550481). All 50,000 loci genotyped are included, prior to any filtering.
File: 50k_SNPs_30_samples_LD_MAF_miss_FINAL.vcf.gz
Variant calling file (vcf) containing 30 individual *Aedes aegypti *genotypes used for population genetic analysis, with five individu...,
Facebook
TwitterTursiops SNP datasetSNP genotype, vcf file. Mapped to the Tursiops truncatus genome (GCA_001922835.1).mappedQC.fil5.vcfTursiops ref_seqFForward reference sequencesTur_1.fastaTursiops ref_seqRReverse reference sequencesTur_2.fasta
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
T-box transcription factor 1 Enables protein homodimerization activity and sequence-specific double-stranded DNA binding activity. Involved in several processes, including chordate embryonic development; parathyroid gland development; and soft palate development. Predicted to be active in chromatin and nucleus. Implicated in several diseases, including DiGeorge syndrome; congenital heart disease (multiple); hypoparathyroidism; sensorineural hearing loss; and velocardiofacial syndrome. Biomarker of congenital heart disease. This gene is a member of a phylogenetically conserved family of genes that share a common DNA-binding domain, the T-box. T-box genes encode transcription factors involved in the regulation of developmental processes. This gene product shares 98% amino acid sequence identity with the mouse ortholog. DiGeorge syndrome (DGS)/velocardiofacial syndrome (VCFS), a common congenital disorder characterized by neural-crest-related developmental defects, has been associated with deletions of chromosome 22q11.2, where this gene has been mapped. Studies using mouse models of DiGeorge syndrome suggest a major role for this gene in the molecular etiology of DGS/VCFS. Several alternatively spliced transcript variants encoding different isoforms have been described for this gene. [provided by RefSeq, Jul 2008]
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Concordance of genotypes represented in VCF and gVCF files with those detected by the MI RISK Plus kit.