73 datasets found
  1. Concordance of genotypes represented in VCF and gVCF files with those...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xls
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne (2023). Concordance of genotypes represented in VCF and gVCF files with those detected by the MI RISK Plus kit. [Dataset]. http://doi.org/10.1371/journal.pone.0132180.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Concordance of genotypes represented in VCF and gVCF files with those detected by the MI RISK Plus kit.

  2. E

    NIHR BioResource Rare Diseases WGS project - Hypertrophic Cardiomyopathy...

    • ega-archive.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NIHR BioResource Rare Diseases WGS project - Hypertrophic Cardiomyopathy (HCM) Rare Disease domain (VCF data) [Dataset]. https://www.ega-archive.org/datasets/EGAD00001007885
    Explore at:
    License

    https://ega-archive.org/dacs/EGAC00001000259https://ega-archive.org/dacs/EGAC00001000259

    Description

    Short read whole genome sequencing (WGS) VCF files for the NIHR BioResource Rare Diseases WGS project – Participants from the Hypertrophic Cardiomyopathy (HCM) Rare Disease domain

  3. Genotyping of known SNPs from ClinVar using the VCF and gVCF file formats...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne (2023). Genotyping of known SNPs from ClinVar using the VCF and gVCF file formats and the number of homozygous reference sites and no-calls based on WGS data. [Dataset]. http://doi.org/10.1371/journal.pone.0132180.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Genotyping of known SNPs from ClinVar using the VCF and gVCF file formats and the number of homozygous reference sites and no-calls based on WGS data.

  4. m

    SARS-CoV-2 GISAID isolates (2020-06-17) genotyping VCF

    • data.mendeley.com
    Updated Jul 25, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doğa Eskier (2020). SARS-CoV-2 GISAID isolates (2020-06-17) genotyping VCF [Dataset]. http://doi.org/10.17632/63t5c7xb4c.1
    Explore at:
    Dataset updated
    Jul 25, 2020
    Authors
    Doğa Eskier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    VCF file containing filtered mutated sites in SARS-CoV-2 genomes obtained from GISAID EpiCoV, separated by individual mutations. The columns correspond to viral genome accession ID, nucleotide position in the genome, mutation ID (left blank in all rows), reference nucleotide, identified mutation, quality, filter, and information columns (all left blank), format (GT in all rows), column corresponding to reference genome (all 0, referring to reference nucleotide column), and columns corresponding to isolate genomes, with each row identifying the nucleotide in the POS column, and whether it is non-mutant (0), or the mutant indicated in the identified mutation column (1). The file is tab delimited, with 22546 rows including the names, and 30690 columns.

    The file was generated to test the hypothesis whether the five most common mutations in the SARS-CoV-2 genome replication complex proteins, nsps 7, 8, 12, and 14, significantly affect the mutation density of the virus over time and whether these affect the synonymous and nonsynonymous mutation densities differently. We discovered that mutations in nsp14, an exonuclease with error correcting capabilities, are most likely to be correlated with increased mutational load across the genome compared to wildtype SARS-CoV-2. These results were obtained by identifying the frequency of mutations across all isolates in genomic regions of interest, analyzing which of the twenty mutations (five per nsp) have a statistically meaningful relationship with the mutation density in the M and E genes (chosen due to being under little selective pressure), and identifying the synonymous and nonsynonymous genomic SNV density for isolates with any of the statistically meaningful mutations, as well as isolates with none of the identified mutations.

  5. Z

    NA12878 WES Benchmark dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranckeviciene Erinija (2020). NA12878 WES Benchmark dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3597726
    Explore at:
    Dataset updated
    May 31, 2020
    Dataset provided by
    Vilnius University
    Authors
    Pranckeviciene Erinija
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset makes available the UCSC Genome Browser (genome.ucsc.edu) GRCh37 genome build public session NA12878 WES Benchmark files in a single dataset so that these files can be used in other applications or genome browsers such as IGV. All genomic variant calls in all VCF files were decomposed and normalized with vt. This dataset contains:

    Genome in a bottle (GIAB) version 3.3.2 high confidence (HC) variant calls and genomic regions for HapMap individual NA12878 :

    GIAB_v3.3.2_NA12878-decomposed-normalized.vcf.gz

    GIAB_v3.3.2_NA12878-decomposed-normalized.vcf.gz.tbi

    GIAB_v3.3.2_NA12878_HC_regions.bed

    HapMap individual NA12878 WES variant calls (VCF) and capture regions (BED) from diagnostic laboratories :

    ARUP whole exome sequencing data (HiSeq 2000) publically available from NCBI GeT-RM Browser

    converted_ARUP_NA12878_Exome-decomposed-normalized.vcf.gz

    converted_ARUP_NA12878_Exome-decomposed-normalized.vcf.gz.tbi

    ARUP_SeqCap_EZ_Exome.bed

    UCSF whole exome sequencing data (HiSeq 2500) publically available from NCBI GeT-RM Browser

    converted_UCSF_NA12878_WES_Agilent_V4_Custom-decomposed-normalized.vcf.gz

    converted_UCSF_NA12878_WES_Agilent_V4_Custom-decomposed-normalized.vcf.gz.tbi

    UCSF_WES_Agilent_V4_Custom.bed

    Whole exome data (NextSeq 500) sequenced in CHEO diagnostic laboratory

    CHEO_NA12878_WES_S1dataset.vcf.gz

    CHEO_NA12878_WES_S1dataset.vcf.gz.tbi

    Agilent_CRE_v2.bed

    Genomic coordinates (BED) of OMIM genes for which a molecular basis of the associated disease is known (as of September 2019) :

    Omim_Genes.bed

  6. Genotyping of GWAS catalog sites using the VCF and gVCF file formats and the...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne (2023). Genotyping of GWAS catalog sites using the VCF and gVCF file formats and the number of homozygous reference sites and no-calls based on WGS data. [Dataset]. http://doi.org/10.1371/journal.pone.0132180.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Genotyping of GWAS catalog sites using the VCF and gVCF file formats and the number of homozygous reference sites and no-calls based on WGS data.

  7. d

    Annotated VCF of 192 Verticillium dahliae isolates

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Mimee; Joel Lafond-Lapalme; Mario Tenuta (2025). Annotated VCF of 192 Verticillium dahliae isolates [Dataset]. http://doi.org/10.5061/dryad.g79cnp5v0
    Explore at:
    Dataset updated
    Jul 16, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Benjamin Mimee; Joel Lafond-Lapalme; Mario Tenuta
    Time period covered
    Jan 1, 2023
    Description

    Verticillium dahliae is an important soil-borne pathogen causing Verticillium wilt. It is also the primary causal agent of the Potato Early Dying, a disease complex involving the root-lesion nematode. Here, we report the whole-genome sequencing of 192 isolates of V. dahliae originating from the major potato production areas across Canada. Our results yielded a resource of 277,010 genetic variations that will be useful for genetic analyses and revealed the presence of two major lineages, both present in all provinces but exhibiting differences in regional prevalence., Filtered WGS reads (fastp) aligned on Verticillium dahliae reference (https://www.ncbi.nlm.nih.gov/assembly/95341/GCA_000150675.2) with BWA. VCF called with freebayes v1.3.6 and annotated with snpeff.,

  8. Data from: A genome-guided strategy for climate resilience in American...

    • agdatacommons.nal.usda.gov
    • data.niaid.nih.gov
    • +1more
    bin
    Updated Aug 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Sandercock; Jared Westbrook; Qian Zhang; Jason Holliday (2025). Data from: A genome-guided strategy for climate resilience in American chestnut restoration populations [Dataset]. http://doi.org/10.5281/zenodo.10676843
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander Sandercock; Jared Westbrook; Qian Zhang; Jason Holliday
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The American chestnut (Castanea dentata) is a functionally extinct tree species that was decimated by an invasive fungal pathogen in the early 20th century. An understanding of the genomic architecture of local adaptation in wild American chestnut was necessary in order to deploy locally adapted, disease-resistant American chestnut populations. Here, we characterize the genomic basis of climate adaptation in remnant wild American chestnut, develop new computational methods, and evaluate the adaptive genomic content captured within backcross breeding populations. Whole genome re-sequencing data of 356 trees from Sandercock et al. (2022) coupled with genotype-environment association methods identified 18483 climate associated loci.Methods: VCF file: The ~21 million SNP dataset from Sandercock et al. (2022) was first imputed using BEAGLE and filtered to remove SNPs with MAF < 0.05. Climate associated loci were then identified using RDA and LFMM2 genotype-environment association methods. Seed zone shape files: Three seed zones were identified using the ~18k climate associated loci. These regions partition the chestnut range into geographic seed zones that reflect relatively homogeneous areas with respect to multivariate adaptive genomic variation. These regions can be used to conserve germplasm ex situ and guide subsequent breeding crosses that lead to climate-matched restoration populations. gmbigxhorn.jtl.map.2022.csv is a genetic map generated from American chestnut backcross genotyping-by-sequencing data. R code for estimating the average migration distance for each seed zone under future climate change conditions.

  9. n

    PhenoDB

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Oct 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). PhenoDB [Dataset]. http://identifiers.org/RRID:SCR_016551
    Explore at:
    Dataset updated
    Oct 23, 2024
    Description

    Database for phenotype genotype associations for humans. Used by clinical researchers to store standardized phenotypic information, diagnosis, and pedigree data and then run analyses on VCF files from individuals, families or cohorts with suspected Mendelian disease.

  10. d

    Unraveling the genetics of feline hypertrophic cardiomyopathy: A multiomics...

    • search.dataone.org
    • datadryad.org
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Vandewege; Joanna Kaplan; Victor Rivas; Jalena Wouters; Samantha Harris; Kathryn Meurs; Joshua Stern (2025). Unraveling the genetics of feline hypertrophic cardiomyopathy: A multiomics study of 138 cats [Dataset]. http://doi.org/10.5061/dryad.cjsxksnjh
    Explore at:
    Dataset updated
    Jun 27, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Michael Vandewege; Joanna Kaplan; Victor Rivas; Jalena Wouters; Samantha Harris; Kathryn Meurs; Joshua Stern
    Description

    Hypertrophic cardiomyopathy (HCM) is the most common inherited cardiac disease in cats, often leading to congestive heart failure, arterial thromboembolism, and sudden cardiac death. The genetics of feline HCM are poorly understood, and limited genetic discoveries remain breed or family-specific. We aimed to identify novel causative or disease-modifying variants in a large cohort of cats reflective of the general cat population. In a second cohort, we sought to characterize transcriptomic differences between HCM-affected cats and healthy controls. DNA was isolated from 138 domestic cats (109 HCM and 29 controls). No single or combination of variants of high, moderate, or modifying impact were identified in genome-wide analysis to cause or modify the disease severity of HCM. Several rare high and moderate-impact variants in genes associated with human HCM were detected in diseased cats. In a second cohort, left ventricular (LV), interventricular septal (IVS), and left atrial (LA) tissues..., WGS data generation A total of 1-2 mL of whole blood were collected from the cephalic, saphenous, or jugular vein into EDTA blood collection tubes. DNA was either isolated from whole blood or from buffy coats after whole blood centrifugation at 2000 rpm for 15 minutes. Genomic DNA isolation was performed using commercially available kits (Gentra Puregene Blood kit, QIAGEN, Hilden Germany; ArchivePure;5Prime) and by following the respective manufacturer’s protocol. High-quality unfragmented DNA was selected by a combination of 1% agarose gel visualization and spectrophotometric confirmation (a 260/280 ratio of ~1.8 and a concentration of > 50 ng/uL; NanoDrop One/One, Thermofisher, Waltham, GA, USA). Samples were stored at -20°C until ready for shipment to Theragen Bio Co., Ltd, Gyeonggi-do, Republic of Korea for WGS. Paired-end DNA libraries were generated with a TruSeq DNA Nano library prep kit. Samples were then pooled and sequenced at ~30x coverage on the Illumina NovaSeq6000 platf..., # Unraveling the genetics of feline hypertrophic cardiomyopathy: A multiomics study of 138 cats

    Dataset DOI: 10.5061/dryad.cjsxksnjh

    Description of the data and file structure

    Data available

    1. A population level vcf of polymorphic SNP and indel variants were called among 138 domestic cats with and without hypertrophic cardiomyopathy (HCM). The VCF was generated by mapping paired wgs fastq reads to the Fca126 reference genome with bwa mem and calling variants through GATK4 best practices. Variant annotations were generated with Ensembl's VEP based on Fca126 gene and exon boundaries.  The vcf file contains meta-information lines, followed by a header line specifying fixed fields per sample and subsequent data lines detail variants at genomic positions. The fixed fields include chromosome (CHROM), position (POS), identifier (ID), the reference base(s) (REF), alternate base(s) (ALT), quality (QUAL), filter status (FILTER), and additional information ...,

  11. d

    Replication data for: Genetic analyses for the response to Bean Leaf Crumple...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Ariza-Suarez; Beat Keller; Anna Spescha; Johan Steven Aparicio; Victor Mayor; Ana Elizabeth Portilla-Benavides; Hector Fabio Buendia; Juan Miguel Bueno; Bruno Studer; Bodo Raatz (2023). Replication data for: Genetic analyses for the response to Bean Leaf Crumple Virus (BLCrV) identify a candidate LRR-RLK gene [Dataset]. http://doi.org/10.7910/DVN/9JSMED
    Explore at:
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Daniel Ariza-Suarez; Beat Keller; Anna Spescha; Johan Steven Aparicio; Victor Mayor; Ana Elizabeth Portilla-Benavides; Hector Fabio Buendia; Juan Miguel Bueno; Bruno Studer; Bodo Raatz
    Time period covered
    Jan 1, 2013 - Jan 1, 2020
    Description

    These datasets contain phenotypic and genotypic data from three connected populations of common bean (Phaseolus vulgaris L.) that were used to identify the genomic regions controlling the phenotypic response to Bean Leaf Crumple Virus (BLCrV). The first is the Andean by Meso (AxM) population, which contains 190 individuals derived from bi-parental crosses between Andean and Mesoamerican breeding lines. The AxM population included 120 additional breeding lines of Andean and Mesoamerican origin that were used as checks for their response against other viral diseases, such as Bean Golden Yellow Mosaic Virus (BGYMV). The second is a pre-breeding population (termed P135-136) composed of 111 lines that was obtained from two-way and three-way crosses between elite Andean lines and resistant sources against viral diseases. The third population is a panel of 186 Mesoamerican breeding lines assembled from a collection of elite materials from the Mesoamerican breeding pipeline at CIAT. The AxM population was evaluated in three yield trials in Palmira (Colombia)between 2013 and 2015 for flowering, maturity time and yield. All three population were evaluated in three BLCrV trials in Pradera (Colombia), where the disease pressure is naturally high. The AxM and the Mesoamerican panel were genotyped by sequencing (GBS), and these datasets contain their corresponding genotypic matrices in variant-call format (VCF, v4.2) with sequence variants mapped against the reference genome of P. vulgaris (G19833, v2.1). A joint genotypic matrix with all available GBS data from these three populations is also included. The population P135-136 was genotyped with the DArTag targeted genotyping service offered by Diversity Arrays Technology (DArT PL, Bruce ACT, Australia), and the genotypic matrix is similarly included in VCF format.

  12. Raw VCF files

    • figshare.com
    application/gzip
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina Cuomo (2023). Raw VCF files [Dataset]. http://doi.org/10.6084/m9.figshare.12693881.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Christina Cuomo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    VCF files submitted for each group/pipeline.

  13. d

    Climate adaptation and genetic differentiation in the mosquito species Culex...

    • search.dataone.org
    • datasetcatalog.nlm.nih.gov
    • +3more
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yunfei Liao; Touhid Islam; Rooksana Noorai4; Jared Streich; Christopher Saski; Lee Cohnstaedt; Elizabeth Cooper (2025). Climate adaptation and genetic differentiation in the mosquito species Culex tarsalis [Dataset]. http://doi.org/10.5061/dryad.51c59zwh3
    Explore at:
    Dataset updated
    Sep 26, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Yunfei Liao; Touhid Islam; Rooksana Noorai4; Jared Streich; Christopher Saski; Lee Cohnstaedt; Elizabeth Cooper
    Description

    The increasing prevalence of vector-borne diseases around the world highlights the pressing need for an in-depth exploration of the genetic and environmental factors that shape the adaptability and widespread distribution of mosquito populations. This research focuses on Culex tarsalis, a principal vector for various viral diseases, including West Nile Virus (WNV). Through the development of a new reference genome and the examination of Restriction-Site Associated DNA sequencing (RAD-seq) data from over 300 individuals and 28 locations, we demonstrate that variables such as temperature, evaporation rates, and the density of vegetation significantly impact the genetic makeup of Cx. tarsalis populations. Among the alleles most strongly associated with environmental factors is a nonsynonymous mutation in a key gene related to circadian rhythms. These results offer new insights into the mechanisms of spread and adaptation in a key North American vector species, which is poised to beco..., Sample Collection Individual mosquitoes were trapped and collected from 28 different locations across the United States and Canada as part of the North American Mosquito Project (NAMP). All samples used in this study were collected in 2012 between the months of April and October. Genome Sequencing, Assembly, and Annotation An F4 population was used to generate the reference genome assembly, and high molecular weight DNA was extracted and sequenced on a Pacific Biosciences (PacBio) RS II (University of Delaware). Thirty-five SMRT cells were generated. The resulting reads provided 76X coverage of the ~790Mb Cx. tarsalis genome, and were assembled with MECAT. Gene annotation was completed by MAKER using EST and protein data from the Culex quinquefasciatus and Aedes aegypti mosquitoes. Sequences were downloaded from the NCBI Taxonomy database and both Trinotate and InterProScan were used for functional annotation of the MAKER predicted genes. The annotated assembly was ass..., , # Climate adaptation and genetic differentiation in the mosquito species Culex tarsalis

    https://doi.org/10.5061/dryad.51c59zwh3

    Description of the data and file structure

    The data were stored in 8 different files.

    1. bi_20missing_filtSNP_maf_005.recode.vcf

      File Details

      • File Name: bi_10missing_filtSNP_maf_005.recode.vcf
      • File Format: VCF (Variant Call Format) v4.2
      • Reference genome: culex.60x.contigs.fasta (from header ##reference=...)
      • Source software (inferred): GATK-style headers present (e.g., QD, MQ, FS, ReadPosRankSum, RGQ, PL, SB, END, NON_REF) suggest generation with GATK/HaplotypeCaller (gVCF→VCF workflow).
      • Samples: 7 individuals → 13-2, 13-3, 13-4, 13-5, 13-6, 13-7, 13-8

      Data Description

       -The VCF contains standard columns: * #CHROM – Contig/chromosome * POS – 1-based position * ID – Variant identifier * REF – Reference allele * ALT – Alternate allele...,

  14. E

    Raw data (FASTQ) and processed data (VCF) of 7 patient-derived Sézary...

    • ega-archive.org
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Raw data (FASTQ) and processed data (VCF) of 7 patient-derived Sézary Syndrome (SS) cells [Dataset]. https://ega-archive.org/datasets/EGAD50000001646
    Explore at:
    Dataset updated
    Sep 1, 2025
    License

    https://ega-archive.org/dacs/EGAC50000000693https://ega-archive.org/dacs/EGAC50000000693

    Description

    We profile the whole-transcriptome (bulk RNAseq) of 7 patient-derived Sézary Syndrome (SS) cells to identify expression patterns, functional programs and expressed gene mutations that may provide clues on new therapeutic options for SS patients. The libraries were sequenced on NextSeq500 (Illumina) with a paired-end read length of 2x75bp. Raw data (FASTQ) and obtained processed data (VCF) including all called raw variants are available.

  15. Data from: Divergent selection in low recombination regions shapes the...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    application/gzip, bin
    Updated Dec 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenjun Zhou; Wenjun Zhou (2023). Divergent selection in low recombination regions shapes the genomic islands in two incipient shorebird species [Dataset]. http://doi.org/10.5061/dryad.4f4qrfjjp
    Explore at:
    bin, application/gzipAvailable download formats
    Dataset updated
    Dec 15, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wenjun Zhou; Wenjun Zhou
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Divergent selection in the face of gene flow is usually associated with a heterogeneous genomic landscape of divergence in nascent species pairs. However, multiple factors, such as divergent selection and local recombination rate variation, can influence the formation of these genomic island. This conundrum can be solved through examination of the genomic landscapes of species pairs that are still in the early stages of evolution. In this study, population genomics analyses were undertaken using a wide range of sampling and whole-genome resequencing data from 96 unrelated individuals of Kentish plover (Charadrius alexandrinus) and white-faced plover (C. dealbatus). We suggest that the two species exhibit varying levels of population admixture along the Chinese coast and on Taiwan Island. Genome-wide analyses for introgression indicate that ancient introgression had occurred in Taiwan population, and recurrent gene flow is still ongoing in mainland coastal populations. Furthermore, we identified a few genomic regions with significant levels of interspecific differentiation and local recombination suppression, which contain several genes potentially associated with disease resistance, coloration, and regulation of plumage moulting, thus may be connected to the phenotypic and ecological divergence of the two nascent species. Overall, our findings suggest that divergent selection in low recombination regions may be the main force in shaping the genomic islands in two incipient shorebird species.

  16. Comparison of the number of dbSNP, ClinVar and GWAScat sites represented...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne (2023). Comparison of the number of dbSNP, ClinVar and GWAScat sites represented using VCF, gVCF and eVCF files. [Dataset]. http://doi.org/10.1371/journal.pone.0132180.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of the number of dbSNP, ClinVar and GWAScat sites represented using VCF, gVCF and eVCF files.

  17. d

    Data from: Acquired dysfunction of CFTR underlies cystic fibrosis-like...

    • search.dataone.org
    • datasetcatalog.nlm.nih.gov
    • +2more
    Updated Jul 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jody Gookin (2024). Acquired dysfunction of CFTR underlies cystic fibrosis-like disease of the canine gallbladder [Dataset]. http://doi.org/10.5061/dryad.2rbnzs7xq
    Explore at:
    Dataset updated
    Jul 20, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Jody Gookin
    Description

    Mucocele formation in dogs is a unique and enigmatic muco-obstructive disease of the gallbladder caused by amassment of abnormal mucus that bears striking pathological similarity to cystic fibrosis. We investigated the role of CFTR in the pathogenesis of this disease. The location and frequency of disease-associated variants in the coding region of CFTR was compared using whole genome sequence data from 2,642 dogs representing breeds at low-risk, high-risk, or with confirmed disease. Expression, localization, and ion transport activity of CFTR was quantified in control and mucocele gallbladders by NanoString, Western blotting, immunofluorescence imaging, and studies in Ussing chambers. Our results establish significant loss of CFTR-dependent anion secretion by mucocele gallbladder mucosa. A significantly lower quantity of CFTR protein was demonstrated relative to E-cadherin in mucocele compared to control gallbladder mucosa. Immunofluorescence identified CFTR along the apical membrane o..., We used the Whole Animal Genome Sequencing (WAGS) pipeline to identify short nucleotide variants in a dataset of 2,642 dogs encompassing both private and public resources including 1,971 genomes from the Dog10K project. Briefly, the WAGS pipeline used Burrows-Wheeler Alignment tool-MEM to map paired-end reads to the UU_Cfam_GSD_1.0 reference genome. Variant calling was executed with Genome Analysis Toolkit (GATK4), and Ensembl’s Variant Effect Predictor (VEP, RRID:SCR_007931) predicted variant annotations and consequences. From the resulting VEP-processed VCF file, we extracted CFTR genic variants plus variants within 1Kb of the flanking sequence that passed filters. Subsequently, non-reference allele frequencies were calculated for each variant within the control, risk, and affected dog groups. , , # Acquired dysfunction of CFTR underlies cystic fibrosis-like disease of the canine gallbladder.

    https://doi.org/10.5061/dryad.2rbnzs7xq

    This dataset includes supplementary materials for the manuscript entitled Acquired dysfunction of CFTR underlies cystic fibrosis-like disease of the canine gallbladder.

    Description of the data and file structure

    Supplemental Figure S1 illustrates sample procurement and appearance of gallbladder from each of 9 dogs having mucosal RNA extracted for targeted gene expression analysis. Samples of lumen mucosa were obtained by excision from regions devoid of mucus or from which mucus could be gently removed. During sampling (panel A) and after removal of sample (panel B). Remaining panels show each of 9 individual mucocele gallbladders used for mucosal RNA sample collection. Pictures are immediately post-cholecystectomy followed by opening of the gallbladder to expose the lumen.

    **Supplemental Table S1...

  18. d

    Genotypes of Aedes aegypti mosquitoes derived from SNP chip and low-coverage...

    • search.dataone.org
    • dataone.org
    • +3more
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andres Gomez-Palacio; Gen Morinaga; Paul Turner; Maria Victoria Micieli; Mohammed-Ahmed Elnour; Bashir Salim; Sinnathamby Noble Surendran; Ranjan Ramasamy; Jeffrey Powell; John Soghigian; Andrea Gloria-Soria (2025). Genotypes of Aedes aegypti mosquitoes derived from SNP chip and low-coverage whole genome sequencing for platform cross-validation [Dataset]. http://doi.org/10.5061/dryad.m0cfxppbd
    Explore at:
    Dataset updated
    Oct 15, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Andres Gomez-Palacio; Gen Morinaga; Paul Turner; Maria Victoria Micieli; Mohammed-Ahmed Elnour; Bashir Salim; Sinnathamby Noble Surendran; Ranjan Ramasamy; Jeffrey Powell; John Soghigian; Andrea Gloria-Soria
    Description

    The mosquito Aedes aegypti is the primary vector of many human arboviruses such as dengue, yellow fever, chikungunya, and Zika, which affect millions of people world-wide. Population genetics studies on this mosquito have been important in understanding its invasion pathways and success as a vector of human disease. The Axiom aegypti1 SNP chip was developed from a sample of geographically diverse Ae. aegypti populations to facilitate genomic studies on this species. Here we evaluate the utility of the Axiom aegypti1 SNP chip for population genetics and compare it with a low-depth shot-gun sequencing approach using mosquitoes from the species’ native (Africa) and invasive range (outside Africa). These analyses indicate that the results from the SNP chip are highly reproducible and have a higher sensitivity to capture alternative alleles than a low-coverage whole-genome sequencing approach. Although the SNP chip suffers from ascertainment bias, results from population structure, ancestry,..., DNA from individual Aedes aegypti mosquitoes was extracted and used for genotyping at 50,000 loci distributed along the species genome, using the Axiom Aegypti1 SNP chip (Life Technologies Corporation CAT#550481). Files "all_snps_G3Dryad" and "Replicas_SNPchip" contain all 50,000 SNPs genotyped, prior to filtering. File "50k_SNPs_30_samples_LD_MAF_miss_FINAL" contain the SNPs after applying filters in Plink 1.9 (https://www.cog-genomics.org/plink/) for linkage disequilibrium (LD: -indep-pairwise 50 10 0.3), minor allele frequency (MAF: -maf 0.1) and missing data (-geno 0.1)., , # Genotypes of Aedes aegypti mosquitoes derived from SNP chip and low-coverage whole genome sequencing for platform cross-validation

    https://doi.org/10.5061/dryad.m0cfxppbd

    Files: Replicas_SNPchip

    SNP chip data generated from 20 individual Aedes aegypti mosquitos from Sudan and Sri Lanka using the Axiom Aegypti1 array (Life Technologies Corporation CAT#550481) . Each mosquito was genotyped in triplicate independently in different chips. All 50,000 loci genotyped are included, prior to any filtering.

    Files: all_snps_G3Dryad

    SNP chip data generated from 13 individual Aedes aegypti mosquitos from populations worldwide using the Axiom Aegypti1 array (Life Technologies Corporation CAT#550481). All 50,000 loci genotyped are included, prior to any filtering.

    File: 50k_SNPs_30_samples_LD_MAF_miss_FINAL.vcf.gz

    Variant calling file (vcf) containing 30 individual *Aedes aegypti *genotypes used for population genetic analysis, with five individu...,

  19. d

    Data from: Genome-wide association study of an unusual dolphin mortality...

    • datadryad.org
    • data.niaid.nih.gov
    • +2more
    zip
    Updated Nov 29, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kimberley C. Batley; Jonathan Sandoval-Castillo; Catherine M. Kemper; Catherine R.M. Attard; Nikki Zanardo; Ikuko Tomo; Luciano B. Beheregaray; Luciana M. Möller (2018). Genome-wide association study of an unusual dolphin mortality event reveals candidate genes for susceptibility and resistance to cetacean morbillivirus [Dataset]. http://doi.org/10.5061/dryad.tk8774f
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 29, 2018
    Dataset provided by
    Dryad
    Authors
    Kimberley C. Batley; Jonathan Sandoval-Castillo; Catherine M. Kemper; Catherine R.M. Attard; Nikki Zanardo; Ikuko Tomo; Luciano B. Beheregaray; Luciana M. Möller
    Time period covered
    Nov 29, 2018
    Area covered
    St. Vincent Gulf, South Australia
    Description

    Tursiops SNP datasetSNP genotype, vcf file. Mapped to the Tursiops truncatus genome (GCA_001922835.1).mappedQC.fil5.vcfTursiops ref_seqFForward reference sequencesTur_1.fastaTursiops ref_seqRReverse reference sequencesTur_2.fasta

  20. a

    TBX1

    • alliancegenome.org
    Updated Apr 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alliance of Genome Resources (2025). TBX1 [Dataset]. http://identifiers.org/HGNC:11592
    Explore at:
    Dataset updated
    Apr 16, 2025
    Dataset authored and provided by
    Alliance of Genome Resources
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    T-box transcription factor 1 Enables protein homodimerization activity and sequence-specific double-stranded DNA binding activity. Involved in several processes, including chordate embryonic development; parathyroid gland development; and soft palate development. Predicted to be active in chromatin and nucleus. Implicated in several diseases, including DiGeorge syndrome; congenital heart disease (multiple); hypoparathyroidism; sensorineural hearing loss; and velocardiofacial syndrome. Biomarker of congenital heart disease. This gene is a member of a phylogenetically conserved family of genes that share a common DNA-binding domain, the T-box. T-box genes encode transcription factors involved in the regulation of developmental processes. This gene product shares 98% amino acid sequence identity with the mouse ortholog. DiGeorge syndrome (DGS)/velocardiofacial syndrome (VCFS), a common congenital disorder characterized by neural-crest-related developmental defects, has been associated with deletions of chromosome 22q11.2, where this gene has been mapped. Studies using mouse models of DiGeorge syndrome suggest a major role for this gene in the molecular etiology of DGS/VCFS. Several alternatively spliced transcript variants encoding different isoforms have been described for this gene. [provided by RefSeq, Jul 2008]

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne (2023). Concordance of genotypes represented in VCF and gVCF files with those detected by the MI RISK Plus kit. [Dataset]. http://doi.org/10.1371/journal.pone.0132180.t001
Organization logo

Concordance of genotypes represented in VCF and gVCF files with those detected by the MI RISK Plus kit.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Concordance of genotypes represented in VCF and gVCF files with those detected by the MI RISK Plus kit.

Search
Clear search
Close search
Google apps
Main menu