77 datasets found
  1. Standard VCF files

    • figshare.com
    application/gzip
    Updated Mar 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina Cuomo (2023). Standard VCF files [Dataset]. http://doi.org/10.6084/m9.figshare.12729533.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Christina Cuomo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Standard VCF files.

  2. E

    Merged VCF file from familial Meniere disease cohort

    • ega-archive.org
    Updated Aug 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Merged VCF file from familial Meniere disease cohort [Dataset]. https://ega-archive.org/datasets/EGAD50000001682
    Explore at:
    Dataset updated
    Aug 11, 2025
    License

    https://ega-archive.org/dacs/EGAC50000000708https://ega-archive.org/dacs/EGAC50000000708

    Description

    This dataset contains a merge VCF file generated from WES data of patients diagnosed with familial Meniere disease (FMD). Variant calling followed GATK best practices using the nf-core/Sarek pipeline (v3), and variants were filtered using genotype-level thresholds consistent with gnomAD filters. Multiallelic variants were split and INDELs were left-aligned during normalization. Variant Quality Score Recalibration (VQSR) was applied separately to SNVs and INDELs using well-established truth sets, with a 90% sensitivity threshold to maximize the detection of rare variants. Final variants were annotated with Ensembl VEP.

  3. f

    Raw VCF files

    • figshare.com
    application/gzip
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina Cuomo (2023). Raw VCF files [Dataset]. http://doi.org/10.6084/m9.figshare.12693881.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    figshare
    Authors
    Christina Cuomo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    VCF files submitted for each group/pipeline.

  4. E

    Merged VCF file from sporadic Meniere disease cohort

    • ega-archive.org
    Updated Aug 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Merged VCF file from sporadic Meniere disease cohort [Dataset]. https://ega-archive.org/datasets/EGAD50000001683
    Explore at:
    Dataset updated
    Aug 11, 2025
    License

    https://ega-archive.org/dacs/EGAC50000000708https://ega-archive.org/dacs/EGAC50000000708

    Description

    This dataset contains a merge VCF file generated from WES data of patients diagnosed with sporadic Meniere disease (FMD). Variant calling followed GATK best practices using the nf-core/Sarek pipeline (v3), and variants were filtered using genotype-level thresholds consistent with gnomAD filters. Multiallelic variants were split and INDELs were left-aligned during normalization. Variant Quality Score Recalibration (VQSR) was applied separately to SNVs and INDELs using well-established truth sets, with a 90% sensitivity threshold to maximize the detection of rare variants. Final variants were annotated with Ensembl VEP.

  5. m

    SARS-CoV-2 GISAID isolates (2020-06-17) genotyping VCF

    • data.mendeley.com
    Updated Jul 25, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doğa Eskier (2020). SARS-CoV-2 GISAID isolates (2020-06-17) genotyping VCF [Dataset]. http://doi.org/10.17632/63t5c7xb4c.1
    Explore at:
    Dataset updated
    Jul 25, 2020
    Authors
    Doğa Eskier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    VCF file containing filtered mutated sites in SARS-CoV-2 genomes obtained from GISAID EpiCoV, separated by individual mutations. The columns correspond to viral genome accession ID, nucleotide position in the genome, mutation ID (left blank in all rows), reference nucleotide, identified mutation, quality, filter, and information columns (all left blank), format (GT in all rows), column corresponding to reference genome (all 0, referring to reference nucleotide column), and columns corresponding to isolate genomes, with each row identifying the nucleotide in the POS column, and whether it is non-mutant (0), or the mutant indicated in the identified mutation column (1). The file is tab delimited, with 22546 rows including the names, and 30690 columns.

    The file was generated to test the hypothesis whether the five most common mutations in the SARS-CoV-2 genome replication complex proteins, nsps 7, 8, 12, and 14, significantly affect the mutation density of the virus over time and whether these affect the synonymous and nonsynonymous mutation densities differently. We discovered that mutations in nsp14, an exonuclease with error correcting capabilities, are most likely to be correlated with increased mutational load across the genome compared to wildtype SARS-CoV-2. These results were obtained by identifying the frequency of mutations across all isolates in genomic regions of interest, analyzing which of the twenty mutations (five per nsp) have a statistically meaningful relationship with the mutation density in the M and E genes (chosen due to being under little selective pressure), and identifying the synonymous and nonsynonymous genomic SNV density for isolates with any of the statistically meaningful mutations, as well as isolates with none of the identified mutations.

  6. E

    NIHR BioResource Rare Diseases WGS project - Hypertrophic Cardiomyopathy...

    • ega-archive.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NIHR BioResource Rare Diseases WGS project - Hypertrophic Cardiomyopathy (HCM) Rare Disease domain (VCF data) [Dataset]. https://www.ega-archive.org/datasets/EGAD00001007885
    Explore at:
    License

    https://ega-archive.org/dacs/EGAC00001000259https://ega-archive.org/dacs/EGAC00001000259

    Description

    Short read whole genome sequencing (WGS) VCF files for the NIHR BioResource Rare Diseases WGS project – Participants from the Hypertrophic Cardiomyopathy (HCM) Rare Disease domain

  7. Concordance of genotypes represented in VCF and gVCF files with those...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne (2023). Concordance of genotypes represented in VCF and gVCF files with those detected by the MI RISK Plus kit. [Dataset]. http://doi.org/10.1371/journal.pone.0132180.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Concordance of genotypes represented in VCF and gVCF files with those detected by the MI RISK Plus kit.

  8. NA12878 WES Benchmark dataset

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, bin
    Updated May 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranckeviciene Erinija; Pranckeviciene Erinija (2020). NA12878 WES Benchmark dataset [Dataset]. http://doi.org/10.5281/zenodo.3597727
    Explore at:
    application/gzip, binAvailable download formats
    Dataset updated
    May 31, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pranckeviciene Erinija; Pranckeviciene Erinija
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset makes available the UCSC Genome Browser (genome.ucsc.edu) GRCh37 genome build public session NA12878 WES Benchmark files in a single dataset so that these files can be used in other applications or genome browsers such as IGV. All genomic variant calls in all VCF files were decomposed and normalized with vt. This dataset contains:

    1. Genome in a bottle (GIAB) version 3.3.2 high confidence (HC) variant calls and genomic regions for HapMap individual NA12878 :
      1. GIAB_v3.3.2_NA12878-decomposed-normalized.vcf.gz
      2. GIAB_v3.3.2_NA12878-decomposed-normalized.vcf.gz.tbi
      3. GIAB_v3.3.2_NA12878_HC_regions.bed
    2. HapMap individual NA12878 WES variant calls (VCF) and capture regions (BED) from diagnostic laboratories :
      • ARUP whole exome sequencing data (HiSeq 2000) publically available from NCBI GeT-RM Browser
        1. converted_ARUP_NA12878_Exome-decomposed-normalized.vcf.gz
        2. converted_ARUP_NA12878_Exome-decomposed-normalized.vcf.gz.tbi
        3. ARUP_SeqCap_EZ_Exome.bed
      • UCSF whole exome sequencing data (HiSeq 2500) publically available from NCBI GeT-RM Browser
        1. converted_UCSF_NA12878_WES_Agilent_V4_Custom-decomposed-normalized.vcf.gz
        2. converted_UCSF_NA12878_WES_Agilent_V4_Custom-decomposed-normalized.vcf.gz.tbi
        3. UCSF_WES_Agilent_V4_Custom.bed
      • Whole exome data (NextSeq 500) sequenced in CHEO diagnostic laboratory
        1. CHEO_NA12878_WES_S1dataset.vcf.gz
        2. CHEO_NA12878_WES_S1dataset.vcf.gz.tbi
        3. Agilent_CRE_v2.bed
    3. Genomic coordinates (BED) of OMIM genes for which a molecular basis of the associated disease is known (as of September 2019) :
      • Omim_Genes.bed

  9. Genotyping of GWAS catalog sites using the VCF and gVCF file formats and the...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne (2023). Genotyping of GWAS catalog sites using the VCF and gVCF file formats and the number of homozygous reference sites and no-calls based on WGS data. [Dataset]. http://doi.org/10.1371/journal.pone.0132180.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Genotyping of GWAS catalog sites using the VCF and gVCF file formats and the number of homozygous reference sites and no-calls based on WGS data.

  10. Human Variant Annotation Datasets

    • console.cloud.google.com
    Updated Jul 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&inv=1&invt=Ab3i5A (2022). Human Variant Annotation Datasets [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/human-variant-annotation-public
    Explore at:
    Dataset updated
    Jul 16, 2022
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    These datasets are important to genomics researchers because they characterize several aspects of what the scientific community has learned to date about human sequence variants. Making this human annotation data freely available in GCP will enable researchers to focus less on data movement and management tasks associated with procuring this data and instead make immediate use of the data to better understand the clinical relevance of particular variant such as disease causing or protective variants (ClinVar), search a catalog of SNPs that have been identified in the human genome (dbSNP), and discover how frequently a particular variant occurs across the human population (1000Genomes, ESP, ExAC, gnomAD) This human annotation dataset contains both a mirror of the original Variant Call Files (VCF) files from NCBI, NHLBI Exome Sequencing Project (ESP) and ensembl as Google Cloud Storage (GCS) objects. In addition, these human sequence variants have also been translated into a particular variant table format and made available in Google BigQuery giving researchers the ability to use cloud technology and code repositories such as the Verily Life Sciences Annotation Toolkit to perform analyses in parallel. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This public dataset is hosted in Google Cloud Storage and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.

  11. VCF genotype of four maize DH populations

    • figshare.com
    txt
    Updated Jun 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Min Wang (2022). VCF genotype of four maize DH populations [Dataset]. http://doi.org/10.6084/m9.figshare.20188547.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 30, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Min Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The vcf genotype for paper "Genetic dissection of QTLs for starch content in four maize DH populations". four DH populations (SC1, SC2, SC3 and SC4) developed from F1 plants of crosses among eight corresponding parents (SC*-P*). ALL lines were genotyped with the GenoBaits Maize 1K marker panel that was developed by Mol Breeding Biotechnology Co., Ltd., Shijiazhuang, China (http://www.molbreeding.com/), based on genotyping by target sequencing platform in maize.

  12. d

    Annotated VCF of 192 Verticillium dahliae isolates

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Mimee; Joel Lafond-Lapalme; Mario Tenuta (2025). Annotated VCF of 192 Verticillium dahliae isolates [Dataset]. http://doi.org/10.5061/dryad.g79cnp5v0
    Explore at:
    Dataset updated
    May 20, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Benjamin Mimee; Joel Lafond-Lapalme; Mario Tenuta
    Time period covered
    Jan 1, 2023
    Description

    Verticillium dahliae is an important soil-borne pathogen causing Verticillium wilt. It is also the primary causal agent of the Potato Early Dying, a disease complex involving the root-lesion nematode. Here, we report the whole-genome sequencing of 192 isolates of V. dahliae originating from the major potato production areas across Canada. Our results yielded a resource of 277,010 genetic variations that will be useful for genetic analyses and revealed the presence of two major lineages, both present in all provinces but exhibiting differences in regional prevalence.

  13. Genotyping of known SNPs from ClinVar using the VCF and gVCF file formats...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne (2023). Genotyping of known SNPs from ClinVar using the VCF and gVCF file formats and the number of homozygous reference sites and no-calls based on WGS data. [Dataset]. http://doi.org/10.1371/journal.pone.0132180.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alberto Ferrarini; Luciano Xumerle; Francesca Griggio; Marianna Garonzi; Chiara Cantaloni; Cesare Centomo; Sergio Marin Vargas; Patrick Descombes; Julien Marquis; Sebastiano Collino; Claudio Franceschi; Paolo Garagnani; Benjamin A. Salisbury; John Max Harvey; Massimo Delledonne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Genotyping of known SNPs from ClinVar using the VCF and gVCF file formats and the number of homozygous reference sites and no-calls based on WGS data.

  14. u

    Data from: A genome-guided strategy for climate resilience in American...

    • agdatacommons.nal.usda.gov
    • data.niaid.nih.gov
    • +1more
    bin
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Sandercock; Jared Westbrook; Qian Zhang; Jason Holliday (2025). Data from: A genome-guided strategy for climate resilience in American chestnut restoration populations [Dataset]. http://doi.org/10.5281/zenodo.10676843
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 20, 2025
    Dataset provided by
    Zenodo
    Authors
    Alexander Sandercock; Jared Westbrook; Qian Zhang; Jason Holliday
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The American chestnut (Castanea dentata) is a functionally extinct tree species that was decimated by an invasive fungal pathogen in the early 20th century. An understanding of the genomic architecture of local adaptation in wild American chestnut was necessary in order to deploy locally adapted, disease-resistant American chestnut populations. Here, we characterize the genomic basis of climate adaptation in remnant wild American chestnut, develop new computational methods, and evaluate the adaptive genomic content captured within backcross breeding populations. Whole genome re-sequencing data of 356 trees from Sandercock et al. (2022) coupled with genotype-environment association methods identified 18483 climate associated loci.Methods: VCF file: The ~21 million SNP dataset from Sandercock et al. (2022) was first imputed using BEAGLE and filtered to remove SNPs with MAF < 0.05. Climate associated loci were then identified using RDA and LFMM2 genotype-environment association methods. Seed zone shape files: Three seed zones were identified using the ~18k climate associated loci. These regions partition the chestnut range into geographic seed zones that reflect relatively homogeneous areas with respect to multivariate adaptive genomic variation. These regions can be used to conserve germplasm ex situ and guide subsequent breeding crosses that lead to climate-matched restoration populations. gmbigxhorn.jtl.map.2022.csv is a genetic map generated from American chestnut backcross genotyping-by-sequencing data. R code for estimating the average migration distance for each seed zone under future climate change conditions.

  15. S

    Exome sequencing data of the patients with liver disease

    • scidb.cn
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kenan Moral; Gulsum Kayhan; Tarik Duzenli; Sinan Sari; Mehmet Cindoruk; Nergiz Ekmen (2025). Exome sequencing data of the patients with liver disease [Dataset]. http://doi.org/10.57760/sciencedb.23199
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Kenan Moral; Gulsum Kayhan; Tarik Duzenli; Sinan Sari; Mehmet Cindoruk; Nergiz Ekmen
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Exome sequencing data (VCF files) of the nine adult patients with liver diseases.For exome sequencing, the library was prepared using Illumina DNA Prep with Exome 2.5 Enrichment product and sequenced on a NovaSeq 6000 instrument (Illumina, San Diego, CA). Reads were aligned to the GRCh38 Human Reference Genome using the Illumina DRAGEN Bio-IT Platform v3.9.

  16. d

    Replication data for: Genetic analyses for the response to Bean Leaf Crumple...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Ariza-Suarez; Beat Keller; Anna Spescha; Johan Steven Aparicio; Victor Mayor; Ana Elizabeth Portilla-Benavides; Hector Fabio Buendia; Juan Miguel Bueno; Bruno Studer; Bodo Raatz (2023). Replication data for: Genetic analyses for the response to Bean Leaf Crumple Virus (BLCrV) identify a candidate LRR-RLK gene [Dataset]. http://doi.org/10.7910/DVN/9JSMED
    Explore at:
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Daniel Ariza-Suarez; Beat Keller; Anna Spescha; Johan Steven Aparicio; Victor Mayor; Ana Elizabeth Portilla-Benavides; Hector Fabio Buendia; Juan Miguel Bueno; Bruno Studer; Bodo Raatz
    Time period covered
    Jan 1, 2013 - Jan 1, 2020
    Description

    These datasets contain phenotypic and genotypic data from three connected populations of common bean (Phaseolus vulgaris L.) that were used to identify the genomic regions controlling the phenotypic response to Bean Leaf Crumple Virus (BLCrV). The first is the Andean by Meso (AxM) population, which contains 190 individuals derived from bi-parental crosses between Andean and Mesoamerican breeding lines. The AxM population included 120 additional breeding lines of Andean and Mesoamerican origin that were used as checks for their response against other viral diseases, such as Bean Golden Yellow Mosaic Virus (BGYMV). The second is a pre-breeding population (termed P135-136) composed of 111 lines that was obtained from two-way and three-way crosses between elite Andean lines and resistant sources against viral diseases. The third population is a panel of 186 Mesoamerican breeding lines assembled from a collection of elite materials from the Mesoamerican breeding pipeline at CIAT. The AxM population was evaluated in three yield trials in Palmira (Colombia)between 2013 and 2015 for flowering, maturity time and yield. All three population were evaluated in three BLCrV trials in Pradera (Colombia), where the disease pressure is naturally high. The AxM and the Mesoamerican panel were genotyped by sequencing (GBS), and these datasets contain their corresponding genotypic matrices in variant-call format (VCF, v4.2) with sequence variants mapped against the reference genome of P. vulgaris (G19833, v2.1). A joint genotypic matrix with all available GBS data from these three populations is also included. The population P135-136 was genotyped with the DArTag targeted genotyping service offered by Diversity Arrays Technology (DArT PL, Bruce ACT, Australia), and the genotypic matrix is similarly included in VCF format.

  17. d

    Climate adaptation and genetic differentiation in the mosquito species Culex...

    • search.dataone.org
    • datasetcatalog.nlm.nih.gov
    • +3more
    Updated Jun 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yunfei Liao (2024). Climate adaptation and genetic differentiation in the mosquito species Culex tarsalis [Dataset]. http://doi.org/10.5061/dryad.51c59zwh3
    Explore at:
    Dataset updated
    Jun 26, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Yunfei Liao
    Description

    The increasing prevalence of vector-borne diseases around the world highlights the pressing need for an in-depth exploration of the genetic and environmental factors that shape the adaptability and widespread distribution of mosquito populations. This research focuses on Culex tarsalis, a principal vector for various viral diseases including West Nile Virus (WNV). Through the development of a new reference genome and the examination of Restriction-Site Associated DNA sequencing (RAD-seq) data from over 300 individuals and 28 locations, we demonstrate that variables such as temperature, evaporation rates, and the density of vegetation significantly impact the genetic makeup of Cx. tarsalis populations. Among the alleles most strongly associated with environmental factors is a nonsynonymous mutation in a key gene related to circadian rhythms. These results offer new insights into the mechanisms of spread and adaptation in a key North American vector species, which is poised to become a g..., Sample Collection Individual mosquitoes were trapped and collected from 28 different locations across the United States and Canada as part of the North American Mosquito Project (NAMP). All samples used in this study were collected in 2012 between the months of April and October. Genome Sequencing, Assembly, and Annotation An F4 population was used to generate the reference genome assembly, and high molecular weight DNA was extracted and sequenced on a Pacific Biosciences (PacBio) RS II (University of Delaware). Thirty-five SMRTcells were generated. The resulting reads provided 76X coverage of the ~790Mb Cx. tarsalis genome, and were assembled with MECAT Gene annotation was completed by MAKER using EST and protein data from the Culex quinquefasciatus and Aedes aegypti mosquitoes. Sequences were downloaded from the NCBI Taxonomy database and both Trinotate and InterProScan were used for functional annotation of the MAKER predicted genes. The annotated assembly was assessed for complet..., , # Culex tarsalis dataset

    https://doi.org/10.5061/dryad.51c59zwh3

    Description of the data and file structure

    The data were stored in 5 different files.

    1. bi_20missing_filtSNP_maf_005.recode.vcf is the filtered .vcf file used in the paper analysis.Â
    2. land_monthly_climate_2009_2012_partial_canada.nc is the file used to extract climate variable information for the samples based on the latitude and longitude.
    3. The Culex-tarsalis-v1.0.a1-merged-2019-08-30-4-45-01.gff3 for annotating SNPs to the gene.
    4. Culex-tarsalis-v1.0.a1.5d6405151b078-interproscan.tab file for annotating the protein function.Â
    5. Â Ctarsalis_sample_w_GPS_climate_average_new_filtered_id_region.csv file contains the samples information such as location, environmental variables information.

    Code/Software

    The code uses the data above are presented in github: [https://github.com/Afei99357/Culex_Tarsalis_GWAS_manuscript.git](https://github.com/Afei99357/Culex_Tarsali...

  18. m

    SARS-CoV-2 GISAID UK-US isolates (2020-09-07) genotyping VCF

    • data.mendeley.com
    Updated Nov 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Necla Koçhan (2020). SARS-CoV-2 GISAID UK-US isolates (2020-09-07) genotyping VCF [Dataset]. http://doi.org/10.17632/5dfj2hhnng.1
    Explore at:
    Dataset updated
    Nov 16, 2020
    Authors
    Necla Koçhan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    VCF files containing filtered mutated sites in SARS-CoV-2 genomes obtained from GISAID EpiCoV and submitted from the UK and the US, separated by individual mutations. The columns correspond to viral genome accession ID, nucleotide position in the genome, mutation ID (left blank in all rows), reference nucleotide, identified mutation, quality, filter, and information columns (all left blank), format (GT in all rows), column corresponding to reference genome (all 0, referring to reference nucleotide column), and columns corresponding to isolate genomes, with each row identifying the nucleotide in the POS column, and whether it is non-mutant (0), or the mutant indicated in the identified mutation column (1). The files is tab delimited, with the UK file having 12696 rows including the names, and 18135 columns, and the US file having 15588 rows including the names, and 16277 columns.

    The file was generated to test the hypothesis whether the different SARS-CoV-2 genes or protein coding regions are positively or negatively selected differently between 14408C>T / 23403A>G double mutants and double wildtype isolates, using mutation rate models, and whether regional distributions affect the mutation rates. Our findings have shown that the RdRp coding region and the S gene show the highest amount of selection across viral generations, and that different countries can affect the synonymous and nonsynonymous mutation rates for individual genes.

  19. E

    Variant calling analysis of cfDNA whole exome sequencing in neuroblastoma

    • ega-archive.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Variant calling analysis of cfDNA whole exome sequencing in neuroblastoma [Dataset]. https://ega-archive.org/datasets/EGAD00001003803
    Explore at:
    License

    https://ega-archive.org/dacs/EGAC00001000319https://ega-archive.org/dacs/EGAC00001000319

    Description

    This dataset contains VCF files from a variant calling analysis of 19 neuroblastoma patients. WES or WGS data of the primary tumor were compared to WES cfDNA analysis at the time of diagnosis and at a 2nd timepoint (complete remission, partial remission, disease progression or relapse). For 4 patients, WGS of germline, tumor at diagnosis and tumor at relapse DNA was performed on Illumina HiSeq2500, with 100-bp paired-end reads. For the other patients, WES was performed using either an AgilentSureSelect Human All Exon v5 or a Roche Nimblegen SeqCap EZ Exome V3 kit on Illumina HiSeq2000, with 100-bp paired-end reads. SNVs observed in any of the primary tumors or cfDNA samples studied by WES were targeted using a capture sequencing panel at all intermediate time points.

  20. r

    PhenoDB

    • rrid.site
    • neuinfo.org
    • +2more
    Updated Aug 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PhenoDB [Dataset]. http://identifiers.org/RRID:SCR_016551
    Explore at:
    Dataset updated
    Aug 23, 2025
    Description

    Database for phenotype genotype associations for humans. Used by clinical researchers to store standardized phenotypic information, diagnosis, and pedigree data and then run analyses on VCF files from individuals, families or cohorts with suspected Mendelian disease.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Christina Cuomo (2023). Standard VCF files [Dataset]. http://doi.org/10.6084/m9.figshare.12729533.v1
Organization logo

Standard VCF files

Explore at:
49 scholarly articles cite this dataset (View in Google Scholar)
application/gzipAvailable download formats
Dataset updated
Mar 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Christina Cuomo
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Standard VCF files.

Search
Clear search
Close search
Google apps
Main menu