100+ datasets found
  1. Sequence Read Archive (SRA)

    • healthdata.gov
    • datahub.hhs.gov
    • +3more
    csv, xlsx, xml
    Updated Jul 2, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    datadiscovery.nlm.nih.gov (2021). Sequence Read Archive (SRA) [Dataset]. https://healthdata.gov/NIH/Sequence-Read-Archive-SRA-/pgqz-iwtp
    Explore at:
    xml, csv, xlsxAvailable download formats
    Dataset updated
    Jul 2, 2021
    Dataset provided by
    datadiscovery.nlm.nih.gov
    Description

    The Sequence Read Archive (SRA) stores sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome Analyzer®, Life Technologies AB SOLiD System®, Helicos Biosciences Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.

  2. d

    Sequencing Data for Hospital Metagenomes

    • catalog.data.gov
    • gimi9.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Sequencing Data for Hospital Metagenomes [Dataset]. https://catalog.data.gov/dataset/sequencing-data-for-hospital-metagenomes
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    U.S. EPA Office of Research and Development (ORD)
    Description

    FASTA files containing the sequence data and for Assembled contigs (FastA), Predicted genes (FastA), Predicted proteins (FastA), Gene prediction (GFF v2). This dataset is not publicly accessible because: These are sequences that have already been deposited in publicly available databases and therefore we can avoid replication. Also the data is quite large and there are numerous files associated with these entries, which are included in the links below. It can be accessed through the following means: Using the following web links https://www.ncbi.nlm.nih.gov/bioproject/PRJNA299404 https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP065069 http://enve-omics.ce.gatech.edu/data/showerheads. Format: The data represent genome sequencing and assembly of 180 different contigs. This dataset is associated with the following publication: Soto-Giron, M.J., L. Rodriguez, C. Luo , M. Elk, H. Ryu, J. Santodomingo , and K. Konstantinidis. Biofilms on Hospital Shower Hoses: Characterization and Implications for Nosocomial Infections. APPLIED AND ENVIRONMENTAL MICROBIOLOGY. American Society for Microbiology, Washington, DC, USA, 82(9): 2872-2883, (2016).

  3. N

    to be completed

    • data.niaid.nih.gov
    Updated Feb 21, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). to be completed [Dataset]. https://data.niaid.nih.gov/resources?id=ncbi_sra_erp014221
    Explore at:
    Dataset updated
    Feb 21, 2018
    Description

    to be completed

  4. Pseudomonas sp. HOU2 predicted gene sequences

    • figshare.com
    txt
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Van Hong Thi Dao; Son Truong Dinh (2024). Pseudomonas sp. HOU2 predicted gene sequences [Dataset]. http://doi.org/10.6084/m9.figshare.26325310.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Van Hong Thi Dao; Son Truong Dinh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These whole genome of Pseudomonas sp. HOU2 were analyzed by RAST (Rapid Annotation using Subsystem Technology) (https://rast.nmpdr.org/) on 18 July 2024 with the following selected options to get the predicted HOU2 gene sequences. Genetic code: 11Annotation scheme: RASTtkPreserve gene calls: noAutomatically fix errors: yesFix frameshifts: yesBackfill gaps: yesNCBI Sequence Read Archive of Pseudomonas sp. HOU2 is SRR29666724 (https://www.ncbi.nlm.nih.gov/sra/SRR29666724)NCBI complete genome of Pseudomonas sp. HOU2 is CP160398.1 (https://www.ncbi.nlm.nih.gov/nuccore/CP160398)

  5. N

    CHOP3_NTM

    • data.niaid.nih.gov
    Updated Apr 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). CHOP3_NTM [Dataset]. https://data.niaid.nih.gov/resources?id=ncbi_sra_srp308850
    Explore at:
    Dataset updated
    Apr 21, 2021
    Description

    Whole genome sequencing of non tuberculosis micro bacterium isolates.

  6. Z

    Processed SNP Data and Genomic Annotations for Cinnamomum

    • nde-dev.biothings.io
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    furan, alp (2025). Processed SNP Data and Genomic Annotations for Cinnamomum [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_15044923
    Explore at:
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    furan, alp
    GENLİ, Gülistan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains SNP variant data and population genomics analysis results derived from RNA-Seq data of Cinnamomum species. The processed VCF files and population genetics outputs include SNP annotations, allele frequency distributions, linkage disequilibrium analysis, and Fst calculations. These data were used in the study titled "[Your Manuscript Title]" and are made publicly available for further research and validation.

    Included Files:

    Processed SNP datasets in VCF format

    SNP annotations and allele frequency tables

    Population genomics analysis results (Fst, LD decay, AFS)

    Usage:Researchers can use this dataset for comparative genomics, evolutionary studies, and population structure analysis of Cinnamomum species.

    NCBI SRA Data

    The raw RNA-Seq data used in this study were retrieved from the NCBI Sequence Read Archive (SRA) under the following accession numbers:

    SRR10063926

    SRR10063927

    SRR10063928

    SRR31477125

    SRR31477126

    SRR31477127

    The datasets can be accessed via the NCBI SRA database: https://www.ncbi.nlm.nih.gov/sra

    Reference Genome

    The reference genome used for alignment and variant calling was obtained from:

    GenBank Accession: GCA_003546025.1

    Available at: NCBI Genome Database

    Variant Calling and Population Genetics Analysis Tools

    The variant calling, SNP annotation, and population genetics analyses were performed using the following tools:

    VCFtools: Danecek P, et al. (2011) "The variant call format and VCFtools." Bioinformatics. DOI: 10.1093/bioinformatics/btr330

    PLINK: Purcell S, et al. (2007) "PLINK: a tool set for whole-genome association and population-based linkage analyses." American Journal of Human Genetics. DOI: 10.1086/519795

  7. d

    Chromosome assembly and preliminary gene and repeat annotations for Myzomela...

    • datadryad.org
    • data-staging.niaid.nih.gov
    • +1more
    zip
    Updated Jul 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elsie Shogren; Jason Sardell; Christina Muirhead; Emiliano Martí; Elizabeth Cooper; Robert Moyle; Daven Presgraves; Albert J. Uy (2024). Chromosome assembly and preliminary gene and repeat annotations for Myzomela tristrami reference genome [Dataset]. http://doi.org/10.5061/dryad.612jm64c9
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 27, 2024
    Dataset provided by
    Dryad
    Authors
    Elsie Shogren; Jason Sardell; Christina Muirhead; Emiliano Martí; Elizabeth Cooper; Robert Moyle; Daven Presgraves; Albert J. Uy
    Time period covered
    Jul 15, 2024
    Description

    Chromosome assembly and preliminary gene and repeat annotations for Myzomela tristrami reference genome

    I. Files (GENOME) Mt_v1.0_MAIN.fa.gz Primary genome, (largely) scaffolded to chromosome-level, plus other primary assembled contigs Mt_v1.0_MAIN.gff.gz Simple gene annotations for primary genome, annotated using GeMoMa v1.8 and a zebra finch (bTaeGut1.4.pri) annotation reference Mt_v1.0_extra.fa.gz Additional contigs, not for use in most analyses but some may be of interest This set is a combination of hand-identified haplotigs of the main genome, and assembler-identified "alternate" (haplotig) contigs (ORIGINAL_ASSEMBLY_CONTIGS) Mt_hifi.asm.p.fa.gz "primary" assembly contigs, output from hifiasm (v0.13-r308) Mt_hifi.asm.a.fa.gz "alternate" assembly contigs, output from hifiasm (v0.13-r308) (REPEAT_MASKING) TElib_Myzo_preliminary.fa.gz Preliminary Myzomela-tuned TE/repeat library, generated using RepeatModeler (v.2) Mt_v1.0_MAIN_RM_sites_to_filter.txt List of sites masked by RepeatM...

  8. b

    Data relating to RNA sequence accessions at NCBI from Ross Sea...

    • bco-dmo.org
    • search.dataone.org
    csv
    Updated May 17, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca J. Gast (2018). Data relating to RNA sequence accessions at NCBI from Ross Sea Dinoflagellates, Phaeocystis antarctica, Pyramimons tychotreta, and Micromonas polaris (CCMP 2099) (Kleptoplasty project) [Dataset]. https://www.bco-dmo.org/dataset/728427
    Explore at:
    csv(16.59 KB)Available download formats
    Dataset updated
    May 17, 2018
    Dataset provided by
    Biological and Chemical Data Management Office
    Authors
    Rebecca J. Gast
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 1997 - Apr 7, 1998
    Area covered
    Variables measured
    lat, lon, temp, depth, isolate, Organism, BioSample, SRA_Study, replicate, Assay_Type, and 13 more
    Measurement technique
    Automated DNA Sequencer
    Description

    This dataset contains data related to RNA sequence genetic accessions at the National Center for Biotechnology Information (NCBI) including information about the host organism, collection location, and collection date.

    The accessions are the unprocessed Illumina MiSeq reads for the Ross Sea Dinoflagellate RNA-Seq experiments, Phaeocystis antarctica RNA-Seq experiments, and Pyramimons tychotreta & Micromonas polaris (CCMP 2099) mixotrophy experiments.

    Pyramimonas tychotreta & Micromonas polaris (CCMP 2099) mixotrophy RNA sequences are available through the NCBI Sequence Read Archive (SRA) under the SRA accession number SRP090401 (BioProject PRJNA342459)

    Ross Sea Dinoflagellate RNA sequences are available through the NCBI Sequence Read Archive (SRA) under the accession number SRP132912 (BioProject PRJNA428208).

    Phaeocystis antarctica RNA sequences are available through the NCBI Sequence Read Archive (SRA) under the accession number SRP133243 (BioProject PRJNA434497).

  9. Additional file 2 of SequencErr: measuring and suppressing sequencer errors...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Feb 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric M. Davis; Yu Sun; Yanling Liu; Pandurang Kolekar; Ying Shao; Karol Szlachta; Heather L. Mulder; Dongren Ren; Stephen V. Rice; Zhaoming Wang; Joy Nakitandwe; Alexander M. Gout; Bridget Shaner; Salina Hall; Leslie L. Robison; Stanley Pounds; Jeffery M. Klco; John Easton; Xiaotu Ma (2024). Additional file 2 of SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data [Dataset]. http://doi.org/10.6084/m9.figshare.13636939.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 5, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Eric M. Davis; Yu Sun; Yanling Liu; Pandurang Kolekar; Ying Shao; Karol Szlachta; Heather L. Mulder; Dongren Ren; Stephen V. Rice; Zhaoming Wang; Joy Nakitandwe; Alexander M. Gout; Bridget Shaner; Salina Hall; Leslie L. Robison; Stanley Pounds; Jeffery M. Klco; John Easton; Xiaotu Ma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2: Supplementary Table S1. List of NCBI SRA ( https://www.ncbi.nlm.nih.gov/sra ) studies and associated platform, research institute and country.

  10. z

    Genome assemblies and respective wg/cgMLST profiles of a diverse dataset...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    xlsx, zip
    Updated Sep 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges (2022). Genome assemblies and respective wg/cgMLST profiles of a diverse dataset comprising 1,434 Salmonella enterica isolates [Dataset]. http://doi.org/10.5281/zenodo.7230091
    Explore at:
    zip, xlsxAvailable download formats
    Dataset updated
    Sep 28, 2022
    Dataset provided by
    Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal
    Department Biological Safety, German Federal Institute for Risk Assessment, Berlin, Germany
    Authors
    Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies and respective 8,558-loci whole-genome (wg) Multiple Locus Sequence Type (MLST) profiles [INNUENDO schema (Llarena et al. 2018) available in chewie-NS (Mamede et al. 2022)] of a final set of 1,434 Salmonella enterica samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) at the beginning of the analysis (November 2021). This set of samples was carefully selected to cover a wide genetic diversity (assessed in terms of serotype). In total, 125 different serotypes are represented in this dataset, with Typhimurium (including monophasic), Enteritidis and Infantis being the most represented ones and, together, corresponding to 56.2% of the dataset.

    File “Se_metadata.xlsx” contains metadata information for each isolate, including ENA/SRA accession number, BioProject and in-silico MLST ST and serotype.

    The directory “assemblies/” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    The file “profiles/Se_profiles_wgMLST.tsv” corresponds to a tab separated file with the 8,558-loci wgMLST profiles of each isolate presented in the metadata file. The files “profiles/Se_profiles_cgMLST_95.tsv”, “profiles/Se_profiles_cgMLST_98.tsv” and “profiles/Se_profiles_cgMLST_100.tsv” correspond to a 3,261-loci, 3,179-loci and 874-loci cgMLST profiles of each isolate presented in the metadata file, respectively. These profiles were determined as explained below.

    Dataset selection and curation

    With the objective of creating a diverse dataset of S. enterica genome assemblies, we collected information about the genetic diversity (serotype) of the isolates available at Enterobase database in the beginning of this analysis (November 2021) and in other previous works. Based on this information, we selected an initial dataset comprising 1,779 samples associated with four BioProjects (PRJEB16326, PRJEB20997, PRJEB30335 and PRJEB39988). Their WGS data was downloaded from ENA/SRA with fastq-dl v1.0.6. Read quality control, trimming and assembly were performed with the Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,434 isolates passed this curation step and were included in the final dataset. In-silico serotyping was performed with SeqSero2 v1.2.1 (Zhang et al. 2019). wgMLST profiles of each of these isolates were determined with chewBBACA v2.8.5 (Silva et al. 2018), using the 8,558-loci INNUENDO schema available in chewie-NS (Llarena et al. 2018; Mamede et al. 2022) and downloaded on May 31st, 2022. Three cgMLST schemas were obtained with ReporTree v1.0.0 (Mixão et al. 2022) using the 8,558-loci wgMLST profiles of the 1,434 isolates as input and setting distinct “--site-inclusion” thresholds: 0.95, 0.98 and 1.0 (i.e., keep schema loci called in at least 95%, 98% and 100% of the samples, resulting in a 3,261-loci, 3,179-loci and 874-loci allelic matrices, respectively).

  11. N

    Pantoea sp. PSNIH2 genome sequencing

    • data.niaid.nih.gov
    Updated Aug 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Pantoea sp. PSNIH2 genome sequencing [Dataset]. https://data.niaid.nih.gov/resources?id=ncbi_sra_srp047062
    Explore at:
    Dataset updated
    Aug 25, 2020
    Description

    Whole genome sequencing of Pantoea sp.

  12. Additional file 3 of SequencErr: measuring and suppressing sequencer errors...

    • springernature.figshare.com
    xlsx
    Updated Feb 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric M. Davis; Yu Sun; Yanling Liu; Pandurang Kolekar; Ying Shao; Karol Szlachta; Heather L. Mulder; Dongren Ren; Stephen V. Rice; Zhaoming Wang; Joy Nakitandwe; Alexander M. Gout; Bridget Shaner; Salina Hall; Leslie L. Robison; Stanley Pounds; Jeffery M. Klco; John Easton; Xiaotu Ma (2024). Additional file 3 of SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data [Dataset]. http://doi.org/10.6084/m9.figshare.13636942.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 5, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Eric M. Davis; Yu Sun; Yanling Liu; Pandurang Kolekar; Ying Shao; Karol Szlachta; Heather L. Mulder; Dongren Ren; Stephen V. Rice; Zhaoming Wang; Joy Nakitandwe; Alexander M. Gout; Bridget Shaner; Salina Hall; Leslie L. Robison; Stanley Pounds; Jeffery M. Klco; John Easton; Xiaotu Ma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 3: Supplementary Table S2. HiSeq datasets. Publicly accessible studies deposited in NCBI SRA ( https://www.ncbi.nlm.nih.gov/sra ) were reviewed to account for lost read names.

  13. d

    Data from Readsynth: short-read simulation for consideration of...

    • search.dataone.org
    • datadryad.org
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Kuster (2025). Data from Readsynth: short-read simulation for consideration of composition-biases in reduced metagenome sequencing approaches [Dataset]. http://doi.org/10.5061/dryad.nzs7h44zk
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Ryan Kuster
    Description

    Background The application of reduced metagenomic sequencing approaches holds promise as a middle ground between targeted amplicon sequencing and whole metagenome sequencing approaches but has not been widely adopted as a technique. A major barrier to adoption is the lack of read simulation software built to handle characteristic features of these novel approaches. Reduced metagenomic sequencing (RMS) produces unique patterns of fragmentation per genome that are sensitive to restriction enzyme choice, and the non-uniform size selection of these fragments may introduce novel challenges to taxonomic assignment as well as relative abundance estimates. Results Through the development and application of simulation software, readsynth, we compare simulated metagenomic sequencing libraries with existing RMS data to assess the influence of multiple library preparation and sequencing steps on downstream analytical results. Based on read depth per position, readsynth achieved 0.79 Pearson’s corre..., Sequence data were collected and aggregated from publicly available NCBI SRA databases for raw sequence data (https://www.ncbi.nlm.nih.gov/sra) and NCBI RefSeq databases for reference genome assemblies (https://www.ncbi.nlm.nih.gov/refseq/). Downloaded reference genomes have been concatenated and indexed using command line "cat" command and the bwa index command., , # readsynth_analysis

    https://doi.org/10.5061/dryad.nzs7h44zk

    The dataset contained here provides the necessary raw sequence data to perform analyses for the simulation software readsynth.

    The dataset includes the genomes and databases necessary to reproduce the steps in the github repository readsynth_analysis and correspond with that repository's "raw_data" directory.

    Description of the data and file structure

    The genome directory "raw_data" is broken into the following subdirectories (further descriptions below):

    .
    ├── helius
    │  └── all_2084
    │    ├── genomes
    │    └── genomes_combined
    ├── kraken_dbs
    │  ├── k2_pluspfp_20220607
    │  ├── snipen_bei_db
    │  │  └── library
    │  │    └── added
    │  └── sun_atcc_db
    │    └── library
    │      └── added
    ├── liu_RMS
    │  └── mock_community_estimate
    │    ├── 10M_bracken_profile
    │ ...
    
  14. De novo assembly of Ensete ventricosum genome sequence

    • figshare.com
    application/gzip
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Studholme (2023). De novo assembly of Ensete ventricosum genome sequence [Dataset]. http://doi.org/10.6084/m9.figshare.828488.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    David Studholme
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file contains a de novo assembly of the Ensete ventricosum genome based on whole-genome shotgun sequencing by Illumina HiSeq paired reads and assembled using SOAPdenovo. This assembly has, in part, been submitted to GenBank under accession number AMZH00000000.1 (http://www.ncbi.nlm.nih.gov/nuccore/AMZH00000000.1/). However, because of limitations on the number of supercontigs/contigs that GenBank will accept, we did not submit supercontigs and contigs of shorter than 5 kb. The raw data are available from the Sequence Read Archive under accession number SRX202265 (see http://www.ncbi.nlm.nih.gov/sra?LinkName=nuccore_sra_wgs&from_uid=440571971). Data are described in this paper: Harrison, J.; Moore, K.A.; Paszkiewicz, K.; Jones, T.; Grant, M.R.; Ambacheew, D.; Muzemil, S.; Studholme, D.J. A Draft Genome Sequence for Ensete ventricosum, the Drought-Tolerant “Tree Against Hunger”. Agronomy 2014, 4, 13-33.

  15. Candida auris genome dataset

    • figshare.com
    xlsx
    Updated Dec 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vincenzo Di Pilato (2023). Candida auris genome dataset [Dataset]. http://doi.org/10.6084/m9.figshare.22233001.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Dec 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Vincenzo Di Pilato
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Global dataset of C. auris genome sequences (raw reads) generated with Illumna platforms following a paired-end approach. Selected records were retrieved from the NCBI SRA database (https://www.ncbi.nlm.nih.gov/sra/?term=candida+auris, accessed on December 30, 2022) through the RunSelecter tool. Inclusion criteria: assay type, WGS; organism, Candida auirs; host, homo sapiens; instrument, Illumina MiSeq, iSeq, HiSeq, NextSeq, NovaSeq; sequenced megabases, >190.

  16. N

    human genome sequencing

    • data.niaid.nih.gov
    Updated Mar 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). human genome sequencing [Dataset]. https://data.niaid.nih.gov/resources?id=ncbi_sra_srp364754
    Explore at:
    Dataset updated
    Mar 24, 2022
    Description

    The de novo mutation can cause the onset of a disease. This subtype is difficult to show symptoms in childhood and is easy to be ignored. Expanding the gene genotype mutation spectrum, can lay a foundation for the further application of mutation screening in genetic counseling.

  17. Combined antibiogram dataset from NCBI, ENA, BV-BRC, and more

    • zenodo.org
    zip
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anton Pashkov; Anton Pashkov; César Aguilar; César Aguilar (2025). Combined antibiogram dataset from NCBI, ENA, BV-BRC, and more [Dataset]. http://doi.org/10.5281/zenodo.15809334
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anton Pashkov; Anton Pashkov; César Aguilar; César Aguilar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The antibiograms.tsv.zip dataset collects antibiograms found in NCBI, ENA and BV-BRC. Each row corresponds to an antibiotic susceptibility test (AST) for a given sample against a specific antibiotic. The dataset is a table with 14 columns:

    1. biosample. A unique identifier for the sample from the NCBI BioSample database.
    2. sra_biosample. If given, a space-separated list of sample identifiers from the NCBI SRA Sample Database.
    3. species. The species the sample belongs to, and, in some cases, with subspecies information.
    4. antibiotic. The name of the antibiotic against which the sample is tested.
    5. phenotype. The interpreted phenotype from the AST standard used during testing.
    6. measurement_sign. If given, corresponds to the sign of the raw result from the AST. Its interpretation depends on the typing method.
    7. measurement_value. If given, corresponds to the value of the raw result from the AST. Its interpretation depends on the typing method.
    8. measurement_units. If given, corresponds to the units of the raw result from the AST.
    9. typing_method. Name of the technique used for AST.
    10. typing_platform. Name of the platform used for AST.
    11. standard. Testing standard used for the interpretation of the phenotype.
    12. genomes. Space-separated list of genome identifiers from the NCBI Genome Database (starting with GCA_ or GCF_), the BV-BRC Genome Database (starting with BVBRC_), or the ENA FTP Site (starting with ftp://ftp.sra.ebi.ac.uk/vol1/analysis/).
    13. reads. Space-separated list of read run identifiers from the NCBI SRA database.
    14. read_type. Space-separated list with the same length as the reads column, storing the type of read of each corresponding read run.

    The gn-genomes.zip file contains some extra genomes with associated AST metadata found in metadata.xlsx file within it.

  18. Genome assemblies and respective wg/cgMLST profiles of a diverse dataset...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    bin, zip
    Updated Jul 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges (2023). Genome assemblies and respective wg/cgMLST profiles of a diverse dataset comprising 1,999 Escherichia coli isolates [Dataset]. http://doi.org/10.5281/zenodo.7230102
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies and respective 7,601-loci whole-genome (wg) Multiple Locus Sequence Type (MLST) profiles [INNUENDO schema (Llarena et al. 2018) available in chewie-NS (Mamede et al. 2022)] of a final set of 1,999 Escherichia coli samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) at the beginning of the analysis (November 2021). This set of samples was carefully selected to cover a wide genetic diversity (assessed in terms of serotype). In total, 411 different serotypes are represented in this dataset, with O157:H7 being the most represented one, corresponding to 37.1% of the dataset.

    File “Ec_metadata.xlsx” contains metadata information for each isolate, including ENA/SRA accession number, BioProject and in-silico MLST ST and serotype.

    The directory “assemblies/” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    The file “profiles/Ec_profiles_wgMLST.tsv” corresponds to a tab separated file with the 7,601-loci wgMLST profiles of each isolate presented in the metadata file. The files “profiles/Ec_profiles_cgMLST_95.tsv”, “profiles/Ec_profiles_cgMLST_98.tsv” and “profiles/Ec_profiles_cgMLST_100.tsv” correspond to a 2,826-loci, 2,704-loci and 465-loci cgMLST profiles of each isolate presented in the metadata file, respectively. These profiles were determined as explained below.

    Dataset selection and curation

    With the objective of creating a diverse dataset of E. coli genome assemblies, we collected information about the genetic diversity (serotype) of the isolates available at Enterobase database in the beginning of this analysis (November 2021) and in other previous works. Based on this information, we selected an initial dataset comprising 2,688 samples associated with three BioProjects (PRJNA230969, PRJEB27020 and PRJNA248042). Their WGS data was downloaded from ENA/SRA with fastq-dl v1.0.6. Read quality control, trimming and assembly were performed with the Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,999 isolates passed this curation step and were included in the final dataset. In-silico serotyping was performed with seq_typing v2.2. wgMLST profiles of each of these isolates were determined with chewBBACA v2.8.5 (Silva et al. 2018), using the 7,601-loci INNUENDO schema available in chewie-NS (Llarena et al. 2018; Mamede et al. 2022) and downloaded on May 31st, 2022. Three cgMLST schemas were obtained with ReporTree v1.0.0 (Mixão et al. 2022) using the 7,601-loci wgMLST profiles of the 1,999 isolates as input and setting distinct “--site-inclusion” thresholds: 0.95, 0.98 and 1.0 (i.e., keep schema loci called in at least 95%, 98% and 100% of the samples, resulting in a 2,826-loci, 2,704-loci and 465-loci allelic matrices, respectively).

    Acknowledgements

    We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.

  19. N

    RNA-seq

    • data.niaid.nih.gov
    Updated Jul 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). RNA-seq [Dataset]. https://data.niaid.nih.gov/resources?id=ncbi_sra_srp269857
    Explore at:
    Dataset updated
    Jul 4, 2020
    Description

    RNA-seq for human

  20. u

    UniMelb Thylacine Genomics Repository

    • figshare.unimelb.edu.au
    application/gzip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CHARLES FEIGIN (2023). UniMelb Thylacine Genomics Repository [Dataset]. http://doi.org/10.26188/19351607.v3
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    The University of Melbourne
    Authors
    CHARLES FEIGIN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This FigShare repository contains genomic datasets for the Thylacine Genomics Project at the University of Melbourne (VIC, Australia). Currently 4 files arising from 2 publications are hosted on this repository. All assemblies arise from NCBI BioSample SAMN060496721). Feigin et al. 2018: Genome of the Tasmanian tiger provides insights into the evolution and demography of an extinct marsupial carnivore [https://www.nature.com/articles/s41559-017-0417-y]a) ThyCyn1.0: This assembly is the first de novo whole-genome assembly for the thylacine. It is a contig-level assembly and thus contains no scaffolds. It was generated from short insert paired-end reads (https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=354646). Sequencing and data pre-processing strategy are discussed in the methods of Feigin et al. 2018. This assembly was used exclusively to estimate genome size and G+C content of the thylacine genome.b) UniMelb_Thylacine_Refassem_1/GCA_007646695.1: Because of the highly-fragmentary nature (low N50) of ThyCyn1.0, UniMelb_Thylacine_Refassem_1 was generated to perform all evolutionary analyses detailed in Feigin et al. 2018. UniMelb_Thylacine_Refassem_1 was generated by mapping thylacine reads against the repeatmasked version of the previous Tasmanian devil draft genome Devil_ref v7.0 and generating reference-guided scaffolds. This is not a complete genome assembly, as it is composed only of non-repetitive genomic regions and does not include indel differences between Thylacine and devil. This was done to preserve the coordinate systems between thylacine and devil, permitting the use of the already-existing Tasmanian devil gene annotations. See methods of Feigin et al. 2018 for details. Assembly is hosted on NCBI under BioProject PRJNA354646. 2) Feigin et al. 2022: A chromosome-scale hybrid genome assembly of the extinct Tasmanian tiger (Thylacinus cynocephalus) [https://www.biorxiv.org/content/10.1101/2022.03.02.482690v1.full]a) ThyCyn2.0: This assembly is a chromosome-scale hybrid genome for the thylacine. It was generated by producing improved de novo contigs and short read-based scaffolds, which were then aligned to the Tasmanian devil reference genome mSarHar1.11. This assembly represents a substantial improvement in contiguity and completeness over both ThyCyn1.0 and UniMelb_Thylacine_Refassem_1. Assembly is hosted on NCBI under BioProject PRJNA354646.b) ThyCyn2.0 annotation: Associated with ThyCyn2.0, we have produced a set of homology-based gene annotations using a gene model liftover procedure (see Feigin et al. 2022 for details). Briefly, exons from the Tasmanian devil genome RefSeq annotation were aligned to the thylacine assembly and gene models were created by linking exons together, filtering for those with preserved the intron-exon structure of the reference devil annotation (with an allowable distance scaling factor of 4).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
datadiscovery.nlm.nih.gov (2021). Sequence Read Archive (SRA) [Dataset]. https://healthdata.gov/NIH/Sequence-Read-Archive-SRA-/pgqz-iwtp
Organization logo

Sequence Read Archive (SRA)

Explore at:
xml, csv, xlsxAvailable download formats
Dataset updated
Jul 2, 2021
Dataset provided by
datadiscovery.nlm.nih.gov
Description

The Sequence Read Archive (SRA) stores sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome Analyzer®, Life Technologies AB SOLiD System®, Helicos Biosciences Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.

Search
Clear search
Close search
Google apps
Main menu