84 datasets found
  1. d

    Sequence Read Archive (SRA)

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Jun 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). Sequence Read Archive (SRA) [Dataset]. https://catalog.data.gov/dataset/sequence-read-archive-sra-54e4a
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    National Library of Medicine
    Description

    The Sequence Read Archive (SRA) stores sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome Analyzer®, Life Technologies AB SOLiD System®, Helicos Biosciences Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.

  2. f

    Study Characteristics: In this table, all publicly available data that were...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill (2023). Study Characteristics: In this table, all publicly available data that were aggregated for this study are described, along with their Sequence Read Archive bioproject numbers, sample descriptors and average number (#) of reads. [Dataset]. http://doi.org/10.1371/journal.pone.0255085.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Study Characteristics: In this table, all publicly available data that were aggregated for this study are described, along with their Sequence Read Archive bioproject numbers, sample descriptors and average number (#) of reads.

  3. o

    COVID-19 Genome Sequence Dataset

    • registry.opendata.aws
    • catalog.midasnetwork.us
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (NLM) (2020). COVID-19 Genome Sequence Dataset [Dataset]. https://registry.opendata.aws/ncbi-covid-19/
    Explore at:
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    <a href="http://nlm.nih.gov/">National Library of Medicine (NLM)</a>
    Description

    This repository within the ACTIV TRACE initiative houses a comprehensive collection of datasets related to SARS-CoV-2. The processing of SARS-CoV-2 Sequence Read Archive (SRA) files has been optimized to identify genetic variations in viral samples. This information is then presented in the Variant Call Format (VCF). Each VCF file corresponds to the SRA parent-run's accession ID. Additionally, the data is available in the parquet format, making it easier to search and filter using the Amazon Athena Service. The SARS-CoV-2 Variant Calling Pipeline is designed to handle new data every six hours, with updates to the AWS ODP bucket occurring daily.

  4. Top 50 conserved aging predictive genes.

    • plos.figshare.com
    xlsx
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill (2023). Top 50 conserved aging predictive genes. [Dataset]. http://doi.org/10.1371/journal.pone.0255085.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This table describes whether previous reports exist linking these genes to aging or neurodegeneration phenotypes in Human or another model organism. (XLSX)

  5. f

    Aging correlated genes.

    • plos.figshare.com
    xlsx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill (2023). Aging correlated genes. [Dataset]. http://doi.org/10.1371/journal.pone.0255085.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This table depicts the aging correlated genes for humans and flies sorted according to their correlation coefficient. (XLSX)

  6. e

    Catalog of NCBI sequence read archive (SRA) data for salamanders at the...

    • portal.edirepository.org
    csv
    Updated Apr 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brett Addis; Madaline Cochrane; Winsor Lowe (2024). Catalog of NCBI sequence read archive (SRA) data for salamanders at the Hubbard Brook Experimental Forest 2012-2021 [Dataset]. http://doi.org/10.6073/pasta/6df7199d751ec81315395a042cbd8083
    Explore at:
    csv(312227 byte), csv(220695 byte), csv(282251 byte)Available download formats
    Dataset updated
    Apr 9, 2024
    Dataset provided by
    EDI
    Authors
    Brett Addis; Madaline Cochrane; Winsor Lowe
    Time period covered
    2012 - 2021
    Area covered
    Variables measured
    strain, ecotype, isolate, lat_lon, cultivar, organism, Accession, BioProject, env_medium, sample_URL, and 8 more
    Description

    This project was designed to describe fine-scale population genetic differentiation of the stream salamander Gryinophilus porphyriticus among five study streams in the Hubbard Brook Experimental Forest. The data are paired with intensive capture-recapture data to assess direct fitness effects of individual genetic diversity, including effects of individual multilocus heterozygosity on stage-specific survival probabilities.

       This dataset publishes a manifest of the genomic sequence reads submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). These samples are published at NCBI under the BioProject ID 1090913 (https://www.ncbi.nlm.nih.gov/bioproject/1090913). The tables here include sample metadata and the NCBI URLs to each sample.
    
       These data were gathered as part of the Hubbard Brook Ecosystem Study (HBES). The HBES is a collaborative effort at the Hubbard Brook Experimental Forest, which is operated and maintained by the USDA Forest Service, Northern Research Station.
    
  7. d

    Data relating to RNA sequence accessions at NCBI from Ross Sea...

    • search.dataone.org
    • bco-dmo.org
    • +1more
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca J. Gast (2021). Data relating to RNA sequence accessions at NCBI from Ross Sea Dinoflagellates, Phaeocystis antarctica, Pyramimons tychotreta, and Micromonas polaris (CCMP 2099) (Kleptoplasty project) [Dataset]. https://search.dataone.org/view/http%3A%2F%2Flod.bco-dmo.org%2Fid%2Fdataset%2F728427
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO)
    Authors
    Rebecca J. Gast
    Time period covered
    Dec 1, 1997 - Apr 7, 1998
    Area covered
    Description

    This dataset contains data related to RNA sequence genetic accessions at the National Center for Biotechnology Information (NCBI) including information about the host organism, collection location, and collection date.

    The accessions are the unprocessed Illumina MiSeq reads for the Ross Sea Dinoflagellate RNA-Seq experiments, Phaeocystis antarctica RNA-Seq experiments, and Pyramimons tychotreta & Micromonas polaris (CCMP 2099) mixotrophy experiments.

    Pyramimonas tychotreta & Micromonas polaris (CCMP 2099) mixotrophy RNA sequences are available through the NCBI Sequence Read Archive (SRA) under the SRA accession number SRP090401 (BioProject PRJNA342459)

    Ross Sea Dinoflagellate RNA sequences are available through the NCBI Sequence Read Archive (SRA) under the accession number SRP132912 (BioProject PRJNA428208).

    Phaeocystis antarctica RNA sequences are available through the NCBI Sequence Read Archive (SRA) under the accession number SRP133243 (BioProject PRJNA434497).

  8. Regression tables predicting chronological age.

    • plos.figshare.com
    • figshare.com
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill (2023). Regression tables predicting chronological age. [Dataset]. http://doi.org/10.1371/journal.pone.0255085.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This table depicts the average R2, mean square error, median absolute error and R2 95% confidence interval across 1000 iterations of training/testing predictions. Each row represents a different way to select genetic features for age prediction, where each column represents the metric used for evaluating the effectiveness in predicting aging. (XLSX)

  9. a

    Catalog of GenBank sequence read archive (SRA) entries of metagenomic DNA...

    • arcticdata.io
    • search.dataone.org
    • +1more
    Updated Mar 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beaufort Lagoon Ecosystems LTER; Byron C Crump; Colleen TE Kellogg; Kristina Baker; James W McClelland; Kenneth H Dunton (2023). Catalog of GenBank sequence read archive (SRA) entries of metagenomic DNA sequence analyses of bacterial and archaeal water column communities along the Eastern Beaufort Sea coast, North Slope, Alaska, 2012 [Dataset]. https://arcticdata.io/catalog/view/https%3A%2F%2Fpasta.lternet.edu%2Fpackage%2Fmetadata%2Feml%2Fknb-lter-ble%2F19%2F1
    Explore at:
    Dataset updated
    Mar 21, 2023
    Dataset provided by
    Arctic Data Center
    Authors
    Beaufort Lagoon Ecosystems LTER; Byron C Crump; Colleen TE Kellogg; Kristina Baker; James W McClelland; Kenneth H Dunton
    Time period covered
    Apr 17, 2012 - Aug 15, 2012
    Area covered
    Variables measured
    run, bases, depth, bytes_b, latitude, run_link, biosample, env_biome, longitude, site_name, and 14 more
    Description

    In contrast to temperate systems, Arctic lagoons that span the Alaska Beaufort Sea coast face extreme seasonality. Nine months of ice cover up to ∼1.7 m thick is followed by a spring thaw that introduces an enormous pulse of freshwater, nutrients, and organic matter into these lagoons over a relatively brief 2–3 week period. Prokaryotic communities link these subsidies to lagoon food webs through nutrient uptake, heterotrophic production, and other biogeochemical processes, but little is known about how the genomic capabilities of these communities respond to seasonal variability. This study characterizes the metabolic capabilities of microbial communities across three seasons in two lagoons and one open coastal site along the eastern Alaska Beaufort Sea coast. We used metagenomic DNA sequence data of bacterial and archaeal water column communities to identify genes of relevant biogeochemical pathways. This data package catalogs sequence read archive (SRA) entries available through GenBank BioProject PRJNA642637 at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA642637. This data package is associated with the following publication: Baker, Kristina D., Colleen T. E. Kellogg, James W. McClelland, Kenneth H. Dunton, and Byron C. Crump. “The Genomic Capabilities of Microbial Communities Track Seasonal Variation in Environmental Conditions of Arctic Lagoons.” Frontiers in Microbiology 12 (2021). https://doi.org/10.3389/fmicb.2021.601901. Environmental variables (physiochemical data from YSI and HOBO data loggers, as well as organic matter analysis and stable isotope data from discrete water samples) associated with this genomic dataset are available from the Arctic Data Center: Kenneth Dunton, Byron Crump, and James McClelland. Physical, chemical, and biological data from lagoons and open coastal waters in the nearshore environment of the eastern Alaska Beaufort Sea, 2011-2013. Arctic Data Center. doi:10.18739/A2DG13. To join the two datasets together, please use the provided site codes (column "site_name" here) and collection dates (column "collection_date" here) in each dataset. Instead of citing this package, which is a catalog, please cite the original GenBank data, journal article, or related Arctic Data Center dataset as appropriate. Citation guidance for the journal article and related Arctic Data Center dataset is available on the respective publishers' websites.

  10. d

    Whole genome sequencing of three North American large-bodied birds

    • datasets.ai
    • data.usgs.gov
    • +1more
    55
    Updated Sep 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of the Interior (2024). Whole genome sequencing of three North American large-bodied birds [Dataset]. https://datasets.ai/datasets/whole-genome-sequencing-of-three-north-american-large-bodied-birds
    Explore at:
    55Available download formats
    Dataset updated
    Sep 11, 2024
    Dataset authored and provided by
    Department of the Interior
    Description

    The data release details the samples, methods, and raw data used to generate high-quality genome assemblies for greater sage-grouse (Centrocercus urophasianus), white-tailed ptarmigan (Lagopus leucura), and trumpeter swan (Cygnus buccinator). The raw data have been deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI), the authoritative repository for public biological sequence data, and are not included in this data release. Instead, the accessions that link to those data via the NCBI portal (www.ncbi.nlm.nih.gov) are provided herein. The release consists of a single file, sample.metadata.txt, which maps NCBI accessions to the samples sequenced and the different types of sequencing performed to generate the assemblies and annotate their gene features.

  11. Genome assemblies and respective wg/cgMLST profiles of a diverse dataset...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Jul 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges (2023). Genome assemblies and respective wg/cgMLST profiles of a diverse dataset comprising 1,999 Escherichia coli isolates [Dataset]. http://doi.org/10.5281/zenodo.7120058
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies and respective 7,601-loci whole-genome (wg) Multiple Locus Sequence Type (MLST) profiles [INNUENDO schema (Llarena et al. 2018) available in chewie-NS (Mamede et al. 2022)] of a final set of 1,999 Escherichia coli samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) at the beginning of the analysis (November 2021). This set of samples was carefully selected to cover a wide genetic diversity (assessed in terms of serotype). In total, 411 different serotypes are represented in this dataset, with O157:H7 being the most represented one, corresponding to 37.1% of the dataset.

    File “Ec_metadata.xlsx” contains metadata information for each isolate, including ENA/SRA accession number, BioProject and in-silico MLST ST and serotype.

    The directory “assemblies/” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    The file “profiles/Ec_profiles_wgMLST.tsv” corresponds to a tab separated file with the 7,601-loci wgMLST profiles of each isolate presented in the metadata file. The files “profiles/Ec_profiles_cgMLST_95.tsv”, “profiles/Ec_profiles_cgMLST_98.tsv” and “profiles/Ec_profiles_cgMLST_100.tsv” correspond to a 2,826-loci, 2,704-loci and 465-loci cgMLST profiles of each isolate presented in the metadata file, respectively. These profiles were determined as explained below.

    Dataset selection and curation

    With the objective of creating a diverse dataset of E. coli genome assemblies, we collected information about the genetic diversity (serotype) of the isolates available at Enterobase database in the beginning of this analysis (November 2021) and in other previous works. Based on this information, we selected an initial dataset comprising 2,688 samples associated with three BioProjects (PRJNA230969, PRJEB27020 and PRJNA248042). Their WGS data was downloaded from ENA/SRA with fastq-dl v1.0.6. Read quality control, trimming and assembly were performed with the Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,999 isolates passed this curation step and were included in the final dataset. In-silico serotyping was performed with seq_typing v2.2. wgMLST profiles of each of these isolates were determined with chewBBACA v2.8.5 (Silva et al. 2018), using the 7,601-loci INNUENDO schema available in chewie-NS (Llarena et al. 2018; Mamede et al. 2022) and downloaded on May 31st, 2022. Three cgMLST schemas were obtained with ReporTree v1.0.0 (Mixão et al. 2022) using the 7,601-loci wgMLST profiles of the 1,999 isolates as input and setting distinct “--site-inclusion” thresholds: 0.95, 0.98 and 1.0 (i.e., keep schema loci called in at least 95%, 98% and 100% of the samples, resulting in a 2,826-loci, 2,704-loci and 465-loci allelic matrices, respectively).

  12. Sequencing Data Set of Sediment Layers

    • s.cnmilf.com
    Updated May 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Sequencing Data Set of Sediment Layers [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/sequencing-data-set-of-sediment-layers
    Explore at:
    Dataset updated
    May 17, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    A table (DP_SRA.xlsx) contains rows as sample and columns as entries representing the biosample accession number (NCBI), collection (date), library strategy, target (source), and sequencing (technology) for each individual sample. The zip file (Genome_Set01.zip) contain nine (9) fasta file (DP_bin_02.fasta, DP_bin_04.fasta, DP_bin_09.fasta, DP_bin_10.fasta, DP_bin_14.fasta, DP_bin_15.fasta, DP_bin_16a.fasta, DP_bin_20.fasta, DP_bin_23.fasta) with the contig sequences (i.e. binning) for each metagenome-assembled genomes (MAGs). These data are available from the NCBI Sequence Read Archive (SRA) under the BioProject (https://www.ncbi.nlm.nih.gov/bioproject) with accession number PRJNA646252 and the following BioSample numbers: SAMN15536103 to SAMN15536108. This dataset is associated with the following publication: Gomez-Alvarez, V., H. Liu, J. Pressman, and D. Wahman. Metagenomic Profile of Microbial Communities in a Drinking Water Storage Tank Sediment after Sequential Exposure to Monochloramine, Free Chlorine, and Monochloramine. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 1(5): 1283-1294, (2021).

  13. s

    Raw data of C and N measurements of agricultural soils in Rostock and...

    • repository.soilwise-he.eu
    • soilwise-he.containers.wur.nl
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raw data of C and N measurements of agricultural soils in Rostock and Freising in 2015 and links to the relevant metagenomic sequencing data in the NCBI Sequence Read Archive [Dataset]. https://repository.soilwise-he.eu/cat/collections/metadata:main/items/a121feba-bcf7-4f35-ab18-e963ebc0b7c2
    Explore at:
    Description

    The availability of phosphorus (P), strongly influences crop yield and quality. However, due to agricultural practices P accumulated in soil, mostly in inaccessible forms. Bacteria play an important role to mobilize P. The release of P is rather a result of the bacterial need for C and N than the immediate need of P. Thus, we postulated that the addition of carbon and N would stimulate phosphorus mobilization by bacteria. Thus, we performed a metagenomic study to investigate soils from two agricultural sites (Rostock, Freising), which only received mineral N fertilizer or mineral N and organic fertilizer for more than 20 years. The metagenomic sequencing followed by taxonomic and functional annotations of the sequences by blasting against the NCBI-nr database (http://ftp.ncbi.nlm.nih.gov/blast/ db/FASTA/nr.gz) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (June 2011) also revealed that independent of site and season, the relative abundance of genes involved in P turnover was not significantly affected by the addition of fertilizers. However, the type of fertilization had a significant impact on the composition of bacterial families harboring genes coding for the different P transformation processes. This gives rise to the possibility that fertilizers can substantially change phosphorus turnover efficiency by favoring different families. Additionally, none of the families involved in phosphorus turnover covered all investigated processes. Therefore, promoting bacteria which play an essential role specifically in mobilization of hardly accessible phosphorus could help to secure the phosphorus supply of plants in soils with low P input as so far the most abundant genes involved in the acquisition of external P sources in our study were those involved in solubilization and subsequent uptake of inorganic phosphorus. The raw sequencing data is available at the sequencing read archive (SRA) under the BioProject ID PRJNA385596 (SAMN06894543- SAMN06894566). Additionally, we determined dissolved organic nitrogen (DON) and carbon (DOC) contents by extracting the soil with 0.01 M CaCl2 solution (soil to liquid ratio: 1:4) and the microbial biomass carbon (Cmic) and nitrogen (Nmic) content by applying a chloroform-fumigation-extraction procedure. Our data indicate that more the site then the treatment changed those values as stability of Cmic, Nmic as well as DOC and DON was high across the different fertilizer regimes. Only additional P fertilization slightly increased DOC values. Data are published in Grafe, M., Goers, M., von Tucher, S., Baum, C., Zimmer, D., Leinweber, P., Vestergaard, G., Kublik, S., Schloter, M., and Schulz, S.: Bacterial potentials for uptake, solubilization and mineralization of extracellular phosphorus in agricultural soils are highly stable under different fertilization regimes, Environ. Microbiol. Rep., 10, 320-327, https://doi.org/10.1111/1758-2229.12651

  14. Z

    Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621
    Explore at:
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    Hsu, Jonathan
    Stoop, Allart
    Description

    Table of Contents

    Main Description File Descriptions Linked Files Installation and Instructions

    1. Main Description

    This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

    Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

    File Descriptions

    The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

    Linked Files

    This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

    Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

    Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

    Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

    Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

    Installation and Instructions

    The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

    Ensure you have R version 4.1.2 or higher for compatibility.

    Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

    1. Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).
    2. Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.
    3. Set your working directory to where the following files are located:

    marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

    You can use the following code to set the working directory in R:

    setwd(directory)

    1. Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.
    2. Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.
    3. Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.
    4. Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
  15. Novel 22 conserved aging predictive genes.

    • figshare.com
    xlsx
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill (2023). Novel 22 conserved aging predictive genes. [Dataset]. http://doi.org/10.1371/journal.pone.0255085.s005
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This table describes previous literature of listed genes, along with references. (XLSX)

  16. e

    Catalog of GenBank sequence read archive (SRA) entries of 16S and 18S rRNA...

    • portal.edirepository.org
    csv
    Updated 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colleen TE Kellogg; James W McClelland; Kenneth H Dunton; Byron C Crump (2020). Catalog of GenBank sequence read archive (SRA) entries of 16S and 18S rRNA genes from bacterial and protistan planktonic communities along the Eastern Beaufort Sea coast, North Slope, Alaska, 2011-2013 [Dataset]. http://doi.org/10.6073/pasta/0e4d75453560ab5c085c9b547be68731
    Explore at:
    csv(563446 byte)Available download formats
    Dataset updated
    2020
    Dataset provided by
    EDI
    Authors
    Colleen TE Kellogg; James W McClelland; Kenneth H Dunton; Byron C Crump
    License

    https://spdx.org/licenses/CC0-1.0https://spdx.org/licenses/CC0-1.0

    Time period covered
    Aug 7, 2011 - Aug 14, 2013
    Area covered
    Variables measured
    run, bases, bytes, depth, latitude, organism, run_link, biosample, env_biome, longitude, and 23 more
    Description

    Microbial communities in the coastal Arctic Ocean experience extreme variability in organic matter and inorganic nutrients driven by seasonal shifts in sea ice extent and freshwater inputs. Lagoons border more than half of the Beaufort Sea coast and provide important habitats for migratory fish and seabirds; yet, little is known about the planktonic food webs supporting these higher trophic levels. To investigate seasonal changes in bacterial and protistan planktonic communities, amplicon sequences of 16S and 18S rRNA genes were generated from samples collected during periods of ice-cover (April), ice break-up (June), and open water (August) from shallow lagoons along the eastern Alaska Beaufort Sea coast from 2011 through 2013.

    This data package catalogs sequence read archive (SRA) entries available through GenBank BioProject PRJNA530074 at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA530074. This data package is associated with the following publication:

    Kellogg CTE, McClelland JW, Dunton KH and Crump BC (2019) Strong Seasonality in Arctic Estuarine Microbial Food Webs. Front. Microbiol. 10:2628. doi: 10.3389/fmicb.2019.02628

    Environmental variables (physiochemical data from YSI and HOBO data loggers, as well as organic matter analysis and stable isotope data from discrete water samples) associated with this genomic dataset are available from the Arctic Data Center:

    Kenneth Dunton, Byron Crump, and James McClelland. Physical, chemical, and biological data from lagoons and open coastal waters in the nearshore environment of the eastern Alaska Beaufort Sea, 2011-2013. Arctic Data Center. doi:10.18739/A2DG13.

    To join the two datasets together, please use the provided site codes (column "site_name" here) and collection dates (column "collection_date" here) in each dataset. Note that the site codes in this package are without hyphens (e.g. JAA) while site codes in the above environmental data package have hyphens (e.g. JA-A).

    Instead of citing this package which is just a catalog, please cite the original GenBank data, journal article, or related Arctic Data Center dataset as appropriate. Citation guidance for the journal article and related Arctic Data Center dataset is available on the respective publishers' websites.

  17. d

    Sequencing Data Set of Sediment Layers

    • datasets.ai
    • catalog.data.gov
    0, 53, 57
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency (2024). Sequencing Data Set of Sediment Layers [Dataset]. https://datasets.ai/datasets/sequencing-data-set-of-sediment-layers
    Explore at:
    57, 53, 0Available download formats
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    U.S. Environmental Protection Agency
    Description

    A table (DP_SRA.xlsx) contains rows as sample and columns as entries representing the biosample accession number (NCBI), collection (date), library strategy, target (source), and sequencing (technology) for each individual sample.

    The zip file (Genome_Set01.zip) contain nine (9) fasta file (DP_bin_02.fasta, DP_bin_04.fasta, DP_bin_09.fasta, DP_bin_10.fasta, DP_bin_14.fasta, DP_bin_15.fasta, DP_bin_16a.fasta, DP_bin_20.fasta, DP_bin_23.fasta) with the contig sequences (i.e. binning) for each metagenome-assembled genomes (MAGs).

    These data are available from the NCBI Sequence Read Archive (SRA) under the BioProject (https://www.ncbi.nlm.nih.gov/bioproject) with accession number PRJNA646252 and the following BioSample numbers: SAMN15536103 to SAMN15536108.

    This dataset is associated with the following publication: Gomez-Alvarez, V., H. Liu, J. Pressman, and D. Wahman. Metagenomic Profile of Microbial Communities in a Drinking Water Storage Tank Sediment after Sequential Exposure to Monochloramine, Free Chlorine, and Monochloramine. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 1(5): 1283-1294, (2021).

  18. d

    Chromosome assembly and preliminary gene and repeat annotations for Myzomela...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elsie Shogren; Jason Sardell; Christina Muirhead; Emiliano MartÃ; Elizabeth Cooper; Robert Moyle; Daven Presgraves; Albert J. Uy (2024). Chromosome assembly and preliminary gene and repeat annotations for Myzomela tristrami reference genome [Dataset]. http://doi.org/10.5061/dryad.612jm64c9
    Explore at:
    Dataset updated
    Jul 28, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Elsie Shogren; Jason Sardell; Christina Muirhead; Emiliano Martí; Elizabeth Cooper; Robert Moyle; Daven Presgraves; Albert J. Uy
    Description

    Secondary contact between closely related taxa represents a “moment of truth†for speciation — an opportunity to test the efficacy of reproductive isolation that evolved in allopatry and to identify the genetic, behavioral, and/or ecological barriers that separate species in sympatry. Sex chromosomes are known to rapidly accumulate differences between species, an effect that may be exacerbated for neo-sex chromosomes that are transitioning from autosomal to sex-specific inheritance. Here we report that, in the Solomon Islands, two closely related bird species in the honeyeater family — Myzomela cardinalis and Myzomela tristrami — carry neo-sex chromosomes and have come into recent secondary contact after ~1.1 my of geographic isolation. Hybrids of the two species were first observed in sympatry ~100 years ago. To determine the genetic consequences of hybridization, we use population genomic analyses of individuals sampled in allopatry and in sympatry to characterize gene flow in the con..., This data repository contains Myzomela tristrami reference genome files. The sequences associated with this assembly are available on NCBI sequence read archive at https://www.ncbi.nlm.nih.gov/sra/?term=SRA%20SRR29254783. We sequenced a M. tristrami female at the University of Delaware DNA sequencing & Genotyping Cener. HiFi libraries were prepared with SMRTbell prep kit, followed by Blue Pippin size selection (15-20Kbp) before sequencing on a PacBio Sequel IIe. We generated a de novo assembly using hifiasm v0.13-r308 with default parameters using the resulting long reads (Cheng et al. 2021, 2022). We used GeMoMa (v1.8) and the annotation from zebra finch genome bTaeGut1.4.pri to infer a rough annotation of genes in the Myzomela genome. We then used these rough annotations, comparing contigs against both zebra finch and the chicken genome bGalGal1.mat.broiler.GRCg7b to infer synteny relationships, remove duplicate haplotigs, and, finally, scaffold contigs into chromosomes in Myzomel..., , # Chromosome assembly and preliminary gene and repeat annotations for Myzomela tristrami reference genome I. Files (GENOME) Mt_v1.0_MAIN.fa.gz Primary genome, (largely) scaffolded to chromosome-level, plus other primary assembled contigs Mt_v1.0_MAIN.gff.gz Simple gene annotations for primary genome, annotated using GeMoMa v1.8 and a zebra finch (bTaeGut1.4.pri) annotation reference Mt_v1.0_extra.fa.gz Additional contigs, not for use in most analyses but some may be of interest This set is a combination of hand-identified haplotigs of the main genome, and assembler-identified "alternate" (haplotig) contigs (ORIGINAL_ASSEMBLY_CONTIGS) Mt_hifi.asm.p.fa.gz "primary" assembly contigs, output from hifiasm (v0.13-r308) Mt_hifi.asm.a.fa.gz "alternate" assembly contigs, output from hifiasm (v0.13-r308) (REPEAT_MASKING) TElib_Myzo_preliminary.fa.gz Preliminary Myzomela-tuned TE/repeat library, generated using RepeatModeler (v.2) Mt_v1.0_MAIN_RM_sites_to_filter.txt List of sites masked by RepeatM...

  19. Genome assemblies and respective cgMLST profiles of a diverse dataset...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges (2023). Genome assemblies and respective cgMLST profiles of a diverse dataset comprising 1,874 Listeria monocytogenes isolates [Dataset]. http://doi.org/10.5281/zenodo.7116879
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies and respective 1,748-loci core-genome (cg) Multiple Locus Sequence Type (MLST) profiles [Pasteur schema (Moura et al. 2016) available in chewie-NS (Mamede et al. 2022)] of a final set of 1,874 Listeria monocytogenes samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) at the beginning of the analysis (November 2021). This set of samples was carefully selected to cover a wide genetic diversity (assessed in terms of Sequence Type [ST]). In total, 204 different STs are represented in this dataset, with ST121, ST6, ST9, ST1 and ST155 being in the top 5 and, together, corresponding to 37.9% of the dataset.

    File “Lm_metadata.xlsx” contains metadata information for each isolate, including ENA/SRA accession number, BioProject and in-silico MLST ST.

    The directory “assemblies/” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    The file “profiles/Lm_profile.tsv” corresponds to a tab separated file with the 1,748-loci cgMLST profile of each isolate presented in the metadata file. These profiles were determined as explained below.

    Dataset selection and curation

    With the objective of creating a diverse dataset of L. monocytogenes genome assemblies, we collected information about the genetic diversity (STs) of the isolates available at BIGSdb-Lm database in the beginning of this analysis (November 2021) and in other previous works. Based on this information, we selected an initial dataset comprising 1,957 samples associated with three previous studies (Moura et al. 2016; Maury et al. 2017; Painset et al. 2019). Their WGS data was downloaded from ENA/SRA with fastq-dl v1.0.6. Read quality control, trimming and assembly were performed with the Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,874 isolates passed the dataset curation step and were included in the final dataset. cgMLST profiles of each of these isolates were determined with chewBBACA v2.8.5 (Silva et al. 2018), using the 1,748-loci Pasteur schema (Moura et al. 2016) available in chewie-NS (Mamede et al. 2022) and downloaded on June 23rd, 2022.

  20. d

    Data from: Metagenomic and near full-length 16S rRNA sequence data in...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers [Dataset]. https://catalog.data.gov/dataset/data-from-metagenomic-and-near-full-length-16s-rrna-sequence-data-in-support-of-the-phylog-07c7d
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Amplicon sequencing utilizing next-generation platforms has significantly transformed how research is conducted, specifically microbial ecology. However, primer and sequencing platform biases can confound or change the way scientists interpret these data. The Pacific Biosciences RSII instrument may also preferentially load smaller fragments, which may also be a function of PCR product exhaustion during sequencing. To further examine theses biases, data is provided from 16S rRNA rumen community analyses. Specifically, data from the relative phylum-level abundances for the ruminal bacterial community are provided to determine between-sample variability. Direct sequencing of metagenomic DNA was conducted to circumvent primer-associated biases in 16S rRNA reads and rarefaction curves were generated to demonstrate adequate coverage of each amplicon. PCR products were also subjected to reduced amplification and pooling to reduce the likelihood of PCR product exhaustion during sequencing on the Pacific Biosciences platform. The taxonomic profiles for the relative phylum-level and genus-level abundance of rumen microbiota as a function of PCR pooling for sequencing on the Pacific Biosciences RSII platform were provided. Data is within this article and raw ruminal MiSeq sequence data is available from the NCBI Sequence Read Archive (SRA Accession SRP047292). Additional descriptive information is associated with NCBI BioProject PRJNA261425. http://www.ncbi.nlm.nih.gov/bioproject/PRJNA261425/ Resources in this dataset:Resource Title: NCBI Sequence Read Archive (SRA Accession SRP047292). File Name: Web Page, url: https://www.ncbi.nlm.nih.gov/sra/SRX704260 1 ILLUMINA (Illumina MiSeq) run: 978,195 spots, 532.9M bases, 311.6Mb downloads.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Library of Medicine (2025). Sequence Read Archive (SRA) [Dataset]. https://catalog.data.gov/dataset/sequence-read-archive-sra-54e4a

Sequence Read Archive (SRA)

Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description

The Sequence Read Archive (SRA) stores sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome Analyzer®, Life Technologies AB SOLiD System®, Helicos Biosciences Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.

Search
Clear search
Close search
Google apps
Main menu