98 datasets found

d
Sequence Read Archive (SRA)
catalog.data.gov
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). Sequence Read Archive (SRA) [Dataset]. https://catalog.data.gov/dataset/sequence-read-archive-sra-54e4a
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description
The Sequence Read Archive (SRA) stores sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome Analyzer®, Life Technologies AB SOLiD System®, Helicos Biosciences Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.
f
Species included in the analysis, including environment (freshwater [FW] or...
datasetcatalog.nlm.nih.gov
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Barts, Nick; Wilson, Elizabeth J.; Tobler, Michael; Greenway, Ryan; Coffin, John L.; Johnson, James B.; Kelley, Joanna L.; Peña, Carlos M. Rodríguez (2024). Species included in the analysis, including environment (freshwater [FW] or saltwater [SW]), collection location, sample size (N), NCBI Sequence Read Archive (SRA) accession numbers, and study reference. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001362851
Explore at:
Dataset updated
Dec 5, 2024
Authors
Barts, Nick; Wilson, Elizabeth J.; Tobler, Michael; Greenway, Ryan; Coffin, John L.; Johnson, James B.; Kelley, Joanna L.; Peña, Carlos M. Rodríguez
Description
Species included in the analysis, including environment (freshwater [FW] or saltwater [SW]), collection location, sample size (N), NCBI Sequence Read Archive (SRA) accession numbers, and study reference.
Top 50 conserved aging predictive genes.
plos.figshare.com
xlsx
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill (2023). Top 50 conserved aging predictive genes. [Dataset]. http://doi.org/10.1371/journal.pone.0255085.s004
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0255085.s004
Dataset updated
Jun 9, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This table describes whether previous reports exist linking these genes to aging or neurodegeneration phenotypes in Human or another model organism. (XLSX)
z
Genome assemblies and respective wg/cgMLST profiles of a diverse dataset...
zenodo.org
xlsx, zip
Updated Sep 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges (2022). Genome assemblies and respective wg/cgMLST profiles of a diverse dataset comprising 1,434 Salmonella enterica isolates [Dataset]. http://doi.org/10.5281/zenodo.7230091
Explore at:
zip, xlsxAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7230091
Dataset updated
Sep 28, 2022
Dataset provided by
Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal
Department Biological Safety, German Federal Institute for Risk Assessment, Berlin, Germany
Authors
Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset

This dataset comprises the genome assemblies and respective 8,558-loci whole-genome (wg) Multiple Locus Sequence Type (MLST) profiles [INNUENDO schema (Llarena et al. 2018) available in chewie-NS (Mamede et al. 2022)] of a final set of 1,434 Salmonella enterica samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) at the beginning of the analysis (November 2021). This set of samples was carefully selected to cover a wide genetic diversity (assessed in terms of serotype). In total, 125 different serotypes are represented in this dataset, with Typhimurium (including monophasic), Enteritidis and Infantis being the most represented ones and, together, corresponding to 56.2% of the dataset.

File “Se_metadata.xlsx” contains metadata information for each isolate, including ENA/SRA accession number, BioProject and in-silico MLST ST and serotype.

The directory “assemblies/” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

The file “profiles/Se_profiles_wgMLST.tsv” corresponds to a tab separated file with the 8,558-loci wgMLST profiles of each isolate presented in the metadata file. The files “profiles/Se_profiles_cgMLST_95.tsv”, “profiles/Se_profiles_cgMLST_98.tsv” and “profiles/Se_profiles_cgMLST_100.tsv” correspond to a 3,261-loci, 3,179-loci and 874-loci cgMLST profiles of each isolate presented in the metadata file, respectively. These profiles were determined as explained below.

Dataset selection and curation

With the objective of creating a diverse dataset of S. enterica genome assemblies, we collected information about the genetic diversity (serotype) of the isolates available at Enterobase database in the beginning of this analysis (November 2021) and in other previous works. Based on this information, we selected an initial dataset comprising 1,779 samples associated with four BioProjects (PRJEB16326, PRJEB20997, PRJEB30335 and PRJEB39988). Their WGS data was downloaded from ENA/SRA with fastq-dl v1.0.6. Read quality control, trimming and assembly were performed with the Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,434 isolates passed this curation step and were included in the final dataset. In-silico serotyping was performed with SeqSero2 v1.2.1 (Zhang et al. 2019). wgMLST profiles of each of these isolates were determined with chewBBACA v2.8.5 (Silva et al. 2018), using the 8,558-loci INNUENDO schema available in chewie-NS (Llarena et al. 2018; Mamede et al. 2022) and downloaded on May 31^st, 2022. Three cgMLST schemas were obtained with ReporTree v1.0.0 (Mixão et al. 2022) using the 8,558-loci wgMLST profiles of the 1,434 isolates as input and setting distinct “--site-inclusion” thresholds: 0.95, 0.98 and 1.0 (i.e., keep schema loci called in at least 95%, 98% and 100% of the samples, resulting in a 3,261-loci, 3,179-loci and 874-loci allelic matrices, respectively).
Sequencing Data Set of Sediment Layers
catalog.data.gov
Updated May 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Sequencing Data Set of Sediment Layers [Dataset]. https://catalog.data.gov/dataset/sequencing-data-set-of-sediment-layers
Explore at:
Dataset updated
May 17, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
A table (DP_SRA.xlsx) contains rows as sample and columns as entries representing the biosample accession number (NCBI), collection (date), library strategy, target (source), and sequencing (technology) for each individual sample. The zip file (Genome_Set01.zip) contain nine (9) fasta file (DP_bin_02.fasta, DP_bin_04.fasta, DP_bin_09.fasta, DP_bin_10.fasta, DP_bin_14.fasta, DP_bin_15.fasta, DP_bin_16a.fasta, DP_bin_20.fasta, DP_bin_23.fasta) with the contig sequences (i.e. binning) for each metagenome-assembled genomes (MAGs). These data are available from the NCBI Sequence Read Archive (SRA) under the BioProject (https://www.ncbi.nlm.nih.gov/bioproject) with accession number PRJNA646252 and the following BioSample numbers: SAMN15536103 to SAMN15536108. This dataset is associated with the following publication: Gomez-Alvarez, V., H. Liu, J. Pressman, and D. Wahman. Metagenomic Profile of Microbial Communities in a Drinking Water Storage Tank Sediment after Sequential Exposure to Monochloramine, Free Chlorine, and Monochloramine. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 1(5): 1283-1294, (2021).
Genome assemblies and respective wg/cgMLST profiles of a diverse dataset...
zenodo.org
bin, zip
Updated Jul 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verónica Mixão; Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges (2023). Genome assemblies and respective wg/cgMLST profiles of a diverse dataset comprising 1,999 Escherichia coli isolates [Dataset]. http://doi.org/10.5281/zenodo.7230102
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7230102
Dataset updated
Jul 24, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Verónica Mixão; Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset

This dataset comprises the genome assemblies and respective 7,601-loci whole-genome (wg) Multiple Locus Sequence Type (MLST) profiles [INNUENDO schema (Llarena et al. 2018) available in chewie-NS (Mamede et al. 2022)] of a final set of 1,999 Escherichia coli samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) at the beginning of the analysis (November 2021). This set of samples was carefully selected to cover a wide genetic diversity (assessed in terms of serotype). In total, 411 different serotypes are represented in this dataset, with O157:H7 being the most represented one, corresponding to 37.1% of the dataset.

File “Ec_metadata.xlsx” contains metadata information for each isolate, including ENA/SRA accession number, BioProject and in-silico MLST ST and serotype.

The directory “assemblies/” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

The file “profiles/Ec_profiles_wgMLST.tsv” corresponds to a tab separated file with the 7,601-loci wgMLST profiles of each isolate presented in the metadata file. The files “profiles/Ec_profiles_cgMLST_95.tsv”, “profiles/Ec_profiles_cgMLST_98.tsv” and “profiles/Ec_profiles_cgMLST_100.tsv” correspond to a 2,826-loci, 2,704-loci and 465-loci cgMLST profiles of each isolate presented in the metadata file, respectively. These profiles were determined as explained below.

Dataset selection and curation

With the objective of creating a diverse dataset of E. coli genome assemblies, we collected information about the genetic diversity (serotype) of the isolates available at Enterobase database in the beginning of this analysis (November 2021) and in other previous works. Based on this information, we selected an initial dataset comprising 2,688 samples associated with three BioProjects (PRJNA230969, PRJEB27020 and PRJNA248042). Their WGS data was downloaded from ENA/SRA with fastq-dl v1.0.6. Read quality control, trimming and assembly were performed with the Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,999 isolates passed this curation step and were included in the final dataset. In-silico serotyping was performed with seq_typing v2.2. wgMLST profiles of each of these isolates were determined with chewBBACA v2.8.5 (Silva et al. 2018), using the 7,601-loci INNUENDO schema available in chewie-NS (Llarena et al. 2018; Mamede et al. 2022) and downloaded on May 31st, 2022. Three cgMLST schemas were obtained with ReporTree v1.0.0 (Mixão et al. 2022) using the 7,601-loci wgMLST profiles of the 1,999 isolates as input and setting distinct “--site-inclusion” thresholds: 0.95, 0.98 and 1.0 (i.e., keep schema loci called in at least 95%, 98% and 100% of the samples, resulting in a 2,826-loci, 2,704-loci and 465-loci allelic matrices, respectively).

Acknowledgements

We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.
Z
Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset
data.niaid.nih.gov
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hsu, Jonathan; Stoop, Allart (2023). Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011621
Explore at:
Dataset updated
Nov 20, 2023
Authors
Hsu, Jonathan; Stoop, Allart
Description
Table of Contents

Main Description File Descriptions Linked Files Installation and Instructions

1. Main Description

This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data. The following libraries are required for script execution:

Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap

File Descriptions

The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.

Linked Files

This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:

Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719

Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).

Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)

Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.

Installation and Instructions

The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:

Ensure you have R version 4.1.2 or higher for compatibility.

Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.

Download the *"Rdata" or ".Rds" file from Zenodo (https://zenodo.org/record/7566113#.ZCcmvC2cbrJ) (Zenodo DOI: 10.5281/zenodo.7566113).

Open R-Studios (https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R.

Set your working directory to where the following files are located:

marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt

You can use the following code to set the working directory in R:

setwd(directory)

Open the file titled "Install_Packages.R" and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies in order to set up an environment where the code in "marengo_code_for_paper_jan_2023.R" can be executed.

Once the "Install_Packages.R" script has been successfully executed, re-start R-Studios or your IDE of choice.

Open the file "marengo_code_for_paper_jan_2023.R" file in R-studios or your IDE of choice.

Execute commands in the file titled "marengo_code_for_paper_jan_2023.R" in R-Studios or your IDE of choice to generate the plots.
M
Bioinformatics Services Market Grows from USD 2.9 Billion to 10.7 Billion by...
media.market.us
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market.us Media (2025). Bioinformatics Services Market Grows from USD 2.9 Billion to 10.7 Billion by 2033 [Dataset]. https://media.market.us/bioinformatics-services-market-news-2025/
Explore at:
Dataset updated
Oct 8, 2025
Dataset authored and provided by
Market.us Media
License
https://media.market.us/privacy-policyhttps://media.market.us/privacy-policy
Time period covered
2022 - 2032
Description
Overview

The Global Bioinformatics Services Market is projected to reach USD 10.7 billion by 2033, growing from USD 2.9 billion in 2023 at a CAGR of 13.9%. Growth is being driven by the rapid expansion of genomic and health data generation across research institutions, healthcare systems, and public-health agencies. The World Health Organizationâ€™s Global Genomic Surveillance Strategy has positioned bioinformatics as a core element in detecting and responding to health threats. This policy direction is reinforcing global demand for scalable analytical platforms, secure data sharing, and sustainable workflow solutions.

A fundamental growth catalyst is the declining cost of sequencing. According to the U.S. National Human Genome Research Institute, the cost per genome has decreased sharply since the late 2000s. As sequencing becomes more affordable, the number of samples increases, driving demand for downstream data storage, processing, and interpretation. Consequently, outsourcing bioinformatics tasks to specialized service providers has become more common and cost-effective.

Another major factor supporting market expansion is the rise in publicly available genomic data. The NIH Sequence Read Archive (SRA) surpassed 50 petabases of data by early 2024, requiring large-scale indexing, quality control, and reanalysis. This massive data load necessitates professional expertise and infrastructure, which are primarily offered by bioinformatics service companies.

The integration of genomics into healthcare systems is further strengthening market growth. The NHS Genomic Medicine Service in England is expanding clinical genomics applications in oncology and rare disease management. This transition creates sustained demand for validated bioinformatics pipelines, variant curation, and clinical reporting services. Healthcare institutions increasingly depend on external service providers for secure, clinical-grade analysis pipelines and data governance compliance, ensuring both accuracy and confidentiality in genomic interpretation.

Emerging Opportunities and Regional Investments

Public health initiatives and global investments are enhancing the bioinformatics services landscape. Programs like the U.S. CDCâ€™s Advanced Molecular Detection and ECDCâ€™s sequencing integration are driving large-scale genomic surveillance. These initiatives require ongoing analysis, pipeline standardization, and data-platform management, which are largely delivered through external service providers. As countries institutionalize sequencing, recurring demand for bioinformatics workflows and analytic services is expected to persist.

In low- and middle-income countries, international investment is expanding market opportunities. The World Bankâ€™s genomic capacity-building programs in Africa are fostering sequencing and analytics infrastructure. These efforts include bioinformatics training and workflow design, ensuring long-term sustainability. Such projects significantly widen the global serviceable market for bioinformatics expertise. Similarly, large-scale national genomic initiatives like the NIH All of Us program generate billions of variants that require harmonization, annotation, and interpretation, sustaining demand for cloud-based data management and analytic platforms.

The growing focus on antimicrobial resistance (AMR) is also fueling bioinformatics adoption. Under WHOâ€™s GLASS platform, countries are integrating whole-genome sequencing into AMR surveillance. This expansion is creating consistent demand for quality assurance, centralized analysis hubs, and workflow optimization. Furthermore, data governance reforms by the OECD and other regulatory bodies are facilitating secure secondary use of genomic data, promoting trust in data sharing and collaboration.

Strategic public funding further strengthens the market outlook. Horizon Europeâ€™s Health Work Programme (2025) and NHGRIâ€™s technology initiatives continue to fund large-scale, data-driven research, ensuring a steady flow of contracts for bioinformatics firms. Workforce development is also improving, with national systems such as NHS England expanding bioinformatics training. This capacity building not only supports in-house analytics but also increases outsourcing to handle peak workloads and specialized computational tasks.

In conclusion, the bioinformatics services market is benefiting from multiple converging factorsâ€”technological affordability, global health investments, regulatory clarity, and expanding data ecosystems. These structural developments are shaping a resilient, long-term demand environment for scalable, compliant, and high-quality bioinformatics services worldwide.

https://market.us/wp-content/uploads/2022/06/Bioinformatics-Services-Market-Size-Forecast-2.jpg" alt="Bioinformatics Services Market Size Forecast">
e
Catalog of NCBI sequence read archive (SRA) data for salamanders at the...
portal.edirepository.org
csv
Updated Apr 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brett Addis; Madaline Cochrane; Winsor Lowe (2024). Catalog of NCBI sequence read archive (SRA) data for salamanders at the Hubbard Brook Experimental Forest 2012-2021 [Dataset]. http://doi.org/10.6073/pasta/6df7199d751ec81315395a042cbd8083
Explore at:
csv(312227 byte), csv(220695 byte), csv(282251 byte)Available download formats
Unique identifier
https://doi.org/10.6073/pasta/6df7199d751ec81315395a042cbd8083
Dataset updated
Apr 9, 2024
Dataset provided by
EDI
Authors
Brett Addis; Madaline Cochrane; Winsor Lowe
Time period covered
2012 - 2021
Area covered

Variables measured
strain, ecotype, isolate, lat_lon, cultivar, organism, Accession, BioProject, env_medium, sample_URL, and 8 more
Description
This project was designed to describe fine-scale population genetic differentiation of the stream salamander Gryinophilus porphyriticus among five study streams in the Hubbard Brook Experimental Forest. The data are paired with intensive capture-recapture data to assess direct fitness effects of individual genetic diversity, including effects of individual multilocus heterozygosity on stage-specific survival probabilities.

This dataset publishes a manifest of the genomic sequence reads submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). These samples are published at NCBI under the BioProject ID 1090913 (https://www.ncbi.nlm.nih.gov/bioproject/1090913). The tables here include sample metadata and the NCBI URLs to each sample. These data were gathered as part of the Hubbard Brook Ecosystem Study (HBES). The HBES is a collaborative effort at the Hubbard Brook Experimental Forest, which is operated and maintained by the USDA Forest Service, Northern Research Station.
d
Chromosome assembly and preliminary gene and repeat annotations for Myzomela...
datadryad.org
zip
Updated Jul 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elsie Shogren; Jason Sardell; Christina Muirhead; Emiliano Martí; Elizabeth Cooper; Robert Moyle; Daven Presgraves; Albert J. Uy (2024). Chromosome assembly and preliminary gene and repeat annotations for Myzomela tristrami reference genome [Dataset]. http://doi.org/10.5061/dryad.612jm64c9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.612jm64c9
Dataset updated
Jul 27, 2024
Dataset provided by
Dryad
Authors
Elsie Shogren; Jason Sardell; Christina Muirhead; Emiliano Martí; Elizabeth Cooper; Robert Moyle; Daven Presgraves; Albert J. Uy
Time period covered
Jul 15, 2024
Description
Chromosome assembly and preliminary gene and repeat annotations for Myzomela tristrami reference genome

I. Files (GENOME) Mt_v1.0_MAIN.fa.gz Primary genome, (largely) scaffolded to chromosome-level, plus other primary assembled contigs Mt_v1.0_MAIN.gff.gz Simple gene annotations for primary genome, annotated using GeMoMa v1.8 and a zebra finch (bTaeGut1.4.pri) annotation reference Mt_v1.0_extra.fa.gz Additional contigs, not for use in most analyses but some may be of interest This set is a combination of hand-identified haplotigs of the main genome, and assembler-identified "alternate" (haplotig) contigs (ORIGINAL_ASSEMBLY_CONTIGS) Mt_hifi.asm.p.fa.gz "primary" assembly contigs, output from hifiasm (v0.13-r308) Mt_hifi.asm.a.fa.gz "alternate" assembly contigs, output from hifiasm (v0.13-r308) (REPEAT_MASKING) TElib_Myzo_preliminary.fa.gz Preliminary Myzomela-tuned TE/repeat library, generated using RepeatModeler (v.2) Mt_v1.0_MAIN_RM_sites_to_filter.txt List of sites masked by RepeatM...
o
Repository for the single cell RNA sequencing data analysis for the human...
explore.openaire.eu
Updated Aug 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan; Andrew; Pierre; Allart; Adrian (2023). Repository for the single cell RNA sequencing data analysis for the human manuscript. [Dataset]. http://doi.org/10.5281/zenodo.8286134
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8286134
Dataset updated
Aug 26, 2023
Authors
Jonathan; Andrew; Pierre; Allart; Adrian
Description
This is the GitHub repository for the single cell RNA sequencing data analysis for the human manuscript. The following essential libraries are required for script execution: Seurat scReportoire ggplot2 dplyr ggridges ggrepel ComplexHeatmap Linked File: -------------------------------------- This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. Provided below are descriptions of the linked datasets: 1. Gene Expression Omnibus (GEO) ID: GSE229626 - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the matrix.mtx, barcodes.tsv, and genes.tsv files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token"(https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). 2. Sequence read archive (SRA) repository - Title: Gene expression profile at single cell level of human T cells stimulated via antibodies against the T Cell Receptor (TCR) - Description: This submission contains the "raw sequencing" or .fastq.gz files, which are tab delimited text files. - Submission type: Private. In order to gain access to the repository, you must use a "reviewer token" (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html). Please note that since the GSE submission is private, the raw data deposited at SRA may not be accessible until the embargo on GSE229626 has been lifted. Installation and Instructions -------------------------------------- The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation: > Ensure you have R version 4.1.2 or higher for compatibility. > Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code. The following code can be used to set working directory in R: > setwd(directory) Steps: 1. Download the "Human_code_April2023.R" and "Install_Packages.R" R scripts, and the processed data from GSE229626. 2. Open "R-Studios"(https://www.rstudio.com/tags/rstudio-ide/) or a similar integrated development environment (IDE) for R. 3. Set your working directory to where the following files are located: - Human_code_April2023.R - Install_Packages.R 4. Open the file titled Install_Packages.R and execute it in R IDE. This script will attempt to install all the necessary pacakges, and its dependencies. 5. Open the Human_code_April2023.R R script and execute commands as necessary.
b
Data relating to RNA sequence accessions at NCBI from Ross Sea...
bco-dmo.org
csv
Updated May 17, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rebecca J. Gast (2018). Data relating to RNA sequence accessions at NCBI from Ross Sea Dinoflagellates, Phaeocystis antarctica, Pyramimons tychotreta, and Micromonas polaris (CCMP 2099) (Kleptoplasty project) [Dataset]. https://www.bco-dmo.org/dataset/728427
Explore at:
csv(16.59 KB)Available download formats
Dataset updated
May 17, 2018
Dataset provided by
Biological and Chemical Data Management Office
Authors
Rebecca J. Gast
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 1997 - Apr 7, 1998
Area covered

Variables measured
lat, lon, temp, depth, isolate, Organism, BioSample, SRA_Study, replicate, Assay_Type, and 13 more
Measurement technique
Automated DNA Sequencer
Description
This dataset contains data related to RNA sequence genetic accessions at the National Center for Biotechnology Information (NCBI) including information about the host organism, collection location, and collection date.

The accessions are the unprocessed Illumina MiSeq reads for the Ross Sea Dinoflagellate RNA-Seq experiments, Phaeocystis antarctica RNA-Seq experiments, and Pyramimons tychotreta & Micromonas polaris (CCMP 2099) mixotrophy experiments.

Pyramimonas tychotreta & Micromonas polaris (CCMP 2099) mixotrophy RNA sequences are available through the NCBI Sequence Read Archive (SRA) under the SRA accession number SRP090401 (BioProject PRJNA342459)

Ross Sea Dinoflagellate RNA sequences are available through the NCBI Sequence Read Archive (SRA) under the accession number SRP132912 (BioProject PRJNA428208).

Phaeocystis antarctica RNA sequences are available through the NCBI Sequence Read Archive (SRA) under the accession number SRP133243 (BioProject PRJNA434497).
g
Whole genome sequencing of three North American large-bodied birds
gimi9.com
Updated Oct 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Whole genome sequencing of three North American large-bodied birds [Dataset]. https://gimi9.com/dataset/data-gov_whole-genome-sequencing-of-three-north-american-large-bodied-birds/
Explore at:
Dataset updated
Oct 26, 2023
Area covered
United States
Description
The data release details the samples, methods, and raw data used to generate high-quality genome assemblies for greater sage-grouse (Centrocercus urophasianus), white-tailed ptarmigan (Lagopus leucura), and trumpeter swan (Cygnus buccinator). The raw data have been deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI), the authoritative repository for public biological sequence data, and are not included in this data release. Instead, the accessions that link to those data via the NCBI portal (www.ncbi.nlm.nih.gov) are provided herein. The release consists of a single file, sample.metadata.txt, which maps NCBI accessions to the samples sequenced and the different types of sequencing performed to generate the assemblies and annotate their gene features.
Pseudomonas sp. HOU2 predicted gene sequences
figshare.com
txt
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Van Hong Thi Dao; Son Truong Dinh (2024). Pseudomonas sp. HOU2 predicted gene sequences [Dataset]. http://doi.org/10.6084/m9.figshare.26325310.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26325310.v1
Dataset updated
Jul 22, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Van Hong Thi Dao; Son Truong Dinh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These whole genome of Pseudomonas sp. HOU2 were analyzed by RAST (Rapid Annotation using Subsystem Technology) (https://rast.nmpdr.org/) on 18 July 2024 with the following selected options to get the predicted HOU2 gene sequences. Genetic code: 11Annotation scheme: RASTtkPreserve gene calls: noAutomatically fix errors: yesFix frameshifts: yesBackfill gaps: yesNCBI Sequence Read Archive of Pseudomonas sp. HOU2 is SRR29666724 (https://www.ncbi.nlm.nih.gov/sra/SRR29666724)NCBI complete genome of Pseudomonas sp. HOU2 is CP160398.1 (https://www.ncbi.nlm.nih.gov/nuccore/CP160398)
u
Data from: Metagenomic and near full-length 16S rRNA sequence data in...
agdatacommons.nal.usda.gov
bin
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phillip R. Myer; MinSeok Kim; Harvey C. Freetly; Timothy P.L. Smith (2024). Data from: Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Data_from_Metagenomic_and_near_full-length_16S_rRNA_sequence_data_in_support_of_the_phylogenetic_analysis_of_the_rumen_bacterial_community_in_steers/24852534
Explore at:
binAvailable download formats
Dataset updated
Feb 9, 2024
Dataset provided by
Data in Brief
Authors
Phillip R. Myer; MinSeok Kim; Harvey C. Freetly; Timothy P.L. Smith
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Amplicon sequencing utilizing next-generation platforms has significantly transformed how research is conducted, specifically microbial ecology. However, primer and sequencing platform biases can confound or change the way scientists interpret these data. The Pacific Biosciences RSII instrument may also preferentially load smaller fragments, which may also be a function of PCR product exhaustion during sequencing. To further examine theses biases, data is provided from 16S rRNA rumen community analyses. Specifically, data from the relative phylum-level abundances for the ruminal bacterial community are provided to determine between-sample variability. Direct sequencing of metagenomic DNA was conducted to circumvent primer-associated biases in 16S rRNA reads and rarefaction curves were generated to demonstrate adequate coverage of each amplicon. PCR products were also subjected to reduced amplification and pooling to reduce the likelihood of PCR product exhaustion during sequencing on the Pacific Biosciences platform. The taxonomic profiles for the relative phylum-level and genus-level abundance of rumen microbiota as a function of PCR pooling for sequencing on the Pacific Biosciences RSII platform were provided. Data is within this article and raw ruminal MiSeq sequence data is available from the NCBI Sequence Read Archive (SRA Accession SRP047292). Additional descriptive information is associated with NCBI BioProject PRJNA261425. http://www.ncbi.nlm.nih.gov/bioproject/PRJNA261425/ Resources in this dataset:Resource Title: NCBI Sequence Read Archive (SRA Accession SRP047292). File Name: Web Page, url: https://www.ncbi.nlm.nih.gov/sra/SRX704260 1 ILLUMINA (Illumina MiSeq) run: 978,195 spots, 532.9M bases, 311.6Mb downloads.
D
Replication Data for: Changes in DNA Methylation During Anoxia and...
dataverse.azure.uit.no
search.dataone.org
txt
Updated Aug 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Magdalena Winklhofer; Magdalena Winklhofer; Øivind Andersen; Sjannie Lefevre; Sjannie Lefevre; Øivind Andersen (2025). Replication Data for: Changes in DNA Methylation During Anoxia and Reoxygenation in Crucian Carp Brain [Dataset]. http://doi.org/10.18710/GSHJEB
Explore at:
txt(29643726), txt(29646762), txt(29645800), txt(29639614), txt(29648691), txt(29642203), txt(29644166), txt(242971), txt(2026), txt(14315), txt(17150), txt(91177), txt(5837), txt(29645960), txt(29648493), txt(29651345), txt(282436), txt(227822), txt(115430), txt(3415), txt(29648375), txt(300150), txt(29647183), txt(13029)Available download formats
Unique identifier
https://doi.org/10.18710/GSHJEB
Dataset updated
Aug 5, 2025
Dataset provided by
DataverseNO
Authors
Magdalena Winklhofer; Magdalena Winklhofer; Øivind Andersen; Sjannie Lefevre; Sjannie Lefevre; Øivind Andersen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
Research Council of Norway
Sigma2
Description
This analysis contained the identification of DNA methylation sites in the context of CpG islands and differentially methylated regions (DMRs) with MethylScore. Further the mRNA sequencing data were analyzed to differentially expressed genes. Differentially expressed genes and identified DMRs were correlated. Finally DMRs and their expression changes were characterized in their genomic context. All raw sequencing data used as input for analyses to obtain the data in this repository are deposited in the NCBI Sequence Read Archive (SRA) under BioProject ID PRJNA1163668 (http://www.ncbi.nlm.nih.gov/bioproject/1163668). The genome assembly and annotation data used here were obtained from DataverseNO (https://doi.org/10.18710/GXMSUH). This genome assembly is based on the raw sequencing data deposited under BioProject ID PRJNA1119394 (http://www.ncbi.nlm.nih.gov/bioproject/1119394). Together these data were used to identify CpG sites genome wide and further identify differentially methylated regions. The corresponding mRNA was utilized to identify transcriptional changes and enable a comparison of differentially methylated genes with differentially expressed genes. Scripts are available in the GitHub repository WholeGenomeBisulphiteSequencing (https://github.com/MagdalenaWinklhofer/WholeGenomeBisulphiteSequencing.git).
Aging correlated genes.
plos.figshare.com
xlsx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill (2023). Aging correlated genes. [Dataset]. http://doi.org/10.1371/journal.pone.0255085.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0255085.s001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Joe L. Webb; Simon M. Moe; Andrew K. Bolstad; Elizabeth M. McNeill
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This table depicts the aging correlated genes for humans and flies sorted according to their correlation coefficient. (XLSX)
f
List of whole genome resequenced datasets available on Sequence Read...
datasetcatalog.nlm.nih.gov
Updated Jan 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Forcina, Giovanni; Sadanandan, Keren R.; Wu, Meng Yue; Low, Gabriel Weijie; Baldwin, Maude W.; Wu, Shaoyuan; Rheindt, Frank E.; van Grouw, Hein; Edwards, Scott V.; Gwee, Chyi Yin (2023). List of whole genome resequenced datasets available on Sequence Read Archive, European Nucleotide Archive or chickenSD for Gallus gallus and used in this study. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001056782
Explore at:
Dataset updated
Jan 19, 2023
Authors
Forcina, Giovanni; Sadanandan, Keren R.; Wu, Meng Yue; Low, Gabriel Weijie; Baldwin, Maude W.; Wu, Shaoyuan; Rheindt, Frank E.; van Grouw, Hein; Edwards, Scott V.; Gwee, Chyi Yin
Description
Local chickens that do not confer to a breed are labelled as “village”. (XLSX)
Genome assemblies and respective wg/cgMLST profiles of a diverse dataset...
zenodo.org
bin, zip
Updated Jul 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verónica Mixão; Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges (2023). Genome assemblies and respective wg/cgMLST profiles of a diverse dataset comprising 3,076 Campylobacter jejuni isolates [Dataset]. http://doi.org/10.5281/zenodo.7230105
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7230105
Dataset updated
Jul 24, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Verónica Mixão; Verónica Mixão; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges; Holger Brendebach; Miguel Pinto; Daniel Sobral; João Paulo Gomes; Carlus Deneke; Simon Tausch; Vítor Borges
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset

This dataset comprises the genome assemblies and respective 2,794-loci whole-genome (wg) Multiple Locus Sequence Type (MLST) profiles [INNUENDO schema (Llarena et al. 2018) available in chewie-NS (Mamede et al. 2022)] of a final set of 3,076 Campylobacter jejuni samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) at the beginning of the analysis (November 2021). This set of samples was carefully selected to cover a wide genetic diversity (assessed in terms of Sequence Type [ST]). In total, 476 different STs are represented in this dataset, with ST21, ST50, ST48, ST45 and ST257 being the most represented ones and, together, corresponding to 29.1% of the dataset.

File “Cj_metadata.xlsx” contains metadata information for each isolate, including ENA/SRA accession number, BioProject and in-silico MLST ST.

The directory “assemblies/” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

The file “profiles/Cj_profiles_wgMLST.tsv” corresponds to a tab separated file with the 2,794-loci wgMLST profiles of each solate presented in the metadata file. The files “profiles/Cj_profiles_cgMLST_95.tsv”, “profiles/Cj_profiles_cgMLST_98.tsv” and “profiles/Cj_profiles_cgMLST_100.tsv” correspond to a 1,012-loci, 987-loci and 29-loci cgMLST profiles of each isolate presented in the metadata file, respectively. These profiles were determined as explained below.

Dataset selection and curation

With the objective of creating a diverse dataset of C. jejuni genome assemblies, we collected information about the genetic diversity (serotype) of the isolates available at PubMLST database in the beginning of this analysis (November 2021) and in other previous works. Based on this information, we selected an initial dataset comprising 3,539 samples. The majority of them are associated with the INNUENDO project (Llarena et al. 2018). The remaining ones are associated with five BioProjects (PRJEB31119, PRJEB38253, PRJEB40238, PRJEB4165 and PRJNA350537). Their WGS data was downloaded from ENA/SRA with fastq-dl v1.0.6. Read quality control, trimming and assembly were performed with the Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 3,076 isolates passed this curation step and were included in the final dataset. wgMLST profiles of each of these isolates were determined with chewBBACA v2.8.5 (Silva et al. 2018), using the 2,794-loci INNUENDO schema available in chewie-NS (Llarena et al. 2018; Mamede et al. 2022) and downloaded on May 31st, 2022. Three cgMLST schemas were obtained with ReporTree v1.0.0 (Mixão et al. 2022) using the 2,794-loci wgMLST profiles of the 3,076 isolates as input and setting distinct “--site-inclusion” thresholds: 0.95, 0.98 and 1.0 (i.e., keep schema loci called in at least 95%, 98% and 100% of the samples, resulting in a 1,012-loci, 987-loci and 29-loci allelic matrices, respectively).

Acknowledgements

We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.
n
Improving the efficiency of single cell genome sequencing based on...
data.niaid.nih.gov
zip
Updated Jan 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jing Tu; Zengyan Yang; Na Lu; Zuhong Lu (2022). Improving the efficiency of single cell genome sequencing based on overlapping pooling strategy [Dataset]. http://doi.org/10.5061/dryad.v6wwpzgwr
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.v6wwpzgwr
Dataset updated
Jan 24, 2022
Dataset provided by
Southeast University
Authors
Jing Tu; Zengyan Yang; Na Lu; Zuhong Lu
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Single cell genome sequencing has become a useful tool in medicine and biology studies. However, an independent library is required for each cell in single cell genome sequencing, so that the cost grows in step with the number of cells. In this study, we report a study on efficient single-cell copy number variation (CNV) analysis based on overlapping pooling strategy together with branch and bound (B&B) algorithm. Single cells are overlapped pooled before sequencing, and later are assorted into specific types by estimating their CNV patterns by B&B algorithm. Instead of constructing libraries for each cell, a library is required only for each pool. As long as the number of pools is smaller than the cells, fewer libraries are needed, and a lower cost is spent. Through computer simulations, we overlapping pooled 80 cells into 40 and 27 pools and classified them into cell types based on CNV pattern. The results showed that 84% cells in 40 pools and 76.5% cells in 27 pools were correctly classified on average, while only half or one-third of the sequencing libraries are required. Combining with traditional approaches, our method is expected to significantly improve the efficiency of single cell genome sequencing. Methods The dataset contains the statistics of the sequencing data and the copy number profiles of the single cells.

The single-cell sequencing data of 80 single cells from 7 tumor patients with Triple-Negative Breast Cancer (TNBC) were downloaded in FASTQ format from National Center for Biotechnology Information (NCBI) [15] under Sequence Read Archive (SRA) accessions SRP064210.

Performed basic statistics on BAM files mapped to the human genome hg19. We followed the protocol put forward by Baslan et al. to obtain the copy number profile of single cells.

Facebook

Twitter

Click to copy link

Link copied

Cite

National Library of Medicine (2025). Sequence Read Archive (SRA) [Dataset]. https://catalog.data.gov/dataset/sequence-read-archive-sra-54e4a

Sequence Read Archive (SRA)

Explore at:

Dataset updated

Jun 19, 2025

Dataset provided by

National Library of Medicine

Description

The Sequence Read Archive (SRA) stores sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome Analyzer®, Life Technologies AB SOLiD System®, Helicos Biosciences Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.

Clear search

Close search

Google apps

Main menu

Sequence Read Archive (SRA)

Species included in the analysis, including environment (freshwater [FW] or...

Top 50 conserved aging predictive genes.

Genome assemblies and respective wg/cgMLST profiles of a diverse dataset...

Sequencing Data Set of Sediment Layers

Genome assemblies and respective wg/cgMLST profiles of a diverse dataset...

Repository for Single Cell RNA Sequencing Analysis of The EMT6 Dataset

1. Main Description

File Descriptions

Linked Files

Installation and Instructions

Bioinformatics Services Market Grows from USD 2.9 Billion to 10.7 Billion by...

Overview

Emerging Opportunities and Regional Investments

Catalog of NCBI sequence read archive (SRA) data for salamanders at the...

Chromosome assembly and preliminary gene and repeat annotations for Myzomela...

Chromosome assembly and preliminary gene and repeat annotations for Myzomela tristrami reference genome

Repository for the single cell RNA sequencing data analysis for the human...

Data relating to RNA sequence accessions at NCBI from Ross Sea...

Whole genome sequencing of three North American large-bodied birds

Pseudomonas sp. HOU2 predicted gene sequences

Data from: Metagenomic and near full-length 16S rRNA sequence data in...

Replication Data for: Changes in DNA Methylation During Anoxia and...

Aging correlated genes.

List of whole genome resequenced datasets available on Sequence Read...

Genome assemblies and respective wg/cgMLST profiles of a diverse dataset...

Improving the efficiency of single cell genome sequencing based on...

Sequence Read Archive (SRA)See More Versions

Sequence Read Archive (SRA)