100+ datasets found

d
GenBank
catalog.data.gov
healthdata.gov
+3more
Updated Jul 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (NIH) (2023). GenBank [Dataset]. https://catalog.data.gov/dataset/genbank
Explore at:
Dataset updated
Jul 26, 2023
Dataset provided by
National Institutes of Health (NIH)
Description
GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is designed to provide and encourage access within the scientific community to the most up to date and comprehensive DNA sequence information.
r
High Throughput Genomic Sequences Division
rrid.site
neuinfo.org
+1more
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). High Throughput Genomic Sequences Division [Dataset]. http://identifiers.org/RRID:SCR_002150
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002150
Dataset updated
Apr 24, 2025
Description
Database of high-throughput genome sequences from large-scale genome sequencing centers, including unfinished and finished sequences. It was created to accommodate a growing need to make unfinished genomic sequence data rapidly available to the scientific community in a coordinated effort among the International Nucleotide Sequence databases, DDBJ, EMBL, and GenBank. Sequences are prepared for submission by using NCBI's software tools Sequin or tbl2asn. Each center has an FTP directory into which new or updated sequence files are placed. Sequence data in this division are available for BLAST homology searches against either the htgs database or the month database, which includes all new submissions for the prior month. Unfinished HTG sequences containing contigs greater than 2 kb are assigned an accession number and deposited in the HTG division. A typical HTG record might consist of all the first-pass sequence data generated from a single cosmid, BAC, YAC, or P1 clone, which together make up more than 2 kb and contain one or more gaps. A single accession number is assigned to this collection of sequences, and each record includes a clear indication of the status (phase 1 or 2) plus a prominent warning that the sequence data are unfinished and may contain errors. The accession number does not change as sequence records are updated; only the most recent version of a HTG record remains in GenBank.
d
Whole genome sequencing of three North American large-bodied birds
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Whole genome sequencing of three North American large-bodied birds [Dataset]. https://catalog.data.gov/dataset/whole-genome-sequencing-of-three-north-american-large-bodied-birds
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
The data release details the samples, methods, and raw data used to generate high-quality genome assemblies for greater sage-grouse (Centrocercus urophasianus), white-tailed ptarmigan (Lagopus leucura), and trumpeter swan (Cygnus buccinator). The raw data have been deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI), the authoritative repository for public biological sequence data, and are not included in this data release. Instead, the accessions that link to those data via the NCBI portal (www.ncbi.nlm.nih.gov) are provided herein. The release consists of a single file, sample.metadata.txt, which maps NCBI accessions to the samples sequenced and the different types of sequencing performed to generate the assemblies and annotate their gene features.
ARS Microbial Genomic Sequence Database Server
agdatacommons.nal.usda.gov
datadiscoverystudio.org
+1more
bin
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA Agricultural Research Service (2024). ARS Microbial Genomic Sequence Database Server [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/ARS_Microbial_Genomic_Sequence_Database_Server/24661200
Explore at:
binAvailable download formats
Dataset updated
Feb 9, 2024
Dataset provided by
United States Department of Agriculturehttp://usda.gov/
Agricultural Research Servicehttps://www.ars.usda.gov/
Authors
USDA Agricultural Research Service
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This database server is supported in fulfilment of the research mission of the Mycotoxin Prevention and Applied Microbiology Research Unit at the National Center for Agricultural Utilization Research in Peoria, Illinois. The linked website provides access to gene sequence databases for various groups of microorganisms, such as Streptomyces species or Aspergillus species and their relatives, that are the product of ARS research programs. The sequence databases are organized in the BIGSdb (Bacterial Isolate Genomic Sequence Database) software package developed by Keith Jolley and Martin Maiden at Oxford University. Resources in this dataset:Resource Title: ARS Microbial Genomic Sequence Database Server. File Name: Web Page, url: http://199.133.98.43
Genomics England - Long Read Sequencing
healthdatagateway.org
unknown
Updated Mar 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The 100;,;000 Genomes Project Protocol v3;,;Genomics England. doi:10.6084/m9.figshare.4530893.v3. 2017. Publications that use the Genomics England Database should include an author as: Genomics England Research Consortium. Please see publication policy. (2023). Genomics England - Long Read Sequencing [Dataset]. https://healthdatagateway.org/dataset/374
Explore at:
unknownAvailable download formats
Dataset updated
Mar 30, 2023
Dataset provided by
Genomics England
Authors
The 100;,;000 Genomes Project Protocol v3;,;Genomics England. doi:10.6084/m9.figshare.4530893.v3. 2017. Publications that use the Genomics England Database should include an author as: Genomics England Research Consortium. Please see publication policy.
License
https://www.genomicsengland.co.uk/about-gecip/joining-research-community/https://www.genomicsengland.co.uk/about-gecip/joining-research-community/
Description
Contains tables related to long-reads sequencing data for 100,000 Genomes Project participants.

lrs_laboratory_sample: Data describing the characteristics and processing methods (DNA to library preparation) of samples from participants in the 100,000 Genomes Project for which long-reads sequencing has been carried out.

lrs_sequencing_data: This table includes data describing long-read sequencing of a subset of 100,000 Genomes Project participants and associated output, including paths to raw and BAM files.

cancer_ont_cohorts: Table listing participant ids, sample data, file paths and sequencing statistics for Oxford Nanopore cancer cohorts available in the Research Environment, along with corresponding matched germline and Illumina short reads files where available

rare_disease_pacbio_pilot: This is a dataset of 91 rare disease samples from the 100k genome project re-sequenced with Pacific Biosciences (PacBio) as an example dataset to to demonstrate the utility of their HiFi technology.
d
Data from: BBGD454: an Online Database for Blueberry Genomic Data...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). BBGD454: an Online Database for Blueberry Genomic Data Transcriptome analysis of Blueberry using 454 EST sequencing [Dataset]. https://catalog.data.gov/dataset/bbgd454-an-online-database-for-blueberry-genomic-data-transcriptome-analysis-of-blueberry--5783e
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
NOTE: This dataset is no longer publicly available. This database houses over 500,000 sequences that were generated and assembled into approximately 15,000 contigs, annotated and functionally mapped to Gene Ontology (GO) terms. Blueberry (Vaccinium corymbosum) is a major berry crop in the United States. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities during an organism’s developmental stage(s) or its response to biotic or abiotic stresses. Such application of this new sequencing technique allows for high-throughput, genome-wide experimental verification of known and novel transcripts. We have applied a high-throughput pyrosequencing technology (454 EST sequencing) for transcriptome profiling of blueberry during different stages of fruit development to gain an understanding of the genes that are up or down regulated during this process. We have also sequenced flower buds at four different stages of cold acclimation to gain a better understanding of the genes and biochemical pathways that are up- or down-regulated during cold acclimation, since extreme low temperatures are known to reduce crop yield and cause major losses to US farmers. We have also sequenced a leaf sample to compare its transcriptome profile with that of bud and fruit samples. Over 500,000 sequences were generated and assembled into approximately 15,000 contigs and were annotated and functionally mapped to Gene Ontology (GO) terms. A database was developed to house these sequences and their annotations. A web based interface was also developed to allow collaborators to search\browse the data and aid in the analysis and interpretation of the data. The availability of these sequences will allow for future advances, such as the development of a blueberry microarray to study gene expression, and will aid in the blueberry genome sequencing effort that is underway. This work was supported by grant 2008-51180-04861 from the USDA - Cooperative State Research, Education, and Extension Service (CSREES) Specialty Crop Research Initiative program.
n
Sequencing of Idd regions in the NOD mouse genome
neuinfo.org
scicrunch.org
+1more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Sequencing of Idd regions in the NOD mouse genome [Dataset]. http://identifiers.org/RRID:SCR_001483
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_001483
Dataset updated
Jan 29, 2022
Description
Genetic variations associated with type 1 diabetes identified by sequencing regions of the non-obese diabetic (NOD) mouse genome and comparing them with the same areas of a diabetes-resistant C57BL/6J reference mouse allowing identification of single nucleotide polymorphisms (SNPs) or other genomic variations putatively associated with diabetes in mice. Finished clones from the targeted insulin-dependent diabetes (Idd) candidate regions are displayed in the NOD clone sequence section of the website, where they can be downloaded either as individual clone sequences or larger contigs that make up the accession golden path (AGP). All sequences are publicly available via the International Nucleotide Sequence Database Collaboration. Two NOD mouse BAC libraries were constructed and the BAC ends sequenced. Clones from the DIL NOD BAC library constructed by RIKEN Genomic Sciences Centre (Japan) in conjunction with the Diabetes and Inflammation Laboratory (DIL) (University of Cambridge) from the NOD/MrkTac mouse strain are designated DIL. Clones from the CHORI-29 NOD BAC library constructed by Pieter de Jong (Children's Hospital, Oakland, California, USA) from the NOD/ShiLtJ mouse strain are designated CHORI-29. All NOD mouse BAC end-sequences have been submitted to the International Nucleotide Sequence Database Consortium (INSDC), deposited in the NCBI trace archive. They have generated a clone map from these two libraries by mapping the BAC end-sequences to the latest assembly of the C57BL/6J mouse reference genome sequence. These BAC end-sequence alignments can then be visualized in the Ensembl mouse genome browser where the alignments of both NOD BAC libraries can be accessed through the Distributed Annotation System (DAS). The Mouse Genomes Project has used the Illumina platform to sequence the entire NOD/ShiLtJ genome and this should help to position unaligned BAC end-sequences to novel non-reference regions of the NOD genome. Further information about the BAC end-sequences, such as their alignment, variation data and Ensembl gene coverage, can be obtained from the NOD mouse ftp site.
Sequence Set Browser
healthdata.gov
datadiscovery.nlm.nih.gov
+3more
application/rdfxml +5
Updated Sep 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
datadiscovery.nlm.nih.gov (2021). Sequence Set Browser [Dataset]. https://healthdata.gov/dataset/Sequence-Set-Browser/285u-dvtn
Explore at:
application/rdfxml, json, csv, application/rssxml, tsv, xmlAvailable download formats
Dataset updated
Sep 1, 2021
Dataset provided by
datadiscovery.nlm.nih.gov
Description
This site is for browsing WGS (Whole Genome Shotgun) genomes, TSA (Transcriptome Shotgun Assemblies) and TLS (Targeted Locus Study) sets. WGS sequences are incomplete genomes that have been sequenced by a whole genome shotgun strategy. TSA sequences are transcript sequences that have been computationally assembled from primary RNA sequence data. TLS sequences are large-scale marker gene sequencing studies.

Please consult WGS Submission or TSA Submission pages for more details. https://www.ncbi.nlm.nih.gov/genbank/wgs https://www.ncbi.nlm.nih.gov/genbank/tsa
s
NCBI Genome Survey Sequences Database
scicrunch.org
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). NCBI Genome Survey Sequences Database [Dataset]. http://identifiers.org/RRID:SCR_002146
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002146
Dataset updated
Apr 11, 2025
Description
Database of unannotated short single-read primarily genomic sequences from GenBank including random survey sequences clone-end sequences and exon-trapped sequences. The GSS division of GenBank is similar to the EST division, with the exception that most of the sequences are genomic in origin, rather than cDNA (mRNA). It should be noted that two classes (exon trapped products and gene trapped products) may be derived via a cDNA intermediate. Care should be taken when analyzing sequences from either of these classes, as a splicing event could have occurred and the sequence represented in the record may be interrupted when compared to genomic sequence. The GSS division contains (but is not limited to) the following types of data: * random single pass read genome survey sequences. * cosmid/BAC/YAC end sequences * exon trapped genomic sequences * Alu PCR sequences * transposon-tagged sequences Although dbGSS sequences are incorporated into the GSS Division of GenBank, annotation in dbGSS is more comprehensive and includes detailed information about the contributors, experimental conditions, and genetic map locations.
DNA Sequencing Market Analysis North America, Europe, Asia, Rest of World...
technavio.com
Updated May 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2024). DNA Sequencing Market Analysis North America, Europe, Asia, Rest of World (ROW) - US, UK, Germany, China, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/dna-sequencing-market-industry-analysis
Explore at:
Dataset updated
May 15, 2024
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
Global
Description
Snapshot img

DNA Sequencing Market Size 2024-2028

The DNA sequencing market size is forecast to increase by USD 17.34 billion at a CAGR of 20.01% between 2023 and 2028. The market is experiencing significant growth due to the increasing adoption of Next-Generation Sequencing (NGS) technologies.

This advanced methodology offers faster, more accurate, and cost-effective solutions compared to traditional Sanger sequencing. Bioinformatics tools, artificial intelligence, and machine learning are essential for interpreting DNA variations and tumor heterogeneity. The affordability of DNA sequencing is expanding its applications beyond genetic research, including diagnostics, personalized medicine, agriculture, and environmental analysis. The global DNA sequencing market is projected to reach substantial growth, driven by technological advancements, increasing demand for early disease detection, and the growing need for genetic research in various industries.

What will be the Size of the Market During the Forecast Period?

Request Free Sample

The market is experiencing significant growth in the healthcare sector, driven by the adoption of precision medicine and the increasing importance of genetic variation in Clinical Diagnosis and Drug Discovery. Next-Generation Sequencing (NGS) technologies, including Whole-Genome Sequencing (WGS), are revolutionizing the field of genomics by providing clinicians with unprecedented access to genetic information. Pharmacogenomics and bioinformatics play crucial roles in interpreting DNA variations and gene expression data from NGS platforms. The Consumables segment, which includes sequencing reagents and benchtop sequencers, is a major contributor to the market's growth. High-throughput sequencing technologies enable the analysis of large amounts of genomic data, leading to the discovery of new biomarkers for Oncology and other applications. Moreover, NGS is also used in Forensics and Reproductive Health, and Clinical Diagnostic Laboratories are increasingly adopting these technologies to improve accuracy and speed up Clinical Diagnosis. The market is expected to continue growing due to the increasing demand for Personalized Medicine and the need for faster sequencing speeds to keep up with the growing amount of genetic information being generated.

How is this market segmented and which is the largest segment?

The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

End-user Academic research Clinical research Hospitals and clinics Pharmaceutical and biotechnology companies Solution Products Services Geography North America US Europe Germany UK Asia China Japan Rest of World (ROW)

By End-user Insights

The academic research segment is estimated to witness significant growth during the forecast period.

The market is significantly driven by the healthcare sector, particularly in the context of cancer research and precision medicine. The increasing focus on genetic variation and gene expression in healthcare frameworks has led to an increase in demand for sequencing platforms and consumables in the pharmacogenomics and oncology segments. Next-Generation Sequencing (NGS) technologies, including Whole-Genome Sequencing (WGS) and Whole Exome Sequencing (WES), are revolutionizing clinical research and academic research by providing high-quality genomic data for tertiary analysis. The sequencing speed and access to genetic information are crucial for researchers and clinicians to improve our understanding of human disease pathogenesis and develop novel drug targets.

Get a glance at the market report of share of various segments Request Free Sample

The academic research segment was valued at USD 3.31 billion in 2018 and showed a gradual increase during the forecast period.

Regional Analysis

North America is estimated to contribute 36% to the growth of the global market during the forecast period.

Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

For more insights on the market share of various regions, Request Free Sample

The market is significantly influenced by the epidemiology of chronic diseases, with North America leading the market due to rising healthcare expenditure and the presence of key players in the pharmaceutical and biotechnology sectors. The region's clinical diagnostic laboratories and research labs utilize advanced benchtop sequencers and high-throughput instruments to discover biomarkers for drug development and clinical diagnosis. Skilled professionals in these facilities use nucleotides and consumables to analyze DNA sequences for personalized medici
PeanutBase
agdatacommons.nal.usda.gov
datasets.ai
+2more
bin
Updated Feb 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA Agricultural Research Service (2024). PeanutBase [Dataset]. http://doi.org/10.15482/USDA.ADC/1352915
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1352915
Dataset updated
Feb 8, 2024
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Authors
USDA Agricultural Research Service
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
PeanutBase (peanutbase.org) is the primary genetics and genomics database for cultivated peanut and its wild relatives. It houses information about genome sequences, genes and predicted functions, genetic maps, markers, links to germplasm resources, and maps of peanut germplasm origins. This resource is being developed for U.S. and International peanut researchers and breeders, with support from The Peanut Foundation and the many contributors that have made the Peanut Genomics Initiative possible. Funded by The Peanut Foundation as part of the Peanut Genomics Initiative. Additional support from USDA-ARS. Database developed and hosted by the USDA-ARS SoyBase and Legume Clade Database group at Ames, IA, with NCGR and other participants. Resources in this dataset:Resource Title: PeanutBase.org. File Name: Web Page, url: https://peanutbase.org Website pointer for PeanutBase.org - Genetic and genomic data to enable more rapid crop improvement in peanuts. The peanut genome has been sequenced and analyzed as part of the International Peanut Genomic Initiative, in order to accelerate breeding progress and get more productive, disease-resistant, stress-tolerant varieties to farmers. The two diploid progenitors have been sequenced and are available, along with predicted genes and descriptions. The genomes of the diploid progenitors will be used to help identify and assemble the similar chromosomes in cultivated peanut. Cultivated peanut, Arachis hypogaea, is an allotetraploid (2n=4x=40) that contains two complete genomes, labeled the A and B genomes. A. duranensis (2n=2x=20) has likely contributed the A genome, and A. ipaensis has likely contributed the B genome. It may be helpful to remember these two associations by using the mnemonic: "A" comes before "B" and "duranensis" comes before "ipaensis". Because of the difficulty of assembly a tetraploid genome, the two diploids, A. duranensis and A. ipaensis, have been sequenced and assembled first. Together these provide a good initial basis for the tetraploid genome. Additionally, the two will help guide assembly of the tetraploid genome. Sequencing work on the tetraploid genome is underway; stay tuned for updates in 2015.
Z
The tpm metabarcoding DNA sequence database for taxonomic allocations using...
data.niaid.nih.gov
Updated Oct 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
COURNOYER Benoît (2023). The tpm metabarcoding DNA sequence database for taxonomic allocations using RDP classifier implemented in DADA2. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4492210
Explore at:
Dataset updated
Oct 10, 2023
Dataset provided by
MARJOLET Laurence
COURNOYER Benoît
POZZI Adrien C.M.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The tpm metabarcoding DNA sequence database for taxonomic allocations using the Mothur and DADA2 bio-informatic tools

A.C.M. Pozzi1, R. Bouchali1, L. Marjolet1, B. Cournoyer1

1 University of Lyon, UMR Ecologie Microbienne Lyon (LEM), CNRS 5557, INRAE 1418, Université Claude Bernard Lyon 1, VetAgro Sup, Research Team “Bacterial Opportunistic Pathogens and Environment” (BPOE), 69280 Marcy L’Etoile, France.

Corresponding authors:

A.C.M. Pozzi, UMR Microbial Ecology, CNRS 5557, CNRS 1418, VetAgro Sup, Main building, aisle 3, 1st floor, 69280 Marcy-L’Etoile, France. Tel. (+33) 478 87 39 47. Fax. (+33) 472 43 12 23. Email: adrien.meynier_pozzi@vetagro-sup.fr

B. Cournoyer, UMR Microbial Ecology, CNRS 5557, CNRS 1418, VetAgro Sup, Main building, aisle 3, 1st floor, 69280 Marcy-L’Etoile, France. Tel. (+33) 478 87 56 47. Fax. (+33) 472 43 12 23. Email: and benoit.cournoyer@vetagro-sup.fr

Keywords:

BACtpm, Bacteria, tpm, thiopurine-S-methyltransferase EC:2.1.1.67, Nucleotide sequences, PCR products, Next-Generation-Sequencing, OTHU

Description:

The tpm gene codes for the thiopurine-S-methyltransferase (TPMT), an enzyme that can detoxify metalloid-containing oxyanions and xenobiotics (Cournoyer et al., 1998). Bacterial TPMTs radiated apart from human and animal TPMTs, and showed a vertical evolution in line with the 16S rRNA gene molecular phylogeny (Favre‐Bonté et al., 2005).

The tpm database, named BACtpm, was designed to apply the tpm-metabarcoding analytical scheme published in Aigle et al. (2021). It includes the full tpm identifiers, GenBank accession numbers, complete taxonomic records (domain down to strain code) of about 215 nucleotide-long tpm sequences of 840 unique taxa belonging to 139 genera.

Nucleotide sequences of tpm (range: 190-233 nucleotides) were either retrieved from public repositories (GenBank) or made available by B. Cournoyer’s research group. Colin et al. (2020) described the PCR and high throughput Illumina Miseq DNA sequencing procedures used to produce tpm sequences.

BACtpm v.2.0.1 (June 2021 release) is made available under the Creative Commons Attribution 4.0 International Licence. It can be used for the taxonomic allocations of tpm sequences down to the species and strain levels. Data is stored in the csv format enabling future user to reformat it to fit their specific needs.

Acknowledgments:

We thank the worldwide community of microbiologists who made contributions to public databases in the past decades, and made possible the elaboration of the BACtpm database. We also thank the Field Observatory in Urban Hydrology (OTHU, www.graie.org/othu/), Labex IMU (Intelligence des Mondes Urbains), the Greater Lyon Urban Community, the School of Integrated Watershed Sciences H2O'LYON, and the Lyon Urban School for their support in the development of this database. This work was funded by the French national research program for environmental and occupational health of ANSES under the terms of project “Iouqmer” EST 2016/1/120, l'Agence Nationale de la Recherche through ANR-16-CE32-0006, ANR-17-CE04-0010, ANR-17-EURE-0018 and ANR-17-CONV-0004, by the MITI CNRS project named Urbamic, and the French water agency for the Rhône, Mediterranean and Corsica areas through the Desir and DOmic projects. We thank former BPOE lab members who contributed to start and expand the BACtpm database: Céline COLINON, Romain MARTI, Emilie BOURGEOIS, Sébastien RIBUN and Yannick COLIN.

References:

Aigle, A., Colin, Y., Bouchali, R., Bourgeois, E., Marti, R., Ribun, S., Marjolet, L., Pozzi, A.C.M., Misery, B., Colinon, C., Bernardin-Souibgui, C., Wiest, L., Blaha, D., Galia, W., Cournoyer, B., 2021. Spatio-temporal variations in chemical pollutants found among urban deposits match changes in thiopurine S-methyltransferase-harboring bacteria tracked by the tpm metabarcoding approach. Sci. Total Environ. 767, 145425. https://doi.org/10.1016/j.scitotenv.2021.145425

Colin, Y., Bouchali, R., Marjolet, L., Marti, R., Vautrin, F., Voisin, J., Bourgeois, E., Rodriguez-Nava, V., Blaha, D., Winiarski, T., Mermillod-Blondin, F., Cournoyer, B., 2020. Coalescence of bacterial groups originating from urban runoffs and artificial infiltration systems among aquifer microbiomes. Hydrol. Earth Syst. Sci. 24, 4257–4273. https://doi.org/10.5194/hess-24-4257-2020

Cournoyer, B., Watanabe, S., Vivian, A., 1998. A tellurite-resistance genetic determinant from phytopathogenic pseudomonads encodes a thiopurine methyltransferase: evidence of a widely-conserved family of methyltransferases1The International Collaboration (IC) accession number of the DNA sequence is L49178.1. Biochim. Biophys. Acta BBA - Gene Struct. Expr. 1397, 161–168. https://doi.org/10.1016/S0167-4781(98)00020-7

Favre‐Bonté, S., Ranjard, L., Colinon, C., Prigent‐Combaret, C., Nazaret, S., Cournoyer, B., 2005. Freshwater selenium-methylating bacterial thiopurine methyltransferases: diversity and molecular phylogeny. Environ. Microbiol. 7, 153–164. https://doi.org/10.1111/j.1462-2920.2004.00670.x
f
Currently active biological databases aiming to archive data related to oral...
plos.figshare.com
xls
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ava K. Chow; Rachel Low; Jerald Yuan; Karen K. Yee; Jaskaranjit Kaur Dhaliwal; Shanice Govia; Nazlee Sharmin (2024). Currently active biological databases aiming to archive data related to oral biology. [Dataset]. http://doi.org/10.1371/journal.pone.0303628.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303628.t001
Dataset updated
Jun 6, 2024
Dataset provided by
PLOS ONE
Authors
Ava K. Chow; Rachel Low; Jerald Yuan; Karen K. Yee; Jaskaranjit Kaur Dhaliwal; Shanice Govia; Nazlee Sharmin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Currently active biological databases aiming to archive data related to oral biology.
d
Data from: Whole-genome sequence data and analysis of a Staphylococcus...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2024). Data from: Whole-genome sequence data and analysis of a Staphylococcus aureus strain SJTUF_J27 isolated from seaweed [Dataset]. https://catalog.data.gov/dataset/data-from-whole-genome-sequence-data-and-analysis-of-a-staphylococcus-aureus-strain-sjtuf--5d2cc
Explore at:
Dataset updated
Mar 30, 2024
Dataset provided by
Agricultural Research Service
Description
The complete genome sequence data of S. aureus SJTUF_J27 isolated from seaweed in China is reported here. The size of the genome is 2.8 Mbp with 32.9% G+C content, consisting of 2614 coding sequences and 77 RNAs. A number of virulence factors, including antimicrobial resistance genes (fluoroquinolone, beta-lactams, fosfomycin, mupirocin, trimethoprim, and aminocoumarin) and the egc enterotoxin cluster, were found in the genome. In addition, the genes encoding metal-binding proteins and associated heavy metal resistance were identified. Phylogenetic data analysis, based upon genome-wide single nucleotide polymorphisms (SNPs), and comparative genomic evaluation with BLAST Ring Image Generator (BRIG) were performed for SJTUF_J27 and four S. aureus strains isolated from food. The completed genome data was deposited in NCBI's GenBank under the accession number CP019117, https://www.ncbi.nlm.nih.gov/nuccore/CP019117. Resources in this dataset:Resource Title: NCBI GenBank Accession CP019117.1: Staphylococcus aureus strain SJTUF_J27 chromosome, complete genome. File Name: Web Page, url: https://www.ncbi.nlm.nih.gov/nuccore/CP019117 With an average of 331-fold sequencing coverage, a genome size of 2,804,759 bp constituting 32.9% of G+C content was generated. RAST annotation of the genome revealed a total of 399 subsystems, 2614 coding sequences (80 of them related to virulence, disease and defense), and 77 RNAs. PathogenFinder showed the probability of this strain being a human pathogen was 98%. Bacteria and source DNA available from Xianming Shi, 800 Dongchuan Road, Shanghai, China, 200240. Annotation was added by the NCBI Prokaryotic Genome Annotation Pipeline (released 2013).
DNA Sequencing Technologies Market Report | Global Forecast From 2025 To...
dataintelo.com
pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). DNA Sequencing Technologies Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-dna-sequencing-technologies-market
Explore at:
pdf, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
DNA Sequencing Technologies Market Outlook

The global DNA sequencing technologies market size was valued at approximately USD 8.5 billion in 2023 and is projected to grow significantly, reaching around USD 21.4 billion by 2032, reflecting a robust compound annual growth rate (CAGR) of 10.9% during the forecast period. This surge in market size can be attributed to several growth factors, including advances in technology, decreasing costs of sequencing, and the expanding application of sequencing technologies across diverse fields. As demand for personalized medicine and precision agriculture continues to grow, the market is poised for substantial expansion over the next decade.

One of the primary growth factors driving the DNA sequencing technologies market is the continuous advancement in sequencing technologies themselves. The transition from traditional Sanger sequencing to next-generation sequencing (NGS) has revolutionized the field, offering shorter run times, higher throughput, and reduced costs. The advent of third-generation sequencing technologies further amplifies these advantages by providing even longer read lengths, real-time data generation, and more detailed genome assemblies. These technological strides have opened new avenues for research and clinical applications, thereby catalyzing market growth. Additionally, the ongoing miniaturization and automation of sequencing processes have made these technologies more accessible to a wider range of users, further fueling market expansion.

Another significant growth driver is the expanding application of DNA sequencing in clinical diagnostics and personalized medicine. With the increasing prevalence of genetic disorders and cancers, there is a pressing need for precise diagnostic tools that can facilitate early detection and personalized treatment plans. DNA sequencing technologies allow for comprehensive genomic profiling, enabling healthcare providers to tailor therapies based on an individualÂ’s genetic makeup. This personalized approach not only improves treatment outcomes but also minimizes adverse drug reactions, thereby driving the adoption of sequencing technologies in clinical settings. Furthermore, governmental and private investments in genomic research and the establishment of large-scale genomic databases are bolstering the marketÂ’s growth prospects.

The role of DNA Sequencing Instruments in the market is pivotal as they serve as the backbone of sequencing technologies. These instruments have evolved significantly over the years, transitioning from bulky, complex machines to more compact and efficient devices. The advancements in DNA Sequencing Instruments have facilitated the shift towards high-throughput sequencing, enabling researchers to conduct large-scale genomic studies with greater ease and accuracy. This evolution has not only reduced the time and cost associated with sequencing but also expanded its accessibility to a broader range of laboratories and institutions. As the demand for genomic data continues to rise, the development and refinement of these instruments remain crucial to supporting the growing needs of the market.

The agricultural and animal research sectors are also significantly contributing to the growth of the DNA sequencing technologies market. The ability to sequence the genomes of various crops and livestock has profound implications for enhancing food security and sustainability. By identifying genetic markers associated with desirable traits such as drought resistance, pest resistance, and higher yield, researchers can expedite the development of improved plant and animal breeds. This application of DNA sequencing in agriculture not only supports global food supply chains but also aligns with rising environmental and sustainability concerns, thereby providing a robust impetus for market growth.

Regionally, the DNA sequencing technologies market exhibits varying dynamics. North America currently leads the market, driven by the presence of major biotechnology companies, advanced healthcare infrastructure, and substantial investment in genomic research. The regionÂ’s strong focus on personalized medicine and favorable governmental policies further support market growth. Meanwhile, the Asia Pacific region is expected to witness the highest growth rate, with a projected CAGR exceeding 12%. This rapid expansion is attributed to the increasing adoption of sequencing technologies in emerging economies, rising investments in healthcare infrastructure, and growing emphasis on agricultural biotechnology. Europe also plays
r
Data from: Non-Human Genome Segmental Duplication Database
rrid.site
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Non-Human Genome Segmental Duplication Database [Dataset]. http://identifiers.org/RRID:SCR_000470
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_000470
Dataset updated
Apr 11, 2025
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented on July 16, 2013. It contains information about segmental duplications in the genomes of chimpanzee, mouse, and rat. The criteria used to identify regions of segmental duplication are: * Sequence identity of at least 90% * Sequence length of at least 5 kb * Not be entirely composed of repetitive elements. BACKGROUND: The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (>/= 5 kb) and recent (>/= 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies. RESULTS: We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of "unmapped" chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice. CONCLUSION: Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the identified duplication content presented in this work. This data will also be relevant to the growing number of investigators who use the draft genome sequence for experimental design and analysis. The segmental duplication data and summary statistics are available for download and can also be visualized in a genome browser in the GBrowse section. Selected annotation tracks (except the segmental duplication track) have also been obtained from UCSC and loaded into the genome browser. Detailed information (e.g. overlapping genes, overlapping clones, detailed alignment) can be obtained by clicking on a duplication cluster in GBrowse. Both keyword search and BLAT search are available. Analyses based on previous genome assemblies can be found in the Previous Analyses section. Recent Developments The Non-Human Genome Segmental Duplication Database is continually updated including the archived copies of the analysis of all previous genome assemblies and will include all new species as they become available. Acknowledgments We thank The Centre for Applied Genomics at the Hospital for Sick Children (HSC) as well as collaborators worldwide. Supported by Genome Canada the Howard Hughes Medical Institute International Scholar Program (to S.W.S.) and the HSC Foundation.
Genomic Data Analysis Service Market Report | Global Forecast From 2025 To...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Genomic Data Analysis Service Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/genomic-data-analysis-service-market
Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Genomic Data Analysis Service Market Outlook

The global genomic data analysis service market size was valued at approximately $1.5 billion in 2023 and is projected to reach around $5.2 billion by 2032, growing at a CAGR of 15.2% during the forecast period. The market's robust growth is primarily driven by significant advancements in sequencing technologies, increased funding for genomics research, and the rising prevalence of genetic disorders and cancer, which necessitate precise and personalized medical interventions.

One of the primary growth factors for the genomic data analysis service market is the rapid advancement in sequencing technologies, particularly Next-Generation Sequencing (NGS). This technology has drastically reduced the cost and time required for sequencing, thereby making it more accessible for various applications such as clinical diagnostics, drug discovery, and personalized medicine. The continuous innovations in bioinformatics tools and computational biology have further enhanced the accuracy and speed of genomic data analysis, contributing to the market's expansion.

Another significant driver is the increasing prevalence of genetic disorders and personalized medicine's rising importance. With the growing understanding of the human genome, healthcare providers are increasingly adopting genomic data analysis to develop tailored treatment plans based on individual genetic profiles. This personalized approach not only improves treatment efficacy but also minimizes adverse effects, thereby boosting the demand for genomic data analysis services in clinical settings.

Government initiatives and funding in genomics research also play a crucial role in propelling the market forward. Numerous countries are investing heavily in genomics projects to better understand and combat various diseases at the genetic level. For instance, initiatives like the Precision Medicine Initiative in the United States and the 100,000 Genomes Project in the United Kingdom are fostering the adoption of genomic data analysis services. Such programs not only enhance research capabilities but also drive the market by creating a substantial demand for genomic data interpretation services.

Bioinformatics Services play a pivotal role in the genomic data analysis service market by providing essential computational tools and platforms that facilitate the interpretation of complex genomic data. As sequencing technologies advance and generate vast amounts of data, the need for sophisticated bioinformatics solutions becomes increasingly critical. These services enable researchers and healthcare providers to efficiently analyze and interpret genomic sequences, leading to more accurate diagnostics and personalized treatment plans. The integration of bioinformatics services into genomic data analysis workflows enhances the precision and speed of data interpretation, thereby driving the market's growth and expanding its applications across various sectors.

The regional outlook for the genomic data analysis service market indicates a significant growth trajectory across various parts of the world. North America holds the largest market share due to its advanced healthcare infrastructure, high funding for genomics research, and the presence of leading market players. Europe follows closely, with substantial investments in genomics projects and favorable government policies supporting genomic research. The Asia Pacific region is expected to witness the fastest growth over the forecast period, driven by increasing healthcare expenditure, rising awareness of personalized medicine, and significant investments in biotechnology sectors.

Service Type Analysis

The genomic data analysis service market can be segmented by service type into whole genome sequencing, exome sequencing, targeted sequencing, RNA sequencing, and others. Whole genome sequencing represents the comprehensive examination of an organism's entire genetic makeup, providing a complete map of all its genes. This service type is gaining traction due to its ability to offer extensive data that can be used for various applications, such as identifying genetic mutations linked to diseases, evolutionary studies, and population genetics. The decreasing costs of sequencing and the increasing speed and accuracy of sequencing technologies have further bolstered the adoption of whole genome sequencing services.

Exome sequencing, which focuses on sequenci
Z
Data from: Paired omics Data Platform projects
data.niaid.nih.gov
doi.org
+2more
Updated Jan 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Verhoeven (2023). Paired omics Data Platform projects [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3736430
Explore at:
Dataset updated
Jan 1, 2023
Dataset provided by
Marnix H. Medema
Justin J.J. van der Hooft
Stefan Verhoeven
Pieter C. Dorrestein
Michelle Schorn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Paired Omics Data Platform is a community-based initiative standardizing links between genomic and metabolomics data in a computer readable format to further the field of natural products discovery. The goals are to link molecules to their producers, find large scale genome-metabolome associations, use genomic data to assist in structural elucidation of molecules, and provide a centralized database for paired datasets. This dataset contains the projects in http://pairedomicsdata.bioinformatics.nl/.

The JSON documents adhere to the http://pairedomicsdata.bioinformatics.nl/schema.json JSON schema.
r
Magnaporthe comparative Database
rrid.site
dknet.org
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Magnaporthe comparative Database [Dataset]. http://identifiers.org/RRID:SCR_003079
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_003079 https://identifiers.org/RRID:SCR_003079/resolver?q=*&i=rrid
Dataset updated
Jan 29, 2022
Description
The Magnaporthe comparative genomics database provides accesses to multiple fungal genomes from the Magnaporthaceae family to facilitate the comparative analysis. As part of the Broad Fungal Genome Initiative, the Magnaporthe comparative project includes the finished M. oryzae (formerly M. grisea) genome, as well as the draft assemblies of Gaeumannomyces graminis var. tritici and M. poae. It provides users the tools to BLAST search, browse genome regions (to retrieve DNA, find clones, and graphically view sequence regions), and provides gene indexes and genome statistics. We were funded to attempt 7x sequence coverage comprising paired end reads from plasmids, Fosmids and BACs. Our strategy involves Whole Genome Shotgun (WGS) sequencing, in which sequence from the entire genome is generated and reassembled. Our specific aims are as follows: 1. Generate and assemble sequence reads yielding 7X coverage of the Magnaporthe oryzae genome through whole genome shotgun sequencing. 2. Generate and incorporate BAC and Fosmid end sequences into the genome assembly to provide a paired-end of average every 2 kb. 3. Integrate the genome sequence with existing physical and genetic map information. 4. Perform automated annotation of the sequence assembly. 5. Distribute the sequence assembly and results of our annotation and analysis through a freely accessible, public web server and by deposition of the sequence assembly in GenBank.
f
UniPept pept2lca analysis of MLI samples.
figshare.com
txt
Updated Jun 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthys G. Potgieter; Andrew J. M. Nel; Suereta Fortuin; Shaun Garnett; Jerome M. Wendoh; David L. Tabb; Nicola J. Mulder; Jonathan M. Blackburn (2023). UniPept pept2lca analysis of MLI samples. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011163.s005
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1011163.s005
Dataset updated
Jun 29, 2023
Dataset provided by
PLOS Computational Biology
Authors
Matthys G. Potgieter; Andrew J. M. Nel; Suereta Fortuin; Shaun Garnett; Jerome M. Wendoh; David L. Tabb; Nicola J. Mulder; Jonathan M. Blackburn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundMicrobiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines.ResultsWe compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database—but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation.ConclusionsBy estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself.

Facebook

Twitter

Click to copy link

Link copied

Cite

National Institutes of Health (NIH) (2023). GenBank [Dataset]. https://catalog.data.gov/dataset/genbank

GenBank

Explore at:

Dataset updated

Jul 26, 2023

Dataset provided by

National Institutes of Health (NIH)

Description

GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is designed to provide and encourage access within the scientific community to the most up to date and comprehensive DNA sequence information.

Clear search

Close search

Google apps

Main menu

GenBank

High Throughput Genomic Sequences Division

Whole genome sequencing of three North American large-bodied birds

ARS Microbial Genomic Sequence Database Server

Genomics England - Long Read Sequencing

Data from: BBGD454: an Online Database for Blueberry Genomic Data...

Sequencing of Idd regions in the NOD mouse genome

Sequence Set Browser

NCBI Genome Survey Sequences Database

DNA Sequencing Market Analysis North America, Europe, Asia, Rest of World...

Snapshot img

PeanutBase

The tpm metabarcoding DNA sequence database for taxonomic allocations using...

Currently active biological databases aiming to archive data related to oral...

Data from: Whole-genome sequence data and analysis of a Staphylococcus...

DNA Sequencing Technologies Market Report | Global Forecast From 2025 To...

DNA Sequencing Technologies Market Outlook

Data from: Non-Human Genome Segmental Duplication Database

Genomic Data Analysis Service Market Report | Global Forecast From 2025 To...

Genomic Data Analysis Service Market Outlook

Service Type Analysis

Data from: Paired omics Data Platform projects

Magnaporthe comparative Database

UniPept pept2lca analysis of MLI samples.

GenBankSee More Versions

GenBank