100+ datasets found
  1. f

    NCBI Genbank Data Backbone File

    • smithsonian.figshare.com
    txt
    Updated Oct 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vanessa Gonzalez (2023). NCBI Genbank Data Backbone File [Dataset]. http://doi.org/10.25573/data.24280123.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 10, 2023
    Dataset provided by
    National Museum of Natural History
    Authors
    Vanessa Gonzalez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NCBI Genbank Data Backbone File -- Smithsonian Gap Analysis Tool; Data download of the NCBI database (https://www.ncbi.nlm.nih.gov/genbank/.org) formatted for use in the Smithsonian Gap Analysis tool.

  2. all csv files used for analysis of NCBI data

    • figshare.com
    txt
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cassandre Pyne (2023). all csv files used for analysis of NCBI data [Dataset]. http://doi.org/10.6084/m9.figshare.24461239.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Cassandre Pyne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    all csv files used for analysis of NCBIall files with "WOAH" in it are the disease and disease agents from WOAH's list (see manuscript for link) all breed files (with breed names in name) are from web scrapingMASTER_DATA_coordinates_FINAL_AUG_5: cleaned mined data from NCBI

  3. d

    NCBI ASN.1 Format Summary

    • catalog.data.gov
    • healthdata.gov
    • +3more
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). NCBI ASN.1 Format Summary [Dataset]. https://catalog.data.gov/dataset/ncbi-asn-1-format-summary
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    National Library of Medicine
    Description

    An International Standards Organization (ISO) data representation format used to achieve interoperability between platforms.

  4. n

    NCBI Sequence Read Archive (SRA)

    • neuinfo.org
    • rrid.site
    Updated Oct 7, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). NCBI Sequence Read Archive (SRA) [Dataset]. http://identifiers.org/RRID:SCR_004891
    Explore at:
    Dataset updated
    Oct 7, 2019
    Description

    Repository of raw sequencing data from next generation of sequencing platforms including including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, Complete Genomics, and Pacific Biosciences SMRT. In addition to raw sequence data, SRA now stores alignment information in form of read placements on reference sequence. Data submissions are welcome. Archive of high throughput sequencing data,part of international partnership of archives (INSDC) at NCBI, European Bioinformatics Institute and DNA Database of Japan. Data submitted to any of this three organizations are shared among them.

  5. d

    Data from: Genomic structural differences between cattle and River Buffalo...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Genomic structural differences between cattle and River Buffalo identified through comparative genomic and transcriptomic analysis [Dataset]. https://catalog.data.gov/dataset/data-from-genomic-structural-differences-between-cattle-and-river-buffalo-identified-throu-10fbb
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Water buffalo (Bubalus bubalis L.) is an important livestock species worldwide. Like many other livestock species, water buffalo lacks high quality and continuous reference genome assembly, required for fine-scale comparative genomics studies. In this work, we present a dataset, which characterizes genomic differences between water buffalo genome and the extensively studied cattle (Bos taurus Taurus) reference genome. This data set is obtained after alignment of 14 river buffalo whole genome sequencing datasets to the cattle reference. This data set consisted of 13, 444 deletion CNV regions, and 11,050 merged mobile element insertion (MEI) events within the upstream regions of annotated cattle genes. Gene expression data from cattle and buffalo were also presented for genes impacted by these regions. This study sought to characterize differences in gene content, regulation and structure between taurine cattle and river buffalo (2n=50) (one extant type of water buffalo) using the extensively annotated UMD3.1 cattle reference genome as a basis for comparisons. Using 14 WGS datasets from river buffalo, we identified 13,444 deletion CNV regions (Supplemental Table 1) in river buffalo, but not identified in cattle. We also presented 11,050 merged mobile element insertion (MEI) events (Supplemental Table 2) in river buffalo, out of which, 568 of them are within the upstream regions of annotated cattle genes. Furthermore, our tissue transcriptomics analysis provided expression profiles of genes impacted by MEI (Supplemental Tables 3–6) and CNV (Supplemental Table 7) events identified in this study. This data provides the genomic coordinates of identified CNV-deletions and MEI events. Additionally, normalized read count of impacted genes, along with their adjusted p-values of statistical analysis were presented (Supplemental Tables 3–6). Genomic coordinates of identified CNV-deletion and MEI events, and Ensemble gene names of impacted genes (Supplemental Tables 1 and 2) Gene expression profiles and statistical significance (adjusted p-values) of genes impacted by MEI in liver (Supplemental Tables 3 and 4) Gene expression profiles and statistical significance (adjusted p-values) of genes impacted by MEI in muscle (Supplemental Tables 5 and 6) Gene expression profiles and statistical significance (adjusted p-values) of genes impacted by CNV deletions in river buffalo (Supplemental Table 7) Public assessment of this dataset will allow for further analyses and functional annotation of genes that are potentially associated with phenotypic difference between cattle and water buffalo. Raw read data of whole genome and transcriptome sequencing were deposited to NCBI Bioprojects. Resources in this dataset:Resource Title: Genomic structural differences between cattle and River Buffalo identified through comparative genomic and transcriptomic analysis. File Name: Web Page, url: https://www.sciencedirect.com/science/article/pii/S2352340918305183 Data in Brief presenting a dataset which characterizes genomic differences between water buffalo genome and the extensively studied cattle (Bos taurus Taurus) reference genome. This data set is obtained after alignment of 14 river buffalo whole genome sequencing datasets to the cattle reference. This data set consisted of 13, 444 deletion CNV regions, and 11,050 merged mobile element insertion (MEI) events within the upstream regions of annotated cattle genes. Gene expression data from cattle and buffalo were also presented for genes impacted by these regions. Tables are with this article. Raw read data of whole genome and transcriptome sequencing were deposited to NCBI Bioprojects as the following: PRJNA350833 (https://www.ncbi.nlm.nih.gov/bioproject/?term=350833) PRJNA277147 (https://www.ncbi.nlm.nih.gov/bioproject/?term=277147) PRJEB4351 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB4351)

  6. d

    NCBI Virus

    • catalog.data.gov
    • data.virginia.gov
    • +3more
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). NCBI Virus [Dataset]. https://catalog.data.gov/dataset/ncbi-virus
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    National Library of Medicine
    Description

    NCBI Virus is an integrative, value-added resource designed to support retrieval, display and analysis of a curated collection of virus sequences and large sequence datasets. Its goal is to increase the usability of viral sequence data archived in GenBank and other NCBI repositories. This resource includes resources previously included in HIV-1, Human Protein Interaction Database, Influenza Virus Resource, and Virus Variation.

  7. Summary table of accuracy, precision, recall, f1-score, and AUC, and ROC...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David C. Molik; DeAndre Tomlinson; Shane Davitt; Eric L. Morgan; Matthew Sisk; Benjamin Roche; Natalie Meyers; Michael E. Pfrender (2023). Summary table of accuracy, precision, recall, f1-score, and AUC, and ROC Score from the random forest classifier. [Dataset]. http://doi.org/10.1371/journal.pntd.0008755.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    David C. Molik; DeAndre Tomlinson; Shane Davitt; Eric L. Morgan; Matthew Sisk; Benjamin Roche; Natalie Meyers; Michael E. Pfrender
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary table of accuracy, precision, recall, f1-score, and AUC, and ROC Score from the random forest classifier.

  8. d

    Data from: Transcriptomic and bioinformatics analysis of the early...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Transcriptomic and bioinformatics analysis of the early time-course of the response to prostaglandin F2 alpha in the bovine corpus luteum [Dataset]. https://catalog.data.gov/dataset/data-from-transcriptomic-and-bioinformatics-analysis-of-the-early-time-course-of-the-respo-cd938
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    RNA expression analysis was performed on the corpus luteum tissue at five time points after prostaglandin F2 alpha treatment of midcycle cows using an Affymetrix Bovine Gene v1 Array. The normalized linear microarray data was uploaded to the NCBI GEO repository (GSE94069). Subsequent statistical analysis determined differentially expressed transcripts ± 1.5-fold change from saline control with P ≤ 0.05. Gene ontology of differentially expressed transcripts was annotated by DAVID and Panther. Physiological characteristics of the study animals are presented in a figure. Bioinformatic analysis by Ingenuity Pathway Analysis was curated, compiled, and presented in tables. A dataset comparison with similar microarray analyses was performed and bioinformatics analysis by Ingenuity Pathway Analysis, DAVID, Panther, and String of differentially expressed genes from each dataset as well as the differentially expressed genes common to all three datasets were curated, compiled, and presented in tables. Finally, a table comparing four bioinformatics tools' predictions of functions associated with genes common to all three datasets is presented. These data have been further analyzed and interpreted in the companion article "Early transcriptome responses of the bovine mid-cycle corpus luteum to prostaglandin F2 alpha includes cytokine signaling". Resources in this dataset:Resource Title: Supporting information as Excel spreadsheets and tables. File Name: Web Page, url: http://www.sciencedirect.com/science/article/pii/S2352340917304031?via=ihub#s0070

  9. Covid-19 Research Articles (NCBI)

    • kaggle.com
    zip
    Updated Jan 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abrar (2021). Covid-19 Research Articles (NCBI) [Dataset]. https://www.kaggle.com/abrarmisk/covid19-research-articles-ncbi
    Explore at:
    zip(709063 bytes)Available download formats
    Dataset updated
    Jan 31, 2021
    Authors
    Abrar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    I collected about 1200 Covid-19 research articles from the NCBI.NLM.NIH website to be utilized in ML algorithms/ Data Analysis such as Sentiment Analysis, Time Series, Recommender System and/or Classification.

    Content

    link: URL to the research article title: research article keywords: words under which the research article is categorized dates: publication date online abstract: a brief summary of the article (methods & hypothesis included) conclusion: findings of the research

    **For the sake of time, I left some columns with 'null' String values. It's your choice to filter the values, and use what is more appropriate for your ML model.

    **I didn't include authors/contributors as it won't serve a purpose in this datasets

    Inspiration

    I am interested in knowing the focus of those studies (by analyzing word frequencies) as well as analyzing the volume of publications over time.

  10. 1000 Cannabis Genomes Project

    • kaggle.com
    zip
    Updated Feb 26, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). 1000 Cannabis Genomes Project [Dataset]. https://www.kaggle.com/bigquery/genomics-cannabis
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 26, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Cannabis is a genus of flowering plants in the family Cannabaceae.

    Source: https://en.wikipedia.org/wiki/Cannabis

    Content

    In October 2016, Phylos Bioscience released a genomic open dataset of approximately 850 strains of Cannabis via the Open Cannabis Project. In combination with other genomics datasets made available by Courtagen Life Sciences, Michigan State University, NCBI, Sunrise Medicinal, University of Calgary, University of Toronto, and Yunnan Academy of Agricultural Sciences, the total amount of publicly available data exceeds 1,000 samples taken from nearly as many unique strains.

    https://medium.com/google-cloud/dna-sequencing-of-1000-cannabis-strains-publicly-available-in-google-bigquery-a33430d63998

    These data were retrieved from the National Center for Biotechnology Information’s Sequence Read Archive (NCBI SRA), processed using the BWA aligner and FreeBayes variant caller, indexed with the Google Genomics API, and exported to BigQuery for analysis. Data are available directly from Google Cloud Storage at gs://gcs-public-data--genomics/cannabis, as well as via the Google Genomics API as dataset ID 918853309083001239, and an additional duplicated subset of only transcriptome data as dataset ID 94241232795910911, as well as in the BigQuery dataset bigquery-public-data:genomics_cannabis.

    All tables in the Cannabis Genomes Project dataset have a suffix like _201703. The suffix is referred to as [BUILD_DATE] in the descriptions below. The dataset is updated frequently as new releases become available.

    The following tables are included in the Cannabis Genomes Project dataset:

    Sample_info contains fields extracted for each SRA sample, including the SRA sample ID and other data that give indications about the type of sample. Sample types include: strain, library prep methods, and sequencing technology. See SRP008673 for an example of upstream sample data. SRP008673 is the University of Toronto sequencing of Cannabis Sativa subspecies Purple Kush.

    MNPR01_reference_[BUILD_DATE] contains reference sequence names and lengths for the draft assembly of Cannabis Sativa subspecies Cannatonic produced by Phylos Bioscience. This table contains contig identifiers and their lengths.

    MNPR01_[BUILD_DATE] contains variant calls for all included samples and types (genomic, transcriptomic) aligned to the MNPR01_reference_[BUILD_DATE] table. Samples can be found in the sample_info table. The MNPR01_[BUILD_DATE] table is exported using the Google Genomics BigQuery variants schema. This table is useful for general analysis of the Cannabis genome.

    MNPR01_transcriptome_[BUILD_DATE] is similar to the MNPR01_[BUILD_DATE] table, but it includes only the subset transcriptomic samples. This table is useful for transcribed gene-level analysis of the Cannabis genome.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    Dataset Source: http://opencannabisproject.org/ Category: Genomics Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://www.ncbi.nlm.nih.gov/home/about/policies.shtml - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. Update frequency: As additional data are released to GenBank View in BigQuery: https://bigquery.cloud.google.com/dataset/bigquery-public-data:genomics_cannabis View in Google Cloud Storage: gs://gcs-public-data--genomics/cannabis

    Banner Photo by Rick Proctor from Unplash.

    Inspiration

    Which Cannabis samples are included in the variants table?

    Which contigs in the MNPR01_reference_[BUILD_DATE] table have the highest density of variants?

    How many variants does each sample have at the THC Synthase gene (THCA1) locus?

  11. d

    Extended data tables to Haering and Habermann, F1000Res, RNfuzzyApp: an R...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jul 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bianca Habermann; Margaux Haering (2021). Extended data tables to Haering and Habermann, F1000Res, RNfuzzyApp: an R shiny RNA-seq data analysis app for visualisation, differential expression analysis, time-series clustering and enrichment analysis [Dataset]. http://doi.org/10.5061/dryad.8pk0p2nnd
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 8, 2021
    Dataset provided by
    Dryad
    Authors
    Bianca Habermann; Margaux Haering
    Time period covered
    Jul 6, 2021
    Description

    Details on data processing and analysis can be found in the associated article.

  12. b

    NCBI Gene

    • bioregistry.io
    • integbio.jp
    Updated Nov 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). NCBI Gene [Dataset]. http://identifiers.org/wikidata:P351
    Explore at:
    Dataset updated
    Nov 9, 2021
    Description

    Entrez Gene is the NCBI's database for gene-specific information, focusing on completely sequenced genomes, those with an active research community to contribute gene-specific information, or those that are scheduled for intense sequence analysis.

  13. Dengue Virus - 4 Complete Genomes

    • kaggle.com
    zip
    Updated Jun 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jubayer Hossain (2020). Dengue Virus - 4 Complete Genomes [Dataset]. https://www.kaggle.com/jhossain/dengue-virus-4-complete-genomes
    Explore at:
    zip(14886 bytes)Available download formats
    Dataset updated
    Jun 29, 2020
    Authors
    Jubayer Hossain
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    According to WHO Dengue is a fast emerging pandemic-prone viral disease in many parts of the world. Dengue flourishes in urban poor areas, suburbs, and the countryside but also affects more affluent neighborhoods in tropical and subtropical countries.

    Dengue Virus complete genome datasets are publicly available and accessible on NCBI, NCBI makes these data public, enabling a transparent look into this genome information.

    Content

    The dengue virus (DEN) comprises four distinct serotypes (DEN-1, DEN-2, DEN-3 and D and DEN-4) which belong to the genus Flavivirus, family Flaviviridae.

    NCBI Reference

    • DEN-1:NC_001477
    • DEN-2:NC_001474
    • DEN-3:NC_001475
    • DEN-4:NC_002640

    Acknowledgements

    Image by Gerd Altmann from Pixabay

    Inspiration

    Some insights could be 1. Similarity and dissimilarity of DEN-1, DEN2, DEN-3, and DEN-4 2. Phylogenetic Analysis

  14. d

    Data from: DNA sequences used to analyze evolutionary rates of transient...

    • catalog.data.gov
    Updated Sep 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). DNA sequences used to analyze evolutionary rates of transient receptor potential (Trp) genes in tetrapods [Dataset]. https://catalog.data.gov/dataset/dna-sequences-used-to-analyze-evolutionary-rates-of-transient-receptor-potential-trp-genes
    Explore at:
    Dataset updated
    Sep 17, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    The dataset consists of two file types. The first type consists of sets of orthologous gene sequences of the transient receptor potential (Trp) superfamily of genes in FASTA format. The sequence data were obtained from the Gene database of the National Center for Biotechnology Information (NCBI, www.ncbi.nlm.nih.gov/gene), aligned at the codon level, and trimmed to a defined region present in each gene family member, termed the Trp box and C-terminal region. The sequences are grouped into files by gene and by taxonomic group. The second file type consists of phylogenetic tree files in Newick format that correspond with the taxa in the paired sequence file. Phylogenetic trees files in this format are often required for estimating the rate of evolution of a gene over an evolutionary period. The phylogenetic trees do not include or require branch lengths and were not generated from the Trp gene sequences themselves but from mitochondrial genome sequences or the literature as described herein. While the choice of genes, taxa, and tree labels were dictated by specific hypotheses for interpretive analysis, these files can be used for other purposes and modified accordingly. In addition to these two data components, there is also a descriptive file (list.of.accessions.txt) that links the taxon names in each sequence file to a permanent accession in the NCBI database from which the analyzed sequence was derived. The original accession identifiers were not used in the data files for readability and to maximize compatibility with different software packages that may not interpret special characters equivalently. Note the particular species for which the sequences were obtained are not germane to the analysis objectives, as long as they are representative of their taxonomic group and the sequences have low levels of missing data.

  15. f

    The organisms used to validate the tool available for download in the NCBI...

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bruno Merlin; Jorianne Thyeska Castro Alves; Pablo Henrique Caracciolo Gomes de Sá; Mônica Silva de Oliveira; Larissa Maranhão Dias; Gislenne da Silva Moia; Victória Cardoso dos Santos; Adonney Allan de Oliveira Veras (2023). The organisms used to validate the tool available for download in the NCBI database. [Dataset]. http://doi.org/10.1371/journal.pcbi.1008797.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Bruno Merlin; Jorianne Thyeska Castro Alves; Pablo Henrique Caracciolo Gomes de Sá; Mônica Silva de Oliveira; Larissa Maranhão Dias; Gislenne da Silva Moia; Victória Cardoso dos Santos; Adonney Allan de Oliveira Veras
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The organisms used to validate the tool available for download in the NCBI database.

  16. Data from: Meta-Analysis of Public RNA Sequencing Data of Abscisic...

    • figshare.com
    bin
    Updated Feb 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mitsuo Shintani (2024). Meta-Analysis of Public RNA Sequencing Data of Abscisic Acid-Related Abiotic Stresses in Arabidopsis thaliana [Dataset]. http://doi.org/10.6084/m9.figshare.22566583.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 17, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mitsuo Shintani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    File 1 - Metadata for Curated DatasetsThis file contains the metadata for the curated datasets used in the meta-analysis, including Sequence Read Archive (SRA) study ID, run ID, sample tissue, treatment type, treatment time, and sequence library type.File 2 - TPM Data for Gene Expression under Stress ConditionsThis file contains the transcripts per million (TPM) data, five different treatment types (ABA, Salt, Dehydration, Mannitol, and Cold).File 3 - TN-Ratio Data for Gene Expression under Stress ConditionsThis file contains the TN-ratio data, which represents the ratio of gene expression between stress-treated (T) and non-treated (N) samples.File 4 - TN-Score Data for Gene Expression under Stress ConditionsThis file contains the TN-score data, calculated by subtracting the number of downregulated experiments from the number of upregulated experiments. The TN-score was used to assess changes in gene expression under stress conditions across experiments.File 5a - Lists of Upregulated Genes for Each of the Five Stress Treatment TypesThis file contains the lists of upregulated genes identified in the Meta-analysis for each of the five stress treatment types.File 5b - Lists of Downregulated Genes for Each of the Five Stress Treatment TypesThis file contains the lists of downregulated genes identified in the Meta-analysis for each of the five stress treatment types.File 6 - Enrichment Analysis of Differentially Expressed Genes for Five Stress Treatment TypesGene set enrichment analysis of the genes regulated under the five treatments is shown in A–J, indicating upregulated and downregulated genes in the ABA (A, B), salt (C, D), dehydration (E, F), mannitol (G, H), and cold (I, J) treatments, respectively. File 7a - Overlap of Commonly Regulated Genes across ABA, Salt, and Dehydration TreatmentsThis file contains the lists of commonly regulated genes across three stress treatments: ABA, Salt, and Dehydration.File 7b - The Results of Enrichment Analysis for Commonly Regulated Genes across ABA, Salt, and Dehydration TreatmentsThis file contains the results of the enrichment analysis focusing on 166 upregulated and 66 downregulated genes that are commonly regulated across three different stress treatments: ABA, Salt, and Dehydration.File 8a - Overlap of Commonly Upregulated Genes across ABA, Salt, Dehydration, Mannitol, and Cold TreatmentsThis file contains the lists of commonly upregulated genes across five stress treatments: ABA, Salt, Dehydration, Mannitol, and Cold.File 8b - Overlap of Commonly Downregulated Genes across ABA, Salt, Dehydration, Mannitol, and Cold TreatmentsThis file contains the lists of commonly downregulated genes across five stress treatments: ABA, Salt, Dehydration, Mannitol, and Cold.File 9a - Overlap of Commonly Upregulated Genes across ABA, Salt, Dehydration, Mannitol, Cold, and Hypoxia TreatmentsThis file contains the lists of commonly upregulated genes across six stress treatments: ABA, Salt, Dehydration, Mannitol, Cold, and Hypoxia.File 9b - Overlap of Commonly Downregulated Genes across ABA, Salt, Dehydration, Mannitol, Cold, and Hypoxia TreatmentsThis file contains the lists of commonly downregulated genes across six stress treatments: ABA, Salt, Dehydration, Mannitol, Cold, and Hypoxia.

  17. u

    Data from: Whole-genome sequence data and analysis of a Staphylococcus...

    • agdatacommons.nal.usda.gov
    • catalog.data.gov
    bin
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yanping Xie; Yiping He; Sandeep Ghatak; Peter Irwin; Xianghe Yan; Terence Strobaugh; Andrew Gehring (2025). Data from: Whole-genome sequence data and analysis of a Staphylococcus aureus strain SJTUF_J27 isolated from seaweed [Dataset]. http://doi.org/10.1016/j.dib.2018.08.084
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Data In Brief
    Authors
    Yanping Xie; Yiping He; Sandeep Ghatak; Peter Irwin; Xianghe Yan; Terence Strobaugh; Andrew Gehring
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete genome sequence data of S. aureus SJTUF_J27 isolated from seaweed in China is reported here. The size of the genome is 2.8 Mbp with 32.9% G+C content, consisting of 2614 coding sequences and 77 RNAs. A number of virulence factors, including antimicrobial resistance genes (fluoroquinolone, beta-lactams, fosfomycin, mupirocin, trimethoprim, and aminocoumarin) and the egc enterotoxin cluster, were found in the genome. In addition, the genes encoding metal-binding proteins and associated heavy metal resistance were identified. Phylogenetic data analysis, based upon genome-wide single nucleotide polymorphisms (SNPs), and comparative genomic evaluation with BLAST Ring Image Generator (BRIG) were performed for SJTUF_J27 and four S. aureus strains isolated from food. The completed genome data was deposited in NCBI's GenBank under the accession number CP019117, https://www.ncbi.nlm.nih.gov/nuccore/CP019117. Resources in this dataset:Resource Title: NCBI GenBank Accession CP019117.1: Staphylococcus aureus strain SJTUF_J27 chromosome, complete genome. File Name: Web Page, url: https://www.ncbi.nlm.nih.gov/nuccore/CP019117 With an average of 331-fold sequencing coverage, a genome size of 2,804,759 bp constituting 32.9% of G+C content was generated. RAST annotation of the genome revealed a total of 399 subsystems, 2614 coding sequences (80 of them related to virulence, disease and defense), and 77 RNAs. PathogenFinder showed the probability of this strain being a human pathogen was 98%. Bacteria and source DNA available from Xianming Shi, 800 Dongchuan Road, Shanghai, China, 200240. Annotation was added by the NCBI Prokaryotic Genome Annotation Pipeline (released 2013).

  18. Classic confusion matrix to visually analyze the classification performance...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David C. Molik; DeAndre Tomlinson; Shane Davitt; Eric L. Morgan; Matthew Sisk; Benjamin Roche; Natalie Meyers; Michael E. Pfrender (2023). Classic confusion matrix to visually analyze the classification performance of an algorithm. [Dataset]. http://doi.org/10.1371/journal.pntd.0008755.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    David C. Molik; DeAndre Tomlinson; Shane Davitt; Eric L. Morgan; Matthew Sisk; Benjamin Roche; Natalie Meyers; Michael E. Pfrender
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classic confusion matrix to visually analyze the classification performance of an algorithm.

  19. n

    NCBI database of Genotypes and Phenotypes (dbGap)

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). NCBI database of Genotypes and Phenotypes (dbGap) [Dataset]. http://identifiers.org/RRID:SCR_002709
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database developed to archive and distribute clinical data and results from studies that have investigated interaction of genotype and phenotype in humans. Database to archive and distribute results of studies including genome-wide association studies, medical sequencing, molecular diagnostic assays, and association between genotype and non-clinical traits.

  20. u

    Data from: Metagenomic and near full-length 16S rRNA sequence data in...

    • agdatacommons.nal.usda.gov
    • datasets.ai
    • +1more
    bin
    Updated Feb 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phillip R. Myer; MinSeok Kim; Harvey C. Freetly; Timothy P.L. Smith (2024). Data from: Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Data_from_Metagenomic_and_near_full-length_16S_rRNA_sequence_data_in_support_of_the_phylogenetic_analysis_of_the_rumen_bacterial_community_in_steers/24852534
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    Data in Brief
    Authors
    Phillip R. Myer; MinSeok Kim; Harvey C. Freetly; Timothy P.L. Smith
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Amplicon sequencing utilizing next-generation platforms has significantly transformed how research is conducted, specifically microbial ecology. However, primer and sequencing platform biases can confound or change the way scientists interpret these data. The Pacific Biosciences RSII instrument may also preferentially load smaller fragments, which may also be a function of PCR product exhaustion during sequencing. To further examine theses biases, data is provided from 16S rRNA rumen community analyses. Specifically, data from the relative phylum-level abundances for the ruminal bacterial community are provided to determine between-sample variability. Direct sequencing of metagenomic DNA was conducted to circumvent primer-associated biases in 16S rRNA reads and rarefaction curves were generated to demonstrate adequate coverage of each amplicon. PCR products were also subjected to reduced amplification and pooling to reduce the likelihood of PCR product exhaustion during sequencing on the Pacific Biosciences platform. The taxonomic profiles for the relative phylum-level and genus-level abundance of rumen microbiota as a function of PCR pooling for sequencing on the Pacific Biosciences RSII platform were provided. Data is within this article and raw ruminal MiSeq sequence data is available from the NCBI Sequence Read Archive (SRA Accession SRP047292). Additional descriptive information is associated with NCBI BioProject PRJNA261425. http://www.ncbi.nlm.nih.gov/bioproject/PRJNA261425/ Resources in this dataset:Resource Title: NCBI Sequence Read Archive (SRA Accession SRP047292). File Name: Web Page, url: https://www.ncbi.nlm.nih.gov/sra/SRX704260 1 ILLUMINA (Illumina MiSeq) run: 978,195 spots, 532.9M bases, 311.6Mb downloads.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Vanessa Gonzalez (2023). NCBI Genbank Data Backbone File [Dataset]. http://doi.org/10.25573/data.24280123.v1

NCBI Genbank Data Backbone File

Explore at:
txtAvailable download formats
Dataset updated
Oct 10, 2023
Dataset provided by
National Museum of Natural History
Authors
Vanessa Gonzalez
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

NCBI Genbank Data Backbone File -- Smithsonian Gap Analysis Tool; Data download of the NCBI database (https://www.ncbi.nlm.nih.gov/genbank/.org) formatted for use in the Smithsonian Gap Analysis tool.

Search
Clear search
Close search
Google apps
Main menu