100+ datasets found
  1. c

    Metatranscriptomic and transcriptomic databases (DB4S) of Spodoptera...

    • ri.conicet.gov.ar
    • datosdeinvestigacion.conicet.gov.ar
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rozadilla, Gastón; Mccarthy, Cristina Beryl (2024). Metatranscriptomic and transcriptomic databases (DB4S) of Spodoptera frugiperda larvae guts from Northern Argentina (Tucumán province) [Dataset]. https://ri.conicet.gov.ar/handle/11336/234791
    Explore at:
    Dataset updated
    May 7, 2024
    Authors
    Rozadilla, Gastón; Mccarthy, Cristina Beryl
    Area covered
    Tucumán Province
    Description

    Spodoptera frugiperda is a noctuid moth that devastates various crops including corn, rice and cotton, and is found in most of the American continent. The purpose of this study was to integrate gene expression data from S. frugiperda guts and their associated metatranscriptomes, under natural and controlled conditions. For this, four S. frugiperda samples from the province of Tucumán (Argentina; subtropical region) were analysed. Specimens were obtained from different environments, altitudes and food sources, namely: 1) a transgenic maize (Zea mays) field at 495 m.a.s.l. where insecticides and fertilisers were applied (named MM; 26o49’50”S; 65o16’59.4”W); 2) Sorghum halepense at 495 m.a.s.l. (MS; 26o49’50”S; 65o16’59.4”W); 3) a maize field at 2283 m.a.s.l. where no insecticides or fertilisers were used (TV; 26o55’40.75”S; 65o45’19.90”W) ; and 4) a colony established from larvae originally collected from the same transgenic maize field as Sf_MM, reared for 9 generations under controlled conditions on an artificial diet adapted from [8], without the addition of antibiotics (BT). For all samples, total RNA extracted from fifth instar larvae guts (two digestive tracts per sample), was submitted to a modified one-step reverse transcription and polymerase chain reaction sequence-independent amplification procedure, as described previously. High-throughput pyrosequencing of the samples was performed using a Roche GS FLX (Macrogen Inc., Korea), yielding ~1Gb of metatranscriptomic reads with lengths of 50 to 1600 bases (nt) (652 nt average). Raw sequence reads were trimmed to remove nucleotides derived from the amplification primers using a custom application. Below follows an outline of the main steps we followed to create the uploaded databases: I.Sequences were compared locally to a combined nucleotide database (nt16SLep = “Non-redundant” nucleotide sequence (nt) database + 16S rRNA gene (16S) database + Lepidopteran whole genome shotgun (Lep) projects completed at the time of the analysis) using BLASTN (Altschul et al., 1990) with a 1e-50 cutoff E-value, and to the protein database (nr = non-redundant protein sequence) using Diamond (Buchfink et al., 2014) with a 1e-17 cutoff E-value. II.The homology search results were then processed as follows: Step A: The output files from both homology searches were processed with MEGAN, a software which performs taxonomic binning and assigns sequences to taxa using the Lowest Common Ancestor (LCA)-assignment algorithm (Huson et al., 2007). Taxonomic and functional assignments performed by MEGAN for each sequence were then exported using a MEGAN functionality. Note: MEGAN computes a “species profile” by finding the lowest node in the NCBI taxonomy that encompasses the set of hit taxa and assigns the sequence to the taxon represented by that lowest node. With this approach, every sequence is assigned to some taxon; if the sequence aligns very specifically only to a single taxon, then it is assigned to that taxon; the less specifically a sequence hits taxa, the higher up in the taxonomy it is placed. Step B: The output files from both homology searches were also processed with a custom bash script. This script parses the homology search output files and generates two files (one for each homology search) containing the name of each sequence, its best hit (or no hit) and the corresponding E-value. III. Create local database (Step C): All this information (from the exported MEGAN files and from the bash script output files) was then used to create a local SQLite database which included all the available information for each sequence (from both homology searches).

  2. Transcriptomic databases.

    • plos.figshare.com
    xls
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohan Lowe; Neil Shirley; Mark Bleackley; Stephen Dolan; Thomas Shafee (2023). Transcriptomic databases. [Dataset]. http://doi.org/10.1371/journal.pcbi.1005457.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 19, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Rohan Lowe; Neil Shirley; Mark Bleackley; Stephen Dolan; Thomas Shafee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Transcriptomic databases.

  3. Transcriptomic databases and qRT-PCR dataset

    • figshare.com
    xlsx
    Updated Apr 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Dannhauser; Sabrina Napoletano (2024). Transcriptomic databases and qRT-PCR dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25592961.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 12, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    David Dannhauser; Sabrina Napoletano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset for the calculation of a universal and minimized miRNA signature.

  4. f

    Statistics on the reference transcriptomic database.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 12, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coustau, Christine; Duval, David; Reichhart, Jean-Marc; Wajnberg, Eric; Dubreuil, Géraldine; Gourbal, Benjamin; Deleury, Emeline; Elangovan, Namasivayam; Gouzy, Jérôme; Baron, Olga Lucia (2012). Statistics on the reference transcriptomic database. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001127306
    Explore at:
    Dataset updated
    Mar 12, 2012
    Authors
    Coustau, Christine; Duval, David; Reichhart, Jean-Marc; Wajnberg, Eric; Dubreuil, Géraldine; Gourbal, Benjamin; Deleury, Emeline; Elangovan, Namasivayam; Gouzy, Jérôme; Baron, Olga Lucia
    Description

    Statistics on the reference transcriptomic database.

  5. n

    Human Transcriptome Database for Alternative Splicing

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Human Transcriptome Database for Alternative Splicing [Dataset]. http://identifiers.org/RRID:SCR_013305
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A specialized database for human alternative splicing (AS) based on H-Invitational full-length cDNAs. H-DBAS offers unique data and viewer for human Alternative Splicing (AS) analysis. It contains: * Genome-wide representative alternative splicing variants (RASVs) identified from following datasets * H-Inv full-length cDNAs (resource summary): H-Invitational cDNA dataset * H-Inv all transcripts (resource summary): Published human mRNA dataset * Mouse full-length cDNAs (resource summary): Mouse cDNA dataset * RASVs affecting protein functions such as protein motif, GO, subcellular localization signal and transmembrane domain * Conserved RASVs compared with mouse genome and the full-length cDNAs (H-Inv full-length cDNAs only)

  6. The thirty-seven differentially expressed genes identified from the three...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin-jie Tian; Yan Long; Jiao Wang; Jing-wen Zhang; Yan-yan Wang; Wei-min Li; Yu-fa Peng; Qian-hua Yuan; Xin-wu Pei (2023). The thirty-seven differentially expressed genes identified from the three libraries. [Dataset]. http://doi.org/10.1371/journal.pone.0131455.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xin-jie Tian; Yan Long; Jiao Wang; Jing-wen Zhang; Yan-yan Wang; Wei-min Li; Yu-fa Peng; Qian-hua Yuan; Xin-wu Pei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    —, no hits in the specific databaseThe thirty-seven differentially expressed genes identified from the three libraries.

  7. f

    Table1_Preclinical species gene expression database: Development and...

    • datasetcatalog.nlm.nih.gov
    Updated Jan 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vo, Andy; Krause, Caitlin; Liguori, Michael J.; Kowalkowski, Kenneth; Van Vleet, Terry R.; Suwada, Kinga; Mittelstadt, Scott; Rendino, Lauren; Mahalingaiah, Prathap Kumar; Peterson, Richard; Blomme, Eric A. G. (2023). Table1_Preclinical species gene expression database: Development and meta-analysis.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001025224
    Explore at:
    Dataset updated
    Jan 17, 2023
    Authors
    Vo, Andy; Krause, Caitlin; Liguori, Michael J.; Kowalkowski, Kenneth; Van Vleet, Terry R.; Suwada, Kinga; Mittelstadt, Scott; Rendino, Lauren; Mahalingaiah, Prathap Kumar; Peterson, Richard; Blomme, Eric A. G.
    Description

    The evaluation of toxicity in preclinical species is important for identifying potential safety liabilities of experimental medicines. Toxicology studies provide translational insight into potential adverse clinical findings, but data interpretation may be limited due to our understanding of cross-species biological differences. With the recent technological advances in sequencing and analyzing omics data, gene expression data can be used to predict cross species biological differences and improve experimental design and toxicology data interpretation. However, interpreting the translational significance of toxicogenomics analyses can pose a challenge due to the lack of comprehensive preclinical gene expression datasets. In this work, we performed RNA-sequencing across four preclinical species/strains widely used for safety assessment (CD1 mouse, Sprague Dawley rat, Beagle dog, and Cynomolgus monkey) in ∼50 relevant tissues/organs to establish a comprehensive preclinical gene expression body atlas for both males and females. In addition, we performed a meta-analysis across the large dataset to highlight species and tissue differences that may be relevant for drug safety analyses. Further, we made these databases available to the scientific community. This multi-species, tissue-, and sex-specific transcriptomic database should serve as a valuable resource to enable informed safety decision-making not only during drug development, but also in a variety of disciplines that use these preclinical species.

  8. f

    Characteristics and details of four primary transcriptome databases.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 7, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tan, Xue-Mei; Tao, Xiang; Wang, Haiyan; Lai, Xian-Jun; Zhang, Yi-Zheng; Gu, Ying-Hong; Yan, Lang (2014). Characteristics and details of four primary transcriptome databases. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001191775
    Explore at:
    Dataset updated
    Mar 7, 2014
    Authors
    Tan, Xue-Mei; Tao, Xiang; Wang, Haiyan; Lai, Xian-Jun; Zhang, Yi-Zheng; Gu, Ying-Hong; Yan, Lang
    Description

    XS18-v: transcriptome from a mixed sample of roots, stems and leaves in cultivar Xushu 18. XS18-f: transcriptome from flowers in cultivar Xushu 18. GS87-r: transcriptome from roots in cultivar Guangshu 87. JS6-r: transcriptome from roots in cultivar Jjingshu 6.

  9. f

    Integration of Proteomics and Transcriptomics Data Sets for the Analysis of...

    • acs.figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paula Díez; Conrad Droste; Rosa M. Dégano; María González-Muñoz; Nieves Ibarrola; Martín Pérez-Andrés; Alba Garin-Muga; Víctor Segura; Gyorgy Marko-Varga; Joshua LaBaer; Alberto Orfao; Fernando J. Corrales; Javier De Las Rivas; Manuel Fuentes (2023). Integration of Proteomics and Transcriptomics Data Sets for the Analysis of a Lymphoma B‑Cell Line in the Context of the Chromosome-Centric Human Proteome Project [Dataset]. http://doi.org/10.1021/acs.jproteome.5b00474.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Paula Díez; Conrad Droste; Rosa M. Dégano; María González-Muñoz; Nieves Ibarrola; Martín Pérez-Andrés; Alba Garin-Muga; Víctor Segura; Gyorgy Marko-Varga; Joshua LaBaer; Alberto Orfao; Fernando J. Corrales; Javier De Las Rivas; Manuel Fuentes
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A comprehensive study of the molecular active landscape of human cells can be undertaken to integrate two different but complementary perspectives: transcriptomics, and proteomics. After the genome era, proteomics has emerged as a powerful tool to simultaneously identify and characterize the compendium of thousands of different proteins active in a cell. Thus, the Chromosome-centric Human Proteome Project (C-HPP) is promoting a full characterization of the human proteome combining high-throughput proteomics with the data derived from genome-wide expression profiling of protein-coding genes. Here we present a full proteomic profiling of a human lymphoma B-cell line (Ramos) performed using a nanoUPLC-LTQ-Orbitrap Velos proteomic platform, combined to an in-depth transcriptomic profiling of the same cell type. Data are available via ProteomeXchange with identifier PXD001933. Integration of the proteomic and transcriptomic data sets revealed a 94% overlap in the proteins identified by both -omics approaches. Moreover, functional enrichment analysis of the proteomic profiles showed an enrichment of several functions directly related to the biological and morphological characteristics of B-cells. In turn, about 30% of all protein-coding genes present in the whole human genome were identified as being expressed by the Ramos cells (stable average of 30% genes along all the chromosomes), revealing the size of the protein expression-set present in one specific human cell type. Additionally, the identification of missing proteins in our data sets has been reported, highlighting the power of the approach. Also, a comparison between neXtProt and UniProt database searches has been performed. In summary, our transcriptomic and proteomic experimental profiling provided a high coverage report of the expressed proteome from a human lymphoma B-cell type with a clear insight into the biological processes that characterized these cells. In this way, we demonstrated the usefulness of combining -omics for a comprehensive characterization of specific biological systems.

  10. f

    Transcriptome data in seven databases in the annotate success rate...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Apr 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meng, Xianhong; Kong, Jie; Luan, Sheng; Li, Xupeng; Shi, Xiaoli; Chen, Baolong; Dong, Lijun; Sui, Juan; Cao, Baoxiang; Luo, Kun; Cao, Jiawang (2019). Transcriptome data in seven databases in the annotate success rate statistics. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000089760
    Explore at:
    Dataset updated
    Apr 8, 2019
    Authors
    Meng, Xianhong; Kong, Jie; Luan, Sheng; Li, Xupeng; Shi, Xiaoli; Chen, Baolong; Dong, Lijun; Sui, Juan; Cao, Baoxiang; Luo, Kun; Cao, Jiawang
    Description

    Transcriptome data in seven databases in the annotate success rate statistics.

  11. n

    Migratory Locust EST Database

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Migratory Locust EST Database [Dataset]. http://identifiers.org/RRID:SCR_008201
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    The migratory locust (Locusta migratoria) is an orthopteran pest and a representative member of hemimetabolous insects. Its transcriptomic data provide invaluable information for molecular entomology study of the insect and pave a way for comparative studies of other medically, agronomically, and ecologically relevant insects. This first transcriptomic database of the locust (LocustDB) has been developed, building necessary infrastructures to integrate, organize, and retrieve data that are either currently available or to be acquired in the future. It currently hosts 45,474 high quality EST sequences from the locust, which were assembled into 12,161 unigenes. This database contains original sequence data, including homologous/orthologous sequences, functional annotations, pathway analysis, and codon usage, based on conserved orthologous groups (COG), gene ontology (GO), protein domain (InterPro), and functional pathways (KEGG). It also provides information from comparative analysis based on data from the migratory locust and five other invertebrate species, such as the silkworm, the honeybee, the fruitfly, the mosquito and the nematode. LocustDB also provides information from comparative analysis based on data from the migratory locust and five other invertebrate species, such as the silkworm, the honeybee, the fruitfly, the mosquito and the nematode. It starts with the first transcriptome information for an orthopteran and hemimetabolous insect and will be extended to provide a framework for incorporation of in-coming genomic data of relevant insect groups and a workbench for cross-species comparative studies.

  12. Z

    Data from: CrusTome: A transcriptome database resource for large-scale...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Perez-Moreno, Jorge L.; Kozma, Mihika T.; DeLeo, Danielle M.; Bracken-Grissom, Heather D.; Durica, David S.; Mykles, Donald L. (2023). CrusTome: A transcriptome database resource for large-scale analyses across Crustacea [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7730439
    Explore at:
    Dataset updated
    Mar 15, 2023
    Dataset provided by
    Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution
    Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution, & Department of Biological Sciences and Institute of Environment, Florida International University
    Department of biology, Colorado State University
    Department of Biology, University of Oklahoma
    Department of Biology, Colorado State University
    Authors
    Perez-Moreno, Jorge L.; Kozma, Mihika T.; DeLeo, Danielle M.; Bracken-Grissom, Heather D.; Durica, David S.; Mykles, Donald L.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CrusTome_v0.1.0 Prerelease /ReadMe - this file /crustome_aa_BLAST.tar.gz - CrusTome database of amino acid sequences in BLAST format /crustome_aa_DIAMOND.tar.gz - CrusTome database of amino acid sequences in DIAMOND format /crustome_mrna_BLAST.tar.gz - CrusTome database of mRNA sequences in BLAST format /dict - Dictionary file to translate species IDs. For usage with sed/awk see link to Github site below

    • Please note, most of the data files contained in this DOI are
    • compressed into GZip files (.gz extension).
    • Mac and Linux OS's can extract this file type natively.
    • Windows OS requires software to extract the archive. 7-Zip
    • (http://www.7-zip.org) is free and open source software that will
    • allow windows PCs to open and decompress the archive.

    • * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

    Pérez-Moreno JL, Kozma MT, DeLeo DM, Bracken-Grissom HD, Durica DS, Mykles DL. 2023. CrusTome: A transcriptome database resource for large-scale analyses across Crustacea. G3: Genes, Genomes, Genetics.

    CrusTome: A transcriptome database resource for large-scale analyses across Crustacea

    Transcriptomes from non-traditional model organisms often harbor a wealth of unexplored data. Examining these datasets can lead to clarity and novel insights in traditional systems, as well as to discoveries across a multitude of fields. Despite significant advances in DNA sequencing technologies and in their adoption, access to genomic and transcriptomic resources for non-traditional model organisms remains limited. Crustaceans, for example, being amongst the most numerous, diverse, and widely distributed taxa on the planet, often serve as excellent systems to address ecological, evolutionary, and organismal questions. While they are ubiquitously present across environments, and of economic and food security importance, they remain severely underrepresented in publicly available sequence databases. Here, we present CrusTome, a multi-species, multi-tissue, transcriptome database of 201 assembled mRNA transcriptomes (189 crustaceans, 30 of which were previously unpublished, and 12 ecdysozoan outgroups) as an evolving, and publicly available resource. This database is suitable for evolutionary, ecological, and functional studies that employ genomic/transcriptomic techniques and datasets. CrusTome is presented in BLAST and DIAMOND formats, providing robust datasets for sequence similarity searches, orthology assignments, phylogenetic inference, etc., and thus allowing for straight-forward incorporation into existing custom pipelines for high-throughput analyses.

    For questions regarding released datasets contact: Corresponding Author: Jorge L. Perez-Moreno (Colorado State University) jorgepm@colostate.edu / jpere645@fiu.edu

    https://github.com/invertome/crustome

    PLEASE CITE:

    Pérez-Moreno JL, Kozma MT, DeLeo DM, Bracken-Grissom HD, Durica DS, Mykles DL. 2023. CrusTome: A transcriptome database resource for large-scale analyses across Crustacea. G3: Genes, Genomes, Genetics.

    Funder Information

    Supported by National Science Foundation grants to DLM (IOS-1922701) and DSD (IOS-1922755). In addition, this work was partially funded by two grants awarded from the National Science Foundation: Doctoral Dissertation Improvement Grant (#1701835) awarded to JPM and HBG and the Division of Environmental Biology Bioluminescence and Vision grant (DEB-1556059) awarded to HBG. Samples in the FICC were collected by grants from The Gulf of Mexico Research Initiative (GOMRI), Florida Institute of Oceanography Shiptime Funding awarded to HBG and DMD; the National Science Foundation Division of Environmental Biology Grant 1556059 awarded to HBG; and the National Oceanic and Atmospheric Administration Ocean Exploration Research (NOAA-OER 2015) grant awarded to HBG.

  13. f

    DataSheet3_Preclinical species gene expression database: Development and...

    • datasetcatalog.nlm.nih.gov
    Updated Jan 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liguori, Michael J.; Blomme, Eric A. G.; Mittelstadt, Scott; Van Vleet, Terry R.; Krause, Caitlin; Mahalingaiah, Prathap Kumar; Rendino, Lauren; Peterson, Richard; Kowalkowski, Kenneth; Suwada, Kinga; Vo, Andy (2023). DataSheet3_Preclinical species gene expression database: Development and meta-analysis.csv [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001025205
    Explore at:
    Dataset updated
    Jan 17, 2023
    Authors
    Liguori, Michael J.; Blomme, Eric A. G.; Mittelstadt, Scott; Van Vleet, Terry R.; Krause, Caitlin; Mahalingaiah, Prathap Kumar; Rendino, Lauren; Peterson, Richard; Kowalkowski, Kenneth; Suwada, Kinga; Vo, Andy
    Description

    The evaluation of toxicity in preclinical species is important for identifying potential safety liabilities of experimental medicines. Toxicology studies provide translational insight into potential adverse clinical findings, but data interpretation may be limited due to our understanding of cross-species biological differences. With the recent technological advances in sequencing and analyzing omics data, gene expression data can be used to predict cross species biological differences and improve experimental design and toxicology data interpretation. However, interpreting the translational significance of toxicogenomics analyses can pose a challenge due to the lack of comprehensive preclinical gene expression datasets. In this work, we performed RNA-sequencing across four preclinical species/strains widely used for safety assessment (CD1 mouse, Sprague Dawley rat, Beagle dog, and Cynomolgus monkey) in ∼50 relevant tissues/organs to establish a comprehensive preclinical gene expression body atlas for both males and females. In addition, we performed a meta-analysis across the large dataset to highlight species and tissue differences that may be relevant for drug safety analyses. Further, we made these databases available to the scientific community. This multi-species, tissue-, and sex-specific transcriptomic database should serve as a valuable resource to enable informed safety decision-making not only during drug development, but also in a variety of disciplines that use these preclinical species.

  14. Additional file 1: of Bridging the gap between reference and real...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonin Morillon; Daniel Gautheret (2023). Additional file 1: of Bridging the gap between reference and real transcriptomes [Dataset]. http://doi.org/10.6084/m9.figshare.8223203.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Antonin Morillon; Daniel Gautheret
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Table S1. Overview of major eukaryotic transcriptome databases. Table S2. Large-scale RNA-seq projects (human). Table S3. Sequencing methods providing insight on specific events shown in Fig 2. Table S4. Transcript variations related to cancer and other diseases; and software for retrieving these variations from RNA-seq data. (XLSX 18 kb)

  15. d

    Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer...

    • search.dataone.org
    • datadryad.org
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreyansh Priyadarshi; Camellia Mazumder; Sayan Biswas; Bhavesh Neekhra; Debayan Gupta; Shubhasis Haldar (2025). Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer hallmarks [Dataset]. http://doi.org/10.5061/dryad.zw3r228jc
    Explore at:
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Shreyansh Priyadarshi; Camellia Mazumder; Sayan Biswas; Bhavesh Neekhra; Debayan Gupta; Shubhasis Haldar
    Description

    Evidence before this study  We conducted an extensive literature search using Google Scholar without language restrictions, employing search terms such as “(Predicting OR Classifying OR Annotating) and (cancer hallmarks) AND (Deep OR Machine Learning) OR (Artificial Intelligence OR AI).†Despite notable advances in molecular oncology and computational methodologies, a critical gap remains: no existing machine learning or deep learning framework comprehensively predicts cancer hallmarks from tumor biopsy samples. Current research primarily targets specific molecular pathways associated with individual hallmarks, leaving clinicians without an integrated model to interpret hallmark activity at the level of an individual tumor. Moreover, the absence of wet-lab techniques capable of annotating all cancer hallmarks in biopsy samples has further impeded progress, limiting the clinical utility of hallmark-related insights for precision oncology.  Added value of this study  This study introdu..., Dataset Collection and Processing  We utilized a large-scale dataset comprising 2.7 million single-cell transcriptomes derived from 14 tumor types, collected from 922 patients across 51 independent studies conducted globally. This dataset was sourced from the Weizmann Institute's 3CA repository. Quality Control  Before generating synthetic datasets for model training, the raw single-cell transcriptomic data underwent a rigorous quality control (QC) process. Cells with over 15% mitochondrial transcript content, fewer than 200, or more than 6,000 expressed mRNA transcripts were excluded to ensure data reliability.  Gene Set Curation  Gene sets representing cancer hallmarks were compiled from multiple databases, retaining only genes identified in at least two independent sources. This selection was refined through manual literature reviews to exclude genes without direct or indirect roles in hallmark-related pathways.  Digital Scoring  Using the curated gene sets, Digital Scores were..., , # Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer hallmarks

    https://doi.org/10.5061/dryad.zw3r228jc

    Description of the data and file structure

    Data Description: Experimental Efforts

    This dataset comprises single-cell transcriptomic data from the Weizmann 3CA repository, encompassing 2.7 million single-cell transcriptomes from 14 tumor types, collected from 922 patients across 51 global studies. The primary objective of the experimental efforts was to generate synthetic datasets for training and validating computational models to identify and analyze cancer hallmarks at the single-cell resolution.

    Single-cell RNA sequencing (scRNA-seq) data underwent a rigorous quality control process to ensure reliability and biological relevance. This included exclusion criteria based on mitochondrial transcript content (>15%) and mRNA transcript counts (<200 or >6,000 transcripts). Gene sets corresponding to 10 estab...

  16. d

    Data from: BBGD454: an Online Database for Blueberry Genomic Data...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). BBGD454: an Online Database for Blueberry Genomic Data Transcriptome analysis of Blueberry using 454 EST sequencing [Dataset]. https://catalog.data.gov/dataset/bbgd454-an-online-database-for-blueberry-genomic-data-transcriptome-analysis-of-blueberry--5783e
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    NOTE: This dataset is no longer publicly available. This database houses over 500,000 sequences that were generated and assembled into approximately 15,000 contigs, annotated and functionally mapped to Gene Ontology (GO) terms. Blueberry (Vaccinium corymbosum) is a major berry crop in the United States. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities during an organism’s developmental stage(s) or its response to biotic or abiotic stresses. Such application of this new sequencing technique allows for high-throughput, genome-wide experimental verification of known and novel transcripts. We have applied a high-throughput pyrosequencing technology (454 EST sequencing) for transcriptome profiling of blueberry during different stages of fruit development to gain an understanding of the genes that are up or down regulated during this process. We have also sequenced flower buds at four different stages of cold acclimation to gain a better understanding of the genes and biochemical pathways that are up- or down-regulated during cold acclimation, since extreme low temperatures are known to reduce crop yield and cause major losses to US farmers. We have also sequenced a leaf sample to compare its transcriptome profile with that of bud and fruit samples. Over 500,000 sequences were generated and assembled into approximately 15,000 contigs and were annotated and functionally mapped to Gene Ontology (GO) terms. A database was developed to house these sequences and their annotations. A web based interface was also developed to allow collaborators to search\browse the data and aid in the analysis and interpretation of the data. The availability of these sequences will allow for future advances, such as the development of a blueberry microarray to study gene expression, and will aid in the blueberry genome sequencing effort that is underway. This work was supported by grant 2008-51180-04861 from the USDA - Cooperative State Research, Education, and Extension Service (CSREES) Specialty Crop Research Initiative program.

  17. Data from: BaRTv1.0: an improved barley reference transcript dataset to...

    • zenodo.org
    • osti.gov
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson; Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson (2020). BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq [Dataset]. http://doi.org/10.5281/zenodo.3360434
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson; Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background
    Time consuming computational assembly and quantification of gene expression and splicing analysis from RNA-seq data vary considerably. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants.

    Results
    A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts – BaRTv1.0) has been generated. BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. Morex reference genome (Mascher et al., 2017). Full-length cDNAs from the barley variety Haruna nijo (Matsumoto et al., 2011) determined transcript coverage, and high-resolution RT-PCR validated alternatively spliced (AS) transcripts of 86 genes in five different organs and tissue. These methods were used as benchmarks to select an optimal barley RTD. BaRTv1.0-Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5’ and 3’ UTR ends of transcripts. BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. This analysis identified 20,972 significant differentially expressed genes, 2,791 differentially alternatively spliced genes and 2,768 transcripts with differential transcript usage.

    Conclusion
    A high confidence barley reference transcript dataset consisting of 60,444 genes with 177,240 transcripts has been generated. Compared to current barley transcripts, BaRTv1.0 transcripts are generally longer, have less fragmentation and improved gene models that are well supported by splice junction reads. Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.

  18. s

    Cerebellar Development Transcriptome Database

    • scicrunch.org
    • dknet.org
    • +2more
    Updated Oct 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Cerebellar Development Transcriptome Database [Dataset]. http://identifiers.org/RRID:SCR_013096
    Explore at:
    Dataset updated
    Oct 17, 2019
    Description

    Transcriptomic information (spatiotemporal gene expression profile data) on the postnatal cerebellar development of mice (C57B/6J & ICR). It is a tool for mining cerebellar genes and gene expression, and provides a portal to relevant bioinformatics links. The mouse cerebellar circuit develops through a series of cellular and morphological events, including neuronal proliferation and migration, axonogenesis, dendritogenesis, and synaptogenesis, all within three weeks after birth, and each event is controlled by a specific gene group whose expression profile must be encoded in the genome. To elucidate the genetic basis of cerebellar circuit development, CDT-DB analyzes spatiotemporal gene expression by using in situ hybridization (ISH) for cellular resolution and by using fluorescence differential display and microarrays (GeneChip) for developmental time series resolution. The CDT-DB not only provides a cross-search function for large amounts of experimental data (ISH brain images, GeneChip graph, RT-PCR gel images), but also includes a portal function by which all registered genes have been provided with hyperlinks to websites of many relevant bioinformatics regarding gene ontology, genome, proteins, pathways, cell functions, and publications. Thus, the CDT-DB is a useful tool for mining potentially important genes based on characteristic expression profiles in particular cell types or during a particular time window in developing mouse brains.

  19. n

    LegumeIP

    • neuinfo.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LegumeIP [Dataset]. http://identifiers.org/RRID:SCR_008906
    Explore at:
    Description

    LegumeIP is an integrative database and bioinformatics platform for comparative genomics and transcriptomics to facilitate the study of gene function and genome evolution in legumes, and ultimately to generate molecular based breeding tools to improve quality of crop legumes. LegumeIP currently hosts large-scale genomics and transcriptomics data, including: * Genomic sequences of three model legumes, i.e. Medicago truncatula, Glycine max (soybean) and Lotus japonicus, including two reference plant species, Arabidopsis thaliana and Poplar trichocarpa, with the annotation based on UniProt TrEMBL, InterProScan, Gene Ontology and KEGG databases. LegumeIP covers a total 222,217 protein-coding gene sequences. * Large-scale gene expression data compiled from 104 array hybridizations from L. japonicas, 156 array hybridizations from M. truncatula gene atlas database, and 14 RNA-Seq-based gene expression profiles from G. max on different tissues including four common tissues: Nodule, Flower, Root and Leaf. * Systematic synteny analysis among M. truncatula, G. max, L. japonicus and A. thaliana. * Reconstruction of gene family and gene family-wide phylogenetic analysis across the five hosted species. LegumeIP features comprehensive search and visualization tools to enable the flexible query on gene annotation, gene family, synteny, relative abundance of gene expression.

  20. n

    Microarray DB

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Microarray DB [Dataset]. http://identifiers.org/RRID:SCR_008525
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A tool for mapping transcriptome data and for creating a database with an overview of the entire pathway, a web-based resource consisting of a web-application for the visualization of complex omics data onto KEGG pathways to overview all entities in the context of cellular pathways, and databases created with the software to visualize a series of microarray data. The web-application accepts transcriptome, proteome, metabolome, or the combination of these data as input, and because of this scalability it is advantageous for the visualization of cell simulation results. Several databases of transcriptome data obtained at Mori Laboratory, Nara Institute of Science and Technology, Japan, are also presented.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rozadilla, Gastón; Mccarthy, Cristina Beryl (2024). Metatranscriptomic and transcriptomic databases (DB4S) of Spodoptera frugiperda larvae guts from Northern Argentina (Tucumán province) [Dataset]. https://ri.conicet.gov.ar/handle/11336/234791

Metatranscriptomic and transcriptomic databases (DB4S) of Spodoptera frugiperda larvae guts from Northern Argentina (Tucumán province)

Explore at:
Dataset updated
May 7, 2024
Authors
Rozadilla, Gastón; Mccarthy, Cristina Beryl
Area covered
Tucumán Province
Description

Spodoptera frugiperda is a noctuid moth that devastates various crops including corn, rice and cotton, and is found in most of the American continent. The purpose of this study was to integrate gene expression data from S. frugiperda guts and their associated metatranscriptomes, under natural and controlled conditions. For this, four S. frugiperda samples from the province of Tucumán (Argentina; subtropical region) were analysed. Specimens were obtained from different environments, altitudes and food sources, namely: 1) a transgenic maize (Zea mays) field at 495 m.a.s.l. where insecticides and fertilisers were applied (named MM; 26o49’50”S; 65o16’59.4”W); 2) Sorghum halepense at 495 m.a.s.l. (MS; 26o49’50”S; 65o16’59.4”W); 3) a maize field at 2283 m.a.s.l. where no insecticides or fertilisers were used (TV; 26o55’40.75”S; 65o45’19.90”W) ; and 4) a colony established from larvae originally collected from the same transgenic maize field as Sf_MM, reared for 9 generations under controlled conditions on an artificial diet adapted from [8], without the addition of antibiotics (BT). For all samples, total RNA extracted from fifth instar larvae guts (two digestive tracts per sample), was submitted to a modified one-step reverse transcription and polymerase chain reaction sequence-independent amplification procedure, as described previously. High-throughput pyrosequencing of the samples was performed using a Roche GS FLX (Macrogen Inc., Korea), yielding ~1Gb of metatranscriptomic reads with lengths of 50 to 1600 bases (nt) (652 nt average). Raw sequence reads were trimmed to remove nucleotides derived from the amplification primers using a custom application. Below follows an outline of the main steps we followed to create the uploaded databases: I.Sequences were compared locally to a combined nucleotide database (nt16SLep = “Non-redundant” nucleotide sequence (nt) database + 16S rRNA gene (16S) database + Lepidopteran whole genome shotgun (Lep) projects completed at the time of the analysis) using BLASTN (Altschul et al., 1990) with a 1e-50 cutoff E-value, and to the protein database (nr = non-redundant protein sequence) using Diamond (Buchfink et al., 2014) with a 1e-17 cutoff E-value. II.The homology search results were then processed as follows: Step A: The output files from both homology searches were processed with MEGAN, a software which performs taxonomic binning and assigns sequences to taxa using the Lowest Common Ancestor (LCA)-assignment algorithm (Huson et al., 2007). Taxonomic and functional assignments performed by MEGAN for each sequence were then exported using a MEGAN functionality. Note: MEGAN computes a “species profile” by finding the lowest node in the NCBI taxonomy that encompasses the set of hit taxa and assigns the sequence to the taxon represented by that lowest node. With this approach, every sequence is assigned to some taxon; if the sequence aligns very specifically only to a single taxon, then it is assigned to that taxon; the less specifically a sequence hits taxa, the higher up in the taxonomy it is placed. Step B: The output files from both homology searches were also processed with a custom bash script. This script parses the homology search output files and generates two files (one for each homology search) containing the name of each sequence, its best hit (or no hit) and the corresponding E-value. III. Create local database (Step C): All this information (from the exported MEGAN files and from the bash script output files) was then used to create a local SQLite database which included all the available information for each sequence (from both homology searches).

Search
Clear search
Close search
Google apps
Main menu