100+ datasets found

c
Metatranscriptomic and transcriptomic databases (DB4S) of Spodoptera...
ri.conicet.gov.ar
datosdeinvestigacion.conicet.gov.ar
Updated May 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rozadilla, Gastón; Mccarthy, Cristina Beryl (2024). Metatranscriptomic and transcriptomic databases (DB4S) of Spodoptera frugiperda larvae guts from Northern Argentina (Tucumán province) [Dataset]. https://ri.conicet.gov.ar/handle/11336/234791
Explore at:
Dataset updated
May 7, 2024
Authors
Rozadilla, Gastón; Mccarthy, Cristina Beryl
Area covered
Tucumán Province
Description
Spodoptera frugiperda is a noctuid moth that devastates various crops including corn, rice and cotton, and is found in most of the American continent. The purpose of this study was to integrate gene expression data from S. frugiperda guts and their associated metatranscriptomes, under natural and controlled conditions. For this, four S. frugiperda samples from the province of Tucumán (Argentina; subtropical region) were analysed. Specimens were obtained from different environments, altitudes and food sources, namely: 1) a transgenic maize (Zea mays) field at 495 m.a.s.l. where insecticides and fertilisers were applied (named MM; 26o49’50”S; 65o16’59.4”W); 2) Sorghum halepense at 495 m.a.s.l. (MS; 26o49’50”S; 65o16’59.4”W); 3) a maize field at 2283 m.a.s.l. where no insecticides or fertilisers were used (TV; 26o55’40.75”S; 65o45’19.90”W) ; and 4) a colony established from larvae originally collected from the same transgenic maize field as Sf_MM, reared for 9 generations under controlled conditions on an artificial diet adapted from [8], without the addition of antibiotics (BT). For all samples, total RNA extracted from fifth instar larvae guts (two digestive tracts per sample), was submitted to a modified one-step reverse transcription and polymerase chain reaction sequence-independent amplification procedure, as described previously. High-throughput pyrosequencing of the samples was performed using a Roche GS FLX (Macrogen Inc., Korea), yielding ~1Gb of metatranscriptomic reads with lengths of 50 to 1600 bases (nt) (652 nt average). Raw sequence reads were trimmed to remove nucleotides derived from the amplification primers using a custom application. Below follows an outline of the main steps we followed to create the uploaded databases: I.Sequences were compared locally to a combined nucleotide database (nt16SLep = “Non-redundant” nucleotide sequence (nt) database + 16S rRNA gene (16S) database + Lepidopteran whole genome shotgun (Lep) projects completed at the time of the analysis) using BLASTN (Altschul et al., 1990) with a 1e-50 cutoff E-value, and to the protein database (nr = non-redundant protein sequence) using Diamond (Buchfink et al., 2014) with a 1e-17 cutoff E-value. II.The homology search results were then processed as follows: Step A: The output files from both homology searches were processed with MEGAN, a software which performs taxonomic binning and assigns sequences to taxa using the Lowest Common Ancestor (LCA)-assignment algorithm (Huson et al., 2007). Taxonomic and functional assignments performed by MEGAN for each sequence were then exported using a MEGAN functionality. Note: MEGAN computes a “species profile” by finding the lowest node in the NCBI taxonomy that encompasses the set of hit taxa and assigns the sequence to the taxon represented by that lowest node. With this approach, every sequence is assigned to some taxon; if the sequence aligns very specifically only to a single taxon, then it is assigned to that taxon; the less specifically a sequence hits taxa, the higher up in the taxonomy it is placed. Step B: The output files from both homology searches were also processed with a custom bash script. This script parses the homology search output files and generates two files (one for each homology search) containing the name of each sequence, its best hit (or no hit) and the corresponding E-value. III. Create local database (Step C): All this information (from the exported MEGAN files and from the bash script output files) was then used to create a local SQLite database which included all the available information for each sequence (from both homology searches).
Transcriptomic databases.
plos.figshare.com
xls
Updated Jun 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rohan Lowe; Neil Shirley; Mark Bleackley; Stephen Dolan; Thomas Shafee (2023). Transcriptomic databases. [Dataset]. http://doi.org/10.1371/journal.pcbi.1005457.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1005457.t005
Dataset updated
Jun 19, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Rohan Lowe; Neil Shirley; Mark Bleackley; Stephen Dolan; Thomas Shafee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Transcriptomic databases.
Transcriptomic databases and qRT-PCR dataset
figshare.com
xlsx
Updated Apr 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Dannhauser; Sabrina Napoletano (2024). Transcriptomic databases and qRT-PCR dataset [Dataset]. http://doi.org/10.6084/m9.figshare.25592961.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25592961.v1
Dataset updated
Apr 12, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
David Dannhauser; Sabrina Napoletano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset for the calculation of a universal and minimized miRNA signature.
f
Statistics on the reference transcriptomic database.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Mar 12, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coustau, Christine; Duval, David; Reichhart, Jean-Marc; Wajnberg, Eric; Dubreuil, Géraldine; Gourbal, Benjamin; Deleury, Emeline; Elangovan, Namasivayam; Gouzy, Jérôme; Baron, Olga Lucia (2012). Statistics on the reference transcriptomic database. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001127306
Explore at:
Dataset updated
Mar 12, 2012
Authors
Coustau, Christine; Duval, David; Reichhart, Jean-Marc; Wajnberg, Eric; Dubreuil, Géraldine; Gourbal, Benjamin; Deleury, Emeline; Elangovan, Namasivayam; Gouzy, Jérôme; Baron, Olga Lucia
Description
Statistics on the reference transcriptomic database.
n
Human Transcriptome Database for Alternative Splicing
neuinfo.org
dknet.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Human Transcriptome Database for Alternative Splicing [Dataset]. http://identifiers.org/RRID:SCR_013305
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013305
Dataset updated
Jan 29, 2022
Description
A specialized database for human alternative splicing (AS) based on H-Invitational full-length cDNAs. H-DBAS offers unique data and viewer for human Alternative Splicing (AS) analysis. It contains: * Genome-wide representative alternative splicing variants (RASVs) identified from following datasets * H-Inv full-length cDNAs (resource summary): H-Invitational cDNA dataset * H-Inv all transcripts (resource summary): Published human mRNA dataset * Mouse full-length cDNAs (resource summary): Mouse cDNA dataset * RASVs affecting protein functions such as protein motif, GO, subcellular localization signal and transmembrane domain * Conserved RASVs compared with mouse genome and the full-length cDNAs (H-Inv full-length cDNAs only)
The thirty-seven differentially expressed genes identified from the three...
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin-jie Tian; Yan Long; Jiao Wang; Jing-wen Zhang; Yan-yan Wang; Wei-min Li; Yu-fa Peng; Qian-hua Yuan; Xin-wu Pei (2023). The thirty-seven differentially expressed genes identified from the three libraries. [Dataset]. http://doi.org/10.1371/journal.pone.0131455.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0131455.t006
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Xin-jie Tian; Yan Long; Jiao Wang; Jing-wen Zhang; Yan-yan Wang; Wei-min Li; Yu-fa Peng; Qian-hua Yuan; Xin-wu Pei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
—, no hits in the specific databaseThe thirty-seven differentially expressed genes identified from the three libraries.
f
Table1_Preclinical species gene expression database: Development and...
datasetcatalog.nlm.nih.gov
Updated Jan 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vo, Andy; Krause, Caitlin; Liguori, Michael J.; Kowalkowski, Kenneth; Van Vleet, Terry R.; Suwada, Kinga; Mittelstadt, Scott; Rendino, Lauren; Mahalingaiah, Prathap Kumar; Peterson, Richard; Blomme, Eric A. G. (2023). Table1_Preclinical species gene expression database: Development and meta-analysis.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001025224
Explore at:
Dataset updated
Jan 17, 2023
Authors
Vo, Andy; Krause, Caitlin; Liguori, Michael J.; Kowalkowski, Kenneth; Van Vleet, Terry R.; Suwada, Kinga; Mittelstadt, Scott; Rendino, Lauren; Mahalingaiah, Prathap Kumar; Peterson, Richard; Blomme, Eric A. G.
Description
The evaluation of toxicity in preclinical species is important for identifying potential safety liabilities of experimental medicines. Toxicology studies provide translational insight into potential adverse clinical findings, but data interpretation may be limited due to our understanding of cross-species biological differences. With the recent technological advances in sequencing and analyzing omics data, gene expression data can be used to predict cross species biological differences and improve experimental design and toxicology data interpretation. However, interpreting the translational significance of toxicogenomics analyses can pose a challenge due to the lack of comprehensive preclinical gene expression datasets. In this work, we performed RNA-sequencing across four preclinical species/strains widely used for safety assessment (CD1 mouse, Sprague Dawley rat, Beagle dog, and Cynomolgus monkey) in ∼50 relevant tissues/organs to establish a comprehensive preclinical gene expression body atlas for both males and females. In addition, we performed a meta-analysis across the large dataset to highlight species and tissue differences that may be relevant for drug safety analyses. Further, we made these databases available to the scientific community. This multi-species, tissue-, and sex-specific transcriptomic database should serve as a valuable resource to enable informed safety decision-making not only during drug development, but also in a variety of disciplines that use these preclinical species.
f
Characteristics and details of four primary transcriptome databases.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Mar 7, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tan, Xue-Mei; Tao, Xiang; Wang, Haiyan; Lai, Xian-Jun; Zhang, Yi-Zheng; Gu, Ying-Hong; Yan, Lang (2014). Characteristics and details of four primary transcriptome databases. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001191775
Explore at:
Dataset updated
Mar 7, 2014
Authors
Tan, Xue-Mei; Tao, Xiang; Wang, Haiyan; Lai, Xian-Jun; Zhang, Yi-Zheng; Gu, Ying-Hong; Yan, Lang
Description
XS18-v: transcriptome from a mixed sample of roots, stems and leaves in cultivar Xushu 18. XS18-f: transcriptome from flowers in cultivar Xushu 18. GS87-r: transcriptome from roots in cultivar Guangshu 87. JS6-r: transcriptome from roots in cultivar Jjingshu 6.
f
Integration of Proteomics and Transcriptomics Data Sets for the Analysis of...
acs.figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paula Díez; Conrad Droste; Rosa M. Dégano; María González-Muñoz; Nieves Ibarrola; Martín Pérez-Andrés; Alba Garin-Muga; Víctor Segura; Gyorgy Marko-Varga; Joshua LaBaer; Alberto Orfao; Fernando J. Corrales; Javier De Las Rivas; Manuel Fuentes (2023). Integration of Proteomics and Transcriptomics Data Sets for the Analysis of a Lymphoma B‑Cell Line in the Context of the Chromosome-Centric Human Proteome Project [Dataset]. http://doi.org/10.1021/acs.jproteome.5b00474.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jproteome.5b00474.s001
Dataset updated
Jun 1, 2023
Dataset provided by
ACS Publications
Authors
Paula Díez; Conrad Droste; Rosa M. Dégano; María González-Muñoz; Nieves Ibarrola; Martín Pérez-Andrés; Alba Garin-Muga; Víctor Segura; Gyorgy Marko-Varga; Joshua LaBaer; Alberto Orfao; Fernando J. Corrales; Javier De Las Rivas; Manuel Fuentes
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
A comprehensive study of the molecular active landscape of human cells can be undertaken to integrate two different but complementary perspectives: transcriptomics, and proteomics. After the genome era, proteomics has emerged as a powerful tool to simultaneously identify and characterize the compendium of thousands of different proteins active in a cell. Thus, the Chromosome-centric Human Proteome Project (C-HPP) is promoting a full characterization of the human proteome combining high-throughput proteomics with the data derived from genome-wide expression profiling of protein-coding genes. Here we present a full proteomic profiling of a human lymphoma B-cell line (Ramos) performed using a nanoUPLC-LTQ-Orbitrap Velos proteomic platform, combined to an in-depth transcriptomic profiling of the same cell type. Data are available via ProteomeXchange with identifier PXD001933. Integration of the proteomic and transcriptomic data sets revealed a 94% overlap in the proteins identified by both -omics approaches. Moreover, functional enrichment analysis of the proteomic profiles showed an enrichment of several functions directly related to the biological and morphological characteristics of B-cells. In turn, about 30% of all protein-coding genes present in the whole human genome were identified as being expressed by the Ramos cells (stable average of 30% genes along all the chromosomes), revealing the size of the protein expression-set present in one specific human cell type. Additionally, the identification of missing proteins in our data sets has been reported, highlighting the power of the approach. Also, a comparison between neXtProt and UniProt database searches has been performed. In summary, our transcriptomic and proteomic experimental profiling provided a high coverage report of the expressed proteome from a human lymphoma B-cell type with a clear insight into the biological processes that characterized these cells. In this way, we demonstrated the usefulness of combining -omics for a comprehensive characterization of specific biological systems.
f
Transcriptome data in seven databases in the annotate success rate...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Apr 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meng, Xianhong; Kong, Jie; Luan, Sheng; Li, Xupeng; Shi, Xiaoli; Chen, Baolong; Dong, Lijun; Sui, Juan; Cao, Baoxiang; Luo, Kun; Cao, Jiawang (2019). Transcriptome data in seven databases in the annotate success rate statistics. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000089760
Explore at:
Dataset updated
Apr 8, 2019
Authors
Meng, Xianhong; Kong, Jie; Luan, Sheng; Li, Xupeng; Shi, Xiaoli; Chen, Baolong; Dong, Lijun; Sui, Juan; Cao, Baoxiang; Luo, Kun; Cao, Jiawang
Description
Transcriptome data in seven databases in the annotate success rate statistics.
n
Migratory Locust EST Database
neuinfo.org
scicrunch.org
+2more
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Migratory Locust EST Database [Dataset]. http://identifiers.org/RRID:SCR_008201
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008201
Dataset updated
Jan 29, 2022
Description
The migratory locust (Locusta migratoria) is an orthopteran pest and a representative member of hemimetabolous insects. Its transcriptomic data provide invaluable information for molecular entomology study of the insect and pave a way for comparative studies of other medically, agronomically, and ecologically relevant insects. This first transcriptomic database of the locust (LocustDB) has been developed, building necessary infrastructures to integrate, organize, and retrieve data that are either currently available or to be acquired in the future. It currently hosts 45,474 high quality EST sequences from the locust, which were assembled into 12,161 unigenes. This database contains original sequence data, including homologous/orthologous sequences, functional annotations, pathway analysis, and codon usage, based on conserved orthologous groups (COG), gene ontology (GO), protein domain (InterPro), and functional pathways (KEGG). It also provides information from comparative analysis based on data from the migratory locust and five other invertebrate species, such as the silkworm, the honeybee, the fruitfly, the mosquito and the nematode. LocustDB also provides information from comparative analysis based on data from the migratory locust and five other invertebrate species, such as the silkworm, the honeybee, the fruitfly, the mosquito and the nematode. It starts with the first transcriptome information for an orthopteran and hemimetabolous insect and will be extended to provide a framework for incorporation of in-coming genomic data of relevant insect groups and a workbench for cross-species comparative studies.
Z
Data from: CrusTome: A transcriptome database resource for large-scale...
data.niaid.nih.gov
zenodo.org
Updated Mar 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Perez-Moreno, Jorge L.; Kozma, Mihika T.; DeLeo, Danielle M.; Bracken-Grissom, Heather D.; Durica, David S.; Mykles, Donald L. (2023). CrusTome: A transcriptome database resource for large-scale analyses across Crustacea [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7730439
Explore at:
Dataset updated
Mar 15, 2023
Dataset provided by
Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution
Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution, & Department of Biological Sciences and Institute of Environment, Florida International University
Department of biology, Colorado State University
Department of Biology, University of Oklahoma
Department of Biology, Colorado State University
Authors
Perez-Moreno, Jorge L.; Kozma, Mihika T.; DeLeo, Danielle M.; Bracken-Grissom, Heather D.; Durica, David S.; Mykles, Donald L.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CrusTome_v0.1.0 Prerelease /ReadMe - this file /crustome_aa_BLAST.tar.gz - CrusTome database of amino acid sequences in BLAST format /crustome_aa_DIAMOND.tar.gz - CrusTome database of amino acid sequences in DIAMOND format /crustome_mrna_BLAST.tar.gz - CrusTome database of mRNA sequences in BLAST format /dict - Dictionary file to translate species IDs. For usage with sed/awk see link to Github site below

Please note, most of the data files contained in this DOI are

compressed into GZip files (.gz extension).

Mac and Linux OS's can extract this file type natively.

Windows OS requires software to extract the archive. 7-Zip

(http://www.7-zip.org) is free and open source software that will

allow windows PCs to open and decompress the archive.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Pérez-Moreno JL, Kozma MT, DeLeo DM, Bracken-Grissom HD, Durica DS, Mykles DL. 2023. CrusTome: A transcriptome database resource for large-scale analyses across Crustacea. G3: Genes, Genomes, Genetics.

CrusTome: A transcriptome database resource for large-scale analyses across Crustacea

Transcriptomes from non-traditional model organisms often harbor a wealth of unexplored data. Examining these datasets can lead to clarity and novel insights in traditional systems, as well as to discoveries across a multitude of fields. Despite significant advances in DNA sequencing technologies and in their adoption, access to genomic and transcriptomic resources for non-traditional model organisms remains limited. Crustaceans, for example, being amongst the most numerous, diverse, and widely distributed taxa on the planet, often serve as excellent systems to address ecological, evolutionary, and organismal questions. While they are ubiquitously present across environments, and of economic and food security importance, they remain severely underrepresented in publicly available sequence databases. Here, we present CrusTome, a multi-species, multi-tissue, transcriptome database of 201 assembled mRNA transcriptomes (189 crustaceans, 30 of which were previously unpublished, and 12 ecdysozoan outgroups) as an evolving, and publicly available resource. This database is suitable for evolutionary, ecological, and functional studies that employ genomic/transcriptomic techniques and datasets. CrusTome is presented in BLAST and DIAMOND formats, providing robust datasets for sequence similarity searches, orthology assignments, phylogenetic inference, etc., and thus allowing for straight-forward incorporation into existing custom pipelines for high-throughput analyses.

For questions regarding released datasets contact: Corresponding Author: Jorge L. Perez-Moreno (Colorado State University) jorgepm@colostate.edu / jpere645@fiu.edu

https://github.com/invertome/crustome

PLEASE CITE:

Pérez-Moreno JL, Kozma MT, DeLeo DM, Bracken-Grissom HD, Durica DS, Mykles DL. 2023. CrusTome: A transcriptome database resource for large-scale analyses across Crustacea. G3: Genes, Genomes, Genetics.

Funder Information

Supported by National Science Foundation grants to DLM (IOS-1922701) and DSD (IOS-1922755). In addition, this work was partially funded by two grants awarded from the National Science Foundation: Doctoral Dissertation Improvement Grant (#1701835) awarded to JPM and HBG and the Division of Environmental Biology Bioluminescence and Vision grant (DEB-1556059) awarded to HBG. Samples in the FICC were collected by grants from The Gulf of Mexico Research Initiative (GOMRI), Florida Institute of Oceanography Shiptime Funding awarded to HBG and DMD; the National Science Foundation Division of Environmental Biology Grant 1556059 awarded to HBG; and the National Oceanic and Atmospheric Administration Ocean Exploration Research (NOAA-OER 2015) grant awarded to HBG.
f
DataSheet3_Preclinical species gene expression database: Development and...
datasetcatalog.nlm.nih.gov
Updated Jan 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liguori, Michael J.; Blomme, Eric A. G.; Mittelstadt, Scott; Van Vleet, Terry R.; Krause, Caitlin; Mahalingaiah, Prathap Kumar; Rendino, Lauren; Peterson, Richard; Kowalkowski, Kenneth; Suwada, Kinga; Vo, Andy (2023). DataSheet3_Preclinical species gene expression database: Development and meta-analysis.csv [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001025205
Explore at:
Dataset updated
Jan 17, 2023
Authors
Liguori, Michael J.; Blomme, Eric A. G.; Mittelstadt, Scott; Van Vleet, Terry R.; Krause, Caitlin; Mahalingaiah, Prathap Kumar; Rendino, Lauren; Peterson, Richard; Kowalkowski, Kenneth; Suwada, Kinga; Vo, Andy
Description
The evaluation of toxicity in preclinical species is important for identifying potential safety liabilities of experimental medicines. Toxicology studies provide translational insight into potential adverse clinical findings, but data interpretation may be limited due to our understanding of cross-species biological differences. With the recent technological advances in sequencing and analyzing omics data, gene expression data can be used to predict cross species biological differences and improve experimental design and toxicology data interpretation. However, interpreting the translational significance of toxicogenomics analyses can pose a challenge due to the lack of comprehensive preclinical gene expression datasets. In this work, we performed RNA-sequencing across four preclinical species/strains widely used for safety assessment (CD1 mouse, Sprague Dawley rat, Beagle dog, and Cynomolgus monkey) in ∼50 relevant tissues/organs to establish a comprehensive preclinical gene expression body atlas for both males and females. In addition, we performed a meta-analysis across the large dataset to highlight species and tissue differences that may be relevant for drug safety analyses. Further, we made these databases available to the scientific community. This multi-species, tissue-, and sex-specific transcriptomic database should serve as a valuable resource to enable informed safety decision-making not only during drug development, but also in a variety of disciplines that use these preclinical species.
Additional file 1: of Bridging the gap between reference and real...
springernature.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonin Morillon; Daniel Gautheret (2023). Additional file 1: of Bridging the gap between reference and real transcriptomes [Dataset]. http://doi.org/10.6084/m9.figshare.8223203.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8223203.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Antonin Morillon; Daniel Gautheret
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Table S1. Overview of major eukaryotic transcriptome databases. Table S2. Large-scale RNA-seq projects (human). Table S3. Sequencing methods providing insight on specific events shown in Fig 2. Table S4. Transcript variations related to cancer and other diseases; and software for retrieving these variations from RNA-seq data. (XLSX 18 kb)
d
Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer...
search.dataone.org
datadryad.org
Updated Oct 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shreyansh Priyadarshi; Camellia Mazumder; Sayan Biswas; Bhavesh Neekhra; Debayan Gupta; Shubhasis Haldar (2025). Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer hallmarks [Dataset]. http://doi.org/10.5061/dryad.zw3r228jc
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.zw3r228jc
Dataset updated
Oct 23, 2025
Dataset provided by
Dryad Digital Repository
Authors
Shreyansh Priyadarshi; Camellia Mazumder; Sayan Biswas; Bhavesh Neekhra; Debayan Gupta; Shubhasis Haldar
Description
Evidence before this study Â We conducted an extensive literature search using Google Scholar without language restrictions, employing search terms such as â€œ(Predicting OR Classifying OR Annotating) and (cancer hallmarks) AND (Deep OR Machine Learning) OR (Artificial Intelligence OR AI).â€ Despite notable advances in molecular oncology and computational methodologies, a critical gap remains: no existing machine learning or deep learning framework comprehensively predicts cancer hallmarks from tumor biopsy samples. Current research primarily targets specific molecular pathways associated with individual hallmarks, leaving clinicians without an integrated model to interpret hallmark activity at the level of an individual tumor. Moreover, the absence of wet-lab techniques capable of annotating all cancer hallmarks in biopsy samples has further impeded progress, limiting the clinical utility of hallmark-related insights for precision oncology. Â Added value of this study Â This study introdu..., Dataset Collection and Processing Â We utilized a large-scale dataset comprising 2.7 million single-cell transcriptomes derived from 14 tumor types, collected from 922 patients across 51 independent studies conducted globally. This dataset was sourced from the Weizmann Institute's 3CA repository. Quality Control Â Before generating synthetic datasets for model training, the raw single-cell transcriptomic data underwent a rigorous quality control (QC) process. Cells with over 15% mitochondrial transcript content, fewer than 200, or more than 6,000 expressed mRNA transcripts were excluded to ensure data reliability. Â Gene Set Curation Â Gene sets representing cancer hallmarks were compiled from multiple databases, retaining only genes identified in at least two independent sources. This selection was refined through manual literature reviews to exclude genes without direct or indirect roles in hallmark-related pathways. Â Digital Scoring Â Using the curated gene sets, Digital Scores were..., , # Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer hallmarks

https://doi.org/10.5061/dryad.zw3r228jc

Description of the data and file structure

Data Description: Experimental Efforts

This dataset comprises single-cell transcriptomic data from the Weizmann 3CA repository, encompassing 2.7 million single-cell transcriptomes from 14 tumor types, collected from 922 patients across 51 global studies. The primary objective of the experimental efforts was to generate synthetic datasets for training and validating computational models to identify and analyze cancer hallmarks at the single-cell resolution.

Single-cell RNA sequencing (scRNA-seq) data underwent a rigorous quality control process to ensure reliability and biological relevance. This included exclusion criteria based on mitochondrial transcript content (>15%) and mRNA transcript counts (<200 or >6,000 transcripts). Gene sets corresponding to 10 estab...
d
Data from: BBGD454: an Online Database for Blueberry Genomic Data...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). BBGD454: an Online Database for Blueberry Genomic Data Transcriptome analysis of Blueberry using 454 EST sequencing [Dataset]. https://catalog.data.gov/dataset/bbgd454-an-online-database-for-blueberry-genomic-data-transcriptome-analysis-of-blueberry--5783e
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
NOTE: This dataset is no longer publicly available. This database houses over 500,000 sequences that were generated and assembled into approximately 15,000 contigs, annotated and functionally mapped to Gene Ontology (GO) terms. Blueberry (Vaccinium corymbosum) is a major berry crop in the United States. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities during an organism’s developmental stage(s) or its response to biotic or abiotic stresses. Such application of this new sequencing technique allows for high-throughput, genome-wide experimental verification of known and novel transcripts. We have applied a high-throughput pyrosequencing technology (454 EST sequencing) for transcriptome profiling of blueberry during different stages of fruit development to gain an understanding of the genes that are up or down regulated during this process. We have also sequenced flower buds at four different stages of cold acclimation to gain a better understanding of the genes and biochemical pathways that are up- or down-regulated during cold acclimation, since extreme low temperatures are known to reduce crop yield and cause major losses to US farmers. We have also sequenced a leaf sample to compare its transcriptome profile with that of bud and fruit samples. Over 500,000 sequences were generated and assembled into approximately 15,000 contigs and were annotated and functionally mapped to Gene Ontology (GO) terms. A database was developed to house these sequences and their annotations. A web based interface was also developed to allow collaborators to search\browse the data and aid in the analysis and interpretation of the data. The availability of these sequences will allow for future advances, such as the development of a blueberry microarray to study gene expression, and will aid in the blueberry genome sequencing effort that is underway. This work was supported by grant 2008-51180-04861 from the USDA - Cooperative State Research, Education, and Extension Service (CSREES) Specialty Crop Research Initiative program.
Data from: BaRTv1.0: an improved barley reference transcript dataset to...
zenodo.org
osti.gov
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson; Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson (2020). BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq [Dataset]. http://doi.org/10.5281/zenodo.3360434
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3360434
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson; Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background
Time consuming computational assembly and quantification of gene expression and splicing analysis from RNA-seq data vary considerably. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants.

Results
A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts – BaRTv1.0) has been generated. BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. Morex reference genome (Mascher et al., 2017). Full-length cDNAs from the barley variety Haruna nijo (Matsumoto et al., 2011) determined transcript coverage, and high-resolution RT-PCR validated alternatively spliced (AS) transcripts of 86 genes in five different organs and tissue. These methods were used as benchmarks to select an optimal barley RTD. BaRTv1.0-Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5’ and 3’ UTR ends of transcripts. BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. This analysis identified 20,972 significant differentially expressed genes, 2,791 differentially alternatively spliced genes and 2,768 transcripts with differential transcript usage.

Conclusion
A high confidence barley reference transcript dataset consisting of 60,444 genes with 177,240 transcripts has been generated. Compared to current barley transcripts, BaRTv1.0 transcripts are generally longer, have less fragmentation and improved gene models that are well supported by splice junction reads. Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.
s
Cerebellar Development Transcriptome Database
scicrunch.org
dknet.org
+2more
Updated Oct 17, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Cerebellar Development Transcriptome Database [Dataset]. http://identifiers.org/RRID:SCR_013096
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013096
Dataset updated
Oct 17, 2019
Description
Transcriptomic information (spatiotemporal gene expression profile data) on the postnatal cerebellar development of mice (C57B/6J & ICR). It is a tool for mining cerebellar genes and gene expression, and provides a portal to relevant bioinformatics links. The mouse cerebellar circuit develops through a series of cellular and morphological events, including neuronal proliferation and migration, axonogenesis, dendritogenesis, and synaptogenesis, all within three weeks after birth, and each event is controlled by a specific gene group whose expression profile must be encoded in the genome. To elucidate the genetic basis of cerebellar circuit development, CDT-DB analyzes spatiotemporal gene expression by using in situ hybridization (ISH) for cellular resolution and by using fluorescence differential display and microarrays (GeneChip) for developmental time series resolution. The CDT-DB not only provides a cross-search function for large amounts of experimental data (ISH brain images, GeneChip graph, RT-PCR gel images), but also includes a portal function by which all registered genes have been provided with hyperlinks to websites of many relevant bioinformatics regarding gene ontology, genome, proteins, pathways, cell functions, and publications. Thus, the CDT-DB is a useful tool for mining potentially important genes based on characteristic expression profiles in particular cell types or during a particular time window in developing mouse brains.
n
LegumeIP
neuinfo.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LegumeIP [Dataset]. http://identifiers.org/RRID:SCR_008906
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008906
Description
LegumeIP is an integrative database and bioinformatics platform for comparative genomics and transcriptomics to facilitate the study of gene function and genome evolution in legumes, and ultimately to generate molecular based breeding tools to improve quality of crop legumes. LegumeIP currently hosts large-scale genomics and transcriptomics data, including: * Genomic sequences of three model legumes, i.e. Medicago truncatula, Glycine max (soybean) and Lotus japonicus, including two reference plant species, Arabidopsis thaliana and Poplar trichocarpa, with the annotation based on UniProt TrEMBL, InterProScan, Gene Ontology and KEGG databases. LegumeIP covers a total 222,217 protein-coding gene sequences. * Large-scale gene expression data compiled from 104 array hybridizations from L. japonicas, 156 array hybridizations from M. truncatula gene atlas database, and 14 RNA-Seq-based gene expression profiles from G. max on different tissues including four common tissues: Nodule, Flower, Root and Leaf. * Systematic synteny analysis among M. truncatula, G. max, L. japonicus and A. thaliana. * Reconstruction of gene family and gene family-wide phylogenetic analysis across the five hosted species. LegumeIP features comprehensive search and visualization tools to enable the flexible query on gene annotation, gene family, synteny, relative abundance of gene expression.
n
Microarray DB
neuinfo.org
scicrunch.org
+2more
Updated Jan 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Microarray DB [Dataset]. http://identifiers.org/RRID:SCR_008525
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008525
Dataset updated
Jan 29, 2022
Description
A tool for mapping transcriptome data and for creating a database with an overview of the entire pathway, a web-based resource consisting of a web-application for the visualization of complex omics data onto KEGG pathways to overview all entities in the context of cellular pathways, and databases created with the software to visualize a series of microarray data. The web-application accepts transcriptome, proteome, metabolome, or the combination of these data as input, and because of this scalability it is advantageous for the visualization of cell simulation results. Several databases of transcriptome data obtained at Mori Laboratory, Nara Institute of Science and Technology, Japan, are also presented.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rozadilla, Gastón; Mccarthy, Cristina Beryl (2024). Metatranscriptomic and transcriptomic databases (DB4S) of Spodoptera frugiperda larvae guts from Northern Argentina (Tucumán province) [Dataset]. https://ri.conicet.gov.ar/handle/11336/234791

Metatranscriptomic and transcriptomic databases (DB4S) of Spodoptera frugiperda larvae guts from Northern Argentina (Tucumán province)

Explore at:

Dataset updated

May 7, 2024

Authors

Rozadilla, Gastón; Mccarthy, Cristina Beryl

Area covered

Tucumán Province

Description

Spodoptera frugiperda is a noctuid moth that devastates various crops including corn, rice and cotton, and is found in most of the American continent. The purpose of this study was to integrate gene expression data from S. frugiperda guts and their associated metatranscriptomes, under natural and controlled conditions. For this, four S. frugiperda samples from the province of Tucumán (Argentina; subtropical region) were analysed. Specimens were obtained from different environments, altitudes and food sources, namely: 1) a transgenic maize (Zea mays) field at 495 m.a.s.l. where insecticides and fertilisers were applied (named MM; 26o49’50”S; 65o16’59.4”W); 2) Sorghum halepense at 495 m.a.s.l. (MS; 26o49’50”S; 65o16’59.4”W); 3) a maize field at 2283 m.a.s.l. where no insecticides or fertilisers were used (TV; 26o55’40.75”S; 65o45’19.90”W) ; and 4) a colony established from larvae originally collected from the same transgenic maize field as Sf_MM, reared for 9 generations under controlled conditions on an artificial diet adapted from [8], without the addition of antibiotics (BT). For all samples, total RNA extracted from fifth instar larvae guts (two digestive tracts per sample), was submitted to a modified one-step reverse transcription and polymerase chain reaction sequence-independent amplification procedure, as described previously. High-throughput pyrosequencing of the samples was performed using a Roche GS FLX (Macrogen Inc., Korea), yielding ~1Gb of metatranscriptomic reads with lengths of 50 to 1600 bases (nt) (652 nt average). Raw sequence reads were trimmed to remove nucleotides derived from the amplification primers using a custom application. Below follows an outline of the main steps we followed to create the uploaded databases: I.Sequences were compared locally to a combined nucleotide database (nt16SLep = “Non-redundant” nucleotide sequence (nt) database + 16S rRNA gene (16S) database + Lepidopteran whole genome shotgun (Lep) projects completed at the time of the analysis) using BLASTN (Altschul et al., 1990) with a 1e-50 cutoff E-value, and to the protein database (nr = non-redundant protein sequence) using Diamond (Buchfink et al., 2014) with a 1e-17 cutoff E-value. II.The homology search results were then processed as follows: Step A: The output files from both homology searches were processed with MEGAN, a software which performs taxonomic binning and assigns sequences to taxa using the Lowest Common Ancestor (LCA)-assignment algorithm (Huson et al., 2007). Taxonomic and functional assignments performed by MEGAN for each sequence were then exported using a MEGAN functionality. Note: MEGAN computes a “species profile” by finding the lowest node in the NCBI taxonomy that encompasses the set of hit taxa and assigns the sequence to the taxon represented by that lowest node. With this approach, every sequence is assigned to some taxon; if the sequence aligns very specifically only to a single taxon, then it is assigned to that taxon; the less specifically a sequence hits taxa, the higher up in the taxonomy it is placed. Step B: The output files from both homology searches were also processed with a custom bash script. This script parses the homology search output files and generates two files (one for each homology search) containing the name of each sequence, its best hit (or no hit) and the corresponding E-value. III. Create local database (Step C): All this information (from the exported MEGAN files and from the bash script output files) was then used to create a local SQLite database which included all the available information for each sequence (from both homology searches).

Clear search

Close search

Google apps

Main menu

Metatranscriptomic and transcriptomic databases (DB4S) of Spodoptera...

Transcriptomic databases.

Transcriptomic databases and qRT-PCR dataset

Statistics on the reference transcriptomic database.

Human Transcriptome Database for Alternative Splicing

The thirty-seven differentially expressed genes identified from the three...

Table1_Preclinical species gene expression database: Development and...

Characteristics and details of four primary transcriptome databases.

Integration of Proteomics and Transcriptomics Data Sets for the Analysis of...

Transcriptome data in seven databases in the annotate success rate...

Migratory Locust EST Database

Data from: CrusTome: A transcriptome database resource for large-scale...

DataSheet3_Preclinical species gene expression database: Development and...

Additional file 1: of Bridging the gap between reference and real...

Synthetic bulk RNA-Seq transcriptomic profiles representing 10 Cancer...

Description of the data and file structure

Data Description: Experimental Efforts

Data from: BBGD454: an Online Database for Blueberry Genomic Data...

Data from: BaRTv1.0: an improved barley reference transcript dataset to...

Cerebellar Development Transcriptome Database

LegumeIP

Microarray DB

Metatranscriptomic and transcriptomic databases (DB4S) of Spodoptera frugiperda larvae guts from Northern Argentina (Tucumán province)