100+ datasets found

n
Human Transcriptome Database for Alternative Splicing
neuinfo.org
scicrunch.org
+2more
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Human Transcriptome Database for Alternative Splicing [Dataset]. http://identifiers.org/RRID:SCR_013305
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013305
Dataset updated
Jun 4, 2024
Description
A specialized database for human alternative splicing (AS) based on H-Invitational full-length cDNAs. H-DBAS offers unique data and viewer for human Alternative Splicing (AS) analysis. It contains: * Genome-wide representative alternative splicing variants (RASVs) identified from following datasets * H-Inv full-length cDNAs (resource summary): H-Invitational cDNA dataset * H-Inv all transcripts (resource summary): Published human mRNA dataset * Mouse full-length cDNAs (resource summary): Mouse cDNA dataset * RASVs affecting protein functions such as protein motif, GO, subcellular localization signal and transmembrane domain * Conserved RASVs compared with mouse genome and the full-length cDNAs (H-Inv full-length cDNAs only)
Z
Data from: CrusTome: A transcriptome database resource for large-scale...
data.niaid.nih.gov
Updated Mar 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Perez-Moreno, Jorge L.; Kozma, Mihika T.; DeLeo, Danielle M.; Bracken-Grissom, Heather D.; Durica, David S.; Mykles, Donald L. (2023). CrusTome: A transcriptome database resource for large-scale analyses across Crustacea [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7730439
Explore at:
Dataset updated
Mar 15, 2023
Dataset provided by
Department of Biology, Colorado State University
Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution
Department of biology, Colorado State University
Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution, & Department of Biological Sciences and Institute of Environment, Florida International University
Department of Biology, University of Oklahoma
Authors
Perez-Moreno, Jorge L.; Kozma, Mihika T.; DeLeo, Danielle M.; Bracken-Grissom, Heather D.; Durica, David S.; Mykles, Donald L.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CrusTome_v0.1.0 Prerelease /ReadMe - this file /crustome_aa_BLAST.tar.gz - CrusTome database of amino acid sequences in BLAST format /crustome_aa_DIAMOND.tar.gz - CrusTome database of amino acid sequences in DIAMOND format /crustome_mrna_BLAST.tar.gz - CrusTome database of mRNA sequences in BLAST format /dict - Dictionary file to translate species IDs. For usage with sed/awk see link to Github site below

Please note, most of the data files contained in this DOI are

compressed into GZip files (.gz extension).

Mac and Linux OS's can extract this file type natively.

Windows OS requires software to extract the archive. 7-Zip

(http://www.7-zip.org) is free and open source software that will

allow windows PCs to open and decompress the archive.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Pérez-Moreno JL, Kozma MT, DeLeo DM, Bracken-Grissom HD, Durica DS, Mykles DL. 2023. CrusTome: A transcriptome database resource for large-scale analyses across Crustacea. G3: Genes, Genomes, Genetics.

CrusTome: A transcriptome database resource for large-scale analyses across Crustacea

Transcriptomes from non-traditional model organisms often harbor a wealth of unexplored data. Examining these datasets can lead to clarity and novel insights in traditional systems, as well as to discoveries across a multitude of fields. Despite significant advances in DNA sequencing technologies and in their adoption, access to genomic and transcriptomic resources for non-traditional model organisms remains limited. Crustaceans, for example, being amongst the most numerous, diverse, and widely distributed taxa on the planet, often serve as excellent systems to address ecological, evolutionary, and organismal questions. While they are ubiquitously present across environments, and of economic and food security importance, they remain severely underrepresented in publicly available sequence databases. Here, we present CrusTome, a multi-species, multi-tissue, transcriptome database of 201 assembled mRNA transcriptomes (189 crustaceans, 30 of which were previously unpublished, and 12 ecdysozoan outgroups) as an evolving, and publicly available resource. This database is suitable for evolutionary, ecological, and functional studies that employ genomic/transcriptomic techniques and datasets. CrusTome is presented in BLAST and DIAMOND formats, providing robust datasets for sequence similarity searches, orthology assignments, phylogenetic inference, etc., and thus allowing for straight-forward incorporation into existing custom pipelines for high-throughput analyses.

For questions regarding released datasets contact: Corresponding Author: Jorge L. Perez-Moreno (Colorado State University) jorgepm@colostate.edu / jpere645@fiu.edu

https://github.com/invertome/crustome

PLEASE CITE:

Pérez-Moreno JL, Kozma MT, DeLeo DM, Bracken-Grissom HD, Durica DS, Mykles DL. 2023. CrusTome: A transcriptome database resource for large-scale analyses across Crustacea. G3: Genes, Genomes, Genetics.

Funder Information

Supported by National Science Foundation grants to DLM (IOS-1922701) and DSD (IOS-1922755). In addition, this work was partially funded by two grants awarded from the National Science Foundation: Doctoral Dissertation Improvement Grant (#1701835) awarded to JPM and HBG and the Division of Environmental Biology Bioluminescence and Vision grant (DEB-1556059) awarded to HBG. Samples in the FICC were collected by grants from The Gulf of Mexico Research Initiative (GOMRI), Florida Institute of Oceanography Shiptime Funding awarded to HBG and DMD; the National Science Foundation Division of Environmental Biology Grant 1556059 awarded to HBG; and the National Oceanic and Atmospheric Administration Ocean Exploration Research (NOAA-OER 2015) grant awarded to HBG.
r
Songbird Brain Transcriptome Database
rrid.site
dknet.org
+1more
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Songbird Brain Transcriptome Database [Dataset]. http://identifiers.org/RRID:SCR_006182
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_006182
Dataset updated
Nov 12, 2025
Description
Database containing cDNA clone information of the brains of songbirds. These clones are annotated with behavioral information, as well as links to information of homologous genes of other species. The database includes over 91,000 zebra finch brain cDNAs (2009) sequenced by Duke, ESTIMA, and Rockefeller research groups. The project is a collaborative effort of the Jarvis Laboratory of Duke University, Duke Bioinformatics, and The Genomics group of RIKEN, with Erich D. Jarvis as P.I. and Kazuhiro Wada as Co-P.I. Microarrays with the cDNAs in this database are available at Duke http://mgm.duke.edu/genome/dna_micro/core/spotted.htm and through the NIH Neurosciences Microarray Consortium http://arrayconsortium.tgen.org/np2/public/overview.jsp
Genome-Wide Functional Analysis of the Cotton Transcriptome by Creating an...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fuliang Xie; Guiling Sun; John W. Stiller; Baohong Zhang (2023). Genome-Wide Functional Analysis of the Cotton Transcriptome by Creating an Integrated EST Database [Dataset]. http://doi.org/10.1371/journal.pone.0026980
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0026980
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Fuliang Xie; Guiling Sun; John W. Stiller; Baohong Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A total of 28,432 unique contigs (25,371 in consensus contigs and 3,061 as singletons) were assembled from all 268,786 cotton ESTs currently available. Several in silico approaches [comparative genomics, Blast, Gene Ontology (GO) analysis, and pathway enrichment by Kyoto Encyclopedia of Genes and Genomes (KEGG)] were employed to investigate global functions of the cotton transcriptome. Cotton EST contigs were clustered into 5,461 groups with a maximum cluster size of 196 members. A total of 27,956 indel mutants and 149,616 single nucleotide polymorphisms (SNPs) were identified from consensus contigs. Interestingly, many contigs with significantly high frequencies of indels or SNPs encode transcription factors and protein kinases. In a comparison with six model plant species, cotton ESTs show the highest overall similarity to grape. A total of 87 cotton miRNAs were identified; 59 of these have not been reported previously from experimental or bioinformatics investigations. We also predicted 3,260 genes as miRNAs targets, which are associated with multiple biological functions, including stress response, metabolism, hormone signal transduction and fiber development. We identified 151 and 4,214 EST-simple sequence repeats (SSRs) from contigs and raw ESTs respectively. To make these data widely available, and to facilitate access to EST-related genetic information, we integrated our results into a comprehensive, fully downloadable web-based cotton EST database (www.leonxie.com).
d
Cerebellar Development Transcriptome Database
dknet.org
scicrunch.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Cerebellar Development Transcriptome Database [Dataset]. http://identifiers.org/RRID:SCR_013096
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_013096 https://identifiers.org/RRID:SCR_013096/resolver
Dataset updated
Jan 29, 2022
Description
Transcriptomic information (spatiotemporal gene expression profile data) on the postnatal cerebellar development of mice (C57B/6J & ICR). It is a tool for mining cerebellar genes and gene expression, and provides a portal to relevant bioinformatics links. The mouse cerebellar circuit develops through a series of cellular and morphological events, including neuronal proliferation and migration, axonogenesis, dendritogenesis, and synaptogenesis, all within three weeks after birth, and each event is controlled by a specific gene group whose expression profile must be encoded in the genome. To elucidate the genetic basis of cerebellar circuit development, CDT-DB analyzes spatiotemporal gene expression by using in situ hybridization (ISH) for cellular resolution and by using fluorescence differential display and microarrays (GeneChip) for developmental time series resolution. The CDT-DB not only provides a cross-search function for large amounts of experimental data (ISH brain images, GeneChip graph, RT-PCR gel images), but also includes a portal function by which all registered genes have been provided with hyperlinks to websites of many relevant bioinformatics regarding gene ontology, genome, proteins, pathways, cell functions, and publications. Thus, the CDT-DB is a useful tool for mining potentially important genes based on characteristic expression profiles in particular cell types or during a particular time window in developing mouse brains.
d
Transcriptome Shotgun Assembly (TSA) Sequence Database and Submissions
catalog.data.gov
datadiscovery.nlm.nih.gov
+3more
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). Transcriptome Shotgun Assembly (TSA) Sequence Database and Submissions [Dataset]. https://catalog.data.gov/dataset/transcriptome-shotgun-assembly-tsa-sequence-database-and-submissions-822d5
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
National Library of Medicine
Description
TSA is an archive of computationally assembled transcript sequences from primary data such as ESTs and Next Generation Sequencing Technologies. The overlapping sequence reads from a complete transcriptome are assembled into transcripts by computational methods instead of by traditional cloning and sequencing of cloned cDNAs. The primary sequence data used in the assemblies must have been experimentally determined by the same submitter. TSA sequence records differ from GenBank records because there are no physical counterparts to the assemblies.
d
Data from: BBGD454: an Online Database for Blueberry Genomic Data...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). BBGD454: an Online Database for Blueberry Genomic Data Transcriptome analysis of Blueberry using 454 EST sequencing [Dataset]. https://catalog.data.gov/dataset/bbgd454-an-online-database-for-blueberry-genomic-data-transcriptome-analysis-of-blueberry--5783e
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
NOTE: This dataset is no longer publicly available. This database houses over 500,000 sequences that were generated and assembled into approximately 15,000 contigs, annotated and functionally mapped to Gene Ontology (GO) terms. Blueberry (Vaccinium corymbosum) is a major berry crop in the United States. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities during an organism’s developmental stage(s) or its response to biotic or abiotic stresses. Such application of this new sequencing technique allows for high-throughput, genome-wide experimental verification of known and novel transcripts. We have applied a high-throughput pyrosequencing technology (454 EST sequencing) for transcriptome profiling of blueberry during different stages of fruit development to gain an understanding of the genes that are up or down regulated during this process. We have also sequenced flower buds at four different stages of cold acclimation to gain a better understanding of the genes and biochemical pathways that are up- or down-regulated during cold acclimation, since extreme low temperatures are known to reduce crop yield and cause major losses to US farmers. We have also sequenced a leaf sample to compare its transcriptome profile with that of bud and fruit samples. Over 500,000 sequences were generated and assembled into approximately 15,000 contigs and were annotated and functionally mapped to Gene Ontology (GO) terms. A database was developed to house these sequences and their annotations. A web based interface was also developed to allow collaborators to search\browse the data and aid in the analysis and interpretation of the data. The availability of these sequences will allow for future advances, such as the development of a blueberry microarray to study gene expression, and will aid in the blueberry genome sequencing effort that is underway. This work was supported by grant 2008-51180-04861 from the USDA - Cooperative State Research, Education, and Extension Service (CSREES) Specialty Crop Research Initiative program.
f
Data from: A crustacean annotated transcriptome (CAT) database
datasetcatalog.nlm.nih.gov
figshare.com
Updated Sep 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nong, Wenyan; Qiu, Jian-Wen; Chu, Ka-Hou; Chai, Zacary Y. H.; Qin, Jing; Hui, Jerome Ho Lam; Yan, Mak Kai; Chow, Billy Kwok Chong; Jiang, Xiaosen (2020). A crustacean annotated transcriptome (CAT) database [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000453200
Explore at:
Dataset updated
Sep 7, 2020
Authors
Nong, Wenyan; Qiu, Jian-Wen; Chu, Ka-Hou; Chai, Zacary Y. H.; Qin, Jing; Hui, Jerome Ho Lam; Yan, Mak Kai; Chow, Billy Kwok Chong; Jiang, Xiaosen
Description
BackgroundDecapods are an order of crustaceans which includes shrimps, crabs, lobsters and crayfish. They occur worldwide and are of great scientific interest as well as being of ecological and economic importance in fisheries and aquaculture. However, our knowledge of their biology mainly comes from the group which is most closely related to crustaceans – insects. Here we produce a de novo transcriptome database, crustacean annotated transcriptome (CAT) database, spanning multiple tissues and the life stages of seven crustaceans.DescriptionA total of 71 transcriptome assemblies from six decapod species and a stomatopod species, including the coral shrimp Stenopus hispidus, the cherry shrimp Neocaridina davidi, the redclaw crayfish Cherax quadricarinatus, the spiny lobster Panulirus ornatus, the red king crab Paralithodes camtschaticus, the coconut crab Birgus latro, and the zebra mantis shrimp Lysiosquillina maculata, were generated. Differential gene expression analyses within species were generated as a reference and included in a graphical user interface database at http://cat.sls.cuhk.edu.hk/. Users can carry out gene name searches and also access gene sequences based on a sequence query using the BLAST search function.ConclusionsThe data generated and deposited in this database offers a valuable resource for the further study of these crustaceans, as well as being of use in aquaculture development.
n
Integrated Tumor Transcriptome Array and Clinical data Analysis
neuinfo.org
scicrunch.org
+2more
Updated Jan 8, 2006
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2006). Integrated Tumor Transcriptome Array and Clinical data Analysis [Dataset]. http://identifiers.org/RRID:SCR_008182
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008182
Dataset updated
Jan 8, 2006
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented on 6/12/25. ITTACA is a database created for Integrated Tumor Transcriptome Array and Clinical data Analysis. ITTACA centralizes public datasets containing both gene expression and clinical data and currently focuses on the types of cancer that are of particular interest to the Institut Curie: breast carcinoma, bladder carcinoma, and uveal melanoma. ITTACA is developed by the Institut Curie Bioinformatics group and the Molecular Oncology group of UMR144 CNRS/Institut Curie. A web interface allows users to carry out different class comparison analyses, including comparison of expression distribution profiles, tests for differential expression, patient survival analyses, and users can define their own patient groups according to clinical data or gene expression levels. The different functionalities implemented in ITTACA are: - To test if one or more gene, of your choice, is differentially expressed between two groups of samples exhibiting distinct phenotypes (Student and Wilcoxon tests). - The detection of genes differentially expressed (Significance Analysis of Microarrays) between two groups of samples. - The creation of histograms which represent the expression level according to a clinical parameter for each sample. - The computation of Kaplan Meier survival curves for each group. ITTACA has been developed to be a useful tool for comparing personal results to the existing results in the field of transcriptome studies with microarrays.
r
CATdb: a Complete Arabidopsis Transcriptome database
rrid.site
dknet.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). CATdb: a Complete Arabidopsis Transcriptome database [Dataset]. http://identifiers.org/RRID:SCR_007582
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007582
Dataset updated
Jan 29, 2022
Description
CATdb collects together all the information on transcriptome experiments done at URGV with CATMA micro arrays. All data in CATdb come from the URGV micro array platforms. Common procedures are used including any steps from the experiment design to the statistical analyses. Directed through a WEB interface, biologists enter the standard description of each experimental step (extraction, labelling, hybridization and scanning). Then, normalization and statistical analyses are done following a set of selected methods depending on the experimental design and array types.
Cowpea genome and transcriptome data resource
researchdata.edu.au
datadownload
Updated Jun 6, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Koltunow; Jen Taylor; Steven Henderson; Andrew Spriggs (2018). Cowpea genome and transcriptome data resource [Dataset]. http://doi.org/10.4225/08/5B1723666D6A5
Explore at:
datadownloadAvailable download formats
Unique identifier
https://doi.org/10.4225/08/5B1723666D6A5
Dataset updated
Jun 6, 2018
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Anna Koltunow; Jen Taylor; Steven Henderson; Andrew Spriggs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Assembled genomic and tissue-specific transcriptomic data resources for two genetically distinct lines of Cowpea (Vigna unguiculata (L.) Walp). For each of two varieties of cowpea (IT97K-499-35, IT86D-1010) this collections contains the following datasets :

i) genomic survey assemblies based on Illumina sequencing ii) transcriptome assemblies iii) raw DNA and RNA sequence data feeding into the above assemblies iv) In-silico gene predictions and predicted gene sequences derived from IT86D-1010 and IT97K-499-35 v) Mapping to the Vigna unguiculata v1.0 reference genome (http://phytozome.jgi.doe.gov/)
Acheta domesticus Transcriptome
agdatacommons.nal.usda.gov
bin
Updated Mar 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA (2025). Acheta domesticus Transcriptome [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Acheta_domesticus_Transcriptome/25085654
Explore at:
binAvailable download formats
Dataset updated
Mar 11, 2025
Dataset provided by
National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
Authors
USDA
License
https://rightsstatements.org/vocab/UND/1.0/https://rightsstatements.org/vocab/UND/1.0/
Description
Sequencing major development stages of the house cricket, Acheta domesticus, will provide the first large transcriptome database on a species that is being developed for the food and feed industry. Obtaining information on specific gene sets will enable us to engineer this and other insect species to optimize traits related to improved nutritional content and disease resistance.
s
Brain Transcriptome Database
scicrunch.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brain Transcriptome Database [Dataset]. http://identifiers.org/RRID:SCR_014457
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_014457
Description
A platform that allow users to visualize and analyze transcriptome data related to the genetics that underlie the development, function, and dysfunction stages and states of the brain. Users can search for cerebellar development genes by name, ID, keyword, expression, and tissue specificity. Search results include general information, links, temporal, spatial, and tissue information, and gene category.
S
Long non-coding RNA transcriptome database of Malus sieversii infected with...
scidb.cn
Updated Nov 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaojie Liu; Daoyuan Zhang (2025). Long non-coding RNA transcriptome database of Malus sieversii infected with Valsa mali [Dataset]. http://doi.org/10.57760/sciencedb.31349
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.31349
Dataset updated
Nov 14, 2025
Dataset provided by
Science Data Bank
Authors
Xiaojie Liu; Daoyuan Zhang
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Malus sieversii (Xinjiang wild apple) is the ancestral species of cultivated apples worldwide, boasting abundant genetic resources and serving as a high-quality gene bank for the molecular breeding of cultivated apples. The data presented herein is the lncRNA transcriptome data obtained via PacBio sequencing technology, following the infection of M. sieversii branches with Valsa mali (causal agent of Valsa canker). Compared with lncRNA research in animals, studies on plant lncRNAs are relatively scarce, and those focusing on woody plants are even rarer. Therefore, this dataset can enrich the lncRNA data of woody plants and lay a foundation for lncRNA research in apples.
Data from: BaRTv1.0: an improved barley reference transcript dataset to...
zenodo.org
osti.gov
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson; Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson (2020). BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq [Dataset]. http://doi.org/10.5281/zenodo.3360434
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3360434
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson; Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background
Time consuming computational assembly and quantification of gene expression and splicing analysis from RNA-seq data vary considerably. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants.

Results
A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts – BaRTv1.0) has been generated. BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. Morex reference genome (Mascher et al., 2017). Full-length cDNAs from the barley variety Haruna nijo (Matsumoto et al., 2011) determined transcript coverage, and high-resolution RT-PCR validated alternatively spliced (AS) transcripts of 86 genes in five different organs and tissue. These methods were used as benchmarks to select an optimal barley RTD. BaRTv1.0-Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5’ and 3’ UTR ends of transcripts. BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. This analysis identified 20,972 significant differentially expressed genes, 2,791 differentially alternatively spliced genes and 2,768 transcripts with differential transcript usage.

Conclusion
A high confidence barley reference transcript dataset consisting of 60,444 genes with 177,240 transcripts has been generated. Compared to current barley transcripts, BaRTv1.0 transcripts are generally longer, have less fragmentation and improved gene models that are well supported by splice junction reads. Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.
Z
Ramonda serbica de novo transcriptome database
data.niaid.nih.gov
Updated Mar 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marija Vidovic (2022). Ramonda serbica de novo transcriptome database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6341872
Explore at:
Dataset updated
Mar 10, 2022
Dataset provided by
Institute of Molecular Genetics and Genetic Engineering University of Belgrade
Authors
Marija Vidovic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ramonda serbica de novo transcriptome database obtained from desiccated and hydrated leaves.
r
GeneSpeed- A Database of Unigene Domain Organization
rrid.site
test2.scicrunch.org
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GeneSpeed- A Database of Unigene Domain Organization [Dataset]. http://identifiers.org/RRID:SCR_002779
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002779
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented on July 16, 2013. Database and customized tools to study the PFAM protein domain content of the transcriptome for all expressed genes of Homo sapiens, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans tethered to both a genomics array repository database and a range of external information resources. GeneSpeed has merged information from several existing data sets including the Gene Ontology Consortium, InterPro, Pfam, Unigene, as well as micro-array datasets. GeneSpeed is a database of PFAM domain homology contained within Unigene. Because Unigene is a non-redundant dbEST database, this provides a wide encompassing overview of the domain content of the expressed transcriptome. We have structured the GeneSpeed Database to include a rich toolset allowing the investigator to study all domain homology, no matter how remote. As a result, homology cutoff score decisions are determined by the scientist, not by a computer algorithm. This quality is one of the novel defining features of the GeneSpeed database giving the user complete control of database content. In addition to a domain content toolset, GeneSpeed provides an assortment of links to external databases, a unique and manually curated Transcription Factor Classification list, as well as links to our newly evolving GeneSpeed BetaCell Database. GeneSpeed BetaCell is a micro-array depository combined with custom array analysis tools created with an emphasis around the meta analysis of developmental time series micro-array datasets and their significance in pancreatic beta cells.
n
Data from: AmpuBase: a transcriptome database for eight species of apple...
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated Jan 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jack C. H. Ip; Huawei Mu; Qian Chen; Jin Sun; Santiago Ituarte; Horacio Heras; Bert Van Bocxlaer; Monthon Ganmanee; Xin Huang; Jian-Wen Qiu (2019). AmpuBase: a transcriptome database for eight species of apple snails (Gastropoda: Ampullariidae) [Dataset]. http://doi.org/10.5061/dryad.117cf
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.117cf
Dataset updated
Jan 19, 2019
Dataset provided by
Hong Kong University of Science and Technology
Hong Kong Baptist University
Universidad Nacional de La Plata
Université de Lille
King Mongkut's Institute of Technology Ladkrabang
HKBU Institute of Research and Continuing Education, Shenzhen, China
Authors
Jack C. H. Ip; Huawei Mu; Qian Chen; Jin Sun; Santiago Ituarte; Horacio Heras; Bert Van Bocxlaer; Monthon Ganmanee; Xin Huang; Jian-Wen Qiu
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Background: Gastropoda, with approximately 80,000 living species, is the largest class of Mollusca. Among gastropods, apple snails (family Ampullariidae) have members that are widely distributed in tropical and subtropical freshwater ecosystems and are ecologically and economically important. They exhibit various morphological and physiological adaptations to their respective habitats, which make them ideal candidates for studying adaptation, population divergence, speciation, and larger-scale patterns of diversity, including biogeography of native and invasive populations. The limited availability of genomic data, however, hinders in-depth ecological and evolutionary studies of these non-model organisms.

Results: Using Illumina Hiseq platforms, we sequenced 1,220 million reads for seven species of apple snails. Together with the RNA-Seq data of two apple snails, we conducted de novo transcriptome assembly of eight species covering five genera of Ampullariidae, including representatives of the Old World and New World lineages. There were 20,730 to 35,828 unigenes with predicted open read frames for the eight species, with N50 (shortest sequence length at 50% of the unigenes) ranging from 1,320 to 1,803 bp. 69.7 % to 80.2 % of these unigenes were functionally annotated by searching against databases of NCBI’s non-redundant, Gene Ontology and Kyoto Encyclopaedia of Genes and Genomes. With these data we developed AmpuBase, a relational database that features online BLAST for DNA/protein sequences, keyword search for unigenes/functional terms, and download functions for sequences and whole transcriptomes.

Conclusions: In summary, we have generated comprehensive transcriptome data for multiple ampullariid genera and species, and created a publicly accessible database with a user-friendly interface to facilitate future basic and applied studies on ampullariids, and comparative molecular studies with other invertebrates.
Z
Master Coral database used in USVI SCTLD Transmission Experiment Gene...
data.niaid.nih.gov
Updated Apr 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kelsey Beavers (2023). Master Coral database used in USVI SCTLD Transmission Experiment Gene Expression Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7838979
Explore at:
Dataset updated
Apr 19, 2023
Dataset provided by
University of Texas at Arlington
Authors
Kelsey Beavers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
U.S. Virgin Islands
Description
The Master Coral Database fasta file is comprised of previously published genome-derived predicted gene models and transcriptomes spanning a wide diversity of coral families. Transcriptomes are from Davies et al., 2016 (doi: 10.3389/fmars.2016.00112), Kirk et al., 2018 (DOI: 10.1111/mec.14934); Moya et al., 2012 (doi: 10.1111/j.1365-294X.2012.05554.x); van de Water et al., 2018 (DOI: 10.1111/mec.14489).
f
Data from: EukProt: a database of genome-scale predicted proteins across the...
datasetcatalog.nlm.nih.gov
figshare.com
Updated Mar 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
de Vargas, Colomban; Muñoz-Gómez, Sergio A.; Strassert, Jürgen; Richter, Daniel; Wideman, Jeremy G.; Burki, Fabien; Poh, Yu-Ping; Berney, Cédric; Herman, Emily K. (2022). EukProt: a database of genome-scale predicted proteins across the diversity of eukaryotes [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000204120
Explore at:
Dataset updated
Mar 23, 2022
Authors
de Vargas, Colomban; Muñoz-Gómez, Sergio A.; Strassert, Jürgen; Richter, Daniel; Wideman, Jeremy G.; Burki, Fabien; Poh, Yu-Ping; Berney, Cédric; Herman, Emily K.
Description
Version 3 (22 November, 2021) See https://doi.org/10.24072/pcjournal.173 for a detailed description of the database. See http://evocellbio.com/eukprot/ for a BLAST database, interactive plots of BUSCO scores and ‘The Comparative Set’ (TCS): A selected subset of EukProt for comparative genomics investigations. Protein sequence FASTA files of the TCS are available at https://doi.org/10.6084/m9.figshare.21586065. See https://github.com/beaplab/EukProt for utility scripts, annotations, and all the files necessary to build the tree in Figures 1 and 3 (from the DOI above). Scroll to the end of this page for changes since version 2. Are we missing anything? Please let us know! EukProt is a database of published and publicly available predicted protein sets selected to represent the breadth of eukaryotic diversity, currently including 993 species from all major supergroups as well as orphan taxa. The goal of the database is to provide a single, convenient resource for gene-based research across the spectrum of eukaryotic life, such as phylogenomics and gene family evolution. Each species is placed within the UniEuk taxonomic framework in order to facilitate downstream analyses, and each data set is associated with a unique, persistent identifier to facilitate comparison and replication among analyses. The database is regularly updated, and all versions will be permanently stored and made available via FigShare. The current version has a number of updates, notably ‘The Comparative Set’ (TCS), a reduced taxonomic set with high estimated completeness while maintaining a substantial phylogenetic breadth, which comprises 196 predicted proteomes. A BLAST web server and graphical displays of data set completeness are available at http://evocellbio.com/eukprot/. We invite the community to provide suggestions for new data sets and new annotation features to be included in subsequent versions, with the goal of building a collaborative resource that will promote research to understand eukaryotic diversity and diversification. This release contains 5 files: EukProt_proteins.v03.2021_11_22.tgz: 993 protein data sets, for species with either a genome (375) or single-cell genome (56), a transcriptome (498), a single-cell transcriptome (47), or an EST assembly (17). EukProt_genome_annotations.v03.2021_11_22.tgz: gene annotations, in GFF format, as produced by EukMetaSanity (https://github.com/cjneely10/EukMetaSanity) for 40 genomes lacking publicly available protein annotations. The proteins predicted from these annotations are included in the proteins file. EukProt_included_data_sets.v03.2021_11_22.txt and EukProt_not_included_data_sets.v03.2021_11_22.txt: tables of information on data sets either included (993 data sets) or not included (163) in the database. Tab-delimited; multiple entries in the same cell are comma-delimited; missing data is represented with the “N/A” value. With the following columns: EukProt_ID: the unique identifier associated with the data set. This will not change among versions. If a new data set becomes available for the species, it will be assigned a new unique identifier. Name_to_Use: the name of the species for protein/genome annotation/assembled transcriptome files. Strain: the strain(s) of the species sequenced. Previous_Names: any previous names that this species was known by. Replaces_EukProt_ID/Replaced_by_EukProt_ID: if the data set changes with respect to an earlier version, the EukProt ID of the data set that it replaces (in the included table) or that it is replaced by (in the not_included table). Genus_UniEuk, Epithet_UniEuk, Supergroup_UniEuk, Taxogroup1_UniEuk, Taxogroup2_UniEuk: taxonomic identifiers at different levels of the UniEuk taxonomy (Berney et al. 2017, DOI: 10.1111/jeu.12414, based on Adl et al. 2019, DOI: 10.1111/jeu.12691). Taxonomy_UniEuk: the full lineage of the species in the UniEuk taxonomy (semicolon-delimited). Merged_Strains: whether multiple strains of the same species were merged to create the data set. Data_Source_URL: the URL(s) from which the data were downloaded. Data_Source_Name: the name of the data set (as assigned by the data source). Paper_DOI: the DOI(s) of the paper(s) that published the data set. Actions_Prior_to_Use: the action(s) that were taken to process the publicly available files in order to produce the data set in this database. Actions taken (see our manuscript for more details): ‘assemble mRNA’: Trinity v. 2.8.4, http://trinityrnaseq.github.io/ ‘CD-HIT’: v. 4.6, http://weizhongli-lab.org/cd-hit/ ‘extractfeat’, ‘seqret’, ‘transeq’, ‘trimseq’: from EMBOSS package v. 6.6.0.0, http://emboss.sourceforge.net/ ‘translate mRNA’: Transdecoder v. 5.3.0, http://transdecoder.github.io/ ‘gffread’: v.0.12.3 https://github.com/gpertea/gffread ‘predict genes’: EukMetaSanity https://github.com/cjneely10/EukMetaSanity (cloned on 21 September, 2021) All parameter values were default, unless otherwise specified. Data_Source_Type: the type of the source data (possible types: EST, transcriptome, single-cell transcriptome, genome, single-cell genome). Notes: additional information on the data set (including why it is replaced by/is replacing another data set, or why it was not included). Columns_Modified_Since_Previous_Version: column(s) in this file modified for the data set since the previous release. Not listed: modifications to the Notes column or to new columns added in this version. Alternative_Strain_Names: non-exhaustive list of alternative names for the sequenced strain for this data set. 18S_Sequence_GenBank_ID: GenBank identifier for the strain sequenced in the data set. When multiple strains were sequenced, identifiers are separated with a comma, in the same order as the Strain column. Ranges of identifiers for the same strain are separated by a hyphen. ‘N/A’ indicates either that there is no GenBank sequence for the strain or that all available sequences are not full-length (< 1,500 bp). 18S_Sequence: 18S for the strain derived from publicly available sequences associated with the data set, in the case where a GenBank sequence is not available. 18S_Sequence_Source: the source for the sequence in the 18S_Sequence column, if any. 18S_Sequence_Other_Strain_GenBank_ID: GenBank identifier for 18S sequence(s) from other strains of the same species as the data set. 18S_Sequence_Other_Strain_Name: strain name(s) for the sequences in the 18S_Sequence_Other_Strain_GenBank_ID column. 18S_and_Taxonomy_Notes: additional information on the values in the 18S_Sequence columns. Changes since version 2 There are 324 new data sets included. 57 of these replace data sets from version 2. 40 newly published data sets were added to the list that are not included in the database (annotated in the Notes column with the reasons they were not included). Instead of unannotated genomes (for published genomes lacking protein predictions), we now include predicted proteins and gene annotations (in GFF3 format). All sequences within each file are now assigned a standardized, unique identifier based on the data set’s EukProt_ID and on the type of data (protein or transcriptome). Illegal characters are removed from sequences. In the UniEuk_Taxonomy field, single quotes are now used instead of double quotes, to be consistent with other UniEuk databases (EukMap, EukRibo). Changes to metadata of individual data sets (in the included and not_included tables) with respect to the previous version are now listed in the Columns_Modified_Since_Previous_Version column. The Taxogroup_UniEuk column has been split into the Taxogroup1_UniEuk and Taxogroup2_UniEuk columns. This resulted in the Supergroup_UniEuk column changing for Opisthokonta. In addition, the following new columns have been added (see our manuscript for details): Alternative_Strain_Names, 18S_Sequence_GenBank_ID, 18S_Sequence, 18S_Sequence_Source, 18S_Sequence_Other_Strain_GenBank_ID, 18S_Sequence_Other_Strain_Name, 18S_and_Taxonomy_Notes. EukProt_assembled_transcriptomes.v03.2021_11_22.tgz: assembled transcriptome contigs, for 126 species with publicly available mRNA sequence reads but no publicly available assembly. The proteins predicted from these assemblies are included in the proteins file. Sequence names in the proteins and transcriptomes files have standardized, unique identifiers with the following format: >[EukProt ID]_[Name_to_Use]_[Type abbreviation][Counter] [Previous header contents] Type abbreviations are P (protein) and T (transcriptome). All characters not in the following list are removed from nucleic acid sequences: ACGTNUKSYMWRBDHV All characters not in the the following list are removed from protein sequences: ABCDEFGHIKLMNPQRSTUVWYZX* Lists of legal characters are from: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). Human Transcriptome Database for Alternative Splicing [Dataset]. http://identifiers.org/RRID:SCR_013305

Human Transcriptome Database for Alternative Splicing

RRID:SCR_013305, nif-0000-02935, OMICS_01887, Human Transcriptome Database for Alternative Splicing (RRID:SCR_013305), H-DBAS, H-DBAS - Human-transcriptome DataBase for Alternative Splicing

Explore at:

69 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://identifiers.org/RRID:SCR_013305

Dataset updated

Jun 4, 2024

Description

A specialized database for human alternative splicing (AS) based on H-Invitational full-length cDNAs. H-DBAS offers unique data and viewer for human Alternative Splicing (AS) analysis. It contains: * Genome-wide representative alternative splicing variants (RASVs) identified from following datasets * H-Inv full-length cDNAs (resource summary): H-Invitational cDNA dataset * H-Inv all transcripts (resource summary): Published human mRNA dataset * Mouse full-length cDNAs (resource summary): Mouse cDNA dataset * RASVs affecting protein functions such as protein motif, GO, subcellular localization signal and transmembrane domain * Conserved RASVs compared with mouse genome and the full-length cDNAs (H-Inv full-length cDNAs only)

Clear search

Close search

Google apps

Main menu

Human Transcriptome Database for Alternative Splicing

Data from: CrusTome: A transcriptome database resource for large-scale...

Songbird Brain Transcriptome Database

Genome-Wide Functional Analysis of the Cotton Transcriptome by Creating an...

Cerebellar Development Transcriptome Database

Transcriptome Shotgun Assembly (TSA) Sequence Database and Submissions

Data from: BBGD454: an Online Database for Blueberry Genomic Data...

Data from: A crustacean annotated transcriptome (CAT) database

Integrated Tumor Transcriptome Array and Clinical data Analysis

CATdb: a Complete Arabidopsis Transcriptome database

Cowpea genome and transcriptome data resource

Acheta domesticus Transcriptome

Brain Transcriptome Database

Long non-coding RNA transcriptome database of Malus sieversii infected with...

Data from: BaRTv1.0: an improved barley reference transcript dataset to...

Ramonda serbica de novo transcriptome database

GeneSpeed- A Database of Unigene Domain Organization

Data from: AmpuBase: a transcriptome database for eight species of apple...

Master Coral database used in USVI SCTLD Transmission Experiment Gene...

Data from: EukProt: a database of genome-scale predicted proteins across the...

Human Transcriptome Database for Alternative Splicing

RRID:SCR_013305, nif-0000-02935, OMICS_01887, Human Transcriptome Database for Alternative Splicing (RRID:SCR_013305), H-DBAS, H-DBAS - Human-transcriptome DataBase for Alternative Splicing