100+ datasets found
  1. n

    Human Transcriptome Database for Alternative Splicing

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Human Transcriptome Database for Alternative Splicing [Dataset]. http://identifiers.org/RRID:SCR_013305
    Explore at:
    Dataset updated
    Jun 4, 2024
    Description

    A specialized database for human alternative splicing (AS) based on H-Invitational full-length cDNAs. H-DBAS offers unique data and viewer for human Alternative Splicing (AS) analysis. It contains: * Genome-wide representative alternative splicing variants (RASVs) identified from following datasets * H-Inv full-length cDNAs (resource summary): H-Invitational cDNA dataset * H-Inv all transcripts (resource summary): Published human mRNA dataset * Mouse full-length cDNAs (resource summary): Mouse cDNA dataset * RASVs affecting protein functions such as protein motif, GO, subcellular localization signal and transmembrane domain * Conserved RASVs compared with mouse genome and the full-length cDNAs (H-Inv full-length cDNAs only)

  2. Z

    Data from: CrusTome: A transcriptome database resource for large-scale...

    • data.niaid.nih.gov
    Updated Mar 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Perez-Moreno, Jorge L.; Kozma, Mihika T.; DeLeo, Danielle M.; Bracken-Grissom, Heather D.; Durica, David S.; Mykles, Donald L. (2023). CrusTome: A transcriptome database resource for large-scale analyses across Crustacea [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7730439
    Explore at:
    Dataset updated
    Mar 15, 2023
    Dataset provided by
    Department of Biology, Colorado State University
    Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution
    Department of biology, Colorado State University
    Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution, & Department of Biological Sciences and Institute of Environment, Florida International University
    Department of Biology, University of Oklahoma
    Authors
    Perez-Moreno, Jorge L.; Kozma, Mihika T.; DeLeo, Danielle M.; Bracken-Grissom, Heather D.; Durica, David S.; Mykles, Donald L.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CrusTome_v0.1.0 Prerelease /ReadMe - this file /crustome_aa_BLAST.tar.gz - CrusTome database of amino acid sequences in BLAST format /crustome_aa_DIAMOND.tar.gz - CrusTome database of amino acid sequences in DIAMOND format /crustome_mrna_BLAST.tar.gz - CrusTome database of mRNA sequences in BLAST format /dict - Dictionary file to translate species IDs. For usage with sed/awk see link to Github site below

    • Please note, most of the data files contained in this DOI are
    • compressed into GZip files (.gz extension).
    • Mac and Linux OS's can extract this file type natively.
    • Windows OS requires software to extract the archive. 7-Zip
    • (http://www.7-zip.org) is free and open source software that will
    • allow windows PCs to open and decompress the archive.

    • * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

    Pérez-Moreno JL, Kozma MT, DeLeo DM, Bracken-Grissom HD, Durica DS, Mykles DL. 2023. CrusTome: A transcriptome database resource for large-scale analyses across Crustacea. G3: Genes, Genomes, Genetics.

    CrusTome: A transcriptome database resource for large-scale analyses across Crustacea

    Transcriptomes from non-traditional model organisms often harbor a wealth of unexplored data. Examining these datasets can lead to clarity and novel insights in traditional systems, as well as to discoveries across a multitude of fields. Despite significant advances in DNA sequencing technologies and in their adoption, access to genomic and transcriptomic resources for non-traditional model organisms remains limited. Crustaceans, for example, being amongst the most numerous, diverse, and widely distributed taxa on the planet, often serve as excellent systems to address ecological, evolutionary, and organismal questions. While they are ubiquitously present across environments, and of economic and food security importance, they remain severely underrepresented in publicly available sequence databases. Here, we present CrusTome, a multi-species, multi-tissue, transcriptome database of 201 assembled mRNA transcriptomes (189 crustaceans, 30 of which were previously unpublished, and 12 ecdysozoan outgroups) as an evolving, and publicly available resource. This database is suitable for evolutionary, ecological, and functional studies that employ genomic/transcriptomic techniques and datasets. CrusTome is presented in BLAST and DIAMOND formats, providing robust datasets for sequence similarity searches, orthology assignments, phylogenetic inference, etc., and thus allowing for straight-forward incorporation into existing custom pipelines for high-throughput analyses.

    For questions regarding released datasets contact: Corresponding Author: Jorge L. Perez-Moreno (Colorado State University) jorgepm@colostate.edu / jpere645@fiu.edu

    https://github.com/invertome/crustome

    PLEASE CITE:

    Pérez-Moreno JL, Kozma MT, DeLeo DM, Bracken-Grissom HD, Durica DS, Mykles DL. 2023. CrusTome: A transcriptome database resource for large-scale analyses across Crustacea. G3: Genes, Genomes, Genetics.

    Funder Information

    Supported by National Science Foundation grants to DLM (IOS-1922701) and DSD (IOS-1922755). In addition, this work was partially funded by two grants awarded from the National Science Foundation: Doctoral Dissertation Improvement Grant (#1701835) awarded to JPM and HBG and the Division of Environmental Biology Bioluminescence and Vision grant (DEB-1556059) awarded to HBG. Samples in the FICC were collected by grants from The Gulf of Mexico Research Initiative (GOMRI), Florida Institute of Oceanography Shiptime Funding awarded to HBG and DMD; the National Science Foundation Division of Environmental Biology Grant 1556059 awarded to HBG; and the National Oceanic and Atmospheric Administration Ocean Exploration Research (NOAA-OER 2015) grant awarded to HBG.

  3. r

    Songbird Brain Transcriptome Database

    • rrid.site
    • dknet.org
    • +1more
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Songbird Brain Transcriptome Database [Dataset]. http://identifiers.org/RRID:SCR_006182
    Explore at:
    Dataset updated
    Nov 12, 2025
    Description

    Database containing cDNA clone information of the brains of songbirds. These clones are annotated with behavioral information, as well as links to information of homologous genes of other species. The database includes over 91,000 zebra finch brain cDNAs (2009) sequenced by Duke, ESTIMA, and Rockefeller research groups. The project is a collaborative effort of the Jarvis Laboratory of Duke University, Duke Bioinformatics, and The Genomics group of RIKEN, with Erich D. Jarvis as P.I. and Kazuhiro Wada as Co-P.I. Microarrays with the cDNAs in this database are available at Duke http://mgm.duke.edu/genome/dna_micro/core/spotted.htm and through the NIH Neurosciences Microarray Consortium http://arrayconsortium.tgen.org/np2/public/overview.jsp

  4. Genome-Wide Functional Analysis of the Cotton Transcriptome by Creating an...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fuliang Xie; Guiling Sun; John W. Stiller; Baohong Zhang (2023). Genome-Wide Functional Analysis of the Cotton Transcriptome by Creating an Integrated EST Database [Dataset]. http://doi.org/10.1371/journal.pone.0026980
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Fuliang Xie; Guiling Sun; John W. Stiller; Baohong Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A total of 28,432 unique contigs (25,371 in consensus contigs and 3,061 as singletons) were assembled from all 268,786 cotton ESTs currently available. Several in silico approaches [comparative genomics, Blast, Gene Ontology (GO) analysis, and pathway enrichment by Kyoto Encyclopedia of Genes and Genomes (KEGG)] were employed to investigate global functions of the cotton transcriptome. Cotton EST contigs were clustered into 5,461 groups with a maximum cluster size of 196 members. A total of 27,956 indel mutants and 149,616 single nucleotide polymorphisms (SNPs) were identified from consensus contigs. Interestingly, many contigs with significantly high frequencies of indels or SNPs encode transcription factors and protein kinases. In a comparison with six model plant species, cotton ESTs show the highest overall similarity to grape. A total of 87 cotton miRNAs were identified; 59 of these have not been reported previously from experimental or bioinformatics investigations. We also predicted 3,260 genes as miRNAs targets, which are associated with multiple biological functions, including stress response, metabolism, hormone signal transduction and fiber development. We identified 151 and 4,214 EST-simple sequence repeats (SSRs) from contigs and raw ESTs respectively. To make these data widely available, and to facilitate access to EST-related genetic information, we integrated our results into a comprehensive, fully downloadable web-based cotton EST database (www.leonxie.com).

  5. d

    Cerebellar Development Transcriptome Database

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Cerebellar Development Transcriptome Database [Dataset]. http://identifiers.org/RRID:SCR_013096
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Transcriptomic information (spatiotemporal gene expression profile data) on the postnatal cerebellar development of mice (C57B/6J & ICR). It is a tool for mining cerebellar genes and gene expression, and provides a portal to relevant bioinformatics links. The mouse cerebellar circuit develops through a series of cellular and morphological events, including neuronal proliferation and migration, axonogenesis, dendritogenesis, and synaptogenesis, all within three weeks after birth, and each event is controlled by a specific gene group whose expression profile must be encoded in the genome. To elucidate the genetic basis of cerebellar circuit development, CDT-DB analyzes spatiotemporal gene expression by using in situ hybridization (ISH) for cellular resolution and by using fluorescence differential display and microarrays (GeneChip) for developmental time series resolution. The CDT-DB not only provides a cross-search function for large amounts of experimental data (ISH brain images, GeneChip graph, RT-PCR gel images), but also includes a portal function by which all registered genes have been provided with hyperlinks to websites of many relevant bioinformatics regarding gene ontology, genome, proteins, pathways, cell functions, and publications. Thus, the CDT-DB is a useful tool for mining potentially important genes based on characteristic expression profiles in particular cell types or during a particular time window in developing mouse brains.

  6. d

    Transcriptome Shotgun Assembly (TSA) Sequence Database and Submissions

    • catalog.data.gov
    • datadiscovery.nlm.nih.gov
    • +3more
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). Transcriptome Shotgun Assembly (TSA) Sequence Database and Submissions [Dataset]. https://catalog.data.gov/dataset/transcriptome-shotgun-assembly-tsa-sequence-database-and-submissions-822d5
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset provided by
    National Library of Medicine
    Description

    TSA is an archive of computationally assembled transcript sequences from primary data such as ESTs and Next Generation Sequencing Technologies. The overlapping sequence reads from a complete transcriptome are assembled into transcripts by computational methods instead of by traditional cloning and sequencing of cloned cDNAs. The primary sequence data used in the assemblies must have been experimentally determined by the same submitter. TSA sequence records differ from GenBank records because there are no physical counterparts to the assemblies.

  7. d

    Data from: BBGD454: an Online Database for Blueberry Genomic Data...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). BBGD454: an Online Database for Blueberry Genomic Data Transcriptome analysis of Blueberry using 454 EST sequencing [Dataset]. https://catalog.data.gov/dataset/bbgd454-an-online-database-for-blueberry-genomic-data-transcriptome-analysis-of-blueberry--5783e
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    NOTE: This dataset is no longer publicly available. This database houses over 500,000 sequences that were generated and assembled into approximately 15,000 contigs, annotated and functionally mapped to Gene Ontology (GO) terms. Blueberry (Vaccinium corymbosum) is a major berry crop in the United States. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities during an organism’s developmental stage(s) or its response to biotic or abiotic stresses. Such application of this new sequencing technique allows for high-throughput, genome-wide experimental verification of known and novel transcripts. We have applied a high-throughput pyrosequencing technology (454 EST sequencing) for transcriptome profiling of blueberry during different stages of fruit development to gain an understanding of the genes that are up or down regulated during this process. We have also sequenced flower buds at four different stages of cold acclimation to gain a better understanding of the genes and biochemical pathways that are up- or down-regulated during cold acclimation, since extreme low temperatures are known to reduce crop yield and cause major losses to US farmers. We have also sequenced a leaf sample to compare its transcriptome profile with that of bud and fruit samples. Over 500,000 sequences were generated and assembled into approximately 15,000 contigs and were annotated and functionally mapped to Gene Ontology (GO) terms. A database was developed to house these sequences and their annotations. A web based interface was also developed to allow collaborators to search\browse the data and aid in the analysis and interpretation of the data. The availability of these sequences will allow for future advances, such as the development of a blueberry microarray to study gene expression, and will aid in the blueberry genome sequencing effort that is underway. This work was supported by grant 2008-51180-04861 from the USDA - Cooperative State Research, Education, and Extension Service (CSREES) Specialty Crop Research Initiative program.

  8. f

    Data from: A crustacean annotated transcriptome (CAT) database

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Sep 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nong, Wenyan; Qiu, Jian-Wen; Chu, Ka-Hou; Chai, Zacary Y. H.; Qin, Jing; Hui, Jerome Ho Lam; Yan, Mak Kai; Chow, Billy Kwok Chong; Jiang, Xiaosen (2020). A crustacean annotated transcriptome (CAT) database [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000453200
    Explore at:
    Dataset updated
    Sep 7, 2020
    Authors
    Nong, Wenyan; Qiu, Jian-Wen; Chu, Ka-Hou; Chai, Zacary Y. H.; Qin, Jing; Hui, Jerome Ho Lam; Yan, Mak Kai; Chow, Billy Kwok Chong; Jiang, Xiaosen
    Description

    BackgroundDecapods are an order of crustaceans which includes shrimps, crabs, lobsters and crayfish. They occur worldwide and are of great scientific interest as well as being of ecological and economic importance in fisheries and aquaculture. However, our knowledge of their biology mainly comes from the group which is most closely related to crustaceans – insects. Here we produce a de novo transcriptome database, crustacean annotated transcriptome (CAT) database, spanning multiple tissues and the life stages of seven crustaceans.DescriptionA total of 71 transcriptome assemblies from six decapod species and a stomatopod species, including the coral shrimp Stenopus hispidus, the cherry shrimp Neocaridina davidi, the redclaw crayfish Cherax quadricarinatus, the spiny lobster Panulirus ornatus, the red king crab Paralithodes camtschaticus, the coconut crab Birgus latro, and the zebra mantis shrimp Lysiosquillina maculata, were generated. Differential gene expression analyses within species were generated as a reference and included in a graphical user interface database at http://cat.sls.cuhk.edu.hk/. Users can carry out gene name searches and also access gene sequences based on a sequence query using the BLAST search function.ConclusionsThe data generated and deposited in this database offers a valuable resource for the further study of these crustaceans, as well as being of use in aquaculture development.

  9. n

    Integrated Tumor Transcriptome Array and Clinical data Analysis

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jan 8, 2006
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2006). Integrated Tumor Transcriptome Array and Clinical data Analysis [Dataset]. http://identifiers.org/RRID:SCR_008182
    Explore at:
    Dataset updated
    Jan 8, 2006
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented on 6/12/25. ITTACA is a database created for Integrated Tumor Transcriptome Array and Clinical data Analysis. ITTACA centralizes public datasets containing both gene expression and clinical data and currently focuses on the types of cancer that are of particular interest to the Institut Curie: breast carcinoma, bladder carcinoma, and uveal melanoma. ITTACA is developed by the Institut Curie Bioinformatics group and the Molecular Oncology group of UMR144 CNRS/Institut Curie. A web interface allows users to carry out different class comparison analyses, including comparison of expression distribution profiles, tests for differential expression, patient survival analyses, and users can define their own patient groups according to clinical data or gene expression levels. The different functionalities implemented in ITTACA are: - To test if one or more gene, of your choice, is differentially expressed between two groups of samples exhibiting distinct phenotypes (Student and Wilcoxon tests). - The detection of genes differentially expressed (Significance Analysis of Microarrays) between two groups of samples. - The creation of histograms which represent the expression level according to a clinical parameter for each sample. - The computation of Kaplan Meier survival curves for each group. ITTACA has been developed to be a useful tool for comparing personal results to the existing results in the field of transcriptome studies with microarrays.

  10. r

    CATdb: a Complete Arabidopsis Transcriptome database

    • rrid.site
    • dknet.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). CATdb: a Complete Arabidopsis Transcriptome database [Dataset]. http://identifiers.org/RRID:SCR_007582
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    CATdb collects together all the information on transcriptome experiments done at URGV with CATMA micro arrays. All data in CATdb come from the URGV micro array platforms. Common procedures are used including any steps from the experiment design to the statistical analyses. Directed through a WEB interface, biologists enter the standard description of each experimental step (extraction, labelling, hybridization and scanning). Then, normalization and statistical analyses are done following a set of selected methods depending on the experimental design and array types.

  11. Cowpea genome and transcriptome data resource

    • researchdata.edu.au
    datadownload
    Updated Jun 6, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Koltunow; Jen Taylor; Steven Henderson; Andrew Spriggs (2018). Cowpea genome and transcriptome data resource [Dataset]. http://doi.org/10.4225/08/5B1723666D6A5
    Explore at:
    datadownloadAvailable download formats
    Dataset updated
    Jun 6, 2018
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Anna Koltunow; Jen Taylor; Steven Henderson; Andrew Spriggs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Assembled genomic and tissue-specific transcriptomic data resources for two genetically distinct lines of Cowpea (Vigna unguiculata (L.) Walp). For each of two varieties of cowpea (IT97K-499-35, IT86D-1010) this collections contains the following datasets :

    i) genomic survey assemblies based on Illumina sequencing ii) transcriptome assemblies iii) raw DNA and RNA sequence data feeding into the above assemblies iv) In-silico gene predictions and predicted gene sequences derived from IT86D-1010 and IT97K-499-35 v) Mapping to the Vigna unguiculata v1.0 reference genome (http://phytozome.jgi.doe.gov/)

  12. Acheta domesticus Transcriptome

    • agdatacommons.nal.usda.gov
    bin
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA (2025). Acheta domesticus Transcriptome [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Acheta_domesticus_Transcriptome/25085654
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/
    Authors
    USDA
    License

    https://rightsstatements.org/vocab/UND/1.0/https://rightsstatements.org/vocab/UND/1.0/

    Description

    Sequencing major development stages of the house cricket, Acheta domesticus, will provide the first large transcriptome database on a species that is being developed for the food and feed industry. Obtaining information on specific gene sets will enable us to engineer this and other insect species to optimize traits related to improved nutritional content and disease resistance.

  13. s

    Brain Transcriptome Database

    • scicrunch.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brain Transcriptome Database [Dataset]. http://identifiers.org/RRID:SCR_014457
    Explore at:
    Description

    A platform that allow users to visualize and analyze transcriptome data related to the genetics that underlie the development, function, and dysfunction stages and states of the brain. Users can search for cerebellar development genes by name, ID, keyword, expression, and tissue specificity. Search results include general information, links, temporal, spatial, and tissue information, and gene category.

  14. S

    Long non-coding RNA transcriptome database of Malus sieversii infected with...

    • scidb.cn
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaojie Liu; Daoyuan Zhang (2025). Long non-coding RNA transcriptome database of Malus sieversii infected with Valsa mali [Dataset]. http://doi.org/10.57760/sciencedb.31349
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Xiaojie Liu; Daoyuan Zhang
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Malus sieversii (Xinjiang wild apple) is the ancestral species of cultivated apples worldwide, boasting abundant genetic resources and serving as a high-quality gene bank for the molecular breeding of cultivated apples. The data presented herein is the lncRNA transcriptome data obtained via PacBio sequencing technology, following the infection of M. sieversii branches with Valsa mali (causal agent of Valsa canker). Compared with lncRNA research in animals, studies on plant lncRNAs are relatively scarce, and those focusing on woody plants are even rarer. Therefore, this dataset can enrich the lncRNA data of woody plants and lay a foundation for lncRNA research in apples.

  15. Data from: BaRTv1.0: an improved barley reference transcript dataset to...

    • zenodo.org
    • osti.gov
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson; Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson (2020). BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq [Dataset]. http://doi.org/10.5281/zenodo.3360434
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson; Paulo Rapazote-Flores; Micha Bayer; Linda Milne; Claus-Dieter Mayer; John Fuller; Wenbin Guo; Pete E Hedley; Jenny Morris; Claire Halpin; Jason Kam; Sarah M McKim; Monika Zwirek; M Cristina Casao; Abdellah Barakate; Miriam Schreiber; Gordon Stephen; Runxuan Zhang; John WS Brown; Robbie Waugh; Craig Simpson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background
    Time consuming computational assembly and quantification of gene expression and splicing analysis from RNA-seq data vary considerably. Recent fast non-alignment tools such as Kallisto and Salmon overcome these problems, but these tools require a high quality, comprehensive reference transcripts dataset (RTD), which are rarely available in plants.

    Results
    A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts – BaRTv1.0) has been generated. BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. Morex reference genome (Mascher et al., 2017). Full-length cDNAs from the barley variety Haruna nijo (Matsumoto et al., 2011) determined transcript coverage, and high-resolution RT-PCR validated alternatively spliced (AS) transcripts of 86 genes in five different organs and tissue. These methods were used as benchmarks to select an optimal barley RTD. BaRTv1.0-Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5’ and 3’ UTR ends of transcripts. BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. This analysis identified 20,972 significant differentially expressed genes, 2,791 differentially alternatively spliced genes and 2,768 transcripts with differential transcript usage.

    Conclusion
    A high confidence barley reference transcript dataset consisting of 60,444 genes with 177,240 transcripts has been generated. Compared to current barley transcripts, BaRTv1.0 transcripts are generally longer, have less fragmentation and improved gene models that are well supported by splice junction reads. Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.

  16. Z

    Ramonda serbica de novo transcriptome database

    • data.niaid.nih.gov
    Updated Mar 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marija Vidovic (2022). Ramonda serbica de novo transcriptome database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6341872
    Explore at:
    Dataset updated
    Mar 10, 2022
    Dataset provided by
    Institute of Molecular Genetics and Genetic Engineering University of Belgrade
    Authors
    Marija Vidovic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Ramonda serbica de novo transcriptome database obtained from desiccated and hydrated leaves.

  17. r

    GeneSpeed- A Database of Unigene Domain Organization

    • rrid.site
    • test2.scicrunch.org
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GeneSpeed- A Database of Unigene Domain Organization [Dataset]. http://identifiers.org/RRID:SCR_002779
    Explore at:
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented on July 16, 2013. Database and customized tools to study the PFAM protein domain content of the transcriptome for all expressed genes of Homo sapiens, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans tethered to both a genomics array repository database and a range of external information resources. GeneSpeed has merged information from several existing data sets including the Gene Ontology Consortium, InterPro, Pfam, Unigene, as well as micro-array datasets. GeneSpeed is a database of PFAM domain homology contained within Unigene. Because Unigene is a non-redundant dbEST database, this provides a wide encompassing overview of the domain content of the expressed transcriptome. We have structured the GeneSpeed Database to include a rich toolset allowing the investigator to study all domain homology, no matter how remote. As a result, homology cutoff score decisions are determined by the scientist, not by a computer algorithm. This quality is one of the novel defining features of the GeneSpeed database giving the user complete control of database content. In addition to a domain content toolset, GeneSpeed provides an assortment of links to external databases, a unique and manually curated Transcription Factor Classification list, as well as links to our newly evolving GeneSpeed BetaCell Database. GeneSpeed BetaCell is a micro-array depository combined with custom array analysis tools created with an emphasis around the meta analysis of developmental time series micro-array datasets and their significance in pancreatic beta cells.

  18. n

    Data from: AmpuBase: a transcriptome database for eight species of apple...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Jan 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack C. H. Ip; Huawei Mu; Qian Chen; Jin Sun; Santiago Ituarte; Horacio Heras; Bert Van Bocxlaer; Monthon Ganmanee; Xin Huang; Jian-Wen Qiu (2019). AmpuBase: a transcriptome database for eight species of apple snails (Gastropoda: Ampullariidae) [Dataset]. http://doi.org/10.5061/dryad.117cf
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 19, 2019
    Dataset provided by
    Hong Kong University of Science and Technology
    Hong Kong Baptist University
    Universidad Nacional de La Plata
    Université de Lille
    King Mongkut's Institute of Technology Ladkrabang
    HKBU Institute of Research and Continuing Education, Shenzhen, China
    Authors
    Jack C. H. Ip; Huawei Mu; Qian Chen; Jin Sun; Santiago Ituarte; Horacio Heras; Bert Van Bocxlaer; Monthon Ganmanee; Xin Huang; Jian-Wen Qiu
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Background: Gastropoda, with approximately 80,000 living species, is the largest class of Mollusca. Among gastropods, apple snails (family Ampullariidae) have members that are widely distributed in tropical and subtropical freshwater ecosystems and are ecologically and economically important. They exhibit various morphological and physiological adaptations to their respective habitats, which make them ideal candidates for studying adaptation, population divergence, speciation, and larger-scale patterns of diversity, including biogeography of native and invasive populations. The limited availability of genomic data, however, hinders in-depth ecological and evolutionary studies of these non-model organisms.

    Results: Using Illumina Hiseq platforms, we sequenced 1,220 million reads for seven species of apple snails. Together with the RNA-Seq data of two apple snails, we conducted de novo transcriptome assembly of eight species covering five genera of Ampullariidae, including representatives of the Old World and New World lineages. There were 20,730 to 35,828 unigenes with predicted open read frames for the eight species, with N50 (shortest sequence length at 50% of the unigenes) ranging from 1,320 to 1,803 bp. 69.7 % to 80.2 % of these unigenes were functionally annotated by searching against databases of NCBI’s non-redundant, Gene Ontology and Kyoto Encyclopaedia of Genes and Genomes. With these data we developed AmpuBase, a relational database that features online BLAST for DNA/protein sequences, keyword search for unigenes/functional terms, and download functions for sequences and whole transcriptomes.

    Conclusions: In summary, we have generated comprehensive transcriptome data for multiple ampullariid genera and species, and created a publicly accessible database with a user-friendly interface to facilitate future basic and applied studies on ampullariids, and comparative molecular studies with other invertebrates.

  19. Z

    Master Coral database used in USVI SCTLD Transmission Experiment Gene...

    • data.niaid.nih.gov
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelsey Beavers (2023). Master Coral database used in USVI SCTLD Transmission Experiment Gene Expression Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7838979
    Explore at:
    Dataset updated
    Apr 19, 2023
    Dataset provided by
    University of Texas at Arlington
    Authors
    Kelsey Beavers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    U.S. Virgin Islands
    Description

    The Master Coral Database fasta file is comprised of previously published genome-derived predicted gene models and transcriptomes spanning a wide diversity of coral families. Transcriptomes are from Davies et al., 2016 (doi: 10.3389/fmars.2016.00112), Kirk et al., 2018 (DOI: 10.1111/mec.14934); Moya et al., 2012 (doi: 10.1111/j.1365-294X.2012.05554.x); van de Water et al., 2018 (DOI: 10.1111/mec.14489).

  20. f

    Data from: EukProt: a database of genome-scale predicted proteins across the...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Mar 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    de Vargas, Colomban; Muñoz-Gómez, Sergio A.; Strassert, Jürgen; Richter, Daniel; Wideman, Jeremy G.; Burki, Fabien; Poh, Yu-Ping; Berney, Cédric; Herman, Emily K. (2022). EukProt: a database of genome-scale predicted proteins across the diversity of eukaryotes [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000204120
    Explore at:
    Dataset updated
    Mar 23, 2022
    Authors
    de Vargas, Colomban; Muñoz-Gómez, Sergio A.; Strassert, Jürgen; Richter, Daniel; Wideman, Jeremy G.; Burki, Fabien; Poh, Yu-Ping; Berney, Cédric; Herman, Emily K.
    Description

    Version 3 (22 November, 2021) See https://doi.org/10.24072/pcjournal.173 for a detailed description of the database. See http://evocellbio.com/eukprot/ for a BLAST database, interactive plots of BUSCO scores and ‘The Comparative Set’ (TCS): A selected subset of EukProt for comparative genomics investigations. Protein sequence FASTA files of the TCS are available at https://doi.org/10.6084/m9.figshare.21586065. See https://github.com/beaplab/EukProt for utility scripts, annotations, and all the files necessary to build the tree in Figures 1 and 3 (from the DOI above). Scroll to the end of this page for changes since version 2. Are we missing anything? Please let us know! EukProt is a database of published and publicly available predicted protein sets selected to represent the breadth of eukaryotic diversity, currently including 993 species from all major supergroups as well as orphan taxa. The goal of the database is to provide a single, convenient resource for gene-based research across the spectrum of eukaryotic life, such as phylogenomics and gene family evolution. Each species is placed within the UniEuk taxonomic framework in order to facilitate downstream analyses, and each data set is associated with a unique, persistent identifier to facilitate comparison and replication among analyses. The database is regularly updated, and all versions will be permanently stored and made available via FigShare. The current version has a number of updates, notably ‘The Comparative Set’ (TCS), a reduced taxonomic set with high estimated completeness while maintaining a substantial phylogenetic breadth, which comprises 196 predicted proteomes. A BLAST web server and graphical displays of data set completeness are available at http://evocellbio.com/eukprot/. We invite the community to provide suggestions for new data sets and new annotation features to be included in subsequent versions, with the goal of building a collaborative resource that will promote research to understand eukaryotic diversity and diversification. This release contains 5 files: EukProt_proteins.v03.2021_11_22.tgz: 993 protein data sets, for species with either a genome (375) or single-cell genome (56), a transcriptome (498), a single-cell transcriptome (47), or an EST assembly (17). EukProt_genome_annotations.v03.2021_11_22.tgz: gene annotations, in GFF format, as produced by EukMetaSanity (https://github.com/cjneely10/EukMetaSanity) for 40 genomes lacking publicly available protein annotations. The proteins predicted from these annotations are included in the proteins file. EukProt_included_data_sets.v03.2021_11_22.txt and EukProt_not_included_data_sets.v03.2021_11_22.txt: tables of information on data sets either included (993 data sets) or not included (163) in the database. Tab-delimited; multiple entries in the same cell are comma-delimited; missing data is represented with the “N/A” value. With the following columns: EukProt_ID: the unique identifier associated with the data set. This will not change among versions. If a new data set becomes available for the species, it will be assigned a new unique identifier. Name_to_Use: the name of the species for protein/genome annotation/assembled transcriptome files. Strain: the strain(s) of the species sequenced. Previous_Names: any previous names that this species was known by. Replaces_EukProt_ID/Replaced_by_EukProt_ID: if the data set changes with respect to an earlier version, the EukProt ID of the data set that it replaces (in the included table) or that it is replaced by (in the not_included table). Genus_UniEuk, Epithet_UniEuk, Supergroup_UniEuk, Taxogroup1_UniEuk, Taxogroup2_UniEuk: taxonomic identifiers at different levels of the UniEuk taxonomy (Berney et al. 2017, DOI: 10.1111/jeu.12414, based on Adl et al. 2019, DOI: 10.1111/jeu.12691). Taxonomy_UniEuk: the full lineage of the species in the UniEuk taxonomy (semicolon-delimited). Merged_Strains: whether multiple strains of the same species were merged to create the data set. Data_Source_URL: the URL(s) from which the data were downloaded. Data_Source_Name: the name of the data set (as assigned by the data source). Paper_DOI: the DOI(s) of the paper(s) that published the data set. Actions_Prior_to_Use: the action(s) that were taken to process the publicly available files in order to produce the data set in this database. Actions taken (see our manuscript for more details): ‘assemble mRNA’: Trinity v. 2.8.4, http://trinityrnaseq.github.io/ ‘CD-HIT’: v. 4.6, http://weizhongli-lab.org/cd-hit/ ‘extractfeat’, ‘seqret’, ‘transeq’, ‘trimseq’: from EMBOSS package v. 6.6.0.0, http://emboss.sourceforge.net/ ‘translate mRNA’: Transdecoder v. 5.3.0, http://transdecoder.github.io/ ‘gffread’: v.0.12.3 https://github.com/gpertea/gffread ‘predict genes’: EukMetaSanity https://github.com/cjneely10/EukMetaSanity (cloned on 21 September, 2021) All parameter values were default, unless otherwise specified. Data_Source_Type: the type of the source data (possible types: EST, transcriptome, single-cell transcriptome, genome, single-cell genome). Notes: additional information on the data set (including why it is replaced by/is replacing another data set, or why it was not included). Columns_Modified_Since_Previous_Version: column(s) in this file modified for the data set since the previous release. Not listed: modifications to the Notes column or to new columns added in this version. Alternative_Strain_Names: non-exhaustive list of alternative names for the sequenced strain for this data set. 18S_Sequence_GenBank_ID: GenBank identifier for the strain sequenced in the data set. When multiple strains were sequenced, identifiers are separated with a comma, in the same order as the Strain column. Ranges of identifiers for the same strain are separated by a hyphen. ‘N/A’ indicates either that there is no GenBank sequence for the strain or that all available sequences are not full-length (< 1,500 bp). 18S_Sequence: 18S for the strain derived from publicly available sequences associated with the data set, in the case where a GenBank sequence is not available. 18S_Sequence_Source: the source for the sequence in the 18S_Sequence column, if any. 18S_Sequence_Other_Strain_GenBank_ID: GenBank identifier for 18S sequence(s) from other strains of the same species as the data set. 18S_Sequence_Other_Strain_Name: strain name(s) for the sequences in the 18S_Sequence_Other_Strain_GenBank_ID column. 18S_and_Taxonomy_Notes: additional information on the values in the 18S_Sequence columns. Changes since version 2 There are 324 new data sets included. 57 of these replace data sets from version 2. 40 newly published data sets were added to the list that are not included in the database (annotated in the Notes column with the reasons they were not included). Instead of unannotated genomes (for published genomes lacking protein predictions), we now include predicted proteins and gene annotations (in GFF3 format). All sequences within each file are now assigned a standardized, unique identifier based on the data set’s EukProt_ID and on the type of data (protein or transcriptome). Illegal characters are removed from sequences. In the UniEuk_Taxonomy field, single quotes are now used instead of double quotes, to be consistent with other UniEuk databases (EukMap, EukRibo). Changes to metadata of individual data sets (in the included and not_included tables) with respect to the previous version are now listed in the Columns_Modified_Since_Previous_Version column. The Taxogroup_UniEuk column has been split into the Taxogroup1_UniEuk and Taxogroup2_UniEuk columns. This resulted in the Supergroup_UniEuk column changing for Opisthokonta. In addition, the following new columns have been added (see our manuscript for details): Alternative_Strain_Names, 18S_Sequence_GenBank_ID, 18S_Sequence, 18S_Sequence_Source, 18S_Sequence_Other_Strain_GenBank_ID, 18S_Sequence_Other_Strain_Name, 18S_and_Taxonomy_Notes. EukProt_assembled_transcriptomes.v03.2021_11_22.tgz: assembled transcriptome contigs, for 126 species with publicly available mRNA sequence reads but no publicly available assembly. The proteins predicted from these assemblies are included in the proteins file. Sequence names in the proteins and transcriptomes files have standardized, unique identifiers with the following format: >[EukProt ID]_[Name_to_Use]_[Type abbreviation][Counter] [Previous header contents] Type abbreviations are P (protein) and T (transcriptome). All characters not in the following list are removed from nucleic acid sequences: ACGTNUKSYMWRBDHV All characters not in the the following list are removed from protein sequences: ABCDEFGHIKLMNPQRSTUVWYZX* Lists of legal characters are from: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). Human Transcriptome Database for Alternative Splicing [Dataset]. http://identifiers.org/RRID:SCR_013305

Human Transcriptome Database for Alternative Splicing

RRID:SCR_013305, nif-0000-02935, OMICS_01887, Human Transcriptome Database for Alternative Splicing (RRID:SCR_013305), H-DBAS, H-DBAS - Human-transcriptome DataBase for Alternative Splicing

Explore at:
69 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 4, 2024
Description

A specialized database for human alternative splicing (AS) based on H-Invitational full-length cDNAs. H-DBAS offers unique data and viewer for human Alternative Splicing (AS) analysis. It contains: * Genome-wide representative alternative splicing variants (RASVs) identified from following datasets * H-Inv full-length cDNAs (resource summary): H-Invitational cDNA dataset * H-Inv all transcripts (resource summary): Published human mRNA dataset * Mouse full-length cDNAs (resource summary): Mouse cDNA dataset * RASVs affecting protein functions such as protein motif, GO, subcellular localization signal and transmembrane domain * Conserved RASVs compared with mouse genome and the full-length cDNAs (H-Inv full-length cDNAs only)

Search
Clear search
Close search
Google apps
Main menu