A specialized database for human alternative splicing (AS) based on H-Invitational full-length cDNAs. H-DBAS offers unique data and viewer for human Alternative Splicing (AS) analysis. It contains: * Genome-wide representative alternative splicing variants (RASVs) identified from following datasets * H-Inv full-length cDNAs (resource summary): H-Invitational cDNA dataset * H-Inv all transcripts (resource summary): Published human mRNA dataset * Mouse full-length cDNAs (resource summary): Mouse cDNA dataset * RASVs affecting protein functions such as protein motif, GO, subcellular localization signal and transmembrane domain * Conserved RASVs compared with mouse genome and the full-length cDNAs (H-Inv full-length cDNAs only)
Transcriptomic information (spatiotemporal gene expression profile data) on the postnatal cerebellar development of mice (C57B/6J & ICR). It is a tool for mining cerebellar genes and gene expression, and provides a portal to relevant bioinformatics links. The mouse cerebellar circuit develops through a series of cellular and morphological events, including neuronal proliferation and migration, axonogenesis, dendritogenesis, and synaptogenesis, all within three weeks after birth, and each event is controlled by a specific gene group whose expression profile must be encoded in the genome. To elucidate the genetic basis of cerebellar circuit development, CDT-DB analyzes spatiotemporal gene expression by using in situ hybridization (ISH) for cellular resolution and by using fluorescence differential display and microarrays (GeneChip) for developmental time series resolution. The CDT-DB not only provides a cross-search function for large amounts of experimental data (ISH brain images, GeneChip graph, RT-PCR gel images), but also includes a portal function by which all registered genes have been provided with hyperlinks to websites of many relevant bioinformatics regarding gene ontology, genome, proteins, pathways, cell functions, and publications. Thus, the CDT-DB is a useful tool for mining potentially important genes based on characteristic expression profiles in particular cell types or during a particular time window in developing mouse brains.
TSA is an archive of computationally assembled transcript sequences from primary data such as ESTs and Next Generation Sequencing Technologies. The overlapping sequence reads from a complete transcriptome are assembled into transcripts by computational methods instead of by traditional cloning and sequencing of cloned cDNAs. The primary sequence data used in the assemblies must have been experimentally determined by the same submitter. TSA sequence records differ from GenBank records because there are no physical counterparts to the assemblies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CrusTome_v0.1.0 Prerelease /ReadMe - this file /crustome_aa_BLAST.tar.gz - CrusTome database of amino acid sequences in BLAST format /crustome_aa_DIAMOND.tar.gz - CrusTome database of amino acid sequences in DIAMOND format /crustome_mrna_BLAST.tar.gz - CrusTome database of mRNA sequences in BLAST format /dict - Dictionary file to translate species IDs. For usage with sed/awk see link to Github site below
Pérez-Moreno JL, Kozma MT, DeLeo DM, Bracken-Grissom HD, Durica DS, Mykles DL. 2023. CrusTome: A transcriptome database resource for large-scale analyses across Crustacea. G3: Genes, Genomes, Genetics.
CrusTome: A transcriptome database resource for large-scale analyses across Crustacea
Transcriptomes from non-traditional model organisms often harbor a wealth of unexplored data. Examining these datasets can lead to clarity and novel insights in traditional systems, as well as to discoveries across a multitude of fields. Despite significant advances in DNA sequencing technologies and in their adoption, access to genomic and transcriptomic resources for non-traditional model organisms remains limited. Crustaceans, for example, being amongst the most numerous, diverse, and widely distributed taxa on the planet, often serve as excellent systems to address ecological, evolutionary, and organismal questions. While they are ubiquitously present across environments, and of economic and food security importance, they remain severely underrepresented in publicly available sequence databases. Here, we present CrusTome, a multi-species, multi-tissue, transcriptome database of 201 assembled mRNA transcriptomes (189 crustaceans, 30 of which were previously unpublished, and 12 ecdysozoan outgroups) as an evolving, and publicly available resource. This database is suitable for evolutionary, ecological, and functional studies that employ genomic/transcriptomic techniques and datasets. CrusTome is presented in BLAST and DIAMOND formats, providing robust datasets for sequence similarity searches, orthology assignments, phylogenetic inference, etc., and thus allowing for straight-forward incorporation into existing custom pipelines for high-throughput analyses.
For questions regarding released datasets contact: Corresponding Author: Jorge L. Perez-Moreno (Colorado State University) jorgepm@colostate.edu / jpere645@fiu.edu
https://github.com/invertome/crustome
PLEASE CITE:
Pérez-Moreno JL, Kozma MT, DeLeo DM, Bracken-Grissom HD, Durica DS, Mykles DL. 2023. CrusTome: A transcriptome database resource for large-scale analyses across Crustacea. G3: Genes, Genomes, Genetics.
Funder Information
Supported by National Science Foundation grants to DLM (IOS-1922701) and DSD (IOS-1922755). In addition, this work was partially funded by two grants awarded from the National Science Foundation: Doctoral Dissertation Improvement Grant (#1701835) awarded to JPM and HBG and the Division of Environmental Biology Bioluminescence and Vision grant (DEB-1556059) awarded to HBG. Samples in the FICC were collected by grants from The Gulf of Mexico Research Initiative (GOMRI), Florida Institute of Oceanography Shiptime Funding awarded to HBG and DMD; the National Science Foundation Division of Environmental Biology Grant 1556059 awarded to HBG; and the National Oceanic and Atmospheric Administration Ocean Exploration Research (NOAA-OER 2015) grant awarded to HBG.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundDecapods are an order of crustaceans which includes shrimps, crabs, lobsters and crayfish. They occur worldwide and are of great scientific interest as well as being of ecological and economic importance in fisheries and aquaculture. However, our knowledge of their biology mainly comes from the group which is most closely related to crustaceans – insects. Here we produce a de novo transcriptome database, crustacean annotated transcriptome (CAT) database, spanning multiple tissues and the life stages of seven crustaceans.DescriptionA total of 71 transcriptome assemblies from six decapod species and a stomatopod species, including the coral shrimp Stenopus hispidus, the cherry shrimp Neocaridina davidi, the redclaw crayfish Cherax quadricarinatus, the spiny lobster Panulirus ornatus, the red king crab Paralithodes camtschaticus, the coconut crab Birgus latro, and the zebra mantis shrimp Lysiosquillina maculata, were generated. Differential gene expression analyses within species were generated as a reference and included in a graphical user interface database at http://cat.sls.cuhk.edu.hk/. Users can carry out gene name searches and also access gene sequences based on a sequence query using the BLAST search function.ConclusionsThe data generated and deposited in this database offers a valuable resource for the further study of these crustaceans, as well as being of use in aquaculture development.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pearl millet (Pennisetum glaucum, also known as Cenchrus americanus) is a C4 cereal crop that can tolerate stressed conditions including drought-stressed, high temperature-stressed and nutrient-poor conditions. Transcriptomes of pearl millet were studied by RNA sequencing (RNA-Seq) to understand mechanisms regulating its development and tolerance to such stressed conditions in previous studies. We collected RNA-Seq reads from as many of such studies in the NCBI (National Center for Biotechnology Information) BioProject database as popssible, and mapped them to the pearl millet reference genome to obtain read counts and transcripts per million (TPM) for each pearl millet gene. Here, the resulting count and TPM data as well as the attributes of the samples used for the RNA-Seq are provided. These data can be updated when a new study with RNA-Seq of pearl millet samples has become available.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Assembled genomic and tissue-specific transcriptomic data resources for two genetically distinct lines of Cowpea (Vigna unguiculata (L.) Walp). For each of two varieties of cowpea (IT97K-499-35, IT86D-1010) this collections contains the following datasets :
i) genomic survey assemblies based on Illumina sequencing ii) transcriptome assemblies iii) raw DNA and RNA sequence data feeding into the above assemblies iv) In-silico gene predictions and predicted gene sequences derived from IT86D-1010 and IT97K-499-35 v) Mapping to the Vigna unguiculata v1.0 reference genome (http://phytozome.jgi.doe.gov/)
A Transcriptome Database for Astrocytes, Neurons, and Oligodendrocytes: A New Resource for Understanding Brain Development and Function Understanding the cell-cell interactions that control CNS development and function has long been limited by the lack of methods to cleanly separate astrocytes, neurons, and oligodendrocytes. Here we describe the first method for the isolation and purification of developing and mature astrocytes from mouse forebrain. This method takes advantage of the expression of S100β by astrocytes. We used fluorescent activated cell sorting (FACS) to isolate EGFP positive cells from transgenic mice that express EGFP under the control of an S100β promoter. By depletion of astrocytes and oligodendrocytes we obtained purified populations of neurons, while by panning with oligodendrocyte-specific antibodies we obtained purified populations of oligodendrocytes. Using GeneChip Arrays we then created a transcriptome database of the expression levels of over 20,000 genes by gene profiling these three main CNS neural cell types at postnatal ages day 1 to 30. This database provides the first global characterization of the genes expressed by mammalian astrocytes in vivo and is the first direct comparison between the astrocyte, neuron, and oligodendrocyte transcriptomes. We demonstrate that Aldh1L1, a highly expressed astrocyte gene, is a highly specific antigenic marker for astrocytes with a substantially broader, and therefore potentially more useful, pattern of astrocyte expression than the traditional astrocyte marker GFAP. This transcriptome database of acutely isolated and highly pure populations of astrocytes, neurons and oligodendrocytes provides a resource to the neuroscience community by providing improved cell type specific markers and for better understanding of neural development, function, and disease. We acutely purified mouse astrocytes from early postnatal ages (P1) to later postnatal ages (P30), when astrocyte differentiation is morphologically complete (Bushong et al., 2004), and acutely purified mouse OL-lineage cells from stages ranging from OPCs to newly differentiated OLs to myelinating OLs. We extracted RNA from each of these highly purified, acutely isolated cell types and used GeneChip Arrays to determine the expression levels of over 20,000 genes and construct a comprehensive database of cell type specific gene expression in the mouse forebrain. Analysis of this database confirms cell type specific expression of many well characterized and functionally important genes. In addition, we have identified thousands of new cell type enriched genes, thereby providing important new information about astrocyte, OL, and neuron interactions, metabolism, development, and function. This database provides a comparison of the genome-wide transcriptional profiles of the main CNS cell types and is a resource to the neuroscience community for better understanding the development, physiology, and pathology of the CNS. Keywords: Developmental CNS Cell type comparision FACS purification of astrocytes: Dissociated forebrains from S100β-EGFP mice were resuspended in panning buffer (DBPS containing 0.02% BSA and 12.5 U/ml DNase) and sequentially incubated on the following panning plates: secondary antibody only plate to deplete microglia, O4 plate to deplete OLs, PDGFRα plate to deplete OPCs, and a second O4 plate to deplete any remaining OLs. This procedure was sufficient to deplete all OL-lineage cells from animals P8 and younger, however, in older animals that had begun to myelinate, additional depletion of OLs and myelin debris was accomplished as follows. The nonadherent cells from the last O4 dish were harvested by centrifugation, and the cells were resuspended in panning buffer containing GalC, MOG, and O1 supernatant and incubated for 15 minutes at room temperature. The cell suspension was washed and then resuspended in panning buffer containing 20 μg donkey anti-mouse APC for 15 minutes. The cells were washed and resuspended in panning buffer containing propidium iodide (PI). EGFP+ astrocytes were then purified by fluorescence activated cell sorting (FACS). Dead cells were gated out using high PI staining and forward light scatter. Astrocytes were identified based on high EGFP fluorescence and negative APC fluorescence from indirect immunostaining for OL markers GalC, MOG, and O1. Cells were sorted twice and routinely yielded >99.5% purity based on reanalysis of double sorted cells.; FACS purification of neurons: EGFP- cells were the remaining forebrain cells after microglia, OLs, and astrocytes had been removed, and were primarily composed of neurons, and to a lesser extent, endothelial cells (we estimate < 4% endothelial cells at P7 and < 20% endothelial cells at P17). EGFP- cells from S100β-EGFP dissociated forebrain were FACS purified in parallel with astrocyte purification and were sorted based on their negative EGFP fluorescence immunofluorescence. Cells were sorted twice and routinely yielded >99.9% purity. In independent preparations, the EGFP- cell population was additionally depleted of endothelial cells and pericytes by sequentially labeling with biotin-BSL1 lectin and streptavidin-APC while also labeling for OL markers as described above. Cells were sorted twice and routinely yielded >99.9% purity.; Panning purification of oligodendrocyte lineage cells: Dissociated mouse forebrains were resuspended in panning buffer. In order to deplete microglia, the single-cell suspension was sequentially panned on four BSL1 panning plates. The cell suspension was then sequentially incubated on two PDGFRα plates (to purify and deplete OPCs), one A2B5 plate (to deplete any remaining OPCs), two MOG plates (to purify and deplete myelinating OLs), and one GalC plate (to purify the remaining PDGFRα-, MOG-, OLs). The adherent cells on the first PDGFRα, MOG, and GalC plates were washed to remove all antigen-negative nonadherent cells. The cells were then lysed while still attached to the panning plate with Qiagen RLT lysis buffer, and total RNA was purified. Purified OPCs were >95% NG2 positive and 0% MOG positive. Purified Myelin OLs were 100% MOG positive, >95% MBP positive, and 0% NG2 positive. Purified GalC OLs depleted of OPCs and Myelin OLs were <10% MOG positive and ~50% weakly NG2 positive, a reflection of their recent development as early OLs.; Data normalization and analysis: Raw image files were processed using Affymetrix GCOS and the MAS 5.0 algorithm. Intensity data was normalized per chip to a target intensity TGT value of 500, and expression data and absent/present calls for individual probe sets were determined. Gene expression values were normalized and modeled across arrays using the dChip software package with invariant-set normalization and a PM model. (www.dchip.org, Li and Wong, 2001). The 29 samples were grouped into 9 sample types: Astros P7-P8, Astros P17, Astros P17-gray matter (P17g), Neurons P7, Neurons P17, Neurons-endothelial cell depleted (P7n, P17n), OPCs, GalC-OLs, and MOG-OLs. Gene filtering was performed to select probe sets that were consistently expressed in at least one cell type, where consistently expressed was defined as being called present and having a MAS 5.0 intensity level greater than 200 in at least two-thirds of the samples in the cell type. We identified 20,932 of the 45,037 probe sets that were consistently expressed in at least one of the nine cell types. The Significance Analysis of Microarrays (SAM) method (Tusher et al., 2001) was used to determine genes that were significantly differentially expressed between different cell types (see Supplemental Table S2 for SAM cell type groupings). Clustering was performed using the hclust method with complete linkage in R. Expression values were transformed for clustering by computing a mean expression value for the gene using those samples in the corresponding SAM statistical analysis, and then subtracting the mean from expression intensities. In order to preserve the log2 scale of the data, unless otherwise indicated, no normalization by variance was performed. Plots were created using the gplots package in R. The Bioconductor software package (Gentleman et al., 2004) was used throughout the expression analyses. Functional analyses were performed through the use of Ingenuity Pathways Analysis (Ingenuity® Systems, www.ingenuity.com).
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
NOTE: This dataset is no longer publicly available. This database houses over 500,000 sequences that were generated and assembled into approximately 15,000 contigs, annotated and functionally mapped to Gene Ontology (GO) terms. Blueberry (Vaccinium corymbosum) is a major berry crop in the United States. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities during an organism’s developmental stage(s) or its response to biotic or abiotic stresses. Such application of this new sequencing technique allows for high-throughput, genome-wide experimental verification of known and novel transcripts. We have applied a high-throughput pyrosequencing technology (454 EST sequencing) for transcriptome profiling of blueberry during different stages of fruit development to gain an understanding of the genes that are up or down regulated during this process. We have also sequenced flower buds at four different stages of cold acclimation to gain a better understanding of the genes and biochemical pathways that are up- or down-regulated during cold acclimation, since extreme low temperatures are known to reduce crop yield and cause major losses to US farmers. We have also sequenced a leaf sample to compare its transcriptome profile with that of bud and fruit samples. Over 500,000 sequences were generated and assembled into approximately 15,000 contigs and were annotated and functionally mapped to Gene Ontology (GO) terms. A database was developed to house these sequences and their annotations. A web based interface was also developed to allow collaborators to search\browse the data and aid in the analysis and interpretation of the data. The availability of these sequences will allow for future advances, such as the development of a blueberry microarray to study gene expression, and will aid in the blueberry genome sequencing effort that is underway. This work was supported by grant 2008-51180-04861 from the USDA - Cooperative State Research, Education, and Extension Service (CSREES) Specialty Crop Research Initiative program.
A platform that allow users to visualize and analyze transcriptome data related to the genetics that underlie the development, function, and dysfunction stages and states of the brain. Users can search for cerebellar development genes by name, ID, keyword, expression, and tissue specificity. Search results include general information, links, temporal, spatial, and tissue information, and gene category.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MicroRNAs (miRNA) are small endogenous RNA molecules, which regulate target gene expression at post-transcriptional level. Besides, miRNA activity can be controlled by a newly discovered regulatory mechanism called endogenous target mimicry (eTM). In target mimicry, eTMs bind to the corresponding miRNAs to block the binding of specific transcript leading to increase mRNA expression. Thus, miRNA-eTM-target-mRNA regulation modules involving a wide range of biological processes; an increasing need for a comprehensive eTM database arose. Except miRSponge with limited number of Arabidopsis eTM data no available database and/or repository was developed and released for plant eTMs yet. Here, we present an online plant eTM database, called PeTMbase (http://petmbase.org), with a highly efficient search tool. To establish the repository a number of identified eTMs was obtained utilizing from high-throughput RNA-sequencing data of 11 plant species. Each transcriptome libraries is first mapped to corresponding plant genome, then long non-coding RNA (lncRNA) transcripts are characterized. Furthermore, additional lncRNAs retrieved from GREENC and PNRD were incorporated into the lncRNA catalog. Then, utilizing the lncRNA and miRNA sources a total of 2,728 eTMs were successfully predicted. Our regularly updated database, PeTMbase, provides high quality information regarding miRNA:eTM modules and will aid functional genomics studies particularly, on miRNA regulatory networks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of the EST-SSRs that were identified in the transcriptome data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Little millet (Panicum sumatrense) a native of Chhattisgarh, belongs to the minor millet group and is primarily known as a climate-resilient and nutritionally rich crop. However, due to the lack of enough Omic studies on the crop, the scientific community has largely remained unaware of the potential of this crop, resulting in less scope for its utilization in crop improvement programs. Looking at global warming, erratic climate change, nutritional security, and limited genetic information available, the Little Millet Transcriptome Database (LMTdb) (https://igkv.ac.in/xenom/index.aspx) was conceptualized upon completion of the transcriptome sequencing of little millet with the aim of deciphering the genetic signatures of this largely unknown crop. The database was developed with the view of providing information about the most comprehensive part of the genome, the ‘Transcriptome’. The database includes transcriptome sequence information, functional annotation, microsatellite markers, DEGs, and pathway information. The database is a freely available resource that provides breeders and scientists a portal to search, browse, and query data to facilitate functional and applied Omic studies in millet crops.
CATdb collects together all the information on transcriptome experiments done at URGV with CATMA micro arrays. All data in CATdb come from the URGV micro array platforms. Common procedures are used including any steps from the experiment design to the statistical analyses. Directed through a WEB interface, biologists enter the standard description of each experimental step (extraction, labelling, hybridization and scanning). Then, normalization and statistical analyses are done following a set of selected methods depending on the experimental design and array types.
A high-quality, barley gene reference transcript dataset (BaRTv1.0, Rapazote-Flores et al. 2019), was used to quantify gene and transcript abundances from 22 RNA-seq experiments, covering 843 separate samples. Using the abundance data we developed a Barley Expression Database (EoRNA* – Expression of RNA) to underpin a visualisation tool that displays comparative gene and transcript abundance data on demand as transcripts per million (TPM) across all samples and all the genes. EoRNA provides gene and transcript models for all of the transcripts contained in BaRTV1.0, and these can be conveniently identified through either BaRT or HORVU gene names, or by direct BLAST of query sequences. Browsing the quantification data reveals cultivar, tissue and condition specific gene expression and shows changes in the proportions of individual transcripts that have arisen via alternative splicing. TPM values can be easily extracted to allow users to determine the statistical significance of observed transcript abundance variation among samples or perform meta analyses on multiple RNA-seq experiments. * Eòrna is the Scottish Gaelic word for Barley This research was supported and developed by Scottish Government Rural and Environment Science and Analytical Services division (RESAS) and funding from the Biotechnology and Biological Sciences Research Council (BBSRC) (BB/I00663X/1: A draft sequence of the barley genome) and ERC project 669182 'SHUFFLE' to RW.
Ampullaridae_alignmentThree genes alignment file of 38 ampullariidsAmpullariidae_alignment.nexEightsp_alignmentThree genes alignment file of eight targeted speciesAmpullariids transcriptomeAssembled transcriptomes of eight ampullariids
https://rightsstatements.org/vocab/UND/1.0/https://rightsstatements.org/vocab/UND/1.0/
Sequencing major development stages of the house cricket, Acheta domesticus, will provide the first large transcriptome database on a species that is being developed for the food and feed industry. Obtaining information on specific gene sets will enable us to engineer this and other insect species to optimize traits related to improved nutritional content and disease resistance.
https://api.github.com/licenses/agpl-3.0https://api.github.com/licenses/agpl-3.0
R code in [Code]NTCdb: single-cell transcriptome database of human inflammatory diseas
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of functional annotation of the assembled unigenes.
A specialized database for human alternative splicing (AS) based on H-Invitational full-length cDNAs. H-DBAS offers unique data and viewer for human Alternative Splicing (AS) analysis. It contains: * Genome-wide representative alternative splicing variants (RASVs) identified from following datasets * H-Inv full-length cDNAs (resource summary): H-Invitational cDNA dataset * H-Inv all transcripts (resource summary): Published human mRNA dataset * Mouse full-length cDNAs (resource summary): Mouse cDNA dataset * RASVs affecting protein functions such as protein motif, GO, subcellular localization signal and transmembrane domain * Conserved RASVs compared with mouse genome and the full-length cDNAs (H-Inv full-length cDNAs only)