Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RNA expression matrices - FPKM from STAR, cufflinks and TPM from STAR, RSEM, Toil. Fusion results from deFuse, SOAPfuse, fusioncatcher, STAR-Fusion.
Facebook
TwitterRNA-seq results of WT and SIRT5-KO A549-ACE2 cells infected or mock-infected with SARS-CoV-2 for 3 days at MOI = 0.1. First tab show normalized counts and log2 fold change (l2fc) between the different condition. Tabs 2–4 show results of gene ontology analysis between the different conditions. (XLSX)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RNA-seq data for analysing differential gene expression. Data from bacteria (E. coli) and subsampled to 1% or original data size. Six FASTQ files and two reference files (genome sequence and annotations).
Facebook
TwitterPearl millet (Pennisetum glaucum, also known as Cenchrus americanus) is a C4 cereal crop that can tolerate stressed conditions including drought-stressed, high temperature-stressed and nutrient-poor conditions. Transcriptomes of pearl millet were studied by RNA sequencing (RNA-Seq) to understand mechanisms regulating its development and tolerance to such stressed conditions in previous studies. We collected RNA-Seq reads from as many of such studies in the NCBI (National Center for Biotechnology Information) BioProject database as popssible, and mapped them to the pearl millet reference genome to obtain read counts and transcripts per million (TPM) for each pearl millet gene. Here, the resulting count and TPM data as well as the attributes of the samples used for the RNA-Seq are provided. These data can be updated when a new study with RNA-Seq of pearl millet samples has become available.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:
matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)
*The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:
nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
Facebook
TwitterBioXpress is a gene expression and cancer association database in which the expression levels are mapped to genes using RNA-seq data obtained from The Cancer Genome Atlas, International Cancer Genome Consortium, Expression Atlas and publications. BioXpress can be searched by gene name or cancer type. To search the database by gene name, select the appropriate identifier type from the dropdown menu and type in the corresponding identifier in the adjacent text box. The results are computed and presented to the user with information such as variable expression levels and tumor expression. To search by cancer type, select the desired type from the dropdown menu, such as "Cancer Type", "Significant", "Expression", "Adjusted p-value" and "p-value". Results are shown in a graph displaying the top 10 differentially expressed genes for the specified cancer type in terms of the frequency of significant altered expression between the tumor and normal pairs.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Small RNA-seqLibraries of small RNAs were constructed using TruSeq Small RNA Library Preparation Kits (Illumina) according to manufacturer’s protocols, and then sequenced by Illumina HiSeq 2000 at Bioacme (Wuhan, China). Quality control and adapter trimming process were performed using Trim Galore (v0.6.4) with default parameters, and reads length between 17 and 70 were used for subsequent analyses. The sequenced reads were aligned to the human genome (GRCh38) by using Bowtie2 (v2.3.5.1). Reads that cannot be aligned to the human genome (GRCh38) were then aligned to agshRNAs. The sequences and lengths of reads aligned to agshRNA were then determined using an in-house script. Reads mapped to genome were further categorized as miRNA, tRNA, snoRNA etc. by using htseq-count (v0.11.2). Annotation file was downloaded from DASHR 2.0 (https://dashr2.lisanwanglab.org/). RNA-seqLibraries of total RNAs were constructed using MGIEasy RNA Library Preparation Kits (BGI) according to manufacturer’s protocols, and then sequenced by MGISEQ2000 (BGI) at Wuhan Institute of Virology, CAS. Quality control and adapter trimming were performed using Trim Galore (v0.6.4) with default parameter. All reads were mapped to the human genome (GRCh38) using Hisat2 (v2.1.0) and then annotated using htseq-count (v0.11.2). Annotation file was downloaded from NCBI. Annotated reads number were transformed into count per million reads (CPM). Statistical significance was evaluated via using unpaired t test and adjusted using Bonferroni correction. Different expression genes (DEGs) were defined by |log2FC| > 1 and P.adj < 0.05.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The peripheral blood immune cell (PBMC) samples were collected from patients infected with dengue virus (DENV) at four time points: two and one day(s) before defervescence (febrile phase), at defervescence (critical phase), and two-week convalescence. The raw and filtered matrix files were generated using CellRanger version 3.0.2 (10x Genomics, USA) with the reference human genome GRCh38 1.2.0. Potential contamination of ambient RNAs was corrected using SoupX. Low quality cells, including cells expressing mitochondrial genes higher than 10% and doublets/multiplets, were excluded using Seurat and doubletFinder, respectively. The individual samples were then integrated using the SCTransform method with 3,000 gene features. Principal component analysis (PCA) and clustering were performed with the Louvain algorithm applying multi-level refinement algorithm. The gene expression level of each cell was normalized using the LogNormalize method in Seurat. Cell types were annotated using the canonical marker genes described in the original paper, see related link below.
Facebook
TwitterDataset from NeurIPS2021 challenge similar to Kaggle 2022 competition: https://www.kaggle.com/competitions/open-problems-multimodal "Open Problems - Multimodal Single-Cell Integration Predict how DNA, RNA & protein measurements co-vary in single cells"
It is https://en.wikipedia.org/wiki/ATAC-seq#Single-cell_ATAC-seq single cell ATAC-seq data. And single cell RNA-seq data: https://en.wikipedia.org/wiki/Single-cell_transcriptomics#Single-cell_RNA-seq
Single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package
(For companion dataset on CITE-seq = scRNA-seq + Proteomics, see: https://www.kaggle.com/datasets/alexandervc/citeseqscrnaseqproteins-challenge-neurips2021)
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122
Expression profiling by high throughput sequencing Genome binding/occupancy profiling by high throughput sequencing Summary Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors. Half the samples were measured using the 10X Multiome Gene Expression and Chromatin Accessability kit and half were measured using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site. In the competition, participants were tasked with challenges including modality prediction, matching profiles from different modalities, and learning a joint embedding from multiple modalities.
Overall design Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors.
Contributor(s) Burkhardt DB, Lücken MD, Lance C, Cannoodt R, Pisco AO, Krishnaswamy S, Theis FJ, Bloom JM Citation https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/158f3069a435b314a80bdcb024f8e422-Abstract-round2.html
Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles
(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This collection of data is part of the RNA-Seq (HiSeq) PANCAN dataset. It is a random extraction of gene expressions of patients having different types of tumor: BRCA, KIRC, COAD, LUAD, and PRAD. Each sample contains the expression of 20,531 genes for a patient diagnosed with one of the following cancers:
| Code | Tumor Name |
|---|---|
| BRCA | Breast invasive carcinoma (breast cancer) |
| KIRC | Kidney renal clear cell carcinoma (kidney) |
| COAD | Colon adenocarcinoma (colon) |
| LUAD | Lung adenocarcinoma (lung) |
| PRAD | Prostate adenocarcinoma (prostate) |
Files:
data.csv: Gene expression matrix X (881 samples × 20,531 genes)label.csv: True class label for each sample y (881 labels)
Source: UCI ML Repository – Gene Expression Cancer RNA-Seq Data
Facebook
TwitterPremise of the study: The root apex is an important region involved in environmental sensing, but comprises a very small part of the root. Obtaining root apex transcriptomes is therefore challenging when the samples are limited. The feasibility of using tiny root sections for transcriptome analysis was examined, comparing RNA sequencing (RNA-Seq) to microarrays in characterizing genes that are relevant to spaceflight.Methods:Arabidopsis thaliana Columbia ecotype (Col-0) roots were sectioned into Zone 1 (0.5 mm; root cap and meristematic zone) and Zone 2 (1.5 mm; transition, elongation, and growth-terminating zone). Differential gene expression in each was compared.Results: Both microarrays and RNA-Seq proved applicable to the small samples. A total of 4180 genes were differentially expressed (with fold changes of 2 or greater) between Zone 1 and Zone 2. In addition, 771 unique genes and 19 novel transcriptionally active regions were identified by RNA-Seq that were not detected in microarrays. However, microarrays detected spaceflight-relevant genes that were missed in RNA-Seq. Discussion: Single root tip subsections can be used for transcriptome analysis using either RNA-Seq or microarrays. Both RNA-Seq and microarrays provided novel information. These data suggest that techniques for dealing with small, rare samples from spaceflight can be further enhanced, and that RNA-Seq may miss some spaceflight-relevant changes in gene expression.
Facebook
Twitterhttps://choosealicense.com/no-permission/https://choosealicense.com/no-permission/
Human RNA-Seq data set GSM2819712 stored in NCBI (GEO)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Scripts and data for the paper: Consequences and opportunities arising due to sparser single-cell RNA-seq datasets
With the number of cells measured in single-cell RNA sequencing (scRNA-seq) datasets increasing exponentially and concurrent increased sparsity due to more zero counts being measured for many genes, we demonstrate here that downstream analyses on binary-based gene expression give similar results as count-based analyses. Moreover, a binary representation scales up to ~ 50-fold more cells that can be analyzed using the same computational resources. We also highlight the possibilities provided by binarized scRNA-seq data. Development of specialized tools for bit-aware implementations of downstream analytical tasks will enable a more fine-grained resolution of biological heterogeneity.
Facebook
Twitter(Tab 1) DESeq2 statistics for all genes. (Tab 2–6) Lists of up- and down-regulated genes from Fig 6. First column lists the KH gene IDs for all genes that were selected for a particular analysis shown in each figure panel. The KH gene IDs for all relevant genes that were significantly up- or down-regulated are shown in blue or red font. Boxes indicate the most highly impact subset of genes as shown in the Fig 6 tables. For these genes KH IDs were paired with KY IDs and human homologs. (Tab 7) Referenced Foxf target genes. (XLSX)
Facebook
TwitterFor methodological details, see S1 Text, paragraph "RNA-Seq Analysis". (XLSX)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data provided here are part of a Galaxy Training Network tutorial that analyzes RNA-Seq data from a study published by Brooks et al. 2011 to identify genes and exons that are regulated by Pasilla gene.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The spreadsheets include pairwise comparisons of gene expression levels under specified conditions. Abbreviations used are as follows: PV for Parvalbumin; Sst for Somatostatin; CFC for Contextual Fear Conditioning; PTZ for Pentylenetetrazol; WT for wild type; cKO for conditional Nr4a1 knockouts; and HC for home cage.
Facebook
Twitterhttps://ega-archive.org/dacs/EGAC00001000145https://ega-archive.org/dacs/EGAC00001000145
Variants derived from mapped whole transcriptome RNA-Seq data from 476 human samples of early stage urothelial carcinoma.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data repository for the scMappR manuscript:
Abstract from biorXiv (https://www.biorxiv.org/content/10.1101/2020.08.24.265298v1.full).
RNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs) and reveal biological mechanisms underlying complex biological processes. RNA-seq is often performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq) methods solve this problem, technical and cost constraints currently limit its widespread use. Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores to DEGs obtained from bulk RNA-seq by integrating cell-type expression data generated by scRNA-seq and existing deconvolution methods. After benchmarking scMappR using RNA-seq data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type specific changes that occur during kidney regeneration. We found that scMappR appropriately assigned DEGs to cell-types involved in kidney regeneration, including a relatively small proportion of immune cells. While scMappR can work with any user supplied scRNA-seq data, we curated scRNA-seq expression matrices for ∼100 human and mouse tissues to facilitate its use with bulk RNA-seq data alone. Overall, scMappR is a user-friendly R package that complements traditional differential expression analysis available at CRAN.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RNA expression matrices - FPKM from STAR, cufflinks and TPM from STAR, RSEM, Toil. Fusion results from deFuse, SOAPfuse, fusioncatcher, STAR-Fusion.