80 datasets found

f
Data from: MOESM8 of Benchmarking principal component analysis for...
springernature.figshare.com
application/x-gzip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido (2023). MOESM8 of Benchmarking principal component analysis for large-scale single-cell RNA-sequencing [Dataset]. http://doi.org/10.6084/m9.figshare.11662170.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11662170.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 8 Pair plots of all the pCA (PBMCs) implementations.
f
MOESM11 of Benchmarking principal component analysis for large-scale...
springernature.figshare.com
application/x-gzip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido (2023). MOESM11 of Benchmarking principal component analysis for large-scale single-cell RNA-sequencing [Dataset]. http://doi.org/10.6084/m9.figshare.11662101.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11662101.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 11 Pair plots of all the pCA (Brain) implementations.
N
RNA-Seq with and without RNase treatment in PCa cell lines
data.niaid.nih.gov
Updated Oct 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen S; Xu X; Sores F; Hansen He H (2020). RNA-Seq with and without RNase treatment in PCa cell lines [Dataset]. https://data.niaid.nih.gov/resources?id=gse113120
Explore at:
Dataset updated
Oct 27, 2020
Dataset provided by
University of Toronto
Authors
Chen S; Xu X; Sores F; Hansen He H
Description
Standard RNA analyses using microarrays and low-coverage polyadenylation enriched RNA-Sequencing (RNA-Seq) cannot fully characterize the complexity of the cancer transcriptome. To fully elucidate the transcriptome of prostate tumours, we performed ultra-deep total RNA-Seq on 144 localized prostate tumours with long-term clinical follow up. Analysis of linear RNAs identified a transcriptomic subtype associated with the aggressive intraductal carcinoma subhistology, and a fusion gene profile that differentiates localized from metastatic prostate cancers. Analysis of back-splicing events identified widespread RNA circularization, with the average tumour expressing 7,140 distinct circular RNAs. The degree of aberrant circRNA production is correlated to disease progression in multiple clinical cohorts. Loss of function screens identified 11.3% of the screened circRNAs as essential to prostate cancer proliferation, and for 93.6% of these, their parental linear genes are not required for proliferation. Follow-up studies on circCSNK1G3 revealed its role in regulating cell cycle progression. Ultra-deep transcriptome sequencing thus provides a more comprehensive view of the linear and circular transcriptional and functional landscapes of localized prostate cancer. RNA-seq with rRNA depletion and random reverse transcription (RT) primer was performed with or without RNase R treatment in five PCa cell lines: LNCap, 22Rv1, V16A, PC-3 and 42D
f
Fan-Brainson Polycomb RNA-Seq Analysis
figshare.com
zip
Updated Nov 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert M Flight (2022). Fan-Brainson Polycomb RNA-Seq Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.13179989.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13179989.v1
Dataset updated
Nov 22, 2022
Dataset provided by
figshare
Authors
Robert M Flight
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PCA and correlation clustering analysis of RNA-Seq data.
d
Stemformatics
dknet.org
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Stemformatics [Dataset]. http://identifiers.org/RRID:SCR_017002/resolver/mentions?q=&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_017002 https://identifiers.org/RRID:SCR_017002/resolver/mentions?q=&i=rrid
Dataset updated
Apr 11, 2025
Description
Gene expression data portal developed for stem cell community, containing public gene expression datasets derived from microarray, RNA sequencing and single cell profiling technologies. Portal to visualize and download curated stem cell data. Provides easy to use and intuitive tools for biologists to visually explore data, including interactive gene expression profiles, principal component analysis plots and hierarchical clusters, among others.
f
Table1_nPCA: a linear dimensionality reduction method using a multilayer...
frontiersin.figshare.com
docx
Updated Jan 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juzeng Li; Yi Wang (2024). Table1_nPCA: a linear dimensionality reduction method using a multilayer perceptron.DOCX [Dataset]. http://doi.org/10.3389/fgene.2023.1290447.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2023.1290447.s001
Dataset updated
Jan 8, 2024
Dataset provided by
Frontiers
Authors
Juzeng Li; Yi Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: Linear dimensionality reduction techniques are widely used in many applications. The goal of dimensionality reduction is to eliminate the noise of data and extract the main features of data. Several dimension reduction methods have been developed, such as linear-based principal component analysis (PCA), nonlinear-based t-distributed stochastic neighbor embedding (t-SNE), and deep-learning-based autoencoder (AE). However, PCA only determines the projection direction with the highest variance, t-SNE is sometimes only suitable for visualization, and AE and nonlinear methods discard the linear projection.Results: To retain the linear projection of raw data and generate a better result of dimension reduction either for visualization or downstream analysis, we present neural principal component analysis (nPCA), an unsupervised deep learning approach capable of retaining richer information of raw data as a promising improvement to PCA. To evaluate the performance of the nPCA algorithm, we compare the performance of 10 public datasets and 6 single-cell RNA sequencing (scRNA-seq) datasets of the pancreas, benchmarking our method with other classic linear dimensionality reduction methods.Conclusion: We concluded that the nPCA method is a competitive alternative method for dimensionality reduction tasks.
o
Intuitive Graphical Visualization of Transcriptomes by Nonlinear...
explore.openaire.eu
Updated Jul 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yajun Liu (2020). Intuitive Graphical Visualization of Transcriptomes by Nonlinear Dimensionality Reduction Exposes Relatedness between Human Placenta Tissues [Dataset]. http://doi.org/10.5281/zenodo.3988909
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3988909
Dataset updated
Jul 1, 2020
Authors
Yajun Liu
Description
Aims: Principal component analysis (PCA) is a widely used dimensionality reduction technique in life sciences, which is usually used to create two-dimensional visualization of geometric morphological measurement data. However, because PCA cannot summarize nonlinear dependencies between variables, interesting biological information may be distorted or lost in these graphs. Nonlinear alternatives exist, but their effectiveness has never been tested on placental transcriptomic data. Methods and Results: In this study, first-trimester chorionic villus and decidua tissues were collected from 6 healthy women. Transcriptome data was acquired by RNA-seq and the expression levels of trophoblast specific transcription factors were identified by immunofluorescence. Differentially expressed genes between chorionic villus and decidua tissues and its related biological functions were identified. After that, we performed Principal Component Analysis (PCA) on these 12 samples. Furthermore, 18 published transcriptomes (a total of 425 samples) datasets of human pregnancy-related tissues (including chorionic villus and decidua, term placenta, endometrium, in vitro cell lines etc.) from public databases were collected and analyzed. At the same time, we compared two of the most widely used dimensionality reduction (DR) methods to generate 2D-map for visualization of these data. We compared the effects of different parameter settings and commonly used manifold learning methods on the results. Conclusions: The result indicates that, the nonlinear method can better preserve the small differences between different subtypes of placental tissue than PCA. Although there are public RNA-seq data available for chorionic villus and decidua tissue, this is the first time that the RNA-seq data were obtained from the chorionic villus and decidua which derived from the same patient. The datasets and analysis provide a useful source for the researchers in the field of the maternal-fetal interface and the establishment of pregnancy.
f
MOESM15 of Benchmarking principal component analysis for large-scale...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido (2023). MOESM15 of Benchmarking principal component analysis for large-scale single-cell RNA-sequencing [Dataset]. http://doi.org/10.6084/m9.figshare.11662113.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11662113.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 15 Crashed jobs caused by out-of-memory errors.
m
FabT_+or-Tween_S.pyogenesM28_Transcriptomic analysis
data.mendeley.com
Updated May 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clara Lambert (2024). FabT_+or-Tween_S.pyogenesM28_Transcriptomic analysis [Dataset]. http://doi.org/10.17632/68bhhsy2p4.1
Explore at:
Unique identifier
https://doi.org/10.17632/68bhhsy2p4.1
Dataset updated
May 14, 2024
Authors
Clara Lambert
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We used RNAseq to identify how membrane fatty acid changes might impact and explain the virulence defect during host infection. WT and mFabT expression was compared in THY, and in THY-Tween (as C18:1Δ9 source), which to activates WT FabT repression. RNA isolation and Illumina RNA-seq sequencing: GAS strains were cultured at 37°C in THY or THY-Tween, and cells were harvested during exponential growth (OD600 between 0.4 and 0.5). Independent triplicate cultures were prepared for each condition. For RNA preparation, 2 volumes of RNA protect (Qiagen) was added to cultures prior centrifugation (10 min 12,000 g) and total RNA was extracted after lysing bacteria by a 30 min 15 mg.ml-1 lysozyme, 300 U.ml-1 mutanolysin treatment at 20°C followed by two cycles of Fast-prep (power 6, 30 s) at 4 °C. RNA extraction (Macherey-Nagel RNA extraction kit; Germany) was done according to supplier instructions. RNA integrity was analyzed using an Agilent Bioanalyzer (Agilent Biotechnologies, Ca., USA). 23S and 16S rRNA were depleted from the samples using the MICROBExpress Bacterial mRNA enrichment kit (Invitrogen, France); depletion was controlled on Agilent Bioanalyzer (Agilent Biotechnologies). Libraries were prepared using an Illumina TS kit. Libraries were sequenced generating 10,000,000 to 20,000,000 75-bp-long reads per sample. RNA-Seq data analysis: The MGAS6180 strain sequence (NCBI), which is nearly identical to M28PF1, was used as a reference sequence to map sequencing reads using the STAR software (2.5.2b) BIOCONDA (Anaconda Inc). RNA-seq data were analyzed using the hclust function and a principal component analysis in R 3.5.1 (version 2018-07-02). For differential expression analysis, normalization and statistical analyses were performed using the SARTools package and DESeq2 p-values were calculated and adjusted for multiple testing using the false discovery rate controlling procedure. We used UpsetR to visualize set intersections in a matrix layout comprising the mFabT versus the WT strain grown in THY and in THY-Tween, and growth in THY-Tween versus THY for each strain.
I
Th1 Polarization of JIA Synovial Fluid T Cells
data.niaid.nih.gov
url
Updated Aug 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lauren Henderson (2023). Th1 Polarization of JIA Synovial Fluid T Cells [Dataset]. http://doi.org/10.21430/M3LNUWU2LK
Explore at:
urlAvailable download formats
Unique identifier
https://doi.org/10.21430/M3LNUWU2LK
Dataset updated
Aug 25, 2023
Dataset provided by
AMP Network
Authors
Lauren Henderson
License
https://www.immport.org/agreementhttps://www.immport.org/agreement
Description
Oligoarticular juvenile idiopathic arthritis (oligo JIA) is the most common form of chronic inflammatory arthritis in children; yet, the cause of this disease remains unknown. We hoped that through disease pathway characterization, we could better understand immune responses in oligo JIA, and generate suggestions for means of controlling arthritic flares in oligo JIA. To do this we conducted detailed immunophenotyping of joint-infiltrating CD4+ T cells and the stability of Tregs in oligo JIA, which are found in the joints of affected patients. This was done with flow cytometry, bulk and single-cell RNA sequencing, DNA methylation studies, and Treg suppression assays. Within our study we enrolled 34 patients with oligo JIA, defined by ILAR criteria, who provided SF and peripheral blood (PB) samples. PB was also obtained from 8 pediatric and 9 adult controls. Flow cytometry was used to evaluate the T cell compartment in oligo JIA. Memory CD4, memory CD8, and gamma sigma T cells were enriched in oligo JIA joints. The frequencies of CD8+ memory T (Tmem) cells expressing the Th1 cytokine (IFNgamma) and chemokine receptor (CXCR3) in oligo JIA SF were assessed and compared to control PB. Similarly, the proportion of gamma sigma T lymphocytes expressing CXCR3 was compared between our two groups. We characterized CD4+ Tmem with additional flow cytometry studies. Paired PB and SF samples confirmed enrichment of CXCR3+ and IFNgamma+ CD4+ Tmem in oligo JIA joints. To assess for non-classical Th1 cells that jointly express Th1 and Th17 features, we evaluated the fraction of CD4+ T cells producing IFNgamma and IL-17. Because T cell stimulation alters expression of the Th17-associated chemokine receptor, CCR6; therefore, CD161 was used as an alternative marker of Th17 cells. To further understand gene expression in CD4+ T cells in oligo JIA, Tregs and Teffs from patients and controls were assessed with bulk RNA-sequencing (RNA-seq). Principal component analysis (PCA) of the transcriptomic data segregated samples by compartment (PB versus SF) and cell type (Teff versus Treg), even for patients receiving methotrexate. Gene set enrichment analysis (GSEA) was used to examine IFNgamma signaling gene sets in SF Tregs and SF Teffs compared to PB Tregs and PB Teffs. Gene sets related to antigen presentation, T cell receptor (TCR) signaling, and type I interferons were also examined in SF Tregs and SF Teffs. To assess for the possibility of cytokine-producing SF Tregs, indicating that these cells may have been reprogrammed to an effector population, we evaluated the transcriptomic signature of Tregs in oligo JIA. Treg-associated transcripts remained significantly elevated in SF Tregs compared to PB Tregs. To determine the stability and functionality of Th1-skewed (CXCR3+) SF Tregs, we used methylation studies and suppression assays. Because, Co-expression of Th1- and Th17-related genes and the robustness of the Treg transcriptomic signature in the sub-population of Tregs with Th1 features cannot be determined from bulk RNA-seq data. single-cell RNA sequencing (scRNA-seq) and TCR repertoire analysis on sorted Tregs and Teffs from the SF of 2 oligo JIA patients. Lastly, complete TCR data (paired CDR3? and CDR3? sequences) were recovered for a total of 5,509 cells (89% of the single-cell transcriptomic dataset).
f
DataSheet2_Dimensionality Reduction and Louvain Agglomerative Hierarchical...
frontiersin.figshare.com
txt
Updated Jun 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soumita Seth; Saurav Mallik; Tapas Bhadra; Zhongming Zhao (2023). DataSheet2_Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data.CSV [Dataset]. http://doi.org/10.3389/fgene.2022.828479.s002
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2022.828479.s002
Dataset updated
Jun 9, 2023
Dataset provided by
Frontiers
Authors
Soumita Seth; Saurav Mallik; Tapas Bhadra; Zhongming Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method “LogNormalize” for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty “significant”principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a log 2FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.
d
The c.119-123dup5bp mutation in human gamma-C-crystallin destabilizes the...
search.dataone.org
datadryad.org
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Hejtmancik (2025). The c.119-123dup5bp mutation in human gamma-C-crystallin destabilizes the protein and activates the unfolded protein response to cause highly variable cataracts [Dataset]. http://doi.org/10.5061/dryad.rn8pk0pmm
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.rn8pk0pmm
Dataset updated
Feb 26, 2025
Dataset provided by
Dryad Digital Repository
Authors
James Hejtmancik
Description
Ordered cellular architecture and high concentrations of stable crystallins are required for the lens to maintain transparency. Here we investigate the molecular mechanism of cataractogenesis of the CRYGC c.119-123dupGCGGC (p.Cys42AlafsX63) (CRYGC5bpd) mutation. Lenses were extracted from wild-type and transgenic mice carrying the CRYGC5bpdup minigene and RNA was isolated and converted into cDNA. Expression of genes in the unfolded protein response (UPR) pathways was estimated by qRT-PCR and RNA seq and pathway analysis was carried out using the Qiagen IPA website. P3W Transgenic mice exhibited phenotypic diversity with a dimorphic population of severe and clear lenses. PCA of RNA seq data showed separate clustering of wild-type, clear CRYGC5bpd, and severe CRYGC5bpd lenses. Transgenic mice showed differential upregulation in Master regulator Grp78 (Hspa5) and downstream targets in the PERK-dependent UPR pathway including Atf4 and Chop (Ddit3), but not GADD34. Thus, high levels of CRYGC..., RNA isolation, cDNA synthesis, and qRT-PCR Total RNA was isolated using an RNA isolation kit (The RNeasy Plus Mini Kit; Qiagen, Valencia, CA) and quantified using a spectrophotometer (Nanodrop 2000C; ThermoFisher). A first-strand cDNA was synthesized from approximately 0.5mg of total RNA by cDNA synthesis kit (Super III first-strand synthesis for RT PCR kit; Invitrogen) according to the manufacturer's protocol. qRT-PCR was performed using Applied Biosystems ViiA7 Real-Time PCR system with the following amplification conditions: an initial incubation of the samples at 50Â°C for 2min and denaturation at 95Â°C 15min followed by 40 cycles of denaturation, annealing, and extension at 95Â°C 15sec, 60Â°C 30sec, and 72Â°C 30sec. Gapdh was used as an endogenous control for normalizing the target mRNA. The relative expression of each target gene was calculated using the 2^(âˆ†âˆ†Ct) method. The primers were standardized, and efficiencies were tested before performing qRT-PCR. RNA-Seq About 200ng of ..., , # The c.119-123dup5bp mutation in human gamma-C-crystallin destabilizes the protein and activates the unfolded protein response to cause highly variable cataracts

https://doi.org/10.5061/dryad.rn8pk0pmm

Description of the data and file structure

The RNASeq and qRT-PCR files included refer to mice transgenic for a CRYGC c.119-123dupGCGGC (p.Cys42AlafsX63) (CRYGC5bpd) mutation. Lenses were extracted from wild-type and transgenic mice carrying the CRYGC5bpdup minigene and RNA was isolated and converted into cDNA and submitted to Novogene for RNASeq analysis. The descriptions of the mice are given in Ma, Z. et al. Overexpression of human Î³C-crystallin 5bp duplication Disrupts Lens Morphology in Transgenic Mice. Invest Ophthalmol Vis Sci 52, 5269-5375 (2011).

Identification of CRYGC as the causative gene is described in Ren, Z. et al. A 5-base insertion in the Î³C-crystallin gene is associated with autosomal dominant variable zonular pulveru...
Z
Processed data for the sRNA landscape chapter
data.niaid.nih.gov
Updated Mar 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
marc galland (2023). Processed data for the sRNA landscape chapter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4277026
Explore at:
Dataset updated
Mar 22, 2023
Dataset provided by
petra bleeker
marc galland
Michelle van der Gragt
Description
Processed data to be used in analyses related to the sRNA landscape.

1) small RNA processed data from stem trichomes: 2020-12-03_results_small_rna_stem_trichomes.tar.gz

Original small RNA-seq fastq files: available at https://doi.org/10.5281/zenodo.4105911

Software: small-rna-seq-pipeline v0.4.3 available at https://zenodo.org/record/3773230

2) small RNA processed data from bald stem, leaf primordium and leaf:

xxxx === to be added === xxx

3) mRNA-seq processed data (raw and scaled counts) from different tissues (stem trichomes, bald stem, leaf, leaf primordium): 20201117_snakemake_messenger_rnaseq_trichomes_and_other_tissues.tar.gz

Original mRNA-seq fastq files:

Stem trichomes of Moneymaker: dataset available here

Stem trichomes of LA0716: dataset available here

Stem trichomes of PI127826: dataset available here

Bald stems, leaf primordia and leaves of Moneymaker, LA0716 and PI127826: datasets are available here. Samples S28 to S48 were used.

Software: Snakemake RNA-seq release 0.3.4

raw_counts.parsed.tsv: contains the raw counts that can be used for differential expression analysis (e.g. with DESeq2).

scaled_counts.tsv: contains counts that are scaled between samples. This can be used for heatmap creation or PCA analysis for instance. NOT for differential analysis.

samples.tsv.: a file listing the fastq files analysed.

config.yaml: a file that contains the parameters used when running the pipeline.
N
Muscle mRNA profiles from subjects enrolled in the ATLAS clinical study...
data.niaid.nih.gov
Updated Jun 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rinsch C; D'Amico D; Singh A (2022). Muscle mRNA profiles from subjects enrolled in the ATLAS clinical study (NCT03464500) [Dataset]. https://data.niaid.nih.gov/resources?id=gse197273
Explore at:
Dataset updated
Jun 1, 2022
Dataset provided by
Amazentis SA
Authors
Rinsch C; D'Amico D; Singh A
Description
Purpose: muscle transcriptomics of subjects supplemented with Urolithin A at different doses or placebo for 4 month.Method: RNA-seq was performed using Illumina HiSeq 4000 sequencing; single read 1 x 50 bp . The quantification of mRNA from the RNA-seq FASTQ files was performed using Salmon. Sample-wise quant.sf files containing raw transcript-level read estimates were read into R, v. 4.0.3 and were combined into a data matrix. Transcripts with very low total counts (< 10) across all samples were filtered out. The data was transformed using the variance stabilizing transformation (VST) method of R package DESeq2, v. 1.30.0. Top 10,000 transcripts with the highest variance across all samples were used for principal component analysis (PCA) using DESeq2. Data transformation and PCA was also done separately for each treatment group. Based on the PCAs, probable outlier samples were excluded and new PCAs were plotted without these samples. The raw transcript-level read count estimates were read in R and summarized to gene-level counts based on the provided transcript and gene ID annotations using summarizeToGene function of R package tximport, v. 1.18.0. DESeqDataSetFromTximport function of DESeq2 was then used for constructing a DESeqDataSet object for DE analysis. Pre-filtering was applied before the DE analysis by excluding genes with < 10 total counts across samples. Subset DE analysis was performed, contrasting Visit time D120 with baseline (BL) and by adjusting for the subject effect. The normalization and DE analysis was done separately for the three different treatment groups. Independent filtering option of DESeq2 was enabled (default), filtering out genes with very low counts and thus unlikely to show significant evidence. R package biomaRt. v. 2.46.0 was used for annotating the results with HGNC gene symbols, gene descriptions and gene biotypes. DESeq2-normalised expression values of all the samples in the given comparison were added to the result tables. Non-adjusted p-value 0.05 was used to filter the results by statistical significance. Results were also generated using DESeq2 function lfcShrink that allows for the shrinkage of the log2 fold change (LFC) estimates toward zero when the information for a gene is low (such as in those cases with low counts or high dispersion values) but has little effect on genes with high counts. The shrinked log2FC values were subsequently used for visualisation and ranking the genes. Muscle mRNA profiles from subjects enrolled in the ATLAS clinical study (NCT03464500)
f
MOESM25 of Benchmarking principal component analysis for large-scale...
springernature.figshare.com
html
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido (2023). MOESM25 of Benchmarking principal component analysis for large-scale single-cell RNA-sequencing [Dataset]. http://doi.org/10.6084/m9.figshare.11662146.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11662146.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 25 Comparison of normalizing size factors.
r
Data from: Glucocorticoid activity regulates glucocorticoid receptor isoform...
researchdata.edu.au
pdf
Updated Feb 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ms Zarqa Saif; Ms Zarqa Saif; Honorary Professor Vicki Clifton; Honorary Professor Vicki Clifton; Dr Sahar Keshvari; Dr Sahar Keshvari; Dr Jack Lockett; Dr Jack Lockett; Dr Adam Ewing; Dr Adam Ewing; Associate Professor Warrick Inder; Associate Professor Warrick Inder (2022). Glucocorticoid activity regulates glucocorticoid receptor isoform expression and downstream gene transcription in humans [Dataset]. http://doi.org/10.48610/EE0C38E
Explore at:
pdf(2693314)Available download formats
Unique identifier
https://doi.org/10.48610/EE0C38E
Dataset updated
Feb 24, 2022
Dataset provided by
The University of Queensland
Authors
Ms Zarqa Saif; Ms Zarqa Saif; Honorary Professor Vicki Clifton; Honorary Professor Vicki Clifton; Dr Sahar Keshvari; Dr Sahar Keshvari; Dr Jack Lockett; Dr Jack Lockett; Dr Adam Ewing; Dr Adam Ewing; Associate Professor Warrick Inder; Associate Professor Warrick Inder
License
http://guides.library.uq.edu.au/deposit_your_data/terms_and_conditionshttp://guides.library.uq.edu.au/deposit_your_data/terms_and_conditions
Description
Supplementary materials for manuscript to be published:- Complete list of R/Python packages used in analysis- Table S1 – Pearson correlations between isoform expression at 0800 h. C = cytoplasmic, N = nuclear. p < 0.05*; p < 0.01**; p < 0.001***; p < 0.0001 ****.- Table S2 – Pearson correlations between isoform expression at 1200 h. C = cytoplasmic, N = nuclear. p < 0.05*; p < 0.01**; p < 0.001***; p < 0.0001 ****.- Table S3 – RNA-Seq gene expression correlations with nuclear GR isoform expression at 0800 h (Spearman’s, FDR q < 0.05)Table S4 – RNA-Seq gene expression correlations with cytoplasmic GR isoform expression at 0800 h (Spearman’s, FDR q < 0.05)- Figure S1 – Differences in GR isoform expression between groups at 1200 h for cytoplasmic (A) and nuclear (B) locations, assessed using linear regression with adjustment for age, menopausal status and BMI, post-hoc comparisons using Tukey’s test. p < 0.05*, p < 0.01**, p < 0.001 , p < 0.0001*.- Figure S2 – Principle components analysis of isoform expression at 0800h across groups, 89.6% of variance in data explained by PC1 and PC2 – participants cluster together based isoform expression. Note: GRP expression and participants not expressing all other isoforms excluded from analysis as PCA requires a complete dataset.- Figure S3 – Spearman correlations between RNASeq read count and nuclear GR isoform expression displayed in a Venn diagram. Each oval represents the isoform labelled and intersections represent correlations shared by the isoforms included. Counts represent the number (%) of genes correlated with each isoform/combination of isoforms (total = 78).Figure S4 – Spearman correlations between RNASeq read count and cytoplasmic GR isoform expression. UpSet plot showing the number of genes correlated with each isoform (left histogram, Set Size), the combinations of isoforms (grey dots), and the number of genes correlated with that combination (upper histogram, Interaction Size).- Figure S5 – Pathway enrichment analysis of genes correlated with at least one GR isoform (cytoplasmic or nuclear, n = 218; FDR q < 0.05). Nodes represent gene pathways (Gene Ontology: Biological Process or Reactome Pathway) with size based on total number of genes in pathway, edges represent overlap between pathways with size based on number of genes overlap. Full list of genes included in analysis available in supplementary data (Table S3 and S4).- Figure S6 – Pathway enrichment analysis of genes differentially expressed in other groups compared with AI (n = 91; FDR q < 0.05). Nodes represent gene pathways (Gene Ontology: Biological Process or Reactome Pathway) with size and colour intensity based on total number of genes in pathway, edges represent overlap between pathways with size based on number of genes overlap.- Supplementary References list
Cancer prediction
kaggle.com
zip
Updated Sep 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Parashar (2020). Cancer prediction [Dataset]. https://www.kaggle.com/abhiparashar/cancer-prediction
Explore at:
zip(4522896 bytes)Available download formats
Dataset updated
Sep 14, 2020
Authors
Abhishek Parashar
Description
Context

Pancreatic Adenocarcinoma (PAAD) is the third most common cause of death from cancer, with an overall 5-year survival rate of less than 5%, and is predicted to become the second leading cause of cancer mortality in the United States by 2030. Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life.

Content

RNA-Seq (RNA sequencing), is a sequencing technique to detect the quantity of RNA in a biological sample at a given moment. Here we have a dataset of normalized RNA Sequencing reads for pancreatic cancer tumors. The measurement consists of ~20,000 genes for 185 pancreatic cancer tumors. The file format is GCT , a tab-delimited file used for sharing gene expression data and metadata (details for each sample) for samples.

● The R package cmapR can be used for reading GCTs in R. ● The python package cmapPy can be used for reading GCTs in python. ● Phantasus is an open source tool which is used to visualise GCT files, make various plots, apply algorithms like clustering and PCA among others.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5167145%2F0b806e97194db0142fc32c603e2cee96%2Fdownload.jpg?generation=1600082671314888&alt=media" alt="">

Acknowledgements

Source - Pancreatic cancer survival analysis defines a signature that predicts outcome - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6084949/
n
Data from: TLR and IMD signaling pathways from Caligus rogercresseyi...
data.niaid.nih.gov
search.dataone.org
+2more
zip
Updated Jan 6, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentina Valenzuela-Muñoz; Cristian Gallardo-Escárate (2014). TLR and IMD signaling pathways from Caligus rogercresseyi (Crustacea: Copepoda): in silico gene expression and SNPs discovery [Dataset]. http://doi.org/10.5061/dryad.sm32j
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.sm32j
Dataset updated
Jan 6, 2014
Dataset provided by
University of Concepción
Authors
Valentina Valenzuela-Muñoz; Cristian Gallardo-Escárate
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
X Region, Chile
Description
The Toll and IMD signaling pathways represent one of the first lines of innate immune defense in invertebrates like Drosophila. However, for crustaceans like Caligus rogercresseyi, there is very little genomic information and, consequently, understanding of immune mechanisms. Massive sequencing data obtained for three developmental stages of C. rogercresseyi were used to evaluate in silico the expression patterns and presence of SNPs variants in genes involved in the Toll and IMD pathways. Through RNA-seq analysis, which used 20 contigs corresponding to relevant genes of the Toll and IMD pathways, an overexpression of genes linked to the Toll pathway, such as toll3 and Dorsal, were observed in the copepod stage. For the chalimus and adult stages, overexpression of genes in both pathways, such as Akirin and Tollip and IAP and Toll9, respectively, were observed. On the other hand, PCA statistical analysis inferred that in the chalimus and adult stages, the immune response mechanism was more developed, as evidenced by a relation between these two stages and the genes of both pathways. Moreover, 136 SNPs were identified for 20 contigs in genes of the Toll and IMD pathways. This study provides transcriptomic information about the immune response mechanisms of Caligus, thus providing a foundation for the development of new control strategies through blocking the innate immune response.
Additional file 6 of Robust principal component analysis for accurate...
springernature.figshare.com
xlsx
Updated Feb 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaoying Chen; Bo Zhang; Ting Wang; Azad Bonni; Guoyan Zhao (2024). Additional file 6 of Robust principal component analysis for accurate outlier sample detection in RNA-Seq data [Dataset]. http://doi.org/10.6084/m9.figshare.12586252.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12586252.v1
Dataset updated
Feb 5, 2024
Dataset provided by
figshare
Authors
Xiaoying Chen; Bo Zhang; Ting Wang; Azad Bonni; Guoyan Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 6. Supplemental Table 6: List of genes validated by qRT-PCR with primer sequences, results and DEG status in each analysis strategy.
f
Scripts for Analysis
figshare.com
txt
Updated Jul 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sneddon Lab UCSF (2018). Scripts for Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6783569.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6783569.v2
Dataset updated
Jul 18, 2018
Dataset provided by
figshare
Authors
Sneddon Lab UCSF
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.

Facebook

Twitter

Click to copy link

Link copied

Cite

Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido (2023). MOESM8 of Benchmarking principal component analysis for large-scale single-cell RNA-sequencing [Dataset]. http://doi.org/10.6084/m9.figshare.11662170.v1

Data from: MOESM8 of Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Explore at:

application/x-gzipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.11662170.v1

Dataset updated

May 31, 2023

Dataset provided by

figshare

Authors

Koki Tsuyuzaki; Hiroyuki Sato; Kenta Sato; Itoshi Nikaido

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Additional file 8 Pair plots of all the pCA (PBMCs) implementations.

Clear search

Close search

Google apps

Main menu

Data from: MOESM8 of Benchmarking principal component analysis for...

MOESM11 of Benchmarking principal component analysis for large-scale...

RNA-Seq with and without RNase treatment in PCa cell lines

Fan-Brainson Polycomb RNA-Seq Analysis

Stemformatics

Table1_nPCA: a linear dimensionality reduction method using a multilayer...

Intuitive Graphical Visualization of Transcriptomes by Nonlinear...

MOESM15 of Benchmarking principal component analysis for large-scale...

FabT_+or-Tween_S.pyogenesM28_Transcriptomic analysis

Th1 Polarization of JIA Synovial Fluid T Cells

DataSheet2_Dimensionality Reduction and Louvain Agglomerative Hierarchical...

The c.119-123dup5bp mutation in human gamma-C-crystallin destabilizes the...

Description of the data and file structure

Processed data for the sRNA landscape chapter

Muscle mRNA profiles from subjects enrolled in the ATLAS clinical study...

MOESM25 of Benchmarking principal component analysis for large-scale...

Data from: Glucocorticoid activity regulates glucocorticoid receptor isoform...

Cancer prediction

Context

Content

Acknowledgements

Data from: TLR and IMD signaling pathways from Caligus rogercresseyi...

Additional file 6 of Robust principal component analysis for accurate...

Scripts for Analysis

Data from: MOESM8 of Benchmarking principal component analysis for large-scale single-cell RNA-sequencing