Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 8 Pair plots of all the pCA (PBMCs) implementations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 11 Pair plots of all the pCA (Brain) implementations.
Standard RNA analyses using microarrays and low-coverage polyadenylation enriched RNA-Sequencing (RNA-Seq) cannot fully characterize the complexity of the cancer transcriptome. To fully elucidate the transcriptome of prostate tumours, we performed ultra-deep total RNA-Seq on 144 localized prostate tumours with long-term clinical follow up. Analysis of linear RNAs identified a transcriptomic subtype associated with the aggressive intraductal carcinoma subhistology, and a fusion gene profile that differentiates localized from metastatic prostate cancers. Analysis of back-splicing events identified widespread RNA circularization, with the average tumour expressing 7,140 distinct circular RNAs. The degree of aberrant circRNA production is correlated to disease progression in multiple clinical cohorts. Loss of function screens identified 11.3% of the screened circRNAs as essential to prostate cancer proliferation, and for 93.6% of these, their parental linear genes are not required for proliferation. Follow-up studies on circCSNK1G3 revealed its role in regulating cell cycle progression. Ultra-deep transcriptome sequencing thus provides a more comprehensive view of the linear and circular transcriptional and functional landscapes of localized prostate cancer. RNA-seq with rRNA depletion and random reverse transcription (RT) primer was performed with or without RNase R treatment in five PCa cell lines: LNCap, 22Rv1, V16A, PC-3 and 42D
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PCA and correlation clustering analysis of RNA-Seq data.
Gene expression data portal developed for stem cell community, containing public gene expression datasets derived from microarray, RNA sequencing and single cell profiling technologies. Portal to visualize and download curated stem cell data. Provides easy to use and intuitive tools for biologists to visually explore data, including interactive gene expression profiles, principal component analysis plots and hierarchical clusters, among others.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Linear dimensionality reduction techniques are widely used in many applications. The goal of dimensionality reduction is to eliminate the noise of data and extract the main features of data. Several dimension reduction methods have been developed, such as linear-based principal component analysis (PCA), nonlinear-based t-distributed stochastic neighbor embedding (t-SNE), and deep-learning-based autoencoder (AE). However, PCA only determines the projection direction with the highest variance, t-SNE is sometimes only suitable for visualization, and AE and nonlinear methods discard the linear projection.Results: To retain the linear projection of raw data and generate a better result of dimension reduction either for visualization or downstream analysis, we present neural principal component analysis (nPCA), an unsupervised deep learning approach capable of retaining richer information of raw data as a promising improvement to PCA. To evaluate the performance of the nPCA algorithm, we compare the performance of 10 public datasets and 6 single-cell RNA sequencing (scRNA-seq) datasets of the pancreas, benchmarking our method with other classic linear dimensionality reduction methods.Conclusion: We concluded that the nPCA method is a competitive alternative method for dimensionality reduction tasks.
Aims: Principal component analysis (PCA) is a widely used dimensionality reduction technique in life sciences, which is usually used to create two-dimensional visualization of geometric morphological measurement data. However, because PCA cannot summarize nonlinear dependencies between variables, interesting biological information may be distorted or lost in these graphs. Nonlinear alternatives exist, but their effectiveness has never been tested on placental transcriptomic data. Methods and Results: In this study, first-trimester chorionic villus and decidua tissues were collected from 6 healthy women. Transcriptome data was acquired by RNA-seq and the expression levels of trophoblast specific transcription factors were identified by immunofluorescence. Differentially expressed genes between chorionic villus and decidua tissues and its related biological functions were identified. After that, we performed Principal Component Analysis (PCA) on these 12 samples. Furthermore, 18 published transcriptomes (a total of 425 samples) datasets of human pregnancy-related tissues (including chorionic villus and decidua, term placenta, endometrium, in vitro cell lines etc.) from public databases were collected and analyzed. At the same time, we compared two of the most widely used dimensionality reduction (DR) methods to generate 2D-map for visualization of these data. We compared the effects of different parameter settings and commonly used manifold learning methods on the results. Conclusions: The result indicates that, the nonlinear method can better preserve the small differences between different subtypes of placental tissue than PCA. Although there are public RNA-seq data available for chorionic villus and decidua tissue, this is the first time that the RNA-seq data were obtained from the chorionic villus and decidua which derived from the same patient. The datasets and analysis provide a useful source for the researchers in the field of the maternal-fetal interface and the establishment of pregnancy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 15 Crashed jobs caused by out-of-memory errors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We used RNAseq to identify how membrane fatty acid changes might impact and explain the virulence defect during host infection. WT and mFabT expression was compared in THY, and in THY-Tween (as C18:1Δ9 source), which to activates WT FabT repression. RNA isolation and Illumina RNA-seq sequencing: GAS strains were cultured at 37°C in THY or THY-Tween, and cells were harvested during exponential growth (OD600 between 0.4 and 0.5). Independent triplicate cultures were prepared for each condition. For RNA preparation, 2 volumes of RNA protect (Qiagen) was added to cultures prior centrifugation (10 min 12,000 g) and total RNA was extracted after lysing bacteria by a 30 min 15 mg.ml-1 lysozyme, 300 U.ml-1 mutanolysin treatment at 20°C followed by two cycles of Fast-prep (power 6, 30 s) at 4 °C. RNA extraction (Macherey-Nagel RNA extraction kit; Germany) was done according to supplier instructions. RNA integrity was analyzed using an Agilent Bioanalyzer (Agilent Biotechnologies, Ca., USA). 23S and 16S rRNA were depleted from the samples using the MICROBExpress Bacterial mRNA enrichment kit (Invitrogen, France); depletion was controlled on Agilent Bioanalyzer (Agilent Biotechnologies). Libraries were prepared using an Illumina TS kit. Libraries were sequenced generating 10,000,000 to 20,000,000 75-bp-long reads per sample. RNA-Seq data analysis: The MGAS6180 strain sequence (NCBI), which is nearly identical to M28PF1, was used as a reference sequence to map sequencing reads using the STAR software (2.5.2b) BIOCONDA (Anaconda Inc). RNA-seq data were analyzed using the hclust function and a principal component analysis in R 3.5.1 (version 2018-07-02). For differential expression analysis, normalization and statistical analyses were performed using the SARTools package and DESeq2 p-values were calculated and adjusted for multiple testing using the false discovery rate controlling procedure. We used UpsetR to visualize set intersections in a matrix layout comprising the mFabT versus the WT strain grown in THY and in THY-Tween, and growth in THY-Tween versus THY for each strain.
https://www.immport.org/agreementhttps://www.immport.org/agreement
Oligoarticular juvenile idiopathic arthritis (oligo JIA) is the most common form of chronic inflammatory arthritis in children; yet, the cause of this disease remains unknown. We hoped that through disease pathway characterization, we could better understand immune responses in oligo JIA, and generate suggestions for means of controlling arthritic flares in oligo JIA. To do this we conducted detailed immunophenotyping of joint-infiltrating CD4+ T cells and the stability of Tregs in oligo JIA, which are found in the joints of affected patients. This was done with flow cytometry, bulk and single-cell RNA sequencing, DNA methylation studies, and Treg suppression assays. Within our study we enrolled 34 patients with oligo JIA, defined by ILAR criteria, who provided SF and peripheral blood (PB) samples. PB was also obtained from 8 pediatric and 9 adult controls. Flow cytometry was used to evaluate the T cell compartment in oligo JIA. Memory CD4, memory CD8, and gamma sigma T cells were enriched in oligo JIA joints. The frequencies of CD8+ memory T (Tmem) cells expressing the Th1 cytokine (IFNgamma) and chemokine receptor (CXCR3) in oligo JIA SF were assessed and compared to control PB. Similarly, the proportion of gamma sigma T lymphocytes expressing CXCR3 was compared between our two groups. We characterized CD4+ Tmem with additional flow cytometry studies. Paired PB and SF samples confirmed enrichment of CXCR3+ and IFNgamma+ CD4+ Tmem in oligo JIA joints. To assess for non-classical Th1 cells that jointly express Th1 and Th17 features, we evaluated the fraction of CD4+ T cells producing IFNgamma and IL-17. Because T cell stimulation alters expression of the Th17-associated chemokine receptor, CCR6; therefore, CD161 was used as an alternative marker of Th17 cells. To further understand gene expression in CD4+ T cells in oligo JIA, Tregs and Teffs from patients and controls were assessed with bulk RNA-sequencing (RNA-seq). Principal component analysis (PCA) of the transcriptomic data segregated samples by compartment (PB versus SF) and cell type (Teff versus Treg), even for patients receiving methotrexate. Gene set enrichment analysis (GSEA) was used to examine IFNgamma signaling gene sets in SF Tregs and SF Teffs compared to PB Tregs and PB Teffs. Gene sets related to antigen presentation, T cell receptor (TCR) signaling, and type I interferons were also examined in SF Tregs and SF Teffs. To assess for the possibility of cytokine-producing SF Tregs, indicating that these cells may have been reprogrammed to an effector population, we evaluated the transcriptomic signature of Tregs in oligo JIA. Treg-associated transcripts remained significantly elevated in SF Tregs compared to PB Tregs. To determine the stability and functionality of Th1-skewed (CXCR3+) SF Tregs, we used methylation studies and suppression assays. Because, Co-expression of Th1- and Th17-related genes and the robustness of the Treg transcriptomic signature in the sub-population of Tregs with Th1 features cannot be determined from bulk RNA-seq data. single-cell RNA sequencing (scRNA-seq) and TCR repertoire analysis on sorted Tregs and Teffs from the SF of 2 oligo JIA patients. Lastly, complete TCR data (paired CDR3? and CDR3? sequences) were recovered for a total of 5,509 cells (89% of the single-cell transcriptomic dataset).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method “LogNormalize” for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty “significant”principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a log 2FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.
Ordered cellular architecture and high concentrations of stable crystallins are required for the lens to maintain transparency. Here we investigate the molecular mechanism of cataractogenesis of the CRYGC c.119-123dupGCGGC (p.Cys42AlafsX63) (CRYGC5bpd) mutation. Lenses were extracted from wild-type and transgenic mice carrying the CRYGC5bpdup minigene and RNA was isolated and converted into cDNA. Expression of genes in the unfolded protein response (UPR) pathways was estimated by qRT-PCR and RNA seq and pathway analysis was carried out using the Qiagen IPA website. P3W Transgenic mice exhibited phenotypic diversity with a dimorphic population of severe and clear lenses. PCA of RNA seq data showed separate clustering of wild-type, clear CRYGC5bpd, and severe CRYGC5bpd lenses. Transgenic mice showed differential upregulation in Master regulator Grp78 (Hspa5) and downstream targets in the PERK-dependent UPR pathway including Atf4 and Chop (Ddit3), but not GADD34. Thus, high levels of CRYGC..., RNA isolation, cDNA synthesis, and qRT-PCR Total RNA was isolated using an RNA isolation kit (The RNeasy Plus Mini Kit; Qiagen, Valencia, CA) and quantified using a spectrophotometer (Nanodrop 2000C; ThermoFisher). A first-strand cDNA was synthesized from approximately 0.5mg of total RNA by cDNA synthesis kit (Super III first-strand synthesis for RT PCR kit; Invitrogen) according to the manufacturer's protocol. qRT-PCR was performed using Applied Biosystems ViiA7 Real-Time PCR system with the following amplification conditions: an initial incubation of the samples at 50°C for 2min and denaturation at 95°C 15min followed by 40 cycles of denaturation, annealing, and extension at 95°C 15sec, 60°C 30sec, and 72°C 30sec. Gapdh was used as an endogenous control for normalizing the target mRNA. The relative expression of each target gene was calculated using the 2^(∆∆Ct) method. The primers were standardized, and efficiencies were tested before performing qRT-PCR. RNA-Seq About 200ng of ..., , # The c.119-123dup5bp mutation in human gamma-C-crystallin destabilizes the protein and activates the unfolded protein response to cause highly variable cataracts
https://doi.org/10.5061/dryad.rn8pk0pmm
The RNASeq and qRT-PCR files included refer to mice transgenic for a CRYGC c.119-123dupGCGGC (p.Cys42AlafsX63) (CRYGC5bpd) mutation. Lenses were extracted from wild-type and transgenic mice carrying the CRYGC5bpdup minigene and RNA was isolated and converted into cDNA and submitted to Novogene for RNASeq analysis. The descriptions of the mice are given in Ma, Z. et al. Overexpression of human γC-crystallin 5bp duplication Disrupts Lens Morphology in Transgenic Mice. Invest Ophthalmol Vis Sci 52, 5269-5375 (2011).
Identification of CRYGC as the causative gene is described in Ren, Z. et al. A 5-base insertion in the γC-crystallin gene is associated with autosomal dominant variable zonular pulveru...
Processed data to be used in analyses related to the sRNA landscape.
1) small RNA processed data from stem trichomes: 2020-12-03_results_small_rna_stem_trichomes.tar.gz
Original small RNA-seq fastq files: available at https://doi.org/10.5281/zenodo.4105911
Software: small-rna-seq-pipeline v0.4.3 available at https://zenodo.org/record/3773230
2) small RNA processed data from bald stem, leaf primordium and leaf:
xxxx === to be added === xxx
3) mRNA-seq processed data (raw and scaled counts) from different tissues (stem trichomes, bald stem, leaf, leaf primordium): 20201117_snakemake_messenger_rnaseq_trichomes_and_other_tissues.tar.gz
Original mRNA-seq fastq files:
Stem trichomes of Moneymaker: dataset available here
Stem trichomes of LA0716: dataset available here
Stem trichomes of PI127826: dataset available here
Bald stems, leaf primordia and leaves of Moneymaker, LA0716 and PI127826: datasets are available here. Samples S28 to S48 were used.
Software: Snakemake RNA-seq release 0.3.4
raw_counts.parsed.tsv: contains the raw counts that can be used for differential expression analysis (e.g. with DESeq2).
scaled_counts.tsv: contains counts that are scaled between samples. This can be used for heatmap creation or PCA analysis for instance. NOT for differential analysis.
samples.tsv.: a file listing the fastq files analysed.
config.yaml: a file that contains the parameters used when running the pipeline.
Purpose: muscle transcriptomics of subjects supplemented with Urolithin A at different doses or placebo for 4 month.Method: RNA-seq was performed using Illumina HiSeq 4000 sequencing; single read 1 x 50 bp . The quantification of mRNA from the RNA-seq FASTQ files was performed using Salmon. Sample-wise quant.sf files containing raw transcript-level read estimates were read into R, v. 4.0.3 and were combined into a data matrix. Transcripts with very low total counts (< 10) across all samples were filtered out. The data was transformed using the variance stabilizing transformation (VST) method of R package DESeq2, v. 1.30.0. Top 10,000 transcripts with the highest variance across all samples were used for principal component analysis (PCA) using DESeq2. Data transformation and PCA was also done separately for each treatment group. Based on the PCAs, probable outlier samples were excluded and new PCAs were plotted without these samples. The raw transcript-level read count estimates were read in R and summarized to gene-level counts based on the provided transcript and gene ID annotations using summarizeToGene function of R package tximport, v. 1.18.0. DESeqDataSetFromTximport function of DESeq2 was then used for constructing a DESeqDataSet object for DE analysis. Pre-filtering was applied before the DE analysis by excluding genes with < 10 total counts across samples. Subset DE analysis was performed, contrasting Visit time D120 with baseline (BL) and by adjusting for the subject effect. The normalization and DE analysis was done separately for the three different treatment groups. Independent filtering option of DESeq2 was enabled (default), filtering out genes with very low counts and thus unlikely to show significant evidence. R package biomaRt. v. 2.46.0 was used for annotating the results with HGNC gene symbols, gene descriptions and gene biotypes. DESeq2-normalised expression values of all the samples in the given comparison were added to the result tables. Non-adjusted p-value 0.05 was used to filter the results by statistical significance. Results were also generated using DESeq2 function lfcShrink that allows for the shrinkage of the log2 fold change (LFC) estimates toward zero when the information for a gene is low (such as in those cases with low counts or high dispersion values) but has little effect on genes with high counts. The shrinked log2FC values were subsequently used for visualisation and ranking the genes. Muscle mRNA profiles from subjects enrolled in the ATLAS clinical study (NCT03464500)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 25 Comparison of normalizing size factors.
http://guides.library.uq.edu.au/deposit_your_data/terms_and_conditionshttp://guides.library.uq.edu.au/deposit_your_data/terms_and_conditions
Supplementary materials for manuscript to be published:- Complete list of R/Python packages used in analysis- Table S1 – Pearson correlations between isoform expression at 0800 h. C = cytoplasmic, N = nuclear. p < 0.05*; p < 0.01**; p < 0.001***; p < 0.0001 ****.- Table S2 – Pearson correlations between isoform expression at 1200 h. C = cytoplasmic, N = nuclear. p < 0.05*; p < 0.01**; p < 0.001***; p < 0.0001 ****.- Table S3 – RNA-Seq gene expression correlations with nuclear GR isoform expression at 0800 h (Spearman’s, FDR q < 0.05)Table S4 – RNA-Seq gene expression correlations with cytoplasmic GR isoform expression at 0800 h (Spearman’s, FDR q < 0.05)- Figure S1 – Differences in GR isoform expression between groups at 1200 h for cytoplasmic (A) and nuclear (B) locations, assessed using linear regression with adjustment for age, menopausal status and BMI, post-hoc comparisons using Tukey’s test. p < 0.05*, p < 0.01**, p < 0.001 , p < 0.0001*.- Figure S2 – Principle components analysis of isoform expression at 0800h across groups, 89.6% of variance in data explained by PC1 and PC2 – participants cluster together based isoform expression. Note: GRP expression and participants not expressing all other isoforms excluded from analysis as PCA requires a complete dataset.- Figure S3 – Spearman correlations between RNASeq read count and nuclear GR isoform expression displayed in a Venn diagram. Each oval represents the isoform labelled and intersections represent correlations shared by the isoforms included. Counts represent the number (%) of genes correlated with each isoform/combination of isoforms (total = 78).Figure S4 – Spearman correlations between RNASeq read count and cytoplasmic GR isoform expression. UpSet plot showing the number of genes correlated with each isoform (left histogram, Set Size), the combinations of isoforms (grey dots), and the number of genes correlated with that combination (upper histogram, Interaction Size).- Figure S5 – Pathway enrichment analysis of genes correlated with at least one GR isoform (cytoplasmic or nuclear, n = 218; FDR q < 0.05). Nodes represent gene pathways (Gene Ontology: Biological Process or Reactome Pathway) with size based on total number of genes in pathway, edges represent overlap between pathways with size based on number of genes overlap. Full list of genes included in analysis available in supplementary data (Table S3 and S4).- Figure S6 – Pathway enrichment analysis of genes differentially expressed in other groups compared with AI (n = 91; FDR q < 0.05). Nodes represent gene pathways (Gene Ontology: Biological Process or Reactome Pathway) with size and colour intensity based on total number of genes in pathway, edges represent overlap between pathways with size based on number of genes overlap.- Supplementary References list
Pancreatic Adenocarcinoma (PAAD) is the third most common cause of death from cancer, with an overall 5-year survival rate of less than 5%, and is predicted to become the second leading cause of cancer mortality in the United States by 2030. Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and DNA are nucleic acids, and, along with lipids, proteins and carbohydrates, constitute the four major macromolecules essential for all known forms of life.
RNA-Seq (RNA sequencing), is a sequencing technique to detect the quantity of RNA in a biological sample at a given moment. Here we have a dataset of normalized RNA Sequencing reads for pancreatic cancer tumors. The measurement consists of ~20,000 genes for 185 pancreatic cancer tumors. The file format is GCT , a tab-delimited file used for sharing gene expression data and metadata (details for each sample) for samples.
● The R package cmapR can be used for reading GCTs in R. ● The python package cmapPy can be used for reading GCTs in python. ● Phantasus is an open source tool which is used to visualise GCT files, make various plots, apply algorithms like clustering and PCA among others.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5167145%2F0b806e97194db0142fc32c603e2cee96%2Fdownload.jpg?generation=1600082671314888&alt=media" alt="">
Source - Pancreatic cancer survival analysis defines a signature that predicts outcome - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6084949/
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The Toll and IMD signaling pathways represent one of the first lines of innate immune defense in invertebrates like Drosophila. However, for crustaceans like Caligus rogercresseyi, there is very little genomic information and, consequently, understanding of immune mechanisms. Massive sequencing data obtained for three developmental stages of C. rogercresseyi were used to evaluate in silico the expression patterns and presence of SNPs variants in genes involved in the Toll and IMD pathways. Through RNA-seq analysis, which used 20 contigs corresponding to relevant genes of the Toll and IMD pathways, an overexpression of genes linked to the Toll pathway, such as toll3 and Dorsal, were observed in the copepod stage. For the chalimus and adult stages, overexpression of genes in both pathways, such as Akirin and Tollip and IAP and Toll9, respectively, were observed. On the other hand, PCA statistical analysis inferred that in the chalimus and adult stages, the immune response mechanism was more developed, as evidenced by a relation between these two stages and the genes of both pathways. Moreover, 136 SNPs were identified for 20 contigs in genes of the Toll and IMD pathways. This study provides transcriptomic information about the immune response mechanisms of Caligus, thus providing a foundation for the development of new control strategies through blocking the innate immune response.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 6. Supplemental Table 6: List of genes validated by qRT-PCR with primer sequences, results and DEG status in each analysis strategy.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 8 Pair plots of all the pCA (PBMCs) implementations.