Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Warden and Wu Preprint: v1
In general, this primarily focuses on the following types of comparisons:
Differential expression methods include the following:
The most common preprocessing strategies include STAR, TopHat2, and Salmon. However, a limited amount of additional processing with HISAT2, kallisto, Bowtie2 (+eXpress), and Bowtie1 (+RSEM) is also provided.
Most STAR and TopHat2 alignments use htseq-count for quantification, as well as running cuffdiff (for single variable 2-group comparisons). However, a limited amount of additional processing with featureCounts is also provided.
Most STAR and TopHat2 alignments start with the public forward reads, even if paired-end data was available.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EdgeR results from MMGs. Differential expression results calculated by edgeR for MMG counts produced by the stage 2 analysis. Can be downloaded from [43]. (XLSX 428 kb)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset corresponds to GSE152641 — a whole-blood RNA-seq study of COVID-19 patients and healthy controls. OmicsDI +1
It includes expression data processed through edgeR on a Galaxy server — hence the title “COVID-19 DGE (GSE152641) edgeR Galaxy Server”.
The original GSE152641 study profiled peripheral blood from 62 SARS-CoV-2 (COVID-19) patients and 24 healthy controls, for a total of 86 samples. OmicsDI +1
The dataset captures host transcriptomic (gene expression) responses to SARS-CoV-2 infection, enabling analysis of differentially expressed genes (DEGs) in COVID-19 vs healthy individuals. OmicsDI +1
This resource can be used to: identify DEGs, perform immune-cell deconvolution / infiltration analysis, compare COVID-19 transcriptomic signatures with other viral infections, perform downstream pathway analysis, co-expression analysis, or machine learning / biomarker discovery.
Because the original study also compared COVID-19 responses to other viral infections (six viruses: influenza, RSV, HRV, Ebola, Dengue, SARS), the dataset is useful for comparative transcriptomic studies of immune response across infections, though here only the COVID-19 whole-blood data from GSE152641 are included. OmicsDI +1
The data are human (Homo sapiens) whole-blood bulk RNA-seq. OmicsDI +1
The underlying gene expression matrix is a count matrix (digital gene expression), suitable for downstream normalization, differential expression (edgeR, DESeq2, limma-voom, etc.), and other transcriptomics analyses. ffli.dev +1
This dataset enables reproducible computational analyses — for example, detection of DEGs, immune cell composition estimation, pathway enrichment, classifier / signature building for COVID-19.
As such, it can serve as a resource for researchers interested in COVID-19 immunology, biomarker discovery, host response profiling, comparative viral transcriptomics, or meta-analysis with other publicly available datasets.
All required data files (metadata, counts or processed tables as uploaded) are made available to facilitate reanalysis and transparent computational workflows.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EdgeR results from unique counts. Differential expression results calculated by edgeR for gene counts produced by the stage 1 analysis. Can be downloaded from [43]. (XLSX 2159 kb)
Facebook
TwitterFigure S1, Venn diagram showing the number of differentially expressed genes identified by two versions of Cuffdiff2. Figure S2, The effects of biological replicates on the differential expression analysis for Cuffdiff v2.0.2. Figure S3, The detected fold changes of all the differentially expressed genes identified by three tools were compared and shown, including DESeq vs. edgeR (top panel), DESeq vs. Cuffdiff2 (middle panel) and edgeR vs. Cuffdiff2 (bottom panel). File S1, Analysis pipelines, methods and examples of commands for differential expression analysis, subsampling fastq files and generating SAM/BAM files based on simulated count values. File S2, The raw count values for genes with high fold changes were picked up by edgeR but not by DESeq. Genes with high fold changes (the absolute value of log2 fold changes larger than 2) identified as DEGs by edgeR but not by DESeq are listed in the file. The gene ID, the log2 fold changes (logFC) and FDR from DESeq, the logFC and FDR from edgeR, the raw count values for the four replicates of sample K (K1–K4) and sample N (N1–N4) are shown in each of the columns. Table S1, Numbers of reads for the human hbr and uhr samples from the MAQC dataset. Table S2, Numbers of reads for the mouse neurosphere samples for treatment groups of K and N (the K_N dataset). Table S3, The number of reads for each individual sample of the LCL3 dataset. Table S4, The definition for TP, FP, TN, FN, TPR and FPR. Table S5, The false positive rate for Cuffdiff2, DESeq and edgeR based on the LCL1 dataset. (ZIP)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data set 1. Transcript expression across human RNA-Seq samples: estimated read counts. The file contains estimated read counts, generated by kallisto (https://pachterlab.github.io/kallisto/), for human transcripts and RNA-Seq samples used in this study (see Additional file 2 of the accompanying publication). The format is a compressed (GZIP) tab-separated transcript-by-sample matrix. Ensembl transcript identifiers and a combined Sequence Read Archive study/sample name identifier serve as row and column names, respectively. Data set 2. Transcript expression across murine RNA-Seq samples: estimated read counts. As in Data set 1, but for mouse transcripts. Data set 3. Transcript expression across simian RNA-Seq samples: estimated read counts. As in Data set 1, but for chimpanzee transcripts. Data set 4. Transcript expression across across human RNA-Seq samples: estimated transcript abundances. As in Data set 1, but instead of read counts, transcript abundances in transcripts per million (TPM), as estimated by kallisto (https://pachterlab.github.io/kallisto/), are listed. Format, column and row names as in Data set 1. Data set 5. Transcript expression across murine RNA-Seq samples: estimated transcript abundances. As in Data set 4, but for mouse transcripts. Data set 6. Transcript expression across simian RNA-Seq samples: estimated transcript abundances. As in Data set 4, but for chimpanzee transcripts. Data set 7. Differential expression analyses across human RNA-Seq sample groups: log fold changes. The file contains log fold changes, inferred by edgeR (http://bioconductor.org/packages/release/bioc/html/edgeR.html), for human genes and the RNA-Seq sample group contrasts listed in Additional file 3 of the accompanying publication in a compressed (GZIP) TSV gene-by-comparison matrix. Ensembl gene identifiers and a descriptive contrast identifier serve as row and column names, respectively. Data set 8. Differential expression analyses across murine RNA-Seq sample groups: log fold changes. As in Data set 7, but for mouse genes. Data set 9. Differential expression analyses across simian RNA-Seq sample groups: log fold changes. As in Data set 7, but for chimpanzee genes. Data set 10. Differential expression analyses across human RNA-Seq sample groups: false discovery rates. The file contains false discovery rates (FDR) for the differential expression analyses summarized in Data set 7. Format, column and row names as in Data set 7. Data set 11. Differential expression analyses across murine RNA-Seq sample groups: false discovery rates. As in Data set 10, but for mouse genes. Data set 12. Differential expression analyses across simian RNA-Seq sample groups: false discovery rates. As in Data set 10, but for chimpanzee genes. Data set 13. Quantification of alternative splicing events across human RNA-Seq samples. The file contains ‘percent spliced in’ (PSI) values computed by SUPPA (https://github.com/comprna/SUPPA) for annotated alternative splicing events (inferred from the transcript annotation of the human genome, Ensembl release 84; http://www.ensembl.org/). The format is a compressed (GZIP) tab-separated transcript-by-sample matrix. SUPPA-provided event identifiers and a combined Sequence Read Archive study/sample name identifier serve as row and column names, respectively. Data set 14. Quantification of alternative splicing events across murine RNA-Seq samples. As in Data set 13, but for mouse alternative splicing events. Data set 15. Differential splicing analyses across human RNA-Seq sample groups: differences in ‘percent spliced in’ (ΔPSI). The file contains ΔPSI values for human alternative splicing events (as in Data set 13). The RNA-Seq sample group contrasts are listed in Additional file 3 of the accompanying publication. Values were inferred by SUPPA’s diffSplice functionality (https://github.com/comprna/SUPPA). The format is a compressed (GZIP) tab-separated gene-by-comparison matrix. SUPPA event identifiers and a descriptive contrast identifier serve as row and column names, respectively. Data set 16. Differential splicing analyses across murine RNA-Seq sample groups: differences in ‘percent spliced in’ (ΔPSI). As in Data set 15, but for mouse alternative splicing events. Data set 17. Differential splicing analyses across human RNA-Seq sample groups: P values. The file contains P values for the differential splicing analysis of human alternative splicing events summarized in Data set 15. Format, column and row names as in Data set 15. Data set 18. Differential splicing analyses across murine RNA-Seq sample groups: P values. The file contains P values for the differential splicing analysis of mouse alternative splicing events summarized in Data set 16. Format, column and row names as in Data set 15. Data set 19. Transcript expression across murine RNA-Seq time course data: estimated read counts. As in Data set 2, but for the time course data generated for the accompanying publication. Data set 20. Trans
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The study was approved by the Health and Wellness Research Ethics Committee (HW-REC), Cape Peninsula University of Technology (CPUT), and the Stellenbosch University (SU) Health Research Ethics Committee (reference: CPUT/HW-REC 2015/H15; CPUT/HW-REC 2017/H20). Student ethics approval was also granted by the CPUT REC (CPUT/HW-REC 2017/H20). Site approval was provided from the Chief Executive Office and Medical Services Manager/Research Coordinator to conduct research at TBH, Tygerberg, CPT, WC, SA in accordance with the Provincial Policy and TBH Notice No 40/2009 (reference: CPUT/HW-REC 2015/H15). The study was also registered with the WC Government National Health Research Database (reference: 2016RP18364).Study aspects involves (1) medical-files based data to (i) observe epidemiological alignment of inhalation injury with similar clinical settings and other LMICs (comparative tests between parameter subgroups), (ii) determine clinical markers for mortality and the significance of inhalation injury in relation to mortality (using Fisher's Exact test, Spearman and/or Pearson's correlation coefficient and partial least squares regress) and (iii) determine clinical markers for inhalation injury (using Fisher's Exact test, Spearman and/or Pearson's correlation coefficient and partial least squares regress). These were performed on the data set named: 'Demographic, injury, and clinical data of all samples' in Data set 1 or Data set 1.In addition, (2) human whole blood was used for RNA sequencing to determine predictive miRNAs for inhalation injury by using (i) the Illumina platform for RNA sequencing, (ii) sRNA bench and Bowtie for sequence alignment to human genome, (iii) EdgeR and DeSeq2 pipelines for differential expression analysis, and (iv) the Fisher's exact test for comparison between DE miRNAs between mild and severe inhalation injury. The data sets named 'Demographic, injury, clinical, and total RNA-related data of all exemplar samples' and 'Demographic, injury, clinical, and total RNA-related of all exemplar samples that passed QC for Sequencing' were the samples used for RNA sequencing.Finally, (3) DE miRNA meeting the threshold criteria, i.e., overlapped between EdgeR and DeSeq2, fold change ≤1.5 and Padj value
Facebook
TwitterBackgroundPipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances.MethodsFour commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data.ResultsThe overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat’s overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67–0.69) than for the cell line dataset (ρ = 0.87–0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21–0.29 and 0.34–0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results.ConclusionIn conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tables “Genes” (List of genome fragments identified by TopHat2, their genomic location, RPKM values in six samples and annotation), “Cufflinks” (Differentially expressed genes identified by Cufflinks), “EdgeR” (Differentially expressed genes identified by EdgeR), Table “UpRegDEGS_Cufflinks & EdgeR” (Differentially expressed genes identified by Cufflinks and EdgeR, upregulated in BLP line), and “DownRegDEGS_Cufflinks & EdgeR” (Differentially expressed genes identified by Cufflinks and EdgeR, downregulated in BLP line). (XLSX 2557 kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Becker muscular dystrophy (BMD) is a rare X-linked recessive neuromuscular disorder, frequently caused by in-frame deletions in the DMD gene that result in the production of a truncated, yet functional, dystrophin protein. The consequences of BMD-causing in-frame deletions on the organism are difficult to predict, especially in regard to long-term prognosis. Here, we used CRISPR-Cas9 to generate a new Dmd Δ52-55 mouse model by deleting exons 52-55 in the Dmd gene, resulting in a BMD-like in-frame deletion. To delineate the long-term effects of this deletion, we studied these mice over 52 weeks by performing histology and echocardiography analyses and assessing motor functions. To further delineate the effects of the exons 52-55 in-frame deletion, we performed RNA-Seq pre- and post-exercise and identified several differentially expressed pathways that could explain the abnormal muscle phenotype observed at 52 weeks in the BMD model.
This dataset shows the results and raw data of the RNA-sequencing and transcriptomic analysis for 52-week-old exercised and non-exercised mice (4 BMD, 4 WT and 4 DMD, as mentioned on the names of each file).
Due to size restrictions, this RNA-Seq dataset will be published on Zenodo in 3 parts. This third part contains the data for the non-exercised mice, including the fastq (R1 and R2) that were extracted from alignment index files (bam - see below), and the differentially expressed genes (tsv files). Fastq files were extracted by our team from the alignment indexes (bam) files, as follows:
1. Starting with the original file (Number.Aligned.sortedByCoord.out.bam), using samtools, we sorted by name:
samtools sort -n Number.Aligned.sortedByCoord.out.bam -o Number.Aligned.namesorted.bam
2. We extracted the paired reads into 2 separate files for R1 and R2, and any singleton or orphaned reads into additional RS and R0 files, respectively (many of the RS and R0 files were empty and not added here due to size constraints):
samtools fastq -1 Number_R1.fastq -2 Number_R2.fastq -0 Number_R0.fastq -s Number_RS.fastq
3. We compressed all of the files into ‘.gz’ extension using gzip:
gzip -9 Number_R1.fastq
.bam and RS/R0 files were not added due to size constraints but were available upon request.
Upstream workflow performed by TCAG (SickKids):
2. RNA-Seq Library and Reference Genome Information
Type of library: stranded, paired end
Genome reference sequence: GRCm39, M31 Gencode gene models.
3. Read Pre-processing, Alignment and Obtaining Gene Counts
3.1 Read Pre-processing
The sequencing data is in FASTQ format. The quality of the data is assessed using FastQC v.0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
Adaptors are trimmed using Trim Galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) v. 0.5.0. Trim Galore is running Cutadapt (https://cutadapt.readthedocs.org/en/stable/) v. 1.10. Trim Galore is run with the following parameters:
-q 25 – the reads are trimmed from the 3' end base by base, trimming stops if the quality of the base is greater than 25;
--clip_R1 6, --clip_R2 6 – clip the first 6 nucleotides from the 5' ends of read 1 and read 2;
--stringency 5 – at least 5 nucleotides overlap with the Illumina primer sequence are needed for trimming;
--length 40 – any read that is shorter than 40 nucleotides as a result of trimming is discarded;
--paired – only pairs of reads are retained (for paired-end reads only, not for single reads).
The type of adaptor is automatically detected by screening the first 1 million sequences of the first specified file for the first 12/13 nucleotides of the standard Illumina or Nextera primers and the sequence from the start of the primer to the 3' end of the read is trimmed.
The quality of the trimmed reads is re-assessed with FastQC.
The trimmed reads are also screened for presence of rRNA and mtRNA sequences using FastQ-Screen v.0.10.0 (http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/).
To assess the read distribution, positional read duplication and to confirm the strandedness of the alignments we use the RSeQC package (http://rseqc.sourceforge.net/), v. 2.6.2. The distribution of reads across exonic, intronic and intergenic sequences is assessed by the read_distribution.py program, infer_experiment.py is used for confirming strandedness, and read_duplication.py is used to obtain the positional read duplication (percentage of reads mapping to exactly the same genomic location). Sufficient proportion of reads should map to the exonic sequences (ideally > 70-80%). Large amounts of reads mapping to intronic sequences in a poly-A mRNA library will suggest significant presence of pre-mRNA or other issues with RNA preparation. For stranded RNA-seq experiments the majority of the reads should map exclusively to one strand, same or opposite to the transcript, depending on the library preparation method. For non-stranded experiments the reads should be equally distributed to both strands.
3.2. Read Alignment
The raw trimmed reads are aligned to the reference genome using the STAR aligner, v.2.6.0c. (https://github.com/alexdobin/STAR, https://academic.oup.com/bioinformatics/article/29/1/15/272537). The alignments are contained in the .bam files. The “.bam” together with the “.bai” files can be used for viewing of the alignments in the Integrative Genomics Viewer (IGV, http://software.broadinstitute.org/software/igv/).
3.3. Obtaining Gene Counts
The filtered STAR alignments are processed to extract raw read counts for genes using htseq-count v.0.6.1p2 (HTSeq, http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html). Assigning reads to genes by htseq-count is done in the mode “intersection_nonempty”, i.e. if a read overlaps with two overlapping genes and the overlap to gene A is greater than the overlap to gene B, the read is counted towards gene A, while if a read overlaps equally with gene A and gene B, then it is not counted towards either gene. Htseq_count does not count reads with multiple alignments to avoid introducing bias in the expression results. Only uniquely mapping reads are counted.
4. Pre-processing, Alignment and Gene Counts QC
MultiQC (https://multiqc.info/) is a reporting tool that aggregates statistics generated by bioinformatics analyses across multiple samples. MultiQC v. 1.14 was used to generate a consolidated report from FastQC screening of both untrimmed and trimmed reads, and from RSeQC, FastQ Screen, STAR and htseq-count results. The MultiQC report is contained in MultiQC_Report_*.html file.
5. DGE Analysis with edgeR
Differential expression was done with the edgeR R package v.3.28.1, using R v.3.6.1 (http://www.bioconductor.org/packages/release/bioc/html/edgeR.html). The data set was filtered to retain only genes whose gene counts were >50 in at least 3 samples. This is intended to remove genes that are not expressed, or expressed at a very low level.
The method used for normalizing the data was TMM, implemented by the calcNormFactors(y) function. All samples were normalized and filtered together. The glmLRT functionality in edgeR was used for the differential expression tests, with sample group taken into account.
EdgeR Results Legend:
· GeneID – Ensembl Gene ID;
· Chr.Start.End - gene coordinates;
· GeneName, GeneType, etc. – Gene attributes, derived from the genome annotation;
· logFC - Log2 Fold Change (use this column for selection of DEGs);
· logCPM - Log2 Counts Per Million, average for all libraries;
· LR – Statistic calculated by the LR-Test;
· PValue - Differential expression P value;
· FDR – Differential expression False Discovery Rate, calculated by the Benjamini-Hochberg method (use this column for selection of DEGs);
· (columns labeled with sample names) – Fragments Per Kilobase of transcript per Million mapped reads (FPKMs) for the given samples.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Conventional (bulk) RNA-sequencing was performed on unfractionated cell suspension or snap frozen whole tissue material. Total RNA was isolated with TRIzol reagent followed by purification over PureLink RNA Mini Kit columns (Invitrogen). RNA-seq was performed using a polyA-enriched strand-specific library construction protocol (doi: 10.1016/j.ccell.2016.02.009) and paired-end 75bp sequencing on an Illumina HiSeq 2500 instrument.
Raw reads were aligned to the reference human genome assembly GRCh37 (hg19) using STAR (v2.5.2.a). To improve spliced alignment, STAR was provided with exon junction coordinates from the reference annotations (Gencode v19). We applied a modified version of a bioinformatics workflow for normalization of raw read counts and differential gene expression analysis (doi: 10.12688/f1000research.9005.3). Gene-level read counts were quantified using HTSEQ-count (v0.11.0; intersection-strict, reverse mode) (doi: 10.1093/bioinformatics/btu638). Genes showing low read counts (i.e., genes not showing counts per million (cpm) > 1.0 in at least 10% of samples) were removed from further analysis. Raw counts from expressed genes were then TMM-normalized and scaled to counts per million (CPM) using the edgeR (v3.22.2) package (doi: 10.1093/bioinformatics/btp616).
Sample IDs correspond to those referenced in Wang X et al, Nature Communications (2022).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Each zipped folder contains results files from reanalysis of public data in our publication, "mirrorCheck: an R package facilitating informed use of DESeq2’s lfcShrink() function for differential gene expression analysis of clinical samples" (see also the Collection description).These files were produced by rendering the Quarto documents provided in the supplementary data with the publication (one per dataset). The Quarto codes for the 3 main analyses (COVID, BRCA and Cell line datasets) performed differential gene expression (DGE) analysis using both DESeq2 with lfcShrink() via our R package mirrorCheck, and also edgeR. Each zipped folder here contains 2 folders, one for each DGE analysis. Since DESeq2 was run on data without prior data cleaning, with prefiltering or after Surrogate Variable Analysis, the 'mirrorCheck output' folders themselves contain 3 sub-folders titled 'DESeq_noclean', 'DESeq_prefilt' and 'DESeq_sva". The COVID dataset also has a folder with results from Gene Set Enrichment Analysis. Finally, the fourth folder contains results from a tutorial/vignette-style supplementary file using the Bioconductor "parathyroidSE" dataset. This analysis only utilised DESeq2, with both data cleaning methods and testing two different design formulae, resulting in 5 sub-folders in the zipped folder.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Tanycytes around the third ventricle and the adjacent mediobasal hypothalamus are crucial components to trigger photoperiodic responses in breeding and metabolism. In mammals, tanycytes are known to regulate hypothalamic thyroid hormone conversion, a process which is linked to seasonal reproduction. They are further involved in retinoic acid signalling, neurogenesis, and nutritional gatekeeping, all of which have been linked to the photoperiodic regulation of metabolism. The region is neuroanatomically conserved between mammals and birds but, apart from the hypothalamic thyroid hormone conversion, little is known about the functional roles of tanycytes in birds. We, hence, aimed to give a comprehensive characterisation of gene expressions in avian tanycytes and surrounding cells under different photoperiodic reproductive and metabolic states. For this purpose, we used the Svalbard ptarmigan (Lagopus muta hyperborea), a high-Arctic bird species which shows pronounced seasonal rhythms in breeding and body mass. We applied a simple photoperiodic extension protocol to short-day adapted birds to trigger a long-day response which is marked by initiation of breeding and loss in body mass. After several weeks under a long photoperiod, the innate development of photorefractoriness led to a reversal to the short-day phenotype marked by the termination of breeding and gain in body mass. We sampled birds at different seasonal states and used laser-capturing and RNAseq to correlate the seasonal phenotype with the gene expressions in tanycytes and the surrounding area. The here contained dataset includes behavioural data, gene expression data (raw and cpm) and results from statistical analyses and GO enrichment analyses. Furthermore the EdgeR RNA seq script is attached.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Output data from RNA-Seq analyses of various datasets used in the study "SMG1:SMG8:SMG9-complex integrity maintains robustness of nonsense-mediated mRNA decay".
Data is organized by zip folders for each individual RNA-Seq dataset, please see SMG189_datasets.csv or SMG189_samples.csv for dataset/sample metadata. Each zip folder contains the output of Salmon transcript quantification, DESeq2 DGE and edgeR DTE analyses, Log files, quality control analyses and metadata files (design.txt and experiment.txt).
Combined with the scripts and addititional information found at https://github.com/boehmv/2024_SMG189 these data should allow reproducing the analysis and plots in the manuscript.
Additional helper files (e.g. annotation) used in the analyses and too big for GitHub are provided here as well. E.g. 2024-10-28_SMG189_datasources.rds contains the DESeq2 and edgeR output ready for import via R.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Transcriptomes were assembled de novo from pools of adult aphids that were feeding on sorghum and switchgrass. Reads from all replicates were pooled, normalized in silico to 25X coverage, and assembled using Trinity. Only the most abundant isoform for each unigene was retained for annotation and unigenes with transcripts per million mapped reads (TPM) less than 0.5 were removed from the dataset. The remaining unigenes were annotated using Trinotate with BLASTP comparisons against the Swiss-Prot/UniProt database. In addition, Pfam-A assignments were computed using hmmer, signal peptide predictions were performed using SignalP, and transmembrane domain predictions were performed using tmHMM. Gene ontology (GO assignments) were retrieved from Trinotate using the highest scoring BLASTp matches as queries. [Note: Supplemental files 1-6 added 2/5/2019] Resources in this dataset:Resource Title: Trinotate annotations for unigenes assembled from Schizaphis graminum (greenbug). File Name: GB_annotation_trinotatefinal.xlsResource Description: ::Note:: Data file is large and may take time to load. Resource Software Recommended: Excel,url: https://products.office.com/en-us/excel Resource Title: Trinotate annotations of unigenes assembled from Sipha flava (yellow sugarcane aphid). File Name: YSA_annotation_trinotatefinal.xlsResource Description: ::Note:: Data file is large and may take time to load. Resource Software Recommended: Excel,url: https://products.office.com/en-us/excel Resource Title: Supplemental Data 1: Differential expression analysis of starvation and BCK60 sorghum treatments in S. graminum at 12 and 24 hours. File Name: Supplemental Data 1.xlsxResource Description: Reads were mapped back to the transcriptome assembly using bowtie2 and RSEM and differential expression analysis was performed using edgeR as described in the Materials and Methods. Unigenes with log fold-change values > 0 are upregulated in the BCK60 treatment relative to the starvation treatment while unigenes with log fold-change values < 0 are downregulated in this comparison. Log fold-change and FDR corrected-p-value thresholds were set at 0.25 and 0.05, respectively. Tab labeled 12 represents 12 hr post infestation and tab labeled 24 represents 24 hr post infestation. Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Supplemental Data 2. Differential expression analysis of starvation and BCK60 sorghum treatments in S. flava at 12 and 24 hours. File Name: Supplemental Data 2.xlsxResource Description: Reads were mapped back to the transcriptome assembly using bowtie2 and RSEM and differential expression analysis was performed using edgeR as described in the Materials and Methods. Unigenes with log fold-change values < 0 are downregulated in the BCK60 treatment relative to the starvation treatment while unigenes with log fold-change values < 0 are downregulated in this comparison. Log fold-change and FDR corrected-p-value thresholds were set at 0.25 and 0.05, respectively. Tab labeled 12 represents 12 hr post infestation and tab labeled 24 represents 24 hr post infestation. Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Supplemental Data 3. Differential expression analysis of Summer switchgrass and BCK60 sorghum treatments in S. graminum at 12 and 24 hours. File Name: Supplemental Data 3.xlsxResource Description: Reads were mapped back to the transcriptome assembly using bowtie2 and RSEM and differential expression analysis was performed using edgeR as described in the Materials and Methods. Unigenes with log fold-change values > 0 are upregulated in the Summer treatment relative to the BCK60 treatment while unigenes with log fold-change values < 0 are downregulated in this comparison. Log fold-change and FDR corrected-p-value thresholds were set at 0.25 and 0.05, respectively. Tab labeled 12 represents 12 hr post infestation and tab labeled 24 represents 24 hr post infestation. Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Supplemental Data 4. Differential expression analysis of Summer switchgrass and BCK60 sorghum treatments in S. flava at 12 and 24 hours. File Name: Supplemental Data 4.xlsxResource Description: Reads were mapped back to the transcriptome assembly using bowtie2 and RSEM and differential expression analysis was performed using edgeR as described in the Materials and Methods. Unigenes with log fold-change values > 0 are upregulated in the Summer treatment relative to the BCK60 treatment while unigenes with log fold-change values < 0 are downregulated in this comparison. Log fold-change and FDR corrected-p-value thresholds were set at 0.25 and 0.05, respectively. Tab labeled 12 represents 12 hr post infestation and tab labeled 24 represents 24 hr post infestation. Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Supplemental Data 5. Differential expression analysis of Kanlow switchgrass and BCK60 sorghum treatments in S. graminum t 12 and 24 hours. File Name: Supplemental Data 5.xlsxResource Description: Reads were mapped back to the transcriptome assembly using bowtie2 and RSEM and differential expression analysis was performed using edgeR as described in the Materials and Methods. Unigenes with log fold-change values > 0 are upregulated in the Kanlow treatment relative to the BCK60 treatment while unigenes with log fold-change values < 0 are downregulated in this comparison. Log fold-change and FDR corrected-p-value thresholds were set at 0.25 and 0.05, respectively. Tab labeled 12 represents 12 hr post infestation and tab labeled 24 represents 24 hr post infestation. Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Supplemental Data 6. Differential expression analysis of Kanlow switchgrass and BCK60 sorghum treatments in S. flava t 12 and 24 hours. File Name: Supplemental Data 6.xlsxResource Description: Reads were mapped back to the transcriptome assembly using bowtie2 and RSEM and differential expression analysis was performed using edgeR as described in the Materials and Methods. Unigenes with log fold-change values > 0 are upregulated in the Kanlow treatment relative to the BCK60 treatment while unigenes with log fold-change values < 0 are downregulated in this comparison. Log fold-change and FDR corrected-p-value thresholds were set at 0.25 and 0.05, respectively. Tab labeled 12 represents 12 hr post infestation and tab labeled 24 represents 24 hr post infestation. Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
Facebook
TwitterAcidiphilium sp. C61 cultures were cultivated in APPW+YE+Glucose medium with 0 µM or 10 µM PEA. RNA was extracted and library preparation was done using the NEBNext Ultra II directional RNA library prep kit for Illumina. Data was demultiplied by GATC sequencing company and adaptor was trimmed by Trimgalore. After trimming, data was processed quality control by sickle and mRNA was sorted by SortmeRNA. mRNA transcripts were mapped to the assembled genome of Acidiphilium sp. C61 and read counts table was produced by featurecounts. Differential gene expression analysis was done by edgeR package.
Facebook
TwitterThis dataset comprises mRNA that was extracted from Laternula elliptica developmental stages (blastula to juvenile) and sequenced (n=3 pools of 200 individual per stage). The resulting sequence data was analysed and the following results files and analysis scripts are available here: Results files from differential gene expression analysis in edgeR (directory = edgeR_DE), results files from WGCNA analysis (directory = WGCNA). Data collection was carried out over Hangar Cove Rothera Point, Adelaide Island, in Ryder Bay, from 2018-04-25 to 2018-09-25 by researchers with the British Antarctic Survey. The data was collected as part of research on the developmental biology of molluscs. This work was supported by UKRI Natural Environment Research Council (NERC) Core Funding to the British Antarctic Survey, a DTG Studentship (Project Reference: NE/J500173/1) and a Junior Research Fellowship to VAS from Wolfson College, University of Cambridge.
Facebook
TwitterSee material & methods section differential expression analysis of RNAseq data of DOI: 10.7554/eLife.61630
We used the R package edgeR (Robinson, McCarthy, & Smyth, 2010) to compare RNA sequencing profiles between FLCNPOS and FLCNNEG replicates, as well as between TP53POS and TP53NEG. This involved reading in the gene-level counts, computing library size normalizing factors using the trimmed-mean of M-values (TMM) method and then fitting a model to estimate the group effect. Obtained p-values were corrected for multiple testing using the Benjamini-Hochberg false discovery rate (FDR) step-up procedure (Benjamini & Hochberg, 1995).
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
We have developed GmPcides from a peptidomimetic dihydrothiazolo ring-fused 2-pyridone scaffold that have antimicrobial activities against a broad-spectrum of Gram-positive pathogens. Here we examine the treatment efficacy of GmPcides using skin and soft tissue infection (SSTI) and biofilm formation models by Streptococcus pyogenes. Screening our compound library for minimal inhibitory (MIC) and minimal bactericidal (MBC) concentrations identified GmPcide PS757 as highly active against S. pyogenes . Treatment of S. pyogenes biofilm with PS757 revealed robust efficacy against all phases of biofilm formation by preventing initial biofilm development, ceasing biofilm maturation and eradicating mature biofilm. In a murine model of S. pyogenes SSTI, subcutaneous delivery of PS757 resulted in reduced levels of tissue damage, decreased bacterial burdens and accelerated rates of wound-healing, which were associated with down-regulation of key virulence factors, including M protein and the SpeB cysteine protease. These data demonstrate that GmPcides show considerable promise for treating S. pyogenes infections. Methods RNA Sequencing. Microplate (96-well) culture in C medium was conducted as described above with the addition of 0.4 µM PS757 or vehicle (DMSO). At 24 hrs, multiple wells were harvested and pooled for further processing, with the experiment repeated in triplicate. Extraction of RNA utilized the Direct-zol RNA Miniprep Plus Kit (Zymo Research, R2072) with the quality of the purified RNA determined by spectroscopy (NanoDrop 2000, Thermo Fisher). Libraries for Illumina sequencing were prepared using the FastSelect RNA kit (Qiagen, 334222), according to the manufacture’s protocol and sequences determined using an Illumina NovaSeq 6000. Basecalls and demultiplexing were performed with Illumina’s bcl2fastq software and a custom python demultiplexing program with a maximum of one mismatch in the indexing read. RNA-seq reads were then aligned to the Ensembl release 101 primary assembly with STAR version 2.7.9a (1). Gene counts were derived from the number of uniquely aligned unambiguous reads by Subread:featureCount version 2.0.3 (2). Isoform expression of known Ensembl transcripts were quantified with Salmon version 1.5.2 (3) and assessed for the total number of aligned reads, total number of uniquely aligned reads, and features detected. The ribosomal fraction, known junction saturation, and read distribution over known gene models were quantified with RSeQC version 4.0 (4). Comparative Transcriptomic Analysis. All gene counts obtained from RNA-seq were then imported into the R/Bioconductor package EdgeR (5) and TMM normalization size factors calculated to adjust for differences in library size. Ribosomal genes and genes not expressed in the smallest group size minus one sample greater than one count-per-million were excluded from further analysis. The TMM size factors and the matrix of counts were then imported into the R/Bioconductor package Limma (6). Weighted likelihoods based on the observed mean-variance relationship of every gene and sample were calculated for all samples and the count matrix transformed to moderated log2-counts-per-million with Limma’s voomWithQualityWeights (7). The performance of all genes was assessed with plots of the residual standard deviation of every gene to their average log-count with a robustly fitted trend line of the residuals. Differential expression analysis was then performed to analyze for differences between conditions with results filtered for only those genes with Benjamini-Hochberg false-discovery rate adjusted p-values less than or equal to 0.05. A principal component analysis (PCA) was performed on differential expression data to distinguish differences between conditions (8). To find the significantly regulated genes, the Limma voomWithQualityWeights transformed log2-counts-per-million expression data was then analyzed via weighted gene correlation network analysis with the R/Bioconductor package WGCNA (9). Briefly, all genes were correlated across each other by Pearson correlations and clustered by expression similarity into unsigned modules using a power threshold empirically determined from the data. An eigengene was then created for each de novo cluster and its expression profile was then correlated across all coefficients of the model matrix. Because these clusters of genes were created by expression profile rather than known functional similarity, the clustered modules were given the names of random colors where grey is the only module that has any pre-existing definition of containing genes that do not cluster well with others. The information for all clustered genes for each module were then combined with their respective statistical significance results from Limma to determine whether or not those features were also found to be significantly differentially expressed. References 1. A. Dobin, C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, T. R. Gingeras, STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15-21 (2013). 2. Y. Liao, G. K. Smyth, W. Shi, featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923-930 (2014). 3. R. Patro, G. Duggal, M. I. Love, R. A. Irizarry, C. Kingsford, Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14, 417-419 (2017). 4. L. Wang, S. Wang, W. Li, RSeQC: quality control of RNA-seq experiments. Bioinformatics 28, 2184-2185 (2012). 5. M. D. Robinson, D. J. McCarthy, G. K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140 (2010). 6. M. E. Ritchie, B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi, G. K. Smyth, limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43, e47 (2015). 7. R. Liu, A. Z. Holik, S. Su, N. Jansz, K. Chen, H. S. Leong, M. E. Blewitt, M. L. Asselin-Labat, G. K. Smyth, M. E. Ritchie, Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. Nucleic Acids Res 43, e97 (2015). 8. Z. Zou, R. F. Potter, W. H. t. McCoy, J. A. Wildenthal, G. L. Katumba, P. J. Mucha, G. Dantas, J. P. Henderson, E. coli catheter-associated urinary tract infections are associated with distinctive virulence and biofilm gene determinants. JCI Insight 8, (2023). 9. P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Facebook
TwitterTranscriptome Shotgun Sequencing (RNA-seq) has been readily embraced by geneticists and molecular ecologists alike. As with all high-throughput technologies, it is critical to understand which analytic strategies are best suited and which parameters may bias the interpretation of the data. Here we use a comprehensive simulation approach to explore how various features of the transcriptome (complexity, degree of polymorphism π, alternative splicing), technological processing (sequencing error ε, library normalization) and bioinformatic workflow (de novo vs. mapping assembly, reference genome quality) impact transcriptome quality and inference of differential gene expression (DE). We find that transcriptome assembly and gene expression profiling (edgeR vs. baySeq software) works well even in the absence of a reference genome, and is robust across a broad range of parameters. We advise against library normalization, and in most situations advocate mapping assemblies to an annotated genome ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Warden and Wu Preprint: v1
In general, this primarily focuses on the following types of comparisons:
Differential expression methods include the following:
The most common preprocessing strategies include STAR, TopHat2, and Salmon. However, a limited amount of additional processing with HISAT2, kallisto, Bowtie2 (+eXpress), and Bowtie1 (+RSEM) is also provided.
Most STAR and TopHat2 alignments use htseq-count for quantification, as well as running cuffdiff (for single variable 2-group comparisons). However, a limited amount of additional processing with featureCounts is also provided.
Most STAR and TopHat2 alignments start with the public forward reads, even if paired-end data was available.