Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set provides data files and R code to accompany the article Differential methylation analysis of reduced representation bisulfite sequencing experiments using edgeR published by F1000Research.
The data consists of Reduced Representation BS-seq methylation profiles of epithelial populations from the mouse mammary gland, with n=2 biological replicates for each of three cell populations.
RNA-seq expression profiles of luminal and basal mammary epithelial populations are also provided.
The R code undertakes an differential methylation analysis of the BS-seq profiles and demonstrates a strong negative correlation between the differential methylation and differential expression results.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Differentially expressed genes in the synovial tissue from FVIII-KO mice after hemarthrosis and FVIII treatment. Synovial tissue was harvested at baseline, day 3 and day 14 post-injury. RNA was purified and analyzed by RNA sequencing using an Illumina NextSeq500 platform (75 bp; single‐end). The R BioConductor packages tximport, edgeR, and limma were used to read estimate counts from RSEM, trimmed mean of M-values (TMM) normalization was applied, and the limma-voom method was used for differential expression analyses (criteria: adjusted p-value
Facebook
TwitterFigure S1, Venn diagram showing the number of differentially expressed genes identified by two versions of Cuffdiff2. Figure S2, The effects of biological replicates on the differential expression analysis for Cuffdiff v2.0.2. Figure S3, The detected fold changes of all the differentially expressed genes identified by three tools were compared and shown, including DESeq vs. edgeR (top panel), DESeq vs. Cuffdiff2 (middle panel) and edgeR vs. Cuffdiff2 (bottom panel). File S1, Analysis pipelines, methods and examples of commands for differential expression analysis, subsampling fastq files and generating SAM/BAM files based on simulated count values. File S2, The raw count values for genes with high fold changes were picked up by edgeR but not by DESeq. Genes with high fold changes (the absolute value of log2 fold changes larger than 2) identified as DEGs by edgeR but not by DESeq are listed in the file. The gene ID, the log2 fold changes (logFC) and FDR from DESeq, the logFC and FDR from edgeR, the raw count values for the four replicates of sample K (K1–K4) and sample N (N1–N4) are shown in each of the columns. Table S1, Numbers of reads for the human hbr and uhr samples from the MAQC dataset. Table S2, Numbers of reads for the mouse neurosphere samples for treatment groups of K and N (the K_N dataset). Table S3, The number of reads for each individual sample of the LCL3 dataset. Table S4, The definition for TP, FP, TN, FN, TPR and FPR. Table S5, The false positive rate for Cuffdiff2, DESeq and edgeR based on the LCL1 dataset. (ZIP)
Facebook
TwitterTriops newberryi transcriptome annotationsAn excel file containing all of the functional characterizations from the different annotation databases including, Swiss-Prot, Pfam-A, TIGRFAM, SUPERFAMILY.Horn_etal_Tnewberryi Transcriptome Annotations.xlsxedgeR R-scriptAn R file including the code used to run the differential expression analysis in the package edgeR.edgR_RScript.RTriops newberry transcriptome read countsAn excel file containing the raw read counts for the contigs identified by the annotations as coding sequences. These read counts are the input for the edgeR package to perform differential gene expression analysis.Tnewberry Transcriptome Cds Read Counts.xlsxTriops newberryi CateGOrizer resultsAn excel file containing the results from the program CateGOrizer. The file indicates the GO class id, term and how many contigs were classified under a specific GO term. This raw data was used to make Figure 1 in the Horn et al. manuscript.Triops newberryi CateGOrizer Results.xlsx
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Becker muscular dystrophy (BMD) is a rare X-linked recessive neuromuscular disorder, frequently caused by in-frame deletions in the DMD gene that result in the production of a truncated, yet functional, dystrophin protein. The consequences of BMD-causing in-frame deletions on the organism are difficult to predict, especially in regard to long-term prognosis. Here, we used CRISPR-Cas9 to generate a new Dmd Δ52-55 mouse model by deleting exons 52-55 in the Dmd gene, resulting in a BMD-like in-frame deletion. To delineate the long-term effects of this deletion, we studied these mice over 52 weeks by performing histology and echocardiography analyses and assessing motor functions. To further delineate the effects of the exons 52-55 in-frame deletion, we performed RNA-Seq pre- and post-exercise and identified several differentially expressed pathways that could explain the abnormal muscle phenotype observed at 52 weeks in the BMD model.
This dataset shows the results and raw data of the RNA-sequencing and transcriptomic analysis for 52-week-old exercised and non-exercised mice (4 BMD, 4 WT and 4 DMD, as mentioned on the names of each file).
Due to size restrictions, this RNA-Seq dataset will be published on Zenodo in 3 parts. This third part contains the data for the non-exercised mice, including the fastq (R1 and R2) that were extracted from alignment index files (bam - see below), and the differentially expressed genes (tsv files). Fastq files were extracted by our team from the alignment indexes (bam) files, as follows:
1. Starting with the original file (Number.Aligned.sortedByCoord.out.bam), using samtools, we sorted by name:
samtools sort -n Number.Aligned.sortedByCoord.out.bam -o Number.Aligned.namesorted.bam
2. We extracted the paired reads into 2 separate files for R1 and R2, and any singleton or orphaned reads into additional RS and R0 files, respectively (many of the RS and R0 files were empty and not added here due to size constraints):
samtools fastq -1 Number_R1.fastq -2 Number_R2.fastq -0 Number_R0.fastq -s Number_RS.fastq
3. We compressed all of the files into ‘.gz’ extension using gzip:
gzip -9 Number_R1.fastq
.bam and RS/R0 files were not added due to size constraints but were available upon request.
Upstream workflow performed by TCAG (SickKids):
2. RNA-Seq Library and Reference Genome Information
Type of library: stranded, paired end
Genome reference sequence: GRCm39, M31 Gencode gene models.
3. Read Pre-processing, Alignment and Obtaining Gene Counts
3.1 Read Pre-processing
The sequencing data is in FASTQ format. The quality of the data is assessed using FastQC v.0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
Adaptors are trimmed using Trim Galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) v. 0.5.0. Trim Galore is running Cutadapt (https://cutadapt.readthedocs.org/en/stable/) v. 1.10. Trim Galore is run with the following parameters:
-q 25 – the reads are trimmed from the 3' end base by base, trimming stops if the quality of the base is greater than 25;
--clip_R1 6, --clip_R2 6 – clip the first 6 nucleotides from the 5' ends of read 1 and read 2;
--stringency 5 – at least 5 nucleotides overlap with the Illumina primer sequence are needed for trimming;
--length 40 – any read that is shorter than 40 nucleotides as a result of trimming is discarded;
--paired – only pairs of reads are retained (for paired-end reads only, not for single reads).
The type of adaptor is automatically detected by screening the first 1 million sequences of the first specified file for the first 12/13 nucleotides of the standard Illumina or Nextera primers and the sequence from the start of the primer to the 3' end of the read is trimmed.
The quality of the trimmed reads is re-assessed with FastQC.
The trimmed reads are also screened for presence of rRNA and mtRNA sequences using FastQ-Screen v.0.10.0 (http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/).
To assess the read distribution, positional read duplication and to confirm the strandedness of the alignments we use the RSeQC package (http://rseqc.sourceforge.net/), v. 2.6.2. The distribution of reads across exonic, intronic and intergenic sequences is assessed by the read_distribution.py program, infer_experiment.py is used for confirming strandedness, and read_duplication.py is used to obtain the positional read duplication (percentage of reads mapping to exactly the same genomic location). Sufficient proportion of reads should map to the exonic sequences (ideally > 70-80%). Large amounts of reads mapping to intronic sequences in a poly-A mRNA library will suggest significant presence of pre-mRNA or other issues with RNA preparation. For stranded RNA-seq experiments the majority of the reads should map exclusively to one strand, same or opposite to the transcript, depending on the library preparation method. For non-stranded experiments the reads should be equally distributed to both strands.
3.2. Read Alignment
The raw trimmed reads are aligned to the reference genome using the STAR aligner, v.2.6.0c. (https://github.com/alexdobin/STAR, https://academic.oup.com/bioinformatics/article/29/1/15/272537). The alignments are contained in the .bam files. The “.bam” together with the “.bai” files can be used for viewing of the alignments in the Integrative Genomics Viewer (IGV, http://software.broadinstitute.org/software/igv/).
3.3. Obtaining Gene Counts
The filtered STAR alignments are processed to extract raw read counts for genes using htseq-count v.0.6.1p2 (HTSeq, http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html). Assigning reads to genes by htseq-count is done in the mode “intersection_nonempty”, i.e. if a read overlaps with two overlapping genes and the overlap to gene A is greater than the overlap to gene B, the read is counted towards gene A, while if a read overlaps equally with gene A and gene B, then it is not counted towards either gene. Htseq_count does not count reads with multiple alignments to avoid introducing bias in the expression results. Only uniquely mapping reads are counted.
4. Pre-processing, Alignment and Gene Counts QC
MultiQC (https://multiqc.info/) is a reporting tool that aggregates statistics generated by bioinformatics analyses across multiple samples. MultiQC v. 1.14 was used to generate a consolidated report from FastQC screening of both untrimmed and trimmed reads, and from RSeQC, FastQ Screen, STAR and htseq-count results. The MultiQC report is contained in MultiQC_Report_*.html file.
5. DGE Analysis with edgeR
Differential expression was done with the edgeR R package v.3.28.1, using R v.3.6.1 (http://www.bioconductor.org/packages/release/bioc/html/edgeR.html). The data set was filtered to retain only genes whose gene counts were >50 in at least 3 samples. This is intended to remove genes that are not expressed, or expressed at a very low level.
The method used for normalizing the data was TMM, implemented by the calcNormFactors(y) function. All samples were normalized and filtered together. The glmLRT functionality in edgeR was used for the differential expression tests, with sample group taken into account.
EdgeR Results Legend:
· GeneID – Ensembl Gene ID;
· Chr.Start.End - gene coordinates;
· GeneName, GeneType, etc. – Gene attributes, derived from the genome annotation;
· logFC - Log2 Fold Change (use this column for selection of DEGs);
· logCPM - Log2 Counts Per Million, average for all libraries;
· LR – Statistic calculated by the LR-Test;
· PValue - Differential expression P value;
· FDR – Differential expression False Discovery Rate, calculated by the Benjamini-Hochberg method (use this column for selection of DEGs);
· (columns labeled with sample names) – Fragments Per Kilobase of transcript per Million mapped reads (FPKMs) for the given samples.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
We have developed GmPcides from a peptidomimetic dihydrothiazolo ring-fused 2-pyridone scaffold that have antimicrobial activities against a broad-spectrum of Gram-positive pathogens. Here we examine the treatment efficacy of GmPcides using skin and soft tissue infection (SSTI) and biofilm formation models by Streptococcus pyogenes. Screening our compound library for minimal inhibitory (MIC) and minimal bactericidal (MBC) concentrations identified GmPcide PS757 as highly active against S. pyogenes . Treatment of S. pyogenes biofilm with PS757 revealed robust efficacy against all phases of biofilm formation by preventing initial biofilm development, ceasing biofilm maturation and eradicating mature biofilm. In a murine model of S. pyogenes SSTI, subcutaneous delivery of PS757 resulted in reduced levels of tissue damage, decreased bacterial burdens and accelerated rates of wound-healing, which were associated with down-regulation of key virulence factors, including M protein and the SpeB cysteine protease. These data demonstrate that GmPcides show considerable promise for treating S. pyogenes infections. Methods RNA Sequencing. Microplate (96-well) culture in C medium was conducted as described above with the addition of 0.4 µM PS757 or vehicle (DMSO). At 24 hrs, multiple wells were harvested and pooled for further processing, with the experiment repeated in triplicate. Extraction of RNA utilized the Direct-zol RNA Miniprep Plus Kit (Zymo Research, R2072) with the quality of the purified RNA determined by spectroscopy (NanoDrop 2000, Thermo Fisher). Libraries for Illumina sequencing were prepared using the FastSelect RNA kit (Qiagen, 334222), according to the manufacture’s protocol and sequences determined using an Illumina NovaSeq 6000. Basecalls and demultiplexing were performed with Illumina’s bcl2fastq software and a custom python demultiplexing program with a maximum of one mismatch in the indexing read. RNA-seq reads were then aligned to the Ensembl release 101 primary assembly with STAR version 2.7.9a (1). Gene counts were derived from the number of uniquely aligned unambiguous reads by Subread:featureCount version 2.0.3 (2). Isoform expression of known Ensembl transcripts were quantified with Salmon version 1.5.2 (3) and assessed for the total number of aligned reads, total number of uniquely aligned reads, and features detected. The ribosomal fraction, known junction saturation, and read distribution over known gene models were quantified with RSeQC version 4.0 (4). Comparative Transcriptomic Analysis. All gene counts obtained from RNA-seq were then imported into the R/Bioconductor package EdgeR (5) and TMM normalization size factors calculated to adjust for differences in library size. Ribosomal genes and genes not expressed in the smallest group size minus one sample greater than one count-per-million were excluded from further analysis. The TMM size factors and the matrix of counts were then imported into the R/Bioconductor package Limma (6). Weighted likelihoods based on the observed mean-variance relationship of every gene and sample were calculated for all samples and the count matrix transformed to moderated log2-counts-per-million with Limma’s voomWithQualityWeights (7). The performance of all genes was assessed with plots of the residual standard deviation of every gene to their average log-count with a robustly fitted trend line of the residuals. Differential expression analysis was then performed to analyze for differences between conditions with results filtered for only those genes with Benjamini-Hochberg false-discovery rate adjusted p-values less than or equal to 0.05. A principal component analysis (PCA) was performed on differential expression data to distinguish differences between conditions (8). To find the significantly regulated genes, the Limma voomWithQualityWeights transformed log2-counts-per-million expression data was then analyzed via weighted gene correlation network analysis with the R/Bioconductor package WGCNA (9). Briefly, all genes were correlated across each other by Pearson correlations and clustered by expression similarity into unsigned modules using a power threshold empirically determined from the data. An eigengene was then created for each de novo cluster and its expression profile was then correlated across all coefficients of the model matrix. Because these clusters of genes were created by expression profile rather than known functional similarity, the clustered modules were given the names of random colors where grey is the only module that has any pre-existing definition of containing genes that do not cluster well with others. The information for all clustered genes for each module were then combined with their respective statistical significance results from Limma to determine whether or not those features were also found to be significantly differentially expressed. References 1. A. Dobin, C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, T. R. Gingeras, STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15-21 (2013). 2. Y. Liao, G. K. Smyth, W. Shi, featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923-930 (2014). 3. R. Patro, G. Duggal, M. I. Love, R. A. Irizarry, C. Kingsford, Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14, 417-419 (2017). 4. L. Wang, S. Wang, W. Li, RSeQC: quality control of RNA-seq experiments. Bioinformatics 28, 2184-2185 (2012). 5. M. D. Robinson, D. J. McCarthy, G. K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140 (2010). 6. M. E. Ritchie, B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi, G. K. Smyth, limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43, e47 (2015). 7. R. Liu, A. Z. Holik, S. Su, N. Jansz, K. Chen, H. S. Leong, M. E. Blewitt, M. L. Asselin-Labat, G. K. Smyth, M. E. Ritchie, Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. Nucleic Acids Res 43, e97 (2015). 8. Z. Zou, R. F. Potter, W. H. t. McCoy, J. A. Wildenthal, G. L. Katumba, P. J. Mucha, G. Dantas, J. P. Henderson, E. coli catheter-associated urinary tract infections are associated with distinctive virulence and biofilm gene determinants. JCI Insight 8, (2023). 9. P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data and R script for analysis on life history and transcriptomic responses to single dimensional changes in resource (chickpea-27°C) and temperature (cowpea-35°C) and multi-dimensional environmental changes in resource and temperature (chickpea-35°C) in a pest beetle, Callosobruchus maculatus (control treatment = cowpea-27°C). Dataset contains life history data collected in laboratory conditions (tab 1), logFC data (RNA-sequencing; Novogene Co. Ltd.) for Spearman rank correlation tests between treatments (tabs 2-4), read count data (RNA-sequencing; Novogene Co. Ltd.) for differential expression analysis using edgeR (R1-5 = Four samples at cowpea-27°C; R7-12 = Four samples at cowpea-35°C; R13-17 = Four samples at chickpea-27°C; R25-28 = Four samples at chickpea-35°C; tab 5) and edgeR output data for plotting in R (tab 6).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This page includes the data and code necessary to reproduce the results of the following paper: Yang Liao, Dinesh Raghu, Bhupinder Pal, Lisa Mielke and Wei Shi. cellCounts: fast and accurate quantification of 10x Chromium single-cell RNA sequencing data. Under review. A Linux computer running an operating system of CentOS 7 (or later) or Ubuntu 20.04 (or later) is recommended for running this analysis. The computer should have >2 TB of disk space and >64 GB of RAM. The following software packages need to be installed before running the analysis. Software executables generated after installation should be included in the $PATH environment variable.
R (v4.0.0 or newer) https://www.r-project.org/ Rsubread (v2.12.2 or newer) http://bioconductor.org/packages/3.16/bioc/html/Rsubread.html CellRanger (v6.0.1) https://support.10xgenomics.com/single-cell-gene-expression/software/overview/welcome STARsolo (v2.7.10a) https://github.com/alexdobin/STAR sra-tools (v2.10.0 or newer) https://github.com/ncbi/sra-tools Seurat (v3.0.0 or newer) https://satijalab.org/seurat/ edgeR (v3.30.0 or newer) https://bioconductor.org/packages/edgeR/ limma (v3.44.0 or newer) https://bioconductor.org/packages/limma/ mltools (v0.3.5 or newer) https://cran.r-project.org/web/packages/mltools/index.html
Reference packages generated by 10x Genomics are also required for this analysis and they can be downloaded from the following link (2020-A version for individual human and mouse reference packages should be selected): https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest After all these are done, you can simply run the shell script ‘test-all-new.bash’ to perform all the analyses carried out in the paper. This script will automatically download the mixture scRNA-seq data from the SRA database, and it will output a text file called ‘test-all.log’ that contains all the screen outputs and speed/accuracy results of CellRanger, STARsolo and cellCounts.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
mRNA-sequencing raw data. Method: We collected algae after 72 h exposure to 10 mg/L AuNP and AuNS for RNA-seq to analyze mRNA expression. Chlamydomonas_reinhardtii (Version: CC-503 cw92 mt+). The differentially expressed transcripts and genes were selected with log2 (fold change) ≥ 1 or log2 (fold change) ≤ -1 and p value < 0.05 criteria with the R package edgeR(https://bioconductor.org/packages/edgeR). Results: RNA sequencing identified 9 upregulated and 38 downregulated differentially expressed genes (DEGs) in the 10 mg/L AuNP treated cells, impairing photosynthesis and energy storage via the photosystem II subunit S1 (PSBS1)/ early light-inducible protein (ELI3) pathway. In contrast, the AuNS group exhibits 246 upregulated and 145 downregulated DEGs, affecting membrane integrity and nitrogen metabolism through the nitrate reductase (NIT1)/ aminomethyl transferase (AMT1)/ protein kinase domain-containing protein (A0A2K3CRU5) pathway.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Each zipped folder contains results files from reanalysis of public data in our publication, "mirrorCheck: an R package facilitating informed use of DESeq2’s lfcShrink() function for differential gene expression analysis of clinical samples" (see also the Collection description).These files were produced by rendering the Quarto documents provided in the supplementary data with the publication (one per dataset). The Quarto codes for the 3 main analyses (COVID, BRCA and Cell line datasets) performed differential gene expression (DGE) analysis using both DESeq2 with lfcShrink() via our R package mirrorCheck, and also edgeR. Each zipped folder here contains 2 folders, one for each DGE analysis. Since DESeq2 was run on data without prior data cleaning, with prefiltering or after Surrogate Variable Analysis, the 'mirrorCheck output' folders themselves contain 3 sub-folders titled 'DESeq_noclean', 'DESeq_prefilt' and 'DESeq_sva". The COVID dataset also has a folder with results from Gene Set Enrichment Analysis. Finally, the fourth folder contains results from a tutorial/vignette-style supplementary file using the Bioconductor "parathyroidSE" dataset. This analysis only utilised DESeq2, with both data cleaning methods and testing two different design formulae, resulting in 5 sub-folders in the zipped folder.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Material 6: Table S3. Differentially Expressed Genes analysis obtained using the edgeR R package. Each sheet addresses the three biological questions raised in this study. In DEG columns, it is indicated whether a gene is differentially expressed based on the convention of this analysis, where a gene is considered differentially expressed if it meets statistical significance (FDR < 0.05) and a Fold Change threshold (|log2FC| > 1).
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
RNA sequencing data from the prefrontal cortex (PFC) and hippocampus (HIPP) of male (12 weeks old) hemizyguous CAGHERV-Wenv mice ( C57BL6/J;129P2/Ola-Hprt mice; n = 3) relative to wild-type ( n = 3) littermates. Total RNA was extracted from prefrontal and hippocampal samples using the SPLIT RNA extraction kit (Lexogen, Austria) following the manufacturer’s recommendations and was sent to the Functional Genomics Center in Zurich (FGCZ) for quality control and RNA sequencing. The quality of the isolated RNA was determined with a Fragment Analyzer (Agilent, Santa Clara, California, USA). Only those samples with a 260 nm/280 nm ratio between 1.8–2.1, a 28S/18S ratio within 1.5–2, and RIN (>8) values qualified for a Poly-A enrichment strategy in order to generate the sequencing libraries applying the TruSeq mRNA Stranded Library Prep Kit (Illumina, Inc, California, USA). After Poly-A selection using Oligo-dT beads the mRNA was reverse-transcribed into cDNA. The cDNA was fragmented, end-repaired and poly-adenylated before ligation of TruSeq UD Indices (IDT, Coralville, Iowa, USA). The quality and quantity of the amplified sequencing libraries were validated using a Fragment Analyzer SS NGS Fragment Kit (1–6000 bp) (Agilent, Waldbronn, Germany). The equimolar pool of the samples was spiked into a NovaSeq6000 run targeting ~15M reads per sample on a S1 FlowCell (Novaseq S1 Reagent Kit, 100 cycles, Illumina, Inc, California, USA). Reads were quality-checked with FastQC. Sequencing adapters were removed with Trimmomatic and aligned to the reference genome and transcriptome of Mus Musculus (GENCODE, GRCm38,p5) with STAR v2.7.3. Distribution of the reads across genomic isoform expression was quantified using the R package GenomicRanges from Bioconductor Version 3.10. Minimum mapping quality, as well as minimum feature overlaps, was set to 10. Multi-overlaps were allowed. Differentially expressed genes (DEGs) were identified using the R package edgeR from Bioconductor Version 3.10, using a generalized linear model (glm) regression, a quasi-likelihood (QL) differential expression test and the trimmed means of M-values (TMM) normalization.
Facebook
TwitterMimulus luteus transcript IDs and sequencesMluteus_transcript_IDs_and_sequences.faMimulus guttatus transcript IDs and sequencesMguttatus_transcript_IDs_and_sequences.faJS1M. luteus var. luteus inbred line EY7 - petal transcriptome "T2" - sample JS1JS2M. guttatus inbred line IM767 - petal transcriptome "T2" - sample JS2JS3M. guttatus inbred line IM767 - leaf transcriptome "T2" - sample JS3JS4M. luteus var. luteus inbred line EY7 - calyx transcriptome "T2" - sample JS4JS5M. guttatus inbred line IM767 - stem transcriptome "T2" - sample JS5JS6M. luteus var. luteus inbred line EY7 - leaf transcriptome "T2" - sample JS6JS7M. guttatus inbred line IM767 - calyx transcriptome "T2" - sample JS7JS8M. luteus var. luteus inbred line EY7 - leaf transcriptome "T2" - sample JS8JS9M. luteus var. luteus inbred line EY7 - stem transcriptome "T2" - sample JS9Mg_T1Mimulus guttatus inbred line CG - transcriptome T1 - complete annotated transcriptomeMimulus_guttatus_complete_w_single_exons_standard_renamed_g...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
One of the fundamental aspects of genomic research is the identification of differentially expressed (DE) genes between two conditions. In the past decade, numerous DE analysis tools have been developed, employing various normalization methods and statistical modelling approaches. In this article, we introduce DElite, an R package that leverages the capabilities of four state-of-the-art DE tools: edgeR, limma, DESeq2, and dearseq. DElite returns the outputs of the four tools with a single command line, thus providing a simplified way for non-expert users to perform DE analysis. Furthermore, DElite provides a statistically combined output of the four tools, and in vitro validations support the improved performance of these combination approaches for the detection of DE genes in small datasets. Finally, DElite offers comprehensive and well-documented plots and tables at each stage of the analysis, thus facilitating result interpretation. Although DElite has been designed with the intention of being accessible to users without extensive expertise in bioinformatics or statistics, the underlying code is open source and structured in such a way that it can be customized by advanced users to meet their specific requirements. DElite is freely available for download from https://gitlab.com/soc-fogg-cro-aviano/DElite.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bovine respiratory disease (BRD) is a multifactorial disease complex and the leading infectious disease in post-weaned beef cattle. Clinical manifestations of BRD are recognized in beef calves within a high-risk setting, commonly associated with weaning, shipping, and novel feeding and housing environments. However, the understanding of complex host immune interactions and genomic mechanisms involved in BRD susceptibility remain elusive. Utilizing high-throughput RNA-sequencing, we contrasted the at-arrival blood transcriptomes of 6 beef cattle that ultimately developed BRD against 5 beef cattle that remained healthy within the same herd, differentiating BRD diagnosis from production metadata and treatment records. We identified 135 differentially expressed genes (DEGs) using the differential gene expression tools edgeR and DESeq2. Thirty-six of the DEGs shared between these two analysis platforms were prioritized for investigation of their relevance to infectious disease resistance using WebGestalt, STRING, and Reactome. Biological processes related to inflammatory response, immunological defense, lipoxin metabolism, and macrophage function were identified. Production of specialized pro-resolvin mediators (SPMs) and endogenous metabolism of angiotensinogen were increased in animals that resisted BRD. Protein-protein interaction modeling of gene products with significantly higher expression in cattle that naturally acquire BRD identified molecular processes involving microbial killing. Accordingly, identification of DEGs in whole blood at arrival revealed a clear distinction between calves that went on to develop BRD and those that resisted BRD. These results provide novel insight into host immune factors that are present at the time of arrival that confer protection from BRD.
Facebook
TwitterSee material & methods section differential expression analysis of RNAseq data of DOI: 10.7554/eLife.61630
We used the R package edgeR (Robinson, McCarthy, & Smyth, 2010) to compare RNA sequencing profiles between FLCNPOS and FLCNNEG replicates, as well as between TP53POS and TP53NEG. This involved reading in the gene-level counts, computing library size normalizing factors using the trimmed-mean of M-values (TMM) method and then fitting a model to estimate the group effect. Obtained p-values were corrected for multiple testing using the Benjamini-Hochberg false discovery rate (FDR) step-up procedure (Benjamini & Hochberg, 1995).
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Invasive species provide an opportune system to investigate how populations respond to new or changing environments. While the impacts of invasive species increase annually, many gaps in our understanding of how these species invade, adapt, and thrive in the areas they are introduced to remain. Using the perennial forb Gypsophila paniculata as a study system, we aimed to investigate how invasive species respond to different environments. Baby’s breath (Gypsophila paniculata) was introduced to North America in the late 1800’s and has since spread throughout the northwestern United States and western Canada. We used an RNA-seq approach to explore how molecular processes may be contributing to the success of invasive G. paniculata populations that are thought to share similar genetic backgrounds across distinct habitats. Transcription profiles were constructed for root, stem, and leaf tissue from seedlings collected from a sand dune ecosystem in Petoskey, MI (PSMI) and a sagebrush ecosystem in Chelan, WA (CHWA). Using these data we assessed differential gene expression between the two populations and identified SNPs within differentially expressed genes. We identified 1,146 transcripts that were differentially expressed across all tissues between the two populations. GO processes enriched by genes displaying higher expression in PSMI were associated with increased nutrient starvation, while enriched processes in CHWA were associated with abiotic stress. Only 7.4% of the differentially expressed genes across all three tissues contained SNPs differing in allele frequencies of at least 0.5 between the populations. In addition, common garden studies found the two populations differed in germination rate and seedling emergence success, but not in above- and below-ground tissue allocation. Our results suggest that the success of invasive G. paniculata across these two environments is likely the result of plasticity in molecular processes responding to different environmental conditions, although some genetic divergence may also be contributing to these differences.
Methods RNA Extraction. We collected 16 G. paniculata seedlings from CHWA (June 8, 2018) and 15 seedlings from PSMI (June 1, 2018). We then dissected seedlings into three tissue types (root, stem, and leaf), placed tissue in RNAlater™ (Thermo Fisher Scientific, Waltham, MA), and flash-froze them in an ethanol and dry ice bath. We extracted total RNA from frozen tissue using a standard TRIzol® (Thermo Fisher Scientific) extraction protocol (https://assets.thermofisher.com/TFS-Assets/LSG/manuals/trizol_reagent.pdf). We resuspended the extracted RNA pellet in DNase/RNase free water. The samples were then treated with DNase to remove any residual DNA using a DNA-Free Kit (Invitrogen, Carlsbad, CA). We assessed RNA quality with a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA) and NanoDrop™ 2000 (Thermo Fisher Scientific). RNA Integrity Number (RIN) values for individuals used in this study ranged from 6.1-8.3. However, because both chloroplast and mitochondrial rRNA can artificially deflate RIN values in plant leaf tissuewe deemed these values to be sufficient for further analysis based upon visualization of the 18S and 28S fragment peaks (see Babu & Gassmann, 2016). This resulted in high quality total RNA from 10 PSMI leaf, 10 PSMI stem, 10 PSMI root, 10 CHWA leaf, 9 CHWA stem, and 10 CHWA root samples.
cDNA Library Construction and Sequencing. Prior to sequencing, all samples were treated with a Ribo-Zero rRNA Removal Kit (Illumina, San Diego, CA). cDNA libraries were constructed using the Collibri Stranded Library Prep Kit (Thermo Fisher Scientific) before being sequenced on a NovaSeq 6000 (Illumina) using S1 and S2 flow cells. Sequencing was performed using a 2 x 100 bp paired-end read format and produced approximately 60 million reads per sample, with 94% of reads having a Q-score >30.
Transcriptome Assembly. Prior to transcriptome assembly, read quality was assessed using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Adapters and bases with a quality score less than 20 were first removed from the raw reads using Trim Galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). Next, rRNAs were identified using SortMeRNA (mean rRNA percent content of 5.31%) (Kopylova, Noé, & Touzet, 2012). A reference transcriptome was then assembled de novo using non-rRNA reads from all samples and Trinity v2.8.2 (Grabherr et al. 2011; Haas et al. 2013), with a normalized max read coverage of 100, a minimum k-mer coverage of 10, and k-mer size set to 32. The assembled transcriptome was annotated using Trinotate v3.1.1. Trinotate was given open reading frames (ORFs) predicted from TransDecoder and transcript homology (blastx and blastp) to the manually curated UniProt database (Bryant et al., 2017). The final assembly consisted of 223,810 putative genes and 474,313 putative transcripts (N50 = 3,121) from the 59 samples.
Differential Expression. To quantify transcript expression, reads were mapped back to the assembly using bowtie and quantified using the RSEM method as implemented in Trinity. Counts were generated for genes and transcripts. We then tested for differential gene expression using edgeR v3.22.5 in R v3.5.2 (Robinson, McCarthy, & Smyth, 2010; R Development Core Team, 2017). First, however, the count data was filtered and only transcripts with greater than 10 counts in at least 10 samples were included. Following filtering, 111,042 genes (49.61%) and 188,108 transcripts (39.66%) remained. Considering tissue type, 127,591 transcripts remained in the data from 20 root samples (26.90%), 125,261 transcripts remained in the 19 stem tissue samples (26.41%), and 112,499 transcripts remained in the 20 leaf tissue samples (23.72%). For differential expression testing, the data were stratified by tissue and filtered transcripts were then fit to the negative binomial (NB) model and tested using the quasi-likelihood F test with TMM (trimmed mean of M values) normalization. To be considered significantly differentially expressed, transcripts needed to have an adjusted p-value (BH method) below 0.05 and a log2 fold change greater than 2. For transcripts that were differentially expressed, we identified Gene Ontology (GO) biological processes that were either over- or under-represented using the PANTHER classification system v14.1, where transcripts were assessed against the Arabidopsis thaliana database (http://pantherdb.org/webservices/go/overrep.jsp). In addition, for those transcripts that were differentially expressed across all three tissues, we converted the UniProt IDs of the transcripts to GO biological process IDs using the online database bioDBnet (https://biodbnet-abcc.ncifcrf.gov/db/db2db.php), and used the metacoder package v0.3.3 (Foster, Sharpton, Grünwald, 2017) in R v3.6.0 to construct heat trees to visualize the relationship of our differentially expressed transcripts across GO biological process hierarchies.
Single Nucleotide Polymorphism (SNP) Variant Calling. We used the HaplotypeCaller tool from GATK4 to identify potential SNPs that were present in transcripts that were differentially expressed between populations (McKenna et al., 2010; DePristo et al. 2011). The bowtie mapped files were used to jointly genotype all 59 samples simultaneously with a minimum base quality and mapping quality of 30. Variant data was visualized using the vcfR package v1.8.0 (Knaus & Grünwald 2017). We identified variants associated with non-synonymous SNPs, synonymous SNPs, 5’ and 3’ UTR SNPs, 5’ and 3’ UTR indels, frame-shift and in-frame indels, premature or changes in stop codons and changes in start codons, and calculated population diversity estimates for all SNP types. The effect prediction was done using custom scripts (which can be found in the Dryad repository) and the Transdecoder predicted annotation in conjunction with the base change. We set a hard filter for the SNPs so that only those with QD scores > 2, MQ scores > 50, SOR scores < 3, and Read Post Rank Sums between -5 and 3 passed. We then calculated the allele frequencies for each SNP within PSMI and CHWA. For the subsequent evaluation, we focused on SNPs that had potential functional effects (i.e., they were not listed as ‘synonymous’ or ‘unclassified’), were in transcripts differentially expressed between PSMI and CHWA across all three tissues, and that exhibited differences in SNP allele frequencies between the populations by at least 0.5. We used the R package metacoder v0.3.3 to visualize the GO biological process hierarchies associated with transcripts containing these SNPs.
Germination Trial. On August 11, 2018 we returned to our sample sites in CHWA and PSMI and collected seeds from 20 plants per location. This date was chosen because Rice, Martínez-Oquendo, & McNair (2019) previously determined that this collection time can yield over 90% seed germination for G. paniculata collected from Empire, MI. To collect seeds, we manually broke seed pods off and placed them inside paper envelopes in bags half-filled with silica beads. We stored bags in the dark at 20 to 23˚C until the germination trial began one month later, We counted one hundred seeds from twenty plants per population and placed them in a petri dish lined with filter paper (n = 2,000 seeds per population). We established a control dish using 100 seeds from the ‘Early Snowball’ commercial cultivar (G. paniculata) sold by W. Atlee Burpee & Co in 2018, known to have germination percentages in excess of 90%. Incubators had a 12:12h dark:light photoperiod and growth chamber conditions were set at 20˚C with 114 μmol m-2 s-1 photosynthetically active radiation from fluorescent light bulbs. Each day we randomized petri dish locations within the incubator to avoid bias in temperature or light regimes. We conducted this study for fourteen days,
Facebook
TwitterSupporting Information forDay-night gene expression reveals circadian gene disco as a candidate for diel-niche evolution in moths.Authors: Yash Sondhi, Rebeccah L. Messcher, Anthony J. Bellantuano, Caroline G. Storer, Scott D. Cinel, R. Keating Godfrey, Andrew J. Mongue, Yi-Ming Weng, Deborah Glass, Ryan A. St Laurent, Chris A. Hamilton, Chandra Earl, Colin J. Brislawn, Ian J. Kitching, Seth M. Bybee, Jamie C. Theobald, Akito Y. KawaharaSupp_dataset_1: EdgeR: EdgeR sample metadata, analysis parameters (config files for RasFlow), differentially expressed gene sets, annotated DEG sets with Bombyx mori annotations for Anisota pellucida (Ap) and Dryocampa rubicunda (Dr). Overlapping genes between Anisota and Dryocampa. Analysis performed with de-novo assembly versions 5 for both species.Supp_dataset_2: DESeq2: DESeq2 sample metadata, analysis parameters (R script), differentially expressed gene sets, and annotated DEG sets with Bombyx mori annotations for Anisota pellucida (Ap) and Dryocampa rubicunda (Dr). Unique and overlapping genes between Anisota and Dryocampa. Analysis performed with de-novo assembly versions 5 for both species.Supp_dataset_3: Supporting_Table_Common_RNAseq: Overlapping genes for both analyses with a pivot table summary of the different unique Bombyx genes and the number of transcripts mapped to each.Supp_dataset_4: Gene_modules_WGCNA: WGCNA identified modules for Anisota and Dryocampa individual count data and combined modules along with annotations of the (grey60, tan, turquoise and blue) modules.Supp_dataset_5: GO_analyses: TopGO, ShinyGo and Revigo analysesSupp_dataset_6:Analyses_combined:All analyses results combined and annotated, with FC<0 and FC<2 for DEGs. Note to combine analyses fold change signs were switched for EdgeR changed.Supp_dataset_7: Overlap_common_genes_annotated_with_sequence: Overlapping transcripts annotated with sequences for both species and Bombyx mori. Note to combine analyses fold change signs were switched for EdgeR changed.Supp_dataset_8_EggNOG_annotations: EggNog annotations of the various sequences in dataset 7 and the GO terms used to query this datasetSupp_datatset_9: GO_lookup_genes: Genes recovered from the GO cross referencing with EggNOG annotationsSupp_dataset_10: Genes of interest for which conservation and protein models were constructedSupp_dataset_11: Assembly_codes: 11a:List of moths and assembly codes used 11b: List of insects and assembly codes used for Orthofinder searchesSupp_dataset_12_Alphafold_Bmor_models: Alphafold predicted models for Bombyx genes of interestSupp_dataset_13_Conservation analyses: Consurf predicted models and conservation analyses for insects and mothsSupp_dataset_14_PyMOL: PyMOL files showcasing the overlapping protein structuresSupp_dataset_15_Protocols: Modified RNA Extraction ProtocolsSupp_dataset_16_Assemblies: De-novo assemblies and Peptide FilesSupp_dataset_17_Samples: Sample RNA extraction and Library Preparation metadata
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Recent studies have shown that one of the parental subgenomes in ancient polyploids is generally more dominant - having both retained more genes and being more highly expressed - a phenomenon termed subgenome dominance. The genomic features that determine how quickly and which subgenome dominates within a newly formed polyploid remain poorly understood. To investigate the rate of subgenome dominance emergence, we examined gene expression, gene methylation, and transposable element (TE) methylation in a natural, less than 140 year old allopolyploid (Mimulus peregrinus), a resynthesized inter-species triploid hybrid (M. robertsii), a resynthesized allopolyploid (M. peregrinus), and diploid progenitors (M. guttatus and M. luteus). We show that subgenome expression dominance occurs instantly following the hybridization of two divergent genomes and significantly increases over generations. Additionally, CHH methylation levels are significantly reduced in regions near genes and within TEs in the first generation hybrid, intermediate in the resynthesized allopolyploid, and are repatterned differently between the dominant and recessive subgenomes in the natural allopolyploid. In addition, subgenome differences in levels of TE methylation mirror the increase in expression bias observed over the generations following hybridization. These findings provide important insights into genomic and epigenomic shock that occurs following hybridization and polyploid events.
Facebook
TwitterSupplementary File 1
Transcriptome assembly of A. fragariae in fasta format.
Note: all 147,621 sequences, including splicing isoforms, are included in this file.
Supplementary File 2
Annotation table of A. fragariae transcriptome.
Note: Annotation was performed on the longest isoforms only (48,541 sequences). Information in this file includes sequence names, closest match to known genes in the nr database (description), lowest e-value to known genes, number of gene ontology (GO) terms assigned, enzyme code and names, detailed GO information, and InterPro annotation.
Supplementary File 3. Gene expression data of A. fragariae under desiccated and control conditions.
Note: Genes with low expression were removed. Fragments per kilobase of transcript per million mapped reads (FPKM) of each condition was averaged across three biological replicates. In the fold change column, positive and negative values indicate genes were up-regulated and down-regulated ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set provides data files and R code to accompany the article Differential methylation analysis of reduced representation bisulfite sequencing experiments using edgeR published by F1000Research.
The data consists of Reduced Representation BS-seq methylation profiles of epithelial populations from the mouse mammary gland, with n=2 biological replicates for each of three cell populations.
RNA-seq expression profiles of luminal and basal mammary epithelial populations are also provided.
The R code undertakes an differential methylation analysis of the BS-seq profiles and demonstrates a strong negative correlation between the differential methylation and differential expression results.