Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains example input data, including raw images, codebooks, parameters, and segmentation labels needed to run the FISH spatial transcriptomics pipeline tool PIPEFISH. The datasets contained are:
in situ sequencing (ISS) of a whole coronal slice of a mouse brain (50 genes). Link to publication.
Gataric, M., Park, J.S., Li, T., Vaskivskyi, V., Svedlund, J., Strell, C., Roberts, K., Nilsson, M., Yates, L.R., Bayraktar, O. and Gerstung, M., 2021. PoSTcode: Probabilistic image-based spatial transcriptomics decoder. bioRxiv, pp.2021-10.
MERFISH of human U2-OS cell cultures (130 genes). Link to publication.
Moffitt, J.R., Hao, J., Wang, G., Chen, K.H., Babcock, H.P. and Zhuang, X., 2016. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proceedings of the National Academy of Sciences, 113(39), pp.11046-11051.
seqFISH of a developing mouse embryo (351 genes). Link to publication.
Lohoff, T., Ghazanfar, S., Missarova, A., Koulena, N., Pierson, N., Griffiths, J.A., Bardot, E.S., Eng, C.H., Tyser, R.C.V., Argelaguet, R. and Guibentif, C., 2022. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nature biotechnology, 40(1), pp.74-85.
In order to correctly format the inputs, run the prep_input.py script for the dataset you wish to run while in the same directory as the script.
Memory requirements for each dataset:
iss_mouse_brain - 3GB
merfish_human_u2os - 7GB
seqfish_mouse_embryo - 37GB
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LncRNAs are RNA molecules longer than 200 nucleotides that do not encode proteins. Experimental studies have demonstrated the diversity and importance of lncRNA functions in plants: involvement in the regulation of gene expression, homeostasis of plant physiological parameters. However, structure and function features are known only for a small number of lncRNAs and were experimentally confirmed only for single cases. To expand knowledge about lncRNA in other species, computational pipelines that allow standardized data processing steps in a mode that does not require user control up to the final result have recently been actively developed. This makes it possible to implement wider functionality for lncRNA data identification and analysis. In the present work, we proposed a pipeline ICAnnoLncRNA for automatic prediction, classification, and annotation of plant lncRNAs. This pipeline was applied to analysis of 877 maize transcriptome libraries. More than 9 million lncRNAs were predicted and classified into 3 classes with respect to their localization in the genome, structural features of lncRNAs, tissue specificity, and homology with other organisms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
gedepir is an R package that simplifies the use of deconvolution tools within a complete transcriptomics analysis pipeline. It simplify the definition of a end-to-end analysis pipeline with a set of base functions that are connected through the pipe syntax used in magrittr, tidyr or dplyr packages.This dataset example is comprised of 50 pseudo-bulk samples.
Facebook
TwitterOverview
This item contains references and test datasets for the Cactus pipeline.
Cactus (Chromatin ACcessibility and Transcriptomics Unification Software) is an mRNA-Seq and ATAC-Seq analysis pipeline that aims to provide advanced molecular insights on the conditions under study.
Test datasets
The test datasets contain all data needed to run Cactus in each of the 4 supported organisms. This include ATAC-Seq and mRNA-Seq data (.fastq.gz), parameter files (.yml) and design files (*.tsv). They were were created for each species by downloading publicly available datasets with fetchngs (Ewels et al., 2020) and subsampling reads to the minimum required to have enough DAS (Differential Analysis Subsets) for enrichment analysis.
Datasets downloaded:
Worm and Humans: GSE98758
Fly: GSE149339
Mouse: GSE193393
References
One of the goals of Cactus is to make the analysis as simple and fast as possible for the user while providing detailed insights on molecular mechanisms. This is achieved by parsing all needed references for the 4 ENCODE (Dunham et al., 2012; Stamatoyannopoulos et al., 2012; Luo et al., 2020) and modENCODE (THE MODENCODE CONSORTIUM et al., 2010; Gerstein et al., 2010) organisms (human, M. musculus, D. melanogaster and C. elegans). This parsing step was done with a Nextflow pipeline with most tools encapsulated within containers for improved efficiency and reproducibility and to allow the creation of customized references.
Genomic sequences and annotations were downloaded from Ensembl (Cunningham et al., 2022). The ENCODE API (Luo et al., 2020) was used to download the CHIP-Seq profiles of 2,714 Transcription Factors (TFs) (Landt et al., 2012; Boyle et al., 2014) and chromatin states in the form of 899 ChromHMM profiles (Boix et al., 2021; van der Velde et al., 2021) and 6 HiHMM profiles (Ho et al., 2014). Slim annotations (cell, organ, development, and system) were parsed and used to create groups of CHIP-Seq profiles that share the same annotations, allowing users to analyze only CHIP-Seq profiles relevant to their study. 2,779 TF motifs were obtained from the Cis-BP database (Lambert et al., 2019). GO terms and KEGG pathways were obtained via the R packages AnnotationHub (Morgan and Shepherd, 2021) and clusterProfiler (Yu et al., 2012; Wu et al., 2021), respectively.
Documentation
More information on how to use Cactus and how references and test datasets were created is available on the documentation website: https://github.com/jsalignon/cactus.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of four samples of GEO accession GSE119855 with the IBU RNA-seq pipeline
Facebook
TwitterAdditional file 9. Table 8: Excitome_freq_matrix .csv. Computationally predicted ADAR editing sites found in Psychiatric Disorders study and confirmed with PCR (Zhu et al. 2012) were combined with editing sites from RADAR database that were previously compared in Alzheimer's disease (Khermesh et al., 2016) to create a list containing 151 editing sites located in 91 genes. If a specific editing site is found in the dbSNP database, a reference number (rs) is included, along with the genomic location, strand orientation, annotation containing details about type of amino acid substitution, amino acid position, codon position on the mRNA, and two columns showing minimum cutoff value for expression and editing read depth for determining accuracy of variant calling.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pipelines for transcriptome analyses conducted as part of "Community context and pCO2 impact the transcriptome of the "helper" bacterium Alteromonas in co-culture with picocyanobacteria" (Barreto Filho et al., 2022). The provided code, documentation, input and output files include all the information needed to replicate our findings.
The following results abstract describes these data along with related datasets which can be accessed from the "Related Datasets" section of this page.
Many microbial photoautotrophs depend on heterotrophic bacteria for accomplishing essential functions. Environmental changes, however, could alter or eliminate such interactions. We investigated the effects of changing pCO2 on gene expression in co-cultures of 3 strains of picocyanobacteria (Synechococcus strains CC9311 and WH8102 and Prochlorococcus strain MIT9312) paired with the ‘helper’ bacterium Alteromonas macleodii EZ55. Co-culture with cyanobacteria resulted in a much higher number of up- and down-regulated genes in EZ55 than pCO2 by itself. Pathway analysis revealed significantly different expression of genes involved in carbohydrate metabolism, stress response, and chemotaxis, with different patterns of up- or down-regulation in co-culture with different cyanobacterial strains. Gene expression patterns of organic and inorganic nutrient transporter and catabolism genes in EZ55 suggested resources available in the culture media were altered under elevated (800 ppm) pCO2 conditions. Altogether, changing expression patterns were consistent with the possibility that the composition of cyanobacterial excretions changed under the two pCO2 regimes, causing extensive ecophysiological changes in both members of the co-cultures. Additionally, significant downregulation of oxidative stress genes in MIT9312/EZ55 cocultures at 800 ppm pCO2 were consistent with a link between the predicted reduced availability of photorespiratory byproducts (i.e., glycolate/2PG) under this condition and observed reductions in internal oxidative stress loads for EZ55, providing a possible explanation for the previously observed lack of “help” provided by EZ55 to MIT9312 under elevated pCO2. The data and code stored in this archive will allow the reconstruction of our analysis pipelines. Additionally, we provide annotation mapping files and other resources for conducting transcriptomic analyses with Alteromonas sp. EZ55.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Forests face an escalating threat from the increasing frequency of extreme drought events driven by climate change. To address this challenge, it is crucial to understand how widely distributed species of economic or ecological importance may respond to drought stress. Here, we used RNA-sequencing to investigate transcriptome responses at increasing levels of water stress in white spruce (Picea glauca (Moench) Voss), distributed across North America. We began by generating an expanded transcriptome assembly emphasizing short-term drought stress at different developmental stages. We also analyzed differential gene expression at four time points over 22 days in a controlled drought stress experiment involving 2-year-old plants and three genetically unrelated clones. De novo transcriptome assembly and gene expression analysis revealed a total of 33,287 transcripts (18,934 annotated unique genes), with 4,425 unique drought-responsive genes. Many transcripts that had predicted functions associated with photosynthesis, cell wall organization, and water transport were down-regulated under drought conditions, while transcripts linked to abscisic acid response and defense response were up-regulated. Our study highlights a previously uncharacterized effect of drought stress on lipid metabolism genes in conifers and significant changes in the expression of several transcription factors, suggesting a regulatory response potentially linked to drought response or acclimation. Our research represents a fundamental step in unraveling the molecular mechanisms underlying short-term drought responses in white spruce seedlings. In addition, it provides a valuable source of new genetic data that could contribute to genetic selection strategies aimed at enhancing the drought resistance and resilience of white spruce to changing climates. Methods This de novo transcriptome was assembled from RNA-Seq data obtained from three distinct experiments involving the collection of Picea glauca foliage. Sample types were selected to cover a wide range of conditions, with a particular focus on drought conditions, and represent diverse genes that are regulated in response to stress. A total of 16 samples came from a common garden experiment belonging to the International Diversity Experiment Network with Trees (IDENT) network, where eight trees had been subjected to water exclusion and eight others to summer irrigation since 2014. Six other samples came from a greenhouse experiment with a budworm-induced biotic stress treatment. Six samples were from a greenhouse drought stress experiment on young clonal seedlings including three water-stressed and three well-watered seedlings. For details please refer to the following paper: Ribeyre et al. De novo transcriptome assembly and discovery of drought-responsive genes in eastern white spruce (Picea glauca). Submitted to Frontiers in Plant Science. De novo assembly For each sample, clean reads were used to produce a transcriptome assembly using the SGA (Simpson and Durbin, 2012) and IDBA-UD assemblers (Peng et al., 2012) integrated within the a5 pipeline (Coil et al., 2015). Transcriptome assemblies were then scaffolded with one another using LINKS 1.8.6 (Warren et al., 2015). The resulting consensus assembly was then scaffolded again with a previously published Picea glauca transcriptome assembly (Rigault et al., 2011) using LINKS 1.8.6, and sequences shorter than 500 bp were removed as they were not likely to code for functional proteins.
Simpson, J. T., and Durbin, R. (2012). Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22, 549–556. doi: 10.1101/gr.126953.111. Peng, C., Ma, Z., Lei, X., Zhu, Q., Chen, H., Wang, W., et al. (2011). A drought-induced pervasive increase in tree mortality across Canada’s boreal forests. Nat. Clim. Change 1, 467–471. doi: 10.1038/nclimate1293. Coil, D., Jospin, G., and Darling, A. E. (2015). A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data. Bioinforma. Oxf. Engl. 31, 587–589. doi: 10.1093/bioinformatics/btu661. Rigault, P., Boyle, B., Lepage, P., Cooke, J. E. K., Bousquet, J., and MacKay, J. J. (2011). A white spruce gene catalog for conifer genome analyses. Plant Physiol. 157, 14–28. doi: 10.1104/pp.111.179663.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 3. Trimmed Mean of M-values.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pipeline for phylogenetic analysis of the GlcDEF, GOX/LOX, and tsar genes conducted as part of "Community context and pCO2 impact the transcriptome of the "helper" bacterium Alteromonas in co-culture with picocyanobacteria" (Barreto Filho et al., 2022). The provided code, documentation, input and output files include all the information needed to replicate our findings.
The following results abstract describes these data along with related datasets which can be accessed from the "Related Datasets" section of this page.
Many microbial photoautotrophs depend on heterotrophic bacteria for accomplishing essential functions. Environmental changes, however, could alter or eliminate such interactions. We investigated the effects of changing pCO2 on gene expression in co-cultures of 3 strains of picocyanobacteria (Synechococcus strains CC9311 and WH8102 and Prochlorococcus strain MIT9312) paired with the ‘helper’ bacterium Alteromonas macleodii EZ55. Co-culture with cyanobacteria resulted in a much higher number of up- and down-regulated genes in EZ55 than pCO2 by itself. Pathway analysis revealed significantly different expression of genes involved in carbohydrate metabolism, stress response, and chemotaxis, with different patterns of up- or down-regulation in co-culture with different cyanobacterial strains. Gene expression patterns of organic and inorganic nutrient transporter and catabolism genes in EZ55 suggested resources available in the culture media were altered under elevated (800 ppm) pCO2 conditions. Altogether, changing expression patterns were consistent with the possibility that the composition of cyanobacterial excretions changed under the two pCO2 regimes, causing extensive ecophysiological changes in both members of the co-cultures. Additionally, significant downregulation of oxidative stress genes in MIT9312/EZ55 cocultures at 800 ppm pCO2 were consistent with a link between the predicted reduced availability of photorespiratory byproducts (i.e., glycolate/2PG) under this condition and observed reductions in internal oxidative stress loads for EZ55, providing a possible explanation for the previously observed lack of “help” provided by EZ55 to MIT9312 under elevated pCO2. The data and code stored in this archive will allow the reconstruction of our analysis pipelines. Additionally, we provide annotation mapping files and other resources for conducting transcriptomic analyses with Alteromonas sp. EZ55.
Facebook
TwitterDataset created in the study "A Spatial Transcriptomics Atlas of the Malaria-infected Liver Indicates a Crucial Role for Lipid Metabolism and Hotspots of Inflammatory Cell Infiltration"
Structure
ST_berghei_liver
contains data generated during stpipeline analysis and imaging on 2k arrays Spatial Transcriptomics platform as well as data necessary for and from hepaquery analysis. These samples include 38 sections in total of which 8 are from mice (n=4) infected with sporozoites for 12h, 5 sections from control mice (n=3) at 12h, 7 sections from mice (n=4) infected with sporozoites for 24h and 4 sections from control mice (n=3) for 24 as well as 8 samples of mice (n=2) infected with sporozoites for 38h and control mice (n =2) for 38h.
STUtiility_mus_pb_ST.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in ST_berghei_liver
visium_berghei_liver
contains data generated with the spaceranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include 8 sections in total, of which 1 was infected with sporozoites for 12h, 1 control section at 12h, 1 section infected with sporozoites for 24h and 1 control section at 24 as well as 2 sporozoite infected sections, and 2 control sections at 38h.
V10S29-135_B1 contains spaceranger output for section 1 for infected and control sections at 12h post-infection
V10S29-135_C1 contains spaceranger output for section 1 for infected and control sections at 24h post-infection
V10S29-135_D1 contains spaceranger output for section 2 for infected and control sections at 38h post-infection
se_visium.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in visium_berghei_liver
snSeq_berghei_liver
contains data generated with the cellranger pipeline and imaging using the Visium spatial transcriptomics platform. These samples include single nuclei of 2 infected and control mice after 12h, 2 infected and control mice after 24h, 2 infected and control mice after 38h, and 2 uninfected mice prior to a challenge.
cellranger_cnt_out contains feature count matrix information from cell ranger output
final_merged_curated_annotations_270623.RDS describes seurat object generated using the STUtility package using ST data of the 38 liver sections of which the data is stored in snSeq_berghei_liver.tar.gz
raw images.zip contains raw images for supplementary figures 20-22
adjusted images.zip contains brightness and contrast adjusted images for supplementary figures 20-22
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains Spatial Transcriptomics (ST) data matching with Matrix Assisted Laser Desorption/Ionization - Mass Spetrometry Imaging (MALDI-MSI). This data is complementary to data contained in the same project. FIles with the same identifiers in the two datasets originated from the very same tissue section and can be combined in a multimodal ST-MSI object. For more information about the dataset please see our manuscript posted on BioRxiv (doi: https://doi.org/10.1101/2023.01.26.525195). This dataset includes ST data from 19 tissue sections, including human post-mortem and mouse samples. The spatial transcriptomics data was generated using the Visium protocol (10x Genomics). The murine tissue sections come from three different mice unilaterally injected with 6-OHDA. 6-OHDA is a neurotoxin that when injected in the brain can selectively destroy dopaminergic neurons. We used this mouse model to show the applicability of the technology that we developed, named Spatial Multimodal Analysis (SMA). Using our technology on these mouse brain tissue sections we were able to detect both dopamine with MALDI-MSI and the corresponding gene expression with ST. This dataset includes also one human post-mortem striatum sample that was placed on one Visium slide across the four capture areas. This sample was analyzed with a different ST protocol named RRST (Mirzazadeh, R., Andrusivova, Z., Larsson, L. et al. Spatially resolved transcriptomic profiling of degraded and challenging fresh frozen samples. Nat Commun 14, 509 (2023). https://doi.org/10.1038/s41467-023-36071-5), where probes capturing the whole transcriptome are first hybridized in the tissue section and then spatially detected. Each tissue section contained in the dataset has been given a unique identifier that is composed of the Visium array ID and capture area ID of the Visium slide that the tissue section was placed on. This unique identifier is included in the file names of all the files relative to the same tissue section, including the MALDI-MSI files published in the other dataset included in this project. In this dataset you will find the following files for each tissue section: - raw files: these are the read one fastq files (containing the pattern *R1*fastq.gz in the file name), read two fastq files (containing the pattern *R1*fastq.gz in the file name) and the raw microscope images (containing the pattern Spot.jpg in the file name). These are the only files needed to run the Space Ranger pipeline, which is freely available for any user (please see the 10x Genomics website for information on how to install and run Space Ranger); - processed data files: we provide processed data files of two types: a) Space Ranger outputs that were used to produce the figures in our publication; b) manual annotation tables in csv format produced using Loupe Browser 6 (csv tables with file names ending _RegionLoupe.csv, _filter.csv, _dopamine.csv, _lesion.csv, _region.csv patterns); c) json files that we used as input for Space Ranger in the cases where the automatic tissue detection included in the pipeline failed to recognize the tissue or the fiducials. Using these processed files the user can reproduce the figures of our publication without having to restart from the raw data files. The MALDI-MSI analyses preceding ST was performed with different matrices in different tissue section. We used 1) 9-aminoacridine (9-AA) for detection of metabolites in negative ionization mode, 2) 2,5-dihydroxybenzoic acid (DHB) for detection of metabolites in positive ionization mode, 3) 4-(anthracen-9-yl)-2-fluoro-1-ethylpyridin-1-ium iodide (FMP-10), which charge-tags molecules with phenolic hydroxyls and/or primary amines, including neurotransmitters. The information about which matrix was sprayed on the tissue sections and other information about the samples is included in the metadata table. We also used three types of control samples: - standard Visium: samples processed with standard Visium (i.e. no matrix spraying, no MALDI-MSI, protocol as recommended by 10x Gemomics with no exeptions) - internal controls (iCTRL): samples not sprayed with any matrix, neither processed with MALDI-MSI, but located on the same Visium slide were other samples were processed with MALDI-MSI - FMP-10-iCTRL: sample sprayed with FMP-10, and then processed as an iCTRL. This and other information is provided in the metadata table.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1. Supplementary Table 1. A configuration table listing key parameters used in CoRMAP.
Facebook
TwitterAdditional file 6. Table 5: GOterms "peripheral nervous system development". Gene ontology term containing 82 genes involved in establishment of blood-nerve barrier, lateral line ganglion development, peripheral nervous system neuron differentiation, postganglionic parasympathetic fiber development and Schwann cell differentiation.
Facebook
TwitterEach of 70 cell samples either at the control condition or treated with FDA-approved cancer drugs is sequenced by the single-ended random-primed mRNA-sequencing method with a read length of 100 base pairs, and a total of 70 raw sequence data files in the FASTQ format are generated. These sequence data files are then analyzed by a high-performance computational pipeline and ranked lists of gene signatures and biological processes related to drug-induced cardiotoxicity are generated for each drug. The raw sequence datasets and the analysis results have been carefully controlled for data quality, and they are made publicly available at the Gene Expression Omnibus (GEO) database repository of NIH. As such, this broad drug-stimulated transcriptomi dataset is valuable for the prediction of drug toxicities and their mitigations.
Facebook
TwitterBackgroundPipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances.MethodsFour commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data.ResultsThe overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat’s overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67–0.69) than for the cell line dataset (ρ = 0.87–0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21–0.29 and 0.34–0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results.ConclusionIn conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.
Facebook
TwitterSingle-cell RNA sequencing (scRNA-seq) has advanced our understanding of cell types and their heterogeneity within the human liver, but the spatial organization at single-cell resolution has not yet been described. Here we apply multiplexed error robust fluorescent in situ hybridization (MERFISH) to map the zonal distribution of hepatocytes, resolve subsets of macrophage and mesenchymal populations, and investigate the relationship between hepatocyte ploidy and gene expression within the healthy human liver. We next integrated spatial information from MERFISH with the more complete transcriptome produced by single- nucleus RNA sequencing (snRNA-seq), revealing zonally enriched receptor-ligand interactions. Finally, analysis of fibrotic liver samples identified two hepatocyte populations that are not restricted to zonal distribution and expand with injury. Together these spatial maps of the healthy and fibrotic liver provide a deeper understanding of the cellular and spatial remodeling t..., Two measurement modalities were used to generate these data, including multiplexed error robust fluorescence in situ hybridization (MERFISH) and single-nucleus RNA sequencing (snRNAseq)., , # MERFISH and snRNAseq data from Watson, Paul et al
This README file contains information on the data deposited for the manuscript "Spatial transcriptomics of healthy and fibrotic human liver at single-cell resolution" by Watson, Paul and colleagues.
Multiple anndata structures are provide as h5ad files for different datasets. These anndata structures were generated with the scanpy pipeline (v1.8.1) and can be loaded in python with the associated tools. These include: (1) adata_healthy_merfish.h5ad (2) adata_healthy_diseased_merfish.h5ad (3) adata_healthy_merfish_nucseq.h5ad (4) adata_healthy_nucseq.h5ad
Each anndata frame contains distinctive values for the respective data set as follows:
(1) adata_healthy_merfish.h5ad This structure contains data from healthy patient samples which were imaged with MERFISH. Raw data is stored in the adata.raw.X while adata.X is normalized by the total counts per cell, scaled to a uniform value, and then converted to logarithm...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Counts, lengths, TPM, and FPKM per gene and per transcript. All CCLE and NCI-60 cell lines are specified as Cellosaurus IDs.
Everything was re-processed with the nf-core/rnaseq pipeline (version 3.10.1) in the setting STAR/Salmon. For human fastq files (CCLE, NCI-60), GRCh38 was used, for mouse GRCm39.
CCLE (1019 cell lines)
Raw fastq files were downloaded from the NCBI SRA Run selector as BioProject PRJNA523380 using the SRA toolkit. Sequences were first prefetched and then the fastq files were generated with:
#!/bin/bash
while read run
do
echo $run
fasterq-dump $run
echo gzipping
gzip $run*.fastq
done < SRR_Acc_List_CCLE.txt
Then, the directories were deleted.
Afterwards, the FASTQ files were processed using the nf-core/RNA-seq pipeline using this command:
nextflow run nf-core/rnaseq --input CCLE_samplesheet.csv --outdir CCLE/nf_core/ --multiqc_title CCLE_star_salmon -c CCLE_nextflow.config -profile singularity,slurm --fasta ensembl107_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa --gtf ensembl107_GRCh38/Homo_sapiens.GRCh38.107.gtf -r 3.10.1
For the output data, SRR accession numbers were mapped back to the Cell line names using the SRARunTable metadata information. These cell line names were mapped to cellosaurus IDs.
The metadata file contains the Cellosaurus ID, the SRR accession numbers, the cell line names, metadata from SRA (BioProject, BioSample, Experiment), and metadata from Cellosaurus (cell line name, synonyms, diseases, cross references, BTO ID, CLO ID, sex, category, organism, comments).
NCI-60 (60 cell lines)
Like for NCI-60, fastq files were downloaded from the NCBI SRA Run selector as BioProject PRJNA433861 using the SRA toolkit.
Afterwards, the FASTQ files were processed using the nf-core/RNA-seq pipeline using the same command settings as above:
nextflow run nf-core/rnaseq --input NCI60_samplesheet.csv --outdir NCI60/nf_core/ --multiqc_title NCI60_star_salmon -c NCI60_nextflow.config -profile singularity,slurm --fasta ensembl107_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa --gtf ensembl107_GRCh38/Homo_sapiens.GRCh38.107.gtf -r 3.10.1
```
For the output data, SRR accession numbers were mapped back to the Cell line names using the SRARunTable metadata information. These cell line names were mapped to cellosaurus IDs.
The metadata file contains the Cellosaurus ID, the SRR accession numbers, the cell line names, metadata from SRA (BioProject, BioSample, Experiment), and metadata from Cellosaurus (cell line name, synonyms, diseases, cross references, BTO ID, CLO ID, sex, category, organism, comments).
PDAC mouse data (401 samples)
This data was generated by the MRI (university hospital rechts der Isar, Munich). The data generation strategy is described in PMC6097607. The read_1 samples contain all the cDNA while the read_2 samples only contain UMIs. Hence, only read_1 samples were used.
The FASTQ files were processed using the nf-core/RNA-seq pipeline using this command:
```{bash}
nextflow run nf-core/rnaseq --input MRI_PDAC_samplesheet.csv --outdir MRI_PDAC/nf_core/ --multiqc_title MRI_star_salmon -c MRI_PDAC_nextflow.config -profile singularity,slurm --fasta ensembl110_GRCm39/Mus_musculus.GRCm39.dna_sm.primary_assembly.fa.gz --gtf ensembl110_GRCm39/Mus_musculus.GRCm39.110.gtf.gz -r 3.10.1
```
The metadata file contains information about the experiments and the oncogenes, genotypes and morphology (epithelial/mesenchymal/fibroblast contamination).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OverviewClinical symptoms that persist for at least three months after infection by SARS-CoV-2, i.e. post-acute sequelae of SARS-CoV-2 (PASC), is an escalating global health problem. The mechanisms underlying post-COVID are still unclear, in particular there is a lack of large studies concerning patients with chronic symptoms persisting for several years after a mild COVID-19 infection. The aim of this study was to investigate possible molecular signatures and persistent SARS-CoV-2 gene fragments in patients with PASC up to 28 months after a mild infection.SummaryWe analyzed the gene expression profile in PBMCs from 60 middle-aged post-COVID patients and 50 age-matched controls, all of whom experienced a mild SARS-CoV-2 infections between March 2020 and February 2022. The uploaded data consist of count table and sample information and can be used for gene expression analysis of patients and controls.Generation of DataSequencing libraries were prepared from 500ng/μg of polyA selected RNA using the TruSeq stranded mRNA library preparation kit (cat# 20020595, Illumina Inc.). Unique dual indexes (cat# 20022371, Illumina Inc.) were used. The library preparation was performed according to the manufacturers’ protocol (#1000000040498). Sequencing was performed using paired-end 150 bp read length on a NovaSeq X Plus system, 10B flow cell and XLEAP-SBS sequencing chemistry. Samples were analyzed with the nf-core RNA sequencing pipeline release 3.15.1 (nf-co.re/rnaseq). In brief, the pipeline processes raw data from FastQ inputs, aligns the reads, generates counts relative to genes or transcripts and performs extensive quality-control of results.DataCount Data: Samples were analyzed with the nf-core RNA sequencing pipeline release 3.15.1 (nf-co.re/rnaseq). In brief, the pipeline processes raw data from FastQ inputs, aligns the reads, generates counts relative to genes or transcripts and performs extensive quality-control of results.Sample Data: Information regarding SampleName, Sample and Condition
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset associated with:
Adrien Hallou, Ruiyang He, Benjamin David Simons and Bianca Dumitrascu. A computational pipeline for spatial mechano-transcriptomics. bioRxiv 2023.08.03.551894; doi: https://doi.org/10.1101/2023.08.03.551894
Licence
This dataset is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains example input data, including raw images, codebooks, parameters, and segmentation labels needed to run the FISH spatial transcriptomics pipeline tool PIPEFISH. The datasets contained are:
in situ sequencing (ISS) of a whole coronal slice of a mouse brain (50 genes). Link to publication.
Gataric, M., Park, J.S., Li, T., Vaskivskyi, V., Svedlund, J., Strell, C., Roberts, K., Nilsson, M., Yates, L.R., Bayraktar, O. and Gerstung, M., 2021. PoSTcode: Probabilistic image-based spatial transcriptomics decoder. bioRxiv, pp.2021-10.
MERFISH of human U2-OS cell cultures (130 genes). Link to publication.
Moffitt, J.R., Hao, J., Wang, G., Chen, K.H., Babcock, H.P. and Zhuang, X., 2016. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proceedings of the National Academy of Sciences, 113(39), pp.11046-11051.
seqFISH of a developing mouse embryo (351 genes). Link to publication.
Lohoff, T., Ghazanfar, S., Missarova, A., Koulena, N., Pierson, N., Griffiths, J.A., Bardot, E.S., Eng, C.H., Tyser, R.C.V., Argelaguet, R. and Guibentif, C., 2022. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nature biotechnology, 40(1), pp.74-85.
In order to correctly format the inputs, run the prep_input.py script for the dataset you wish to run while in the same directory as the script.
Memory requirements for each dataset:
iss_mouse_brain - 3GB
merfish_human_u2os - 7GB
seqfish_mouse_embryo - 37GB