https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Six tomato accessions with contrasted responses to high temeprature (HT) were selected and the ovary transcriptomes of plants grown under HT and control conditions were analyzed and differentially expressed genes identified. Plant Materials : For RNAseq study, six accessions were used, three tolerant to heat stress : Cervil (one of the parents of the MAGIC population), the F1 hybrid Cervil x Levovil (CerLev) and Nagcarlan, which is known as tolerant to heat stress (Xu et al 2017a), and three susceptible to heat stress, chosen among the MAGIC lines (MT102, MT54) and their parental lines (Levovil). Plants were grown in greenhouse in Avignon in 2018, in the same conditions as the core collection (control and HT conditions) described in Bineau et al (submitted). RNA extraction Ovaries were collected on flowers from the six accessions just before petals fully open (1-2 days after pollination) and immediately frozen in liquid nitrogen. Sampling was performed over three weeks, each week sample corresponding to a biological replicate. The three biological replicates per accession (corresponding to a minimum of 5 ovaries) were separately ground to get biological replicates. RNA was extracted using the Spectrum Plant Total RNA kit (Sigma‐Aldrich, Saint‐Quentin Fallavier, France) following the manufacturer's protocol and treated with On‐Column DNase I Digestion Set (Sigma‐Aldrich) to remove any remaining genomic DNA. RNA purity and integrity were assessed on a NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific, Illkirch, France) and a Bioanalyser 2100 spectrophotometer (Agilent Technologies, Les Ulis, France), respectively. RNAseq Library construction and sequencing (100 bp pair end) were subcontracted to BGI. The minimal, maximal, and average amounts of raw sequencing data per sample were estimated to be 12.2 × 10⁹ bp, 13.6 × 10⁹ bp, and 12.56 × 10⁹ bp, respectively. Raw sequencing data quality was assessed using FASTQC v.0.11.8 software (Andrews, 2011) and aggregated with MULTIQC v.1.7 (Ewels et al., 2016). Sequences were trimmed using FASTP v.0.20.0 (Chen et al., 2018). On average, cleaning steps removed 4.06% of the data (min = 4.04%, max = 4.08%). Remaining data were aligned to the tomato reference genome (Heinz 1706, v.4.0) using STAR v.2.6.1b with two passes and providing the tomato gene model (annotation v4.1) to support the mapping process. Alignments were filtered to keep only concordantly mapped reads using Samtools v.1.9 (Li et al., 2009) and read counts per gene were generated for each library using HTSEQ v.0.11.2 (Anders et al., 2015). On average, 94.8% of read pairs were uniquely mapped per library (min = 91.5%, max = 96.2%) and 2.1% multi-mapped (min = 1.7%, max = 2.7%). The differential gene analysis was performed using the workspace DiCoExpres (Lambert et al, 2020). For further explanations, we refer to the manual of DiCoExpress. The Dataset contains the raw data and R script used and the main results provided by the analysis. For specific contrast for one accession, the scripts must be run again.
Six tomato accessions with contrasted responses to high temeprature (HT) were selected and the ovary transcriptomes of plants grown under HT and control conditions were analyzed and differentially expressed genes identified. Plant Materials : For RNAseq study, six accessions were used, three tolerant to heat stress : Cervil (one of the parents of the MAGIC population), the F1 hybrid Cervil x Levovil (CerLev) and Nagcarlan, which is known as tolerant to heat stress (Xu et al 2017a), and three susceptible to heat stress, chosen among the MAGIC lines (MT102, MT54) and their parental lines (Levovil). Plants were grown in greenhouse in Avignon in 2018, in the same conditions as the core collection (control and HT conditions) described in Bineau et al (submitted). RNA extraction Ovaries were collected on flowers from the six accessions just before petals fully open (1-2 days after pollination) and immediately frozen in liquid nitrogen. Sampling was performed over three weeks, each week sample corresponding to a biological replicate. The three biological replicates per accession (corresponding to a minimum of 5 ovaries) were separately ground to get biological replicates. RNA was extracted using the Spectrum Plant Total RNA kit (Sigma‐Aldrich, Saint‐Quentin Fallavier, France) following the manufacturer's protocol and treated with On‐Column DNase I Digestion Set (Sigma‐Aldrich) to remove any remaining genomic DNA. RNA purity and integrity were assessed on a NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific, Illkirch, France) and a Bioanalyser 2100 spectrophotometer (Agilent Technologies, Les Ulis, France), respectively. RNAseq Library construction and sequencing (100 bp pair end) were subcontracted to BGI. The minimal, maximal, and average amounts of raw sequencing data per sample were estimated to be 12.2 × 10⁹ bp, 13.6 × 10⁹ bp, and 12.56 × 10⁹ bp, respectively. Raw sequencing data quality was assessed using FASTQC v.0.11.8 software (Andrews, 2011) and aggregated with MULTIQC v.1.7 (Ewels et al., 2016). Sequences were trimmed using FASTP v.0.20.0 (Chen et al., 2018). On average, cleaning steps removed 4.06% of the data (min = 4.04%, max = 4.08%). Remaining data were aligned to the tomato reference genome (Heinz 1706, v.4.0) using STAR v.2.6.1b with two passes and providing the tomato gene model (annotation v4.1) to support the mapping process. Alignments were filtered to keep only concordantly mapped reads using Samtools v.1.9 (Li et al., 2009) and read counts per gene were generated for each library using HTSEQ v.0.11.2 (Anders et al., 2015). On average, 94.8% of read pairs were uniquely mapped per library (min = 91.5%, max = 96.2%) and 2.1% multi-mapped (min = 1.7%, max = 2.7%). The differential gene analysis was performed using the workspace DiCoExpres (Lambert et al, 2020). For further explanations, we refer to the manual of DiCoExpress. The Dataset contains the raw data and R script used and the main results provided by the analysis. For specific contrast for one accession, the scripts must be run again.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of RNA-seq and sRNA-seq datasets from four libraries.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset details the scRNASeq and TCR-Seq analysis of sorted PD-1+ CD8+ T cells from patients with melanoma treated with checkpoint therapy (anti-PD-1 monotherapy and anti-PD-1 & anti-CTLA-4 combination therapy) at baseline and after the first cycle of therapy. A major publication using this dataset is accessible here: (reference)
*experimental design
Single-cell RNA sequencing was performed using 10x Genomics with feature barcoding technology to multiplex cell samples from different patients undergoing mono or dual therapy so that they can be loaded on one well to reduce costs and minimize technical variability. Hashtag oligomers (oligos) were obtained as purified and already oligo-conjugated in TotalSeq-C format from BioLegend. Cells were thawed, counted and 20 million cells per patient and time point were used for staining. Cells were stained with barcoded antibodies together with a staining solution containing antibodies against CD3, CD4, CD8, PD-1/IgG4 and fixable viability dye (eBioscience) prior to FACS sorting. Barcoded antibody concentrations used were 0.5 µg per million cells, as recommended by the manufacturer (BioLegend) for flow cytometry applications. After staining, cells were washed twice in PBS containing 2% BSA and 0.01% Tween 20, followed by centrifugation (300 xg 5 min at 4 °C) and supernatant exchange. After the final wash, cells were resuspended in PBS and filtered through 40 µm cell strainers and proceeded for sorting. Sorted cells were counted and approximately 75,000 cells were processed through 10x Genomics single-cell V(D)J workflow according to the manufacturer’s instructions. Gene expression, hashing and TCR libraries were pooled to desired quantities to obtain the sequencing depths of 15,000 reads per cell for gene expression libraries and 5,000 reads per cell for hashing and TCR libraries. Libraries were sequenced on a NovaSeq 6000 flow cell in a 2X100 paired-end format.
*extract protocol
PBMCs were thawed, counted and 20 million cells per patient and time point were used for staining. Cells were stained with barcoded antibodies together with a staining solution containing antibodies against CD3, CD4, CD8, PD-1/IgG4 and fixable viability dye (eBioscience) prior to FACS sorting. Barcoded antibody concentrations used were 0.5 µg per million cells, as recommended by the manufacturer (BioLegend) for flow cytometry applications. After staining, cells were washed twice in PBS containing 2% BSA and 0.01% Tween 20, followed by centrifugation (300 xg 5 min at 4 °C) and supernatant exchange. After the final wash, cells were resuspended in PBS and filtered through 40 µm cell strainers and proceeded for sorting. Sorted cells were counted and approximately 75,000 cells were processed through 10x Genomics single-cell V(D)J workflow according to the manufacturer’s instructions.
*library construction protocol
Sorted cells were counted and approximately 75,000 cells were processed through 10x Genomics single-cell V(D)J workflow according to the manufacturer’s instructions. Gene expression, hashing and TCR libraries were pooled to desired quantities to obtain the sequencing depths of 15,000 reads per cell for gene expression libraries and 5,000 reads per cell for hashing and TCR libraries. Libraries were sequenced on a NovaSeq 6000 flow cell in a 2X100 paired-end format.
*library strategy
scRNA-seq and scTCR-seq
*data processing step
Pre-processing of sequencing results to generate count matrices (gene expression and HTO barcode counts) was performed using the 10x genomics Cell Ranger pipeline.
Further processing was done with Seurat (cell and gene filtering, hashtag identification, clustering, differential gene expression analysis based on gene expression).
*genome build/assembly
Alignment was performed using prebuilt Cell Ranger human reference GRCh38.
*processed data files format and content
RNA counts and HTO counts are in sparse matrix format and TCR clonotypes are in csv format.
Datasets were merged and analyzed by Seurat and the analyzed objects are in rds format.
file name |
file checksum |
PD1CD8_160421_filtered_feature_bc_matrix.zip |
da2e006d2b39485fd8cf8701742c6d77 |
PD1CD8_190421_filtered_feature_bc_matrix.zip |
e125fc5031899bba71e1171888d78205 |
PD1CD8_160421_filtered_contig_annotations.csv |
927241805d507204fbe9ef7045d0ccf4 |
PD1CD8_190421_filtered_contig_annotations.csv |
8ca544d27f06e66592b567d3ab86551e |
*processed data file |
antibodies/tags |
PD1CD8_160421_filtered_feature_bc_matrix.zip |
none |
PD1CD8_160421_filtered_feature_bc_matrix.zip |
TotalSeq™-C0251 anti-human Hashtag 1 Antibody - (HASH_1) - M1_base_monotherapy |
PD1CD8_160421_filtered_contig_annotations.csv |
none |
PD1CD8_190421_filtered_feature_bc_matrix.zip |
none |
PD1CD8_190421_filtered_feature_bc_matrix.zip |
TotalSeq™-C0251 anti-human Hashtag 1 Antibody - (HASH_1) - M2_base_monotherapy |
PD1CD8_190421_filtered_contig_annotations.csv |
none |
This dataset contains all the analysis results of the sequencing data performed using 10xGenomics CellRanger and custom R scripts. Code of the analysis can be found in Github : https://github.com/CIML-bioinformatic/MIlab_LTaTreg-Thymus Description of the sample production: Single cell suspensions of thymus were obtained by scratching organs through a 70-μm nylon mesh cell strainer with a plastic plunge in PBS 5%BSA 1mM EDTA. Thymic cell suspensions were then incubated with red blood cell lysis buffer for 3 minutes at room temperature and enriched for CD4+ thymocytes by depletion of CD8+ and CD11c+ cells using the AutoMACS® Pro Separator with the Deplete program (Miltenyi). We used cell hashing with hashtag oligonucleotides (HTO) to multiplex the two samples from 2 individual mice. Cell surface staining used for gating cells from the Treg lineage and staining for distinct barcoded anti-mouse CD45 antibody (Biolegend; A0304 and A0305 for Foxp3eGFP and Foxp3eGFPxLta-/- Treg cells, respectively) were performed in PBS 5%BSA 1mM EDTA for 30 min on ice. For each sample, cells from the Treg lineage Live Dead-CD4+CD8-CCR6-CD3e+ expressing either CD25, Foxp3eGFP or both were bulk-sorted with BD FACS Aria II. Sorted cell samples (20,000 Foxp3eGFP Treg cells and 20,000 Foxp3eGFPxLta-/- Treg cells) were pooled with a target of 10,000 captured cells and loaded in a single capture well for subsequent 10x Genomics Single Cell 3’ v3.1 workflow. Library was performed according to the manufacter’s instructions (single cell 3’ v3.1 protocol, 10x Genomics). Briefly, cells were resuspended in the master mix and loaded together with partitioning oil and gel beads into the chip to generate the gel bead-in-emulsion (GEM). The poly-A RNA from the cell lysate contained in every single GEM was retrotranscripted to cDNA, which contains an Illumina R1 primer sequence, Unique Molecular Identifier (UMI) and the 10x Barcode. The pooled barcoded cDNA was then cleaned up with Silane DynaBeads, amplified by PCR and the appropriated sized fragments were selected with SPRIselect reagent. The pellet and supernatant fractions were separated for subsequent HTO and gene expression library construction. During the library construction, Illumina R2 primer sequence, paired-end constructs with P5 and P7 sequences and a sample index were added. The HTO library was constructed using Truseq D701 and D702 sequences containing i7 indexes. The resulting libraries were pooled and sequenced on an Illumina NextSeq2000 platform with a P2 flow cell (100 cycles). mRNA and Cell Hashing FASTQ raw files were processed using Cell Ranger v6.0.1 (10X genomics Inc.) software with default parameters to performs alignment, filtering, barcode counting and unique molecular identifier (UMI) counting. Reads were aligned to the mouse mm10 genome. A total number of 4,798 cells were identified with a mean of 44,015 reads per cell and a median of 1,992 genes per cell.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
AbstractBackground: Assembling genes from next-generation sequencing data is not only time consuming but computationally difficult, particularly for taxa without a closely related reference genome. Assembling even a draft genome using de novo approaches can take days, even on a powerful computer, and these assemblies typically require data from a variety of genomic libraries. Here we describe software that will alleviate these issues by rapidly assembling genes from distantly related taxa using a single library of paired-end reads: aTRAM, automated Target Restricted Assembly Method. The aTRAM pipeline uses a reference sequence, BLAST, and an iterative approach to target and locally assemble the genes of interest. Results: Our results demonstrate that aTRAM rapidly assembles genes across distantly related taxa. In comparative tests with a closely related taxon, aTRAM assembled the same sequence as reference-based and de novo approaches taking on average < 1 min per gene. As a test case with divergent sequences, we assembled >1,000 genes from six taxa ranging from 25 – 110 million years divergent from the reference taxon. The gene recovery was between 97 – 99% from each taxon. Conclusions: aTRAM can quickly assemble genes across distantly-related taxa, obviating the need for draft genome assembly of all taxa of interest. Because aTRAM uses a targeted approach, loci can be assembled in minutes depending on the size of the target. Our results suggest that this software will be useful in rapidly assembling genes for phylogenomic projects covering a wide taxonomic range, as well as other applications. The software is freely available: http://www.github.com/juliema/aTRAM Usage notesAlignments from Pediculus schaeffi assembliesThis zip file contains folders with alignments from the assembly of Pediculus schaeffi genes using aTRAM, reference-based and de novo approaches. The alignments include each of these assemblies compared with the reference species Pediculus humanus as well as to each of the other assembly methods.DRYAD_PedSchaeffi_Data.zipAlignments from aTRAM assemblies across taxaAlignments of 1,107 aTRAM gene assemblies from six taxa. Each gene and each taxon is aligned to the reference sequence of Pediculus humanus.DRYAD_AllTaxon_Alignments.zip
The HMAP database (http://www.hull.ac.uk/hmap) is an open access facility that currently comprises time series of commercial catches covering the period 1611-2000. It is a growing resource and extends more that 240,000 records and more than 100 species. Data are mostly recovered from archives, tax records, custom records or surveys. The facility includes a web guide to the database (the Data Directory) and a web library of dataset downloads (the Data Library), while users can create customized datasets through the HMAP Portal, which is an interactive facility for searching the database. A significant proportion of these holdings are currently available through OBIS. HMAP is a distributed data contributor and the constituent datasets have been mapped to the OBIS schema using DiGIR since 2004.
The HMAP program (http://www.hmapcoml.org) is the historical component of the Census of Marine Life (CoML). It is a multidisciplinary, collaborative project which aims to enhance knowledge and understanding of how and why the diversity, distribution and abundance of marine life in the world's oceans changes over the long term. The HMAP program is currently composed of 9 datasets, 3 of which focus on trawl records from Southeast Australia, one on world whaling, 2 on Northwest Atlantic, and 3 on catch data from Norwegian and North and Baltic seas.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Sugar kelp (Saccharina latissima) is an ecologically and increasingly economically important kelp, distributed from temperate to Arctic rocky shores. However, S. latissima is presently threatened by ongoing climate changes. Genetic variations have previously been identified across S. latissima populations. However, little is known regarding the genetic basis for adaptation and acclimation to different environmental conditions. In this study, a common garden experiment was performed with sporophytes originating from North-Norway (NN), Mid-Norway (MN), and South-Norway (SN), representing areas with highly different temperatures and photoperiods. Transcriptomic analyses revealed significant variation in the gene expression of cultures from North-Norway, associated with low temperature and long photoperiods, compared to Mid- and South-Norway. Genes that were differentially expressed under different photoperiod and temperature conditions included genes linked to photosynthesis, chlorophyll biosynthesis, heat response, growth, protein synthesis, and translation. However, the transcriptional responses to variations in photoperiod and temperature differed between different populations of S. latissima (NN, MN, and SN), indicating genotypic adaptations. Overall, our study provides deeper insight into the local adaptations of S. latissima populations along the Norwegian Coast with implications for the conservation of natural populations. Methods A common garden experiment (CGE) was carried out between 2 February – 13 March 2018 at the University of Bergen, with material of S. latissima collected in North-Norway (69° 38’ N, 18° 57’ E) (NN), Mid-Norway (63° 43’ N, 8° 49’ E) (MN) and South-Norway (60° 16’ N, 5° 13’ E) (SN). Tissue pieces carrying mature sori with sporangia were cut from 10-12 Saccharina latissima, collected at 1-5 m depth between 8-12 January 2018 at all three sites in Norway. The material from NN and MN was wrapped up in moist paper with cooling elements and transported to the laboratory and seeded the day of arrival, during two successive days. The tissue with sori was treated as described in Forbord et al. (2018) to prevent diatom growth in the cultures, and thereafter submerged in beaker glasses with cool (12°C) sterile sea water and stirred until spore release was observed using microscope. The spore solution was applied to tagged, clean and heat-treated granite stones (10x10 cm). After 10 min exposure to spore solution the stones were placed in sterile sea water. The material from SN was kept in moist paper in a fridge overnight to provoke spore release and was otherwise treated as described by Forbord et al. (2018). Additional plates for checking development of each culture were seeded in the same way. The seeded stones were transferred to a climate room. Each batch of gametophytic culture (NN, MN and SN) was grown in separate tanks with running sea water, but with similar photon fluence rates (around 50 µE m-2s-1, measured with a spherical sensor) and temperature (9-10°C) conditions. The running sea water of the climate rooms is from 100 m depth, filtered through a sand filter, and treated with UV light. The development of each culture batch was checked regularly, and 1 February minute sporophytes were observed in all three batches. The CGE was carried out in a climate room divided into three compartments through complete enclosures of black and opaque plastic, where each compartment had two tanks (30 cm x 50 cm x 25 cm (height)) with running seawater, representing replicates. The conditions of the three compartments were set to mimic temperature and day-length conditions of NN, MN and SN in mid-May: NN with 4°C and 24 h light, MN with 6°C and 19 h light and SN with 9°C and 17 h light. The temperature conditions are within the ranges measured in April-May in the three regions by Forbord et al. (2020) or in mid-May according to Sætre (1973), which represent a period of rapid growth of S. latissima. The granite stones with growing sporophytes were added to the tanks with three stones in each, representing NN, MN and SN genotypes. Flow rate of running sea water was about 1 L per 20 sec in all the tanks, and photon fluence rates were kept at similar levels (46-50 µE m-2s-1). The tanks were rinsed regularly to prevent diatom growth, and temperatures of the tanks were checked every second or third day and adjusted when needed. Small variations normally within 1°C occurred during the experiment. On termination the stones were photographed, and two or three of the largest sporophytes per stone were sampled, put on RNA-later in a fridge overnight, and frozen at -80°C. RNA-seq library / dataset Total RNA was extracted from the flash frozen kelp tissue using 2% CTAB and 2M DTT, followed by separation of DNA/RNA from lipids and proteins using chloroform:isoamyl alcohol (24:1). The supernatant was removed, precipitated using isopropanol, pelleted, and washed with absolute alcohol, and finally resuspended in Low TE buffer. Isolation of RNA from extractions was achieved by removing DNA using the TURBO DNA-free™ Kit (Invitrogen). Extracted RNA quality and quantity was checked using a Qubit RNA HS kit and Agilent RNA HS kit on a 4200 TapeStation System. RNA libraries were constructed using the NEBNext® Ultra II RNA RNA Library Prep Kit for Illumina® (New England Biolabs) and quality checked using the Agilent 4200 TapeStation System. Paired-end sequencing (2x75bp) of transcriptomes was performed on an Illumina HiSeq4000, giving ca. 30 million reads per sample. RNA-seq data was available from 35 samples. Within the RNA-seq subset, sample replicates were categorized according to genotype and temperature/light conditions. Specifically, for the SN genotype there were 4 samples each at 6°C/19 h and 4°C/24 h, along with 3 reference samples at 9°C/17 h. For the MN genotype, the sample distribution was 3 samples at 9°C/17 h, 5 samples at 4°C/24 h and 5 reference samples at 6°C/19 h. For the NN genotype there were 3 reference samples at 4°C/24 h, 4 samples at 6°C/19 h, and 4 samples at 9°C/17 h. Transcripts were aligned to a reference genome obtained through the France Génomique National infrastructure project Phaeoexplorer (ANR-10-INBS-09). Bioinformatics analysis To assess the quality of the raw sequencing reads, the quality control software FastQC (version 0.11.9) was used. The reads were trimmed using Trim Galore (version 0.6.6). Minimum read length before cutoff were set to 60 bp since shorter reads often have low qualities (Krueger, 2015). Trim Galore was run with default setting, i.e. a Phredscore threshold of 20 and a maximum error rate of 0.1 to obtain clean reads of high quality. MultiQC was used to summarize the results from the read quality trimming done with Trim Galore (Ewels et al., 2016). Trimmed reads were aligned to the reference genome of S. latissima using STAR (version 2.7.10a) (Dobin et al., 2015; Denoeud et al., 2024). Prior to alignment, the reference genome was indexed by running --runmode genomeGenerate. No Gene Transer Format (GTF) file was available and the provided General Feature Format (GFF) file was formatted in STAR to avoid problems when running sequence alignment. This was solved by replacing --sjdbGTFfile with --sjdbGTFtagExonParentTranscript. The setting instructs STAR to use “Parent” tag in the GFF-file to link exons to transcripts. The featureCounts software (version 2.0.1) was used to count the number of reads mapping to each gene in the Saccharina latissimi reference genome. Pairwise comparisons of differential gene expression were performed using DESeq2 (version 1.34.0) (Love et al., 2014). Reference replicates were compared against replicate groups with same genotype under different conditions. An absolute LFC (log2FoldChange) of ≥ 2 and a p-value of < 0.05 was used as thresholds to determine the significance in gene expression between groups. The identifier from the GFF-file was used to find annotation from Interproscan. A GO enrichment analysis was carried out using topGO (version 2.46.0) to identify overrepresented GO-terms.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Giant viruses (phylum Nucleocytoviricota) are globally distributed in aquatic ecosystems. They play significant roles as evolutionary drivers of eukaryotic plankton and regulators of global biogeochemical cycles. However, we lack knowledge about their native hosts, hindering our understanding of their lifecycle and ecological importance. Here, we used single-cell RNAseq and samples from an induced E. huxleyi bloom during a mesocosm experiment to link giant viruses with their protist hosts. We observe active giant virus infections in multiple host lineages, including members of the algal groups Chrysophycae and Prymnesiophycae, as well as heterotrophic flagellates in the class Katablepharidaceae. Katablepharids were infected with a rare Imitevirales-07 giant virus lineage expressing cell fate regulation genes. Analysis of the temporal dynamics of this host-virus interaction indicated a role for the Imitevirales-07 in the collapse of the host Katablepharid population. Our results demonstrate that single-cell RNA-seq can be used to identify previously undescribed host-virus interactions and study their ecological relevance. Methods Mesocosm core setup and sampling procedure Samples were obtained during the AQUACOSM VIMS-Ehux mesocosm experiment in Raunefjorden near Bergen, Norway (60°16′11N; 5°13′07E), in May 2018. Seven bags were filled with 11m3 water from the fjord, containing natural plankton communities. Algal blooms were induced by nutrient addition and monitored for 24 days, as previously described23. 10 samples were collected from four bags, as follows: From bag 3, on days 15 and 20 (named B3T15, B3T20 correspondingly). From bag 4, on days 13, 15,19, and 20 (named B4T13, B4T15, B4T19, and B4T20, correspondingly). From bag 6, on day 17 (named B6T17). From bag 7, on days 16, 17, and 18 (named B7T16, B7T17, and B7T18, correspondingly). Samples were initially filtered as follows: 2 liters of water were filtered with a 20 µm mesh and collected in a glass bottle. The cells were then concentrated through gentle gravity filtration on a 3 µm polycarbonate filter (Whatman), mounted on a reusable bottle top filter holder (Thermo Fischer). The biomass on the filter was regularly resuspended by gentle pipetting. For samples B7T16, B7T18, B4T15, B3T15, B6T17, B7T17, and B4T19, the 2 liters of seawater were concentrated down to 100 ml, distributed in two 50 ml tubes, which corresponds to a 200 times concentration. For B4T13, the concentration factor was 140 times. For B4T20 and B3T20, the concentration factor was 100 times. The different concentration factors are explained by filter clogging and various field constraints, including processing time. For all samples except B3T20, the 50 ml tubes were centrifuged for 4 min at 2500g, after which the supernatant was discarded. Pellets corresponding to the same day and same bag were pooled and resuspended in a final volume of 200 µl of chilled PBS. 1800 µl of pre-chilled high-performance liquid chromatography (HPLC) grade 100% methanol was added drop by drop to the concentrated biomass. For B3T20, the concentrated biomass was centrifuged for 4 min at 2500g, resuspended in 100 µl of chilled PBS, to which 900 µl of chilled HPLC grade 100% methanol was added. Then, samples were incubated for 15 minutes on ice and stored at -80°C until further analysis. Library preparation and RNA-seq sequencing using 10X Genomics For analysis by 10X Genomics, tubes were defrosted and gently mixed, and 1.7 ml of the samples were transferred into an Eppendorf Lowbind tube and centrifuged at 4°C for 3 min at 3000g. The PBS/methanol mix was discarded and replaced by 400 µl of PBS. Cell concentration was measured using an iCyt Eclipse flow cytometer (SONY) based on forward scatter. Cell concentration ranged from 1044 cells ml-1 to 9855 cells ml-1. All concentrations were brought to 1000 cells ml-1 to target 7000 cells recovery, according to the 10X Genomics Cell Suspension Volume Calculator Table provided in the user guide. The cellular suspension was loaded onto Next GEM Chip G targeting 7000 cells and then ran on a Chromium Controller instrument to generate GEM emulsion (10x Genomics). Single-cell 3' RNA-seq libraries were generated according to the manufacturer's protocol (10x Genomics Chromium Single Cell 3' Reagent Kit User Guide v3/v3.1 Chemistry) on different occasions: B4T19 and B7T17 in January 2020 and B3T15, B3T20, B4T13, B4T15, B4T20, B6T17, B7T16, and B7T18 in August 2020 with 12 cycles for cDNA amplification and 15 cycles for library amplification. Library concentrations and quality were measured using the Qubit dsDNA High Sensitivity Assay kit (Life Technologies, Carlsbad, CA). Libraries were pooled according to targeted cell number, aiming for a minimum of 20,000 reads per cell. Pooled libraries were sequenced using the NextSeq® 500 High Output kit (75 cycles). Bioinformatic pipeline A step-by-step description of the bioinformatic pipeline from this step onward, including all in-house scripts used, is detailed in the GitHub repository under github.com/vardilab/host-virus-pairing. Detection of infected cells in the single-cell RNA-seq data using a custom viral genes database To detect viral transcripts, a reference was built from a database of highly conserved genes6 from all NCLDV in the Giant Virus Database9, such as family B DNA polymerase, RNA polymerase subunits, and the major capsid protein. The genes were clustered using CD-HIT v. 4.6.6 at 90% nucleotide identity To remove redundancy43. From this database of 34866 genes, a reference was created using the 10X Genomics Cell Ranger mkref command. The Cell Ranger Software Suite (v. 5.0.0) was used to perform barcode processing (demultiplexing) and single-cell unique molecular identifier (UMI) counting on the raw reads from 47391 cells using the count script (default parameters), with the deduplicated NCLDV database as a reference. For downstream analysis, 972 cells that highly expressed multiple NCLDV genes and were considered "highly infected" were selected. These 'highly infected' cells were selected based on the following criteria: (a) cell expresses in total ≥10 viral UMIs22,24, (b) expression of more than one viral gene (>1), (c) expression of at least one gene with a UMI count greater than one (>1). Cell selection was wrapped using an in-house script (choose_cells.py). Identifying the taxonomy of individual cells by sequence homology to ribosomal RNA Raw reads from each cell were pulled by the cell's unique barcode identifier using seqtk v. 1.2. Reads were then trimmed (command: trim_galore --phred33 -j 8 --length 36 -q 5 --stringency 1 --fastqc -e 0.1), and poly-A was removed (command: trim_galore --polyA -j 1 --length 36), using TrimGalore (v. 0.6.5), a Cutadapt wrapper 44. Trimmed reads from each cell were assembled using rnaSPAdes 3.1545 with kmer 21,33. Raw reads pulling, trimming, and assembly was wrapped using an in-house script (assemble_cells.sh). To identify the taxonomy of the cells, assembled contigs from each cell were matched against 18S rRNA sequences from the Protist Ribosomal Reference (PR2)46 and metaPR247. To remove redundancy, the sequences in each database were clustered using CD-HIT v. 4.6.6 at 99% identity43. Contigs were filtered using SortMeRNA v. 4.3.648 with default parameters against the PR2 database and then aligned to the PR2 and metaPR2 databases using Blastn49, at 99% identity, E-value ≤ 10-10 and alignment length of at least 100 bp. Contigs were ranked by their bitscore, and only the best hit was kept for each contig. Each contig was assigned to one of the following taxonomic groups that were prevalent in the sample: the classes Bacillariophyta, Prymnesiophyceae, Chrysophyceae, MAST-3, and Katablepharidaceae, the divisions Pseudofungi, Lobosa (Amoebozoa), Ciliphora (Ciliates), Dinoflagellata and Cercozoa. Contigs that matched other groups were assigned as "other eukaryotes". Contigs that matched more than one of these taxonomic groups were considered non-specific or chimeric and were therefore ignored. This downstream analysis of Blast result was wrapped using an in-house script (Sankey_wrapper_extended.ipynb). To avoid detection of doublets and predators, Cells that transcribe 18S rRNA transcripts homologous to more than one taxonomic group were conservatively omitted. Of the 972 infected cells detected, 418 (43%) were omitted because we could not assemble specific 18s rRNA contigs from them or because their identity was ambiguous. None of the cells that were assigned "other eukaryotes" had contigs with conflicting annotations (contigs matching different classes). Identifying the infecting virus using a homology search against a custom protein database To identify transcripts derived from giant viruses, reads from the detected 972 infected cells were compared to a custom protein database using a translated alignment approach. To ensure that as many giant viruses as possible were represented, a database was constructed by combining RefSeq v. 20750 with all predicted proteins in the Giant Virus Database9. The proteins were then masked with tantan51 (using the -p option) and generated the database with the lastdb command (using parameters -c, -p). To identify the infecting virus, the raw sequencing reads in each of the 972 single-cell transcriptomes were compared to the constructed database using LASTAL v. 95952 (parameters -m 100, -F 15, -u 2) with best matches retained. The same procedure was done for the assembled transcripts from each cell to identify viral transcripts. The results were analyzed at different taxonomic levels, consistent with the Giant Virus Database (for giant viruses) or NCBI taxonomy33(everything else). 754 Cells whose best matching virus was coccolithovirus were omitted from the downstream analysis since EhV-infected cells were already reported to be abundant in the algal bloom25, and our analysis aims to explore other host-virus
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Six tomato accessions with contrasted responses to high temeprature (HT) were selected and the ovary transcriptomes of plants grown under HT and control conditions were analyzed and differentially expressed genes identified. Plant Materials : For RNAseq study, six accessions were used, three tolerant to heat stress : Cervil (one of the parents of the MAGIC population), the F1 hybrid Cervil x Levovil (CerLev) and Nagcarlan, which is known as tolerant to heat stress (Xu et al 2017a), and three susceptible to heat stress, chosen among the MAGIC lines (MT102, MT54) and their parental lines (Levovil). Plants were grown in greenhouse in Avignon in 2018, in the same conditions as the core collection (control and HT conditions) described in Bineau et al (submitted). RNA extraction Ovaries were collected on flowers from the six accessions just before petals fully open (1-2 days after pollination) and immediately frozen in liquid nitrogen. Sampling was performed over three weeks, each week sample corresponding to a biological replicate. The three biological replicates per accession (corresponding to a minimum of 5 ovaries) were separately ground to get biological replicates. RNA was extracted using the Spectrum Plant Total RNA kit (Sigma‐Aldrich, Saint‐Quentin Fallavier, France) following the manufacturer's protocol and treated with On‐Column DNase I Digestion Set (Sigma‐Aldrich) to remove any remaining genomic DNA. RNA purity and integrity were assessed on a NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific, Illkirch, France) and a Bioanalyser 2100 spectrophotometer (Agilent Technologies, Les Ulis, France), respectively. RNAseq Library construction and sequencing (100 bp pair end) were subcontracted to BGI. The minimal, maximal, and average amounts of raw sequencing data per sample were estimated to be 12.2 × 10⁹ bp, 13.6 × 10⁹ bp, and 12.56 × 10⁹ bp, respectively. Raw sequencing data quality was assessed using FASTQC v.0.11.8 software (Andrews, 2011) and aggregated with MULTIQC v.1.7 (Ewels et al., 2016). Sequences were trimmed using FASTP v.0.20.0 (Chen et al., 2018). On average, cleaning steps removed 4.06% of the data (min = 4.04%, max = 4.08%). Remaining data were aligned to the tomato reference genome (Heinz 1706, v.4.0) using STAR v.2.6.1b with two passes and providing the tomato gene model (annotation v4.1) to support the mapping process. Alignments were filtered to keep only concordantly mapped reads using Samtools v.1.9 (Li et al., 2009) and read counts per gene were generated for each library using HTSEQ v.0.11.2 (Anders et al., 2015). On average, 94.8% of read pairs were uniquely mapped per library (min = 91.5%, max = 96.2%) and 2.1% multi-mapped (min = 1.7%, max = 2.7%). The differential gene analysis was performed using the workspace DiCoExpres (Lambert et al, 2020). For further explanations, we refer to the manual of DiCoExpress. The Dataset contains the raw data and R script used and the main results provided by the analysis. For specific contrast for one accession, the scripts must be run again.