Facebook
TwitterDataset from NeurIPS2021 challenge similar to Kaggle 2022 competition: https://www.kaggle.com/competitions/open-problems-multimodal "Open Problems - Multimodal Single-Cell Integration Predict how DNA, RNA & protein measurements co-vary in single cells"
It is https://en.wikipedia.org/wiki/ATAC-seq#Single-cell_ATAC-seq single cell ATAC-seq data. And single cell RNA-seq data: https://en.wikipedia.org/wiki/Single-cell_transcriptomics#Single-cell_RNA-seq
Single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package
(For companion dataset on CITE-seq = scRNA-seq + Proteomics, see: https://www.kaggle.com/datasets/alexandervc/citeseqscrnaseqproteins-challenge-neurips2021)
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122
Expression profiling by high throughput sequencing Genome binding/occupancy profiling by high throughput sequencing Summary Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors. Half the samples were measured using the 10X Multiome Gene Expression and Chromatin Accessability kit and half were measured using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site. In the competition, participants were tasked with challenges including modality prediction, matching profiles from different modalities, and learning a joint embedding from multiple modalities.
Overall design Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors.
Contributor(s) Burkhardt DB, Lücken MD, Lance C, Cannoodt R, Pisco AO, Krishnaswamy S, Theis FJ, Bloom JM Citation https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/158f3069a435b314a80bdcb024f8e422-Abstract-round2.html
Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles
(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Seurat object generated, following the "Multimodal Vignette" workflow (https://satijalab.org/seurat/multimodal_vignette.html). In addition to the workflow, the UMAP embedding has been calculated. Data from this objects are used to produce Sleepwalk examples throughout the paper.(data are taken from Stoeckius et al., Simultaneous epitope and transcriptome measurement in single cells, Nature Methods, 14:865, 2017)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the accompanying data set for the paper entitled ‘T cell receptor-centric approach to streamline multimodal single-cell data analysis.’, which is currently available as a preprint (https://www.biorxiv.org/content/10.1101/2023.09.27.559702v2). Details on the origin of the datasets, and processing steps can be found there.
The purpose of this atlas both the full dataset and down sampling version is to aid in improving the interpretability of other T cell based datasets. This can be done by adding in the down sampled object that contains up to 500 cells per annotation model or all 12 dataset to your new sample. This dataset aims to improve the capacity to identify TCR-specific signature by ensuring a well covered background, which will improve the robustness of the FindMarker Function in Seurat package.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This page includes the data and code necessary to reproduce the results of the following paper: Yang Liao, Dinesh Raghu, Bhupinder Pal, Lisa Mielke and Wei Shi. cellCounts: fast and accurate quantification of 10x Chromium single-cell RNA sequencing data. Under review. A Linux computer running an operating system of CentOS 7 (or later) or Ubuntu 20.04 (or later) is recommended for running this analysis. The computer should have >2 TB of disk space and >64 GB of RAM. The following software packages need to be installed before running the analysis. Software executables generated after installation should be included in the $PATH environment variable.
R (v4.0.0 or newer) https://www.r-project.org/ Rsubread (v2.12.2 or newer) http://bioconductor.org/packages/3.16/bioc/html/Rsubread.html CellRanger (v6.0.1) https://support.10xgenomics.com/single-cell-gene-expression/software/overview/welcome STARsolo (v2.7.10a) https://github.com/alexdobin/STAR sra-tools (v2.10.0 or newer) https://github.com/ncbi/sra-tools Seurat (v3.0.0 or newer) https://satijalab.org/seurat/ edgeR (v3.30.0 or newer) https://bioconductor.org/packages/edgeR/ limma (v3.44.0 or newer) https://bioconductor.org/packages/limma/ mltools (v0.3.5 or newer) https://cran.r-project.org/web/packages/mltools/index.html
Reference packages generated by 10x Genomics are also required for this analysis and they can be downloaded from the following link (2020-A version for individual human and mouse reference packages should be selected): https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest After all these are done, you can simply run the shell script ‘test-all-new.bash’ to perform all the analyses carried out in the paper. This script will automatically download the mixture scRNA-seq data from the SRA database, and it will output a text file called ‘test-all.log’ that contains all the screen outputs and speed/accuracy results of CellRanger, STARsolo and cellCounts.
Facebook
TwitterDataset from NeurIPS2021 challenge similar to Kaggle 2022 competition: https://www.kaggle.com/competitions/open-problems-multimodal "Open Problems - Multimodal Single-Cell Integration Predict how DNA, RNA & protein measurements co-vary in single cells"
CITE-seq - joint single cell RNA sequencing + single cell measurements of CD** proteins. (https://en.wikipedia.org/wiki/CITE-Seq) (For companion dataset on scRNA-seq + scATAC-seq, see: https://www.kaggle.com/datasets/alexandervc/scrnaseq-scatacseq-challenge-at-neurips-2021 )
Single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122
Expression profiling by high throughput sequencing Genome binding/occupancy profiling by high throughput sequencing Summary Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors. Half the samples were measured using the 10X Multiome Gene Expression and Chromatin Accessability kit and half were measured using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site. In the competition, participants were tasked with challenges including modality prediction, matching profiles from different modalities, and learning a joint embedding from multiple modalities.
Overall design Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors.
Contributor(s) Burkhardt DB, Lücken MD, Lance C, Cannoodt R, Pisco AO, Krishnaswamy S, Theis FJ, Bloom JM Citation https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/158f3069a435b314a80bdcb024f8e422-Abstract-round2.html
Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles
(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This project is a collection of files to allow users to reproduce the model development and benchmarking in "Dawnn: single-cell differential abundance with neural networks" (Hall and Castellano, under review). Dawnn is a tool for detecting differential abundance in single-cell RNAseq datasets. It is available as an R package here. Please contact us if you are unable to reproduce any of the analysis in our paper. The files in this collection correspond to the benchmarking dataset based on simulated linear trajectories.
FILES: Data processing code
adapted_traj_sim_milo_paper.R Lightly adapted code from Dann et al. to simulate single-cell RNAseq datasets that form linear trajectories . generate_test_data_linear_traj_sim_milo_paper.R R code to assign simulated labels to datatsets generated from adapted_traj_sim_milo_paper.R. Seurat objects saved as cells_sim_linear_traj_gex_seed_*.rds. Simulated labels saved as benchmark_dataset_sim_linear_traj.csv.
Resulting datasets
cells_sim_linear_traj_gex_seed_*.rds Seurat objects generated by generate_test_data_linear_traj_sim_milo_paper.R. benchmark_dataset_sim_linear_traj.csv Cell labels generated by generate_test_data_linear_traj_sim_milo_paper.R.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the data necessary to reproduce the results from the SpatialMuxSeq vignette (https://rpubs.com/LiranM/SpatialMuxSeq), featured in our paper "Multiplexed Spatial Mapping of Chromatin Features, Transcriptome, and Proteins in Tissues." To ensure full reproducibility of the results, we have provided a Seurat object that includes all omics layers. For further details and access to all relevant code, please visit our GitHub repository: https://github.com/liranmao/Spatial_multi_omics.
Facebook
TwitterRemark 0: See https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev For the cell cycle analysis
Full dataset in https://www.kaggle.com/datasets/alexandervc/scrnaseq-crisprperturbseq-normanweissman But it is huge, and loading crashes memory, so here are cropped pieces to start with.
dataset used in: "GEARS: Predicting transcriptional outcomes of novel multi-gene perturbations" Yusuf Roohani, Kexin Huang, Jure Leskovec https://www.biorxiv.org/content/10.1101/2022.07.12.499735v1 https://twitter.com/yusufroohani/status/1547965695744360448 Accepted in ICLR
Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package
Particular data - Perturb-seq: Single-cell, pooled CRISPR screening experiment comparing the transcriptional effects of overexpressing genes alone or in combination
Paper: Norman TM, Horlbeck MA, Replogle JM, Ge AY et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 2019 Aug 23;365(6455):786-793. PMID: 31395745 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6746554/
Data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE133344
Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles
(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)
Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)
Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TLDRSeurat object of the 16 NPM1-mutated AML samples (n = 83,162 cells).AML samplesAll sixteen peripheral blood and bone marrow samples were obtained from patients with AML at diagnosis (n=15) or relapse after chemotherapy (n=1) with written informed consent according to the Declaration of Helsinki. Mononuclear cells were isolated by Ficoll-Isopaque density gradient centrifugation and cryopreserved in the Leiden University Medical Center (LUMC) Biobank for Hematological Diseases after approval by the LUMC Institutional Review Board (protocol no. B18.047).Upstream processing pipelineCellRanger v7.0.0 was run on all samples with the human reference genome hg38. For all QC Seurat v4 was used15. Our QC pipeline had three steps per sample: 1) soft filtering, 2) low quality cluster removal, and 3) doublet detection. In soft filtering, Seurat objects were created with cells expressing at least 200 genes and with the genes expressed at least in 3 cells. Then, standard Seurat command list with default parameters was run to detect low quality clusters. Clusters with >15% mitochondrial and 15% mitochondrial mRNA. We used standard Seurat commands to scale and normalize the data on integrated features. First 30 principal components were used to create UMAP plots. We used clustree to determine optimal cluster number, based on FindClusters with resolutions sweeping from 0 to 1.2. We chose res=0.5, as clusters became stable. Next, we merged two clusters (CC5 and CC12) into one GMP-like cluster as one of these clusters (CC12) had high expression of HSP-genes yet still retained its cell-type specific properties.Note: The file was processed with Seurat v4 but the object is updated for v5. Uploaded as .qs file format for faster reading. To read the file: qs:qread("path/to/data.qs")This data is available for research use only; and cannot be used for commercial purposes.For further queries please refer to our paper:
Facebook
TwitterTable of Contents
Main Description File Descriptions Linked Files Installation and Instructions
This is the Zenodo repository for the manuscript titled "A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity.". The code included in the file titled marengo_code_for_paper_jan_2023.R was used to generate the figures from the single-cell RNA sequencing data.
The following libraries are required for script execution:
Seurat scReportoire ggplot2 stringr dplyr ggridges ggrepel ComplexHeatmap
The code can be downloaded and opened in RStudios. The "marengo_code_for_paper_jan_2023.R" contains all the code needed to reproduce the figues in the paper The "Marengo_newID_March242023.rds" file is available at the following address: https://zenodo.org/badge/DOI/10.5281/zenodo.7566113.svg (Zenodo DOI: 10.5281/zenodo.7566113). The "all_res_deg_for_heat_updated_march2023.txt" file contains the unfiltered results from DGE anlaysis, also used to create the heatmap with DGE and volcano plots. The "genes_for_heatmap_fig5F.xlsx" contains the genes included in the heatmap in figure 5F.
This repository contains code for the analysis of single cell RNA-seq dataset. The dataset contains raw FASTQ files, as well as, the aligned files that were deposited in GEO. The "Rdata" or "Rds" file was deposited in Zenodo. Provided below are descriptions of the linked datasets:
Gene Expression Omnibus (GEO) ID: GSE223311(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE223311)
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment. Description: This submission contains the "matrix.mtx", "barcodes.tsv", and "genes.tsv" files for each replicate and condition, corresponding to the aligned files for single cell sequencing data. Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Sequence read archive (SRA) repository ID: SRX19088718 and SRX19088719
Title: Gene expression profile at single cell level of CD4+ and CD8+ tumor infiltrating lymphocytes (TIL) originating from the EMT6 tumor model from mSTAR1302 treatment.
Description: This submission contains the raw sequencing or .fastq.gz files, which are tab delimited text files.
Submission type: Private. In order to gain access to the repository, you must use a reviewer token (https://www.ncbi.nlm.nih.gov/geo/info/reviewer.html).
Zenodo DOI: 10.5281/zenodo.7566113(https://zenodo.org/record/7566113#.ZCcmvC2cbrJ)
Title: A TCR β chain-directed antibody-fusion molecule that activates and expands subsets of T cells and promotes antitumor activity. Description: This submission contains the "Rdata" or ".Rds" file, which is an R object file. This is a necessary file to use the code. Submission type: Restricted Acess. In order to gain access to the repository, you must contact the author.
The code included in this submission requires several essential packages, as listed above. Please follow these instructions for installation:
Ensure you have R version 4.1.2 or higher for compatibility.
Although it is not essential, you can use R-Studios (Version 2022.12.0+353 (2022.12.0+353)) for accessing and executing the code.
marengo_code_for_paper_jan_2023.R Install_Packages.R Marengo_newID_March242023.rds genes_for_heatmap_fig5F.xlsx all_res_deg_for_heat_updated_march2023.txt
You can use the following code to set the working directory in R:
setwd(directory)
Facebook
Twitterdataset used in: "GEARS: Predicting transcriptional outcomes of novel multi-gene perturbations" Yusuf Roohani, Kexin Huang, Jure Leskovec https://www.biorxiv.org/content/10.1101/2022.07.12.499735v1 https://twitter.com/yusufroohani/status/1547965695744360448 Accepted in ICLR
Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package
Particular data - Perturb-seq: Single-cell, pooled CRISPR screening experiment comparing the transcriptional effects of overexpressing genes alone or in combination
Paper: Norman TM, Horlbeck MA, Replogle JM, Ge AY et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 2019 Aug 23;365(6455):786-793. PMID: 31395745 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6746554/
Data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE133344
Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles
(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)
Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)
Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The attached R Scripts supplement our protocol paper currently under editorial review at the Journal of Visualized Experiments.Scope of the article:This protocol describes the general processes and quality control checks necessary for preparing healthy adult single cells in preparation for droplet-based, high-throughput single cell RNA-Seq analysis using the 10X Genomics' Chromium System. We also describe sequencing parameters, alignment and downstream single-cell bioinformatic analysis.
Facebook
TwitterRemark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev
Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (csv file is vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package
Particular data:
Paper: Cell cycle gene regulation dynamics revealed by RNA velocity and deep-learning https://www.nature.com/articles/s41467-022-30545-8 https://www.biorxiv.org/content/10.1101/2021.03.17.435887v1.full Andrea Riba, Attila Oravecz, Matej Durik, Sara Jiménez, Violaine Alunni, Marie Cerciat, Matthieu Jung, Céline Keime, William M. Keyes, View ORCID ProfileNacho Molina doi: https://doi.org/10.1101/2021.03.17.435887
datasets of processed single-cell RNA-seq for mESC and human fibroblasts related to GEO dataset GSE167609 and the preprint https://doi.org/10.1101/2021.03.17.435887
Data: ZENODO: https://zenodo.org/record/4719436#.YlartShBy38
Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles
(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)
Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)
Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the Seurat object in .rds format with the raw matrix information (after filtering) , cell type annotation information and the UMAP coordinates. Users can use R readRDS function to load this .rds file. If you are using this dataset, please cite our paper: Qian, Peipei, Jiahui Kang, Dong Liu, and Gangcai Xie. "Single cell transcriptome sequencing of Zebrafish testis revealed novel spermatogenesis marker genes and stronger Leydig-germ cell paracrine interactions." Frontiers in genetics 13 (2022): 851719.
Facebook
TwitterSee paper: https://arxiv.org/abs/2208.05229 on cell cycle analysis discussed here. "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev Dataset is very high quality with FUCCI cell cycle labels, it is one of the rare examples where TWO subpopulations with "normal" and "fast" proliferation can be clearly seen within seemingly homegeneous cells - one cell line U2OS. For scRNA-seq that phenomena was first discovered in the paper above, with scripts given here.
Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package
Particular data: Paper: https://www.nature.com/articles/s41586-021-03232-9 Spatiotemporal dissection of the cell cycle with single-cell proteogenomics Emma Lundberg et.al.
Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles
(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)
Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)
Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
Facebook
TwitterSee the same in ADATA format ( more easy to use) https://www.kaggle.com/datasets/alexandervc/scrnaseq-breast-cancer-421-761-cells-adata-fmt
Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package
Particular data: Paper: Pal B, Chen Y, Vaillant F, Capaldo BD et al. A single-cell RNA expression atlas of normal, preneoplastic and tumorigenic states in the human breast. EMBO J 2021 Jun 1;40(11):e107333. PMID: 33950524 https://www.ncbi.nlm.nih.gov/pubmed/33950524
Data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE161529 The data provides 69 scRNA-seq profiles using the 10X Genomics Chromium platform, comprising a total of 421,761 cells from 52 patients. This includes profiles of 4 TNBCs, 4 BRCA1 TNBCs, 6 HER+ tumors, 19 ER+ tumors and 6 lymph-node metastases of ER+ tumors. It also includes profiles of total mammary gland cells from 13 normal patients without breast cancer, of epithelial mammary gland cells from 11 normal patients, and of total mammary gland cells from 4 preneoplastic patients with BRCA1 mutations.
Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles
(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)
Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)
Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The human adult intestinal system is a complex organ that is approximately 9 meters long and performs a variety of complex functions including digestion, nutrient absorption, and immune surveillance. We performed snRNA-seq on 8 regions of of the human intestine (duodenum, proximal-jejunum, mid-jejunum, ileum, ascending colon, transverse colon, descending colon, and sigmoid colon) from 9 donors (B001, B004, B005, B006, B008, B009, B010, B011, and B012). In the corresponding paper, we find cell compositions differ dramatically across regions of the intestine and demonstrate the complexity of epithelial subtypes. We map gene regulatory differences in these cells suggestive of a regulatory differentiation cascade, and associate intestinal disease heritability with specific cell types. These results describe the complexity of the cell composition, regulation, and organization in the human intestine, and serve as an important reference map for understanding human biology and disease. Methods For a detailed description of each of the steps to obtain this data see the detailed materials and methods in the associated manuscript. Briefly, intestine pieces from 8 different sites across the small intestine and colon were flash frozen. Nuclei were isolated from each sample and the resulting nuclei were processed with either 10x scRNA-seq using Chromium Next GEM Single Cell 3’ Reagent Kits v3.1 (10x Genomics, 1000121) or Chromium Next GEM Chip G Single Cell Kits (10x Genomics, 1000120) or 10x multiome sequencing using Chromium Next GEM Single Cell Multiome ATAC + Gene Expression Kits (10x Genomics, 1000283). Initial processing of snRNA-seq data was done with the Cell Ranger Pipeline (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger) by first running cellranger mkfastq to demultiplex the bcl files and then running cellranger count. Since nuclear RNA was sequenced, data were aligned to a pre-mRNA reference. Initial processing of the mutiome data, including alignment and generation of fragments files and expression matrices, was performed with the Cell Ranger ARC Pipeline. The raw expression matrices from these pipelines are included here. Downstream processing was performed in R, using the Seurat package.
Facebook
TwitterSee the same in https://www.kaggle.com/datasets/alexandervc/scrnaseq-breast-cancer-brca1-mutant-etc (huge files in MTX format - not convenient to work)
Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package
Particular data: Paper: Pal B, Chen Y, Vaillant F, Capaldo BD et al. A single-cell RNA expression atlas of normal, preneoplastic and tumorigenic states in the human breast. EMBO J 2021 Jun 1;40(11):e107333. PMID: 33950524 https://www.ncbi.nlm.nih.gov/pubmed/33950524
Data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE161529 The data provides 69 scRNA-seq profiles using the 10X Genomics Chromium platform, comprising a total of 421,761 cells from 52 patients. This includes profiles of 4 TNBCs, 4 BRCA1 TNBCs, 6 HER+ tumors, 19 ER+ tumors and 6 lymph-node metastases of ER+ tumors. It also includes profiles of total mammary gland cells from 13 normal patients without breast cancer, of epithelial mammary gland cells from 11 normal patients, and of total mammary gland cells from 4 preneoplastic patients with BRCA1 mutations.
Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles
(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)
Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)
Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The zip files are the gene expression matrix files for the single cell RNA sequencing study of mouse testes aging conducted at Gangcai Xie's Lab from Nantong University, which were formatted to be easily loaded by common single cell analyzing tools, such as Seurat and Scanpy. And the .rds format file is the Seurat object file generated in our study, which contains the cell annotation information presented in our paper. The young group (3 months old): TY1, TY2, TY3, and the old group (23 months old): TO1, TO2, TO3. The paper related to this dataset will be accessible at https://doi.org/10.1038/s41598-024-65710-0.
Facebook
TwitterRemark: for cell cycle analysis - see paper https://arxiv.org/abs/2208.05229 "Computational challenges of cell cycle analysis using single cell transcriptomics" Alexander Chervov, Andrei Zinovyev
Data - results of single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package
Source: Nikolaus Rajewsky Lab 2020: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE142277
The present data for cell "HeLa" - the first immortal human cell line ever established: https://en.wikipedia.org/wiki/HeLa
Paper: Schwabe D, Formichetti S, Junker JP, Falcke M et al. The transcriptome dynamics of single cells during the cell cycle. Mol Syst Biol 2020 Nov;16(11):e9946. PMID: 33205894 https://pubmed.ncbi.nlm.nih.gov/33205894/
Two variants of data (see last letters in filenames) WT - wild type https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4224315 KO - knockout https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4224316 (Monoclonal cell line with both AGO2 alleles knock-out (AGO2dKO))
Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles
(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833
Challenges in unsupervised clustering of single-cell RNA-seq data https://www.nature.com/articles/s41576-018-0088-9 Review Article 07 January 2019 Vladimir Yu Kiselev, Tallulah S. Andrews & Martin Hemberg Nature Reviews Genetics volume 20, pages273–282 (2019)
Challenges and emerging directions in single-cell analysis https://link.springer.com/article/10.1186/s13059-017-1218-y Published: 08 May 2017 Guo-Cheng Yuan, Long Cai, Michael Elowitz, Tariq Enver, Guoping Fan, Guoji Guo, Rafael Irizarry, Peter Kharchenko, Junhyong Kim, Stuart Orkin, John Quackenbush, Assieh Saadatpour, Timm Schroeder, Ramesh Shivdasani & Itay Tirosh Genome Biology volume 18, Article number: 84 (2017)
Single-Cell RNA Sequencing in Cancer: Lessons Learned and Emerging Challenges https://www.sciencedirect.com/science/article/pii/S1097276519303569 Molecular Cell Volume 75, Issue 1, 11 July 2019, Pages 7-12 Journal home page for Molecular Cell
Facebook
TwitterDataset from NeurIPS2021 challenge similar to Kaggle 2022 competition: https://www.kaggle.com/competitions/open-problems-multimodal "Open Problems - Multimodal Single-Cell Integration Predict how DNA, RNA & protein measurements co-vary in single cells"
It is https://en.wikipedia.org/wiki/ATAC-seq#Single-cell_ATAC-seq single cell ATAC-seq data. And single cell RNA-seq data: https://en.wikipedia.org/wiki/Single-cell_transcriptomics#Single-cell_RNA-seq
Single cell RNA sequencing, i.e. rows - correspond to cells, columns to genes (or vice versa). value of the matrix shows how strong is "expression" of the corresponding gene in the corresponding cell. https://en.wikipedia.org/wiki/Single-cell_transcriptomics
See tutorials: https://scanpy.readthedocs.io/en/stable/tutorials.html ("Scanpy" - main Python package to work with scRNA-seq data). Or https://satijalab.org/seurat/ "Seurat" - "R" package
(For companion dataset on CITE-seq = scRNA-seq + Proteomics, see: https://www.kaggle.com/datasets/alexandervc/citeseqscrnaseqproteins-challenge-neurips2021)
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194122
Expression profiling by high throughput sequencing Genome binding/occupancy profiling by high throughput sequencing Summary Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors. Half the samples were measured using the 10X Multiome Gene Expression and Chromatin Accessability kit and half were measured using the 10X 3' Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site. In the competition, participants were tasked with challenges including modality prediction, matching profiles from different modalities, and learning a joint embedding from multiple modalities.
Overall design Single-cell multiomics data collected from bone marrow mononuclear cells of 12 healthy human donors.
Contributor(s) Burkhardt DB, Lücken MD, Lance C, Cannoodt R, Pisco AO, Krishnaswamy S, Theis FJ, Bloom JM Citation https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/158f3069a435b314a80bdcb024f8e422-Abstract-round2.html
Other single cell RNA seq datasets can be found on kaggle: Look here: https://www.kaggle.com/alexandervc/datasets Or search kaggle for "scRNA-seq"
Single cell RNA sequencing is important technology in modern biology, see e.g. "Eleven grand challenges in single-cell data science" https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1926-6
Also see review : Nature. P. Kharchenko: "The triumphs and limitations of computational methods for scRNA-seq" https://www.nature.com/articles/s41592-021-01171-x
Search scholar.google "challenges in single cell rna sequencing" https://scholar.google.fr/scholar?q=challenges+in+single+cell+rna+sequencing&hl=en&as_sdt=0&as_vis=1&oi=scholart gives many interesting and highly cited articles
(Cited 968) Computational and analytical challenges in single-cell transcriptomics Oliver Stegle, Sarah A. Teichmann, John C. Marioni Nat. Rev. Genet., 16 (3) (2015), pp. 133-145 https://www.nature.com/articles/nrg3833