Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The . gov means it’s official VSports app下载. Federal government websites often end in . gov or . mil. Before sharing sensitive information, make sure you’re on a federal government site. .

Https

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely. V体育官网.

. 2018 Oct 22;16(10):e2006687.
doi: 10.1371/journal.pbio.2006687. eCollection 2018 Oct.

"V体育安卓版" Quantitative assessment of cell population diversity in single-cell landscapes

Affiliations

Quantitative assessment of cell population diversity in single-cell landscapes

"VSports app下载" Qi Liu et al. PLoS Biol. .

Abstract

Single-cell RNA sequencing (scRNA-seq) has become a powerful tool for the systematic investigation of cellular diversity. As a number of computational tools have been developed to identify and visualize cell populations within a single scRNA-seq dataset, there is a need for methods to quantitatively and statistically define proportional shifts in cell population structures across datasets, such as expansion or shrinkage or emergence or disappearance of cell populations. Here we present sc-UniFrac, a framework to statistically quantify compositional diversity in cell populations between single-cell transcriptome landscapes. sc-UniFrac enables sensitive and robust quantification in simulated and experimental datasets in terms of both population identity and quantity VSports手机版. We have demonstrated the utility of sc-UniFrac in multiple applications, including assessment of biological and technical replicates, classification of tissue phenotypes and regional specification, identification and definition of altered cell infiltrates in tumorigenesis, and benchmarking batch-correction tools. sc-UniFrac provides a framework for quantifying diversity or alterations in cell populations across conditions and has broad utility for gaining insight into tissue-level perturbations at the single-cell resolution. .

PubMed Disclaimer

"VSports" Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of the sc-UniFrac method.
(A) A hierarchical tree is built by clustering the combined single-cell transcriptome profiles from two samples and by calculating distances between cluster centroids. Each cell, as a function of their cluster membership, is then assigned to branches. Branch lengths weighted by the relative abundance of each sample are used to calculate the sc-UniFrac distance. In the second step, the sample labels of all cells are swapped without altering the tree topology to generate a null distribution of sc-UniFrac distances, where a p-value for the sc-UniFrac distance can be calculated. (B) Workflow overview of the sc-UniFrac package for characterizing dissimilarities between two samples.
Fig 2
Fig 2. Simulation data reveal sc-UniFrac to be sensitive and robust.
(A) Two groups (N1 and N2) of 1,000 cells were selected from CD8 and CD4 cells identified in the Wishbone dataset (S1 Data) [25]. N1 is always composed of 100% CD8 cells, while N2 is composed of CD8 cells and different proportions of CD4 cells (indicated on x-axis). Green and red arrows represent CD8/CD8 (completely similar) and CD8/CD4 (completely dissimilar) comparisons, respectively; y-axis is the sc-UniFrac distance calculated over n = 50 runs with k = 10. Boxes represent the first and third quartiles, and bars represent maximum and minimum values. (B) Sensitivity of sc-UniFrac evaluated by the fraction of incidences that a statistically significant sc-UniFrac distance was returned over n = 50 runs, as a function of increasing dissimilarity between N1 and N2 using the same simulation scheme as panel A. (C) Mean sc-UniFrac plotted as in panel A with varying k parameter. (D) Fraction significant sc-UniFrac detected plotted as in panel B with varying k parameter. (E) Mean sc-UniFrac plotted as in panel A with N1 = 1,000 but a varying N2 size to determine the effect of dataset size imbalance on sc-UniFrac. (F) Fraction significant sc-UniFrac detected plotted as in panel B with N1 = 1,000 and varying N2 size.
Fig 3
Fig 3. sc-UniFrac statistically determines dissimilarities between single-cell data landscapes.
t-SNE plots of (A) technical and (B) biological replicates of scRNA-seq data generated from the adult murine colonic mucosa. Replicates were combined for t-SNE analyses and labeled with different colors. Outlined populations were identified with canonical markers. (C) t-SNE plot depicting E14.5 pancreatic islet and adult colonic mucosa scRNA-seq data in different mice, showing segregation by organ type. (D) Hierarchical clustering by sc-UniFrac of scRNA-seq landscapes generated from E14.5 pancreatic islet and adult colonic mucosa (indicated by tissue label), with technical and biological replicates (indicated by mouse label), as well as colonic tumor and adjacent normal isolated from an induced Lrig1CreERT2/+;Apcfl/+ mouse. Heat represents sc-UniFrac distance between two samples. (E) Hierarchical clustering by sc-UniFrac of single-cell landscapes of technical and biological replicates of the colonic mucosa while varying parameter k. (F) Discriminate analysis of sc-UniFrac on biological and technical replicates. Discriminative ability, as defined by the smallest distance between biological replicates minus the largest distance between technical replicates, plotted against k. Data from GSE102698, GSE114044, GSE117615, GSE117616. scRNA-seq, single-cell RNA-sequencing; t-SNE, t-distributed stochastic neighbor embedding.
Fig 4
Fig 4. Alternative methods of landscape comparisons arrive at similar results compared with sc-UniFrac.
(A) Hierarchical clustering by cellAlign distance calculated using unbranched trajectories created from scRNA-seq data generated from E14.5 pancreatic islet and adult colonic mucosa (indicated by tissue label-greyscale bar), with technical and biological replicates (indicated by mouse label-red bar) (S4 Data). Heat represents cellAlign distance between two samples. Example dissimilarity matrices resulting from alignments of unbranched stem cell to colonocyte trajectories using the cellAlign algorithm according to [20] for (B) technical replicates and (C) biological replicates. Normalized alignment-based distances appear below each matrix. (D) Representative p-Creode trajectories depicting the colonic epithelial differentiation continuum of 2 technical and 2 biological replicates. Outlined lineages were identified with canonical markers. Muc2 expression overlay. (E) Hierarchical clustering by p-Creode scoring of trajectories generated from scRNA-seq data of technical (green) and biological (red, cyan) replicates. N = 100 resampled p-Creode runs for each dataset were performed and then analyzed together in a single clustering analysis. Heat represents the p-Creode score between two trajectories. Data from GSE102698, GSE114044, GSE117616; https://github.com/KenLauLab/pCreode_Comparison_Across_Datasets. scRNA-seq, single-cell RNA sequencing.
Fig 5
Fig 5. Cells that drive sc-UniFrac can be intuitively identified.
(A, B) Branching structure of two single-cell landscapes being scored by sc-UniFrac (k = 10), with black representing statistically shared branches and blue and red representing statistically unshared branches from each of the colored samples. Thickness of branch is proportional to effect size. Comparing between (A) technical replicates and (B) different tissues. (C) Individual cells (columns) from group 10 of panel B being matched to cell types (rows) referenced from the Mouse Cell Atlas. Heat represents the correlation of gene expression between the cell and the reference using all genes. Data from GSE102698, GSE114044, GSE117616.
Fig 6
Fig 6. sc-UniFrac identifies unique cellular infiltrates within colonic tumor compared with normal colon.
(A) t-SNE plot of multiple replicates of single-cell data from the pancreas, colonic tumor, adjacent normal colon, and normal colon analyzed together. Random sampling of 400 cells from each group. Populations delineated by marker genes. (B) Branching structure of tumor and adjacent normal landscapes scored by sc-UniFrac (k = 10). (C, D) Individual cells (columns) from subpopulations 1 (C) and 10 (D) of panel B being matched to cell types (rows) referenced from the Mouse Cell Atlas. Analysis similar to Fig 5. Data from GSE117615. t-SNE, t-distributed stochastic neighbor embedding.
Fig 7
Fig 7. sc-UniFrac groups oligodendrocytes by brain regions.
(A) Hierarchical clustering by sc-UniFrac of scRNA-seq data generated from different regions of the brain according to [29]. Heat represents sc-UniFrac distance between two regions. (B) Schematic of brain regions for generating scRNA-seq data. (C) t-SNE plot of data combined from all brain regions, with oligodendrocytes from each region highlighted. Data from GSE75330. scRNA-seq, single-cell RNA sequencing; SN-VTA, substantia nigra and ventral tegmental area; t-SNE, t-distributed stochastic neighbor embedding.
Fig 8
Fig 8. sc-UniFrac can benchmark batch effect removal approaches.
(A) sc-UniFrac distance calculated comparing uncorrected and batch-corrected scRNA-seq datasets of HEK293 cells fresh, frozen at −80 °C, or liquid nitrogen flash frozen performed in two different batches (GSE85534) [35]. ComBat, limma, and MNN were used for batch correction. (B) sc-UniFrac distance calculated similar to panel A for technical replicates of the mouse colonic epithelium scRNA-seq data (GSE102698). (C) Hierarchical clustering by sc-UniFrac of uncorrected or batch-corrected scRNA-seq data depicting murine gastrulation from two different studies [36,37]. A gradation of similarity, and hence clustering, was expected over developmental times from the earliest development stage (E5.5) to the latest stage (E7.5). Data from GSE100597; http://gastrulation.stemcells.cam.ac.uk/scialdone2016. E, embryonic day; Fre, fresh; Fro, frozen at −80 °C; HEK293, human embryonic kidney 293; Nitr, liquid nitrogen flash frozen; scRNA-seq, single-cell RNA sequencing.

References

    1. Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, et al. Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells. Cell. 2015;161(5):1187–1201. 10.1016/j.cell.2015.04.044 - DOI - PMC - PubMed
    1. Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015;161(5):1202–1214. 10.1016/j.cell.2015.05.002 - DOI - PMC - PubMed
    1. Han X, Wang R, Zhou Y, Fei L, Sun H, Lai S, et al. Mapping the Mouse Cell Atlas by Microwell-Seq. Cell. 2018;172(5):1091–1107. 10.1016/j.cell.2018.02.001 - DOI - PubMed
    1. Gierahn TM, Wadsworth MH, Hughes TK, Bryson BD, Butler A, Satija R, et al. Seq-Well: Portable, low-cost rna sequencing of single cells at high throughput. Nat Methods. 2017;14(4):395–398. 10.1038/nmeth.4179 - "V体育平台登录" DOI - PMC - PubMed
    1. Zheng GXY, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8:14049 10.1038/ncomms14049 - DOI - PMC - PubMed

Publication types

MeSH terms