VSports最新版本 - Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The . gov means it’s official. Federal government websites often end in . gov or . mil. Before sharing sensitive information, make sure you’re on a federal government site. VSports app下载.

Https

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely V体育官网. .

. 2021 Jul 20;22(4):bbaa287.
doi: 10.1093/bib/bbaa287.

Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data

Affiliations

Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data

Chunman Zuo et al. Brief Bioinform. .

Abstract

Simultaneous profiling transcriptomic and chromatin accessibility information in the same individual cells offers an unprecedented resolution to understand cell states. However, computationally effective methods for the integration of these inherent sparse and heterogeneous data are lacking. Here, we present a single-cell multimodal variational autoencoder model, which combines three types of joint-learning strategies with a probabilistic Gaussian Mixture Model to learn the joint latent features that accurately represent these multilayer profiles. Studies on both simulated datasets and real datasets demonstrate that it has more preferable capability (i) dissecting cellular heterogeneity in the joint-learning space, (ii) denoising and imputing data and (iii) constructing the association between multilayer omics data, which can be used for understanding transcriptional regulatory mechanisms. VSports手机版.

Keywords: data integration; deep joint-learning model; multimodal variational autoencoder; single-cell multiple omics data. V体育安卓版.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of scMVAE model with three joint-learning strategies. (A) Overall framework of the scMVAE model. Given the scRNA-seq data (formula image with formula image variables) and scATAC-seq data (formula image with formula image variables) of the same cell formula image as input, the scMVAE model learned a nonlinear joint embedding (formula image) of the cells that can be used for multiple analysis tasks (i.e. cell clustering and visualization) through a multimodal encoder with three learning strategies described as (B), and then reconstructed back to the original dimension as output through a decoder for each omics data. Note: the same cell orders for both omics data ensure that one cell corresponds to a point in the low-dimensional space. (B) Illustration model of three learning strategies: (i) ‘PoE’ framework was used to estimate the joint posterior by a product of posterior of each omics data (detailed in Material S1), (ii) ‘NN’ was used to learn the joint-learning space by using a neural network to combine the features extracted by a sub encoder network for each layer data and (iii) ‘Direct’ strategy was used to learn together by directly using the concatenation of the original features of two-layer data as input. Here, the neural networks: formula image, formula image, formula image, formula image, formula image, were removed from the total network under this learning condition. (C) The distribution to where each variable of scMVAE model belongs. Each omics data were modeled as one ZINB distribution. The detailed description for each variable is given in datasets and preprocessing.
Figure 2
Figure 2
Visualization, clustering and run-time comparison on the simulated datasets. (A) Dot plot of the top two factors (PCs for Dataset1 and 2; UMAPs for Dataset 3) extracted from each of corrupted omics data of three simulated datasets, and latent features extracted by single-omics methods: scVI and Seurat for each omics data (upper layer for each dataset), and joint-learning latent features extracted by IntNMF, MOFA and scMVAE model, respectively (lower layer for each dataset). Cells are colored by their true cell types. For each dataset, the final subplot indicates its corruption rate of each omics data. (B) Clustering accuracy was evaluated by ARI and NMI between true cell label and predicted cell cluster by single-omics methods: scVI and Seurat; and multiomics methods: IntNMF, MOFA and scMVAE model, respectively, for each of three simulated datasets. (C) Run-time comparison for fitting four models on the 18 simulated datasets which were generated by randomly selected different sizes of cells and features from AdBrainCortex datasets with 3000 features per omics data. Algorithms were tested on a machine with one 40-core Intel(R) Xeon(R) Gold 5115 CPU addressing with 132GB RAM, and two NVIDIA TITAN V GPU addressing 24GB.
Figure 3
Figure 3
Feature embedding and clustering comparison on the original cell line mixture datasets. (A) UMAP visualization of the raw data and features separately extracted from scRNA-seq (upper layer) and scATAC-seq (lower layer), by Seurat and scVI, respectively. (B) UMAP visualization of the extracted features from the multiomics method: CCA, IntNMF, MOFA and scMVAE model. (C) Clustering accuracy was evaluated by clustering score between cell cluster predicted by nine computational methods (i.e. Seurat, scVI (scRNA-seq), IntNMF, MOFA, CCA, scVI (scATAC-seq) and scMVAE model) and cell assignments based on whether each cell expresses one marker gene. Each subpie plot shows the clustering score of nine methods for each cluster, and ideally, it is distributed on the diagonal. X and Y axis indicate marker genes and cell clusters, respectively. (D) Clustering accuracy was evaluated by AGI score based on the clustering assignment predicted by computational methods (i.e. Seurat, scVI, MOFA and scMVAE model) and the expression level of marker gene and housekeeping genes. Note: the higher the score, the better the clustering performance. (E) Clustering accuracy was assessed by ARI to compare different methods under the nine datasets with different sparsity levels of scRNA-seq and scATAC-seq data. (F) Clustering accuracy was assessed by NMI to compare different methods under the nine datasets with different sparsity levels of scRNA-seq and scATAC-seq data.
Figure 4
Figure 4
Consistency of clustering and features between two-omics data on the denoised cell line mixture datasets by scMVAE. (A) The consistency was evaluated by the Kappa coefficient between the clustering assignment of two-omics data denoised by MOFA, CCA, scVI and scMVAE model, as well as raw data. (B) Features similarity was assessed by Pearson and Spearman correlation between two-omics data denoised by MOFA, CCA, scVI and scMVAE, as well as raw data. (C) Pearson correlation between known TF–TG pairs of two-omics data denoised by MOFA, CCA, scVI and scMVAE model, as well as raw data.
Figure 5
Figure 5
scMVAE model works well on AdBrainCortex (a large dataset). (A) UMAP visualization of the latent features extracted by one-omics methods (i.e. Seurat and scVI) for scRNA-seq and scATAC-seq data, separately; and by two-omics methods (i.e. CCA, IntNMF, MOFA and scMVAE model) for multilayer data. (B) Clustering accuracy was evaluated by clustering score between cluster assignments predicted by computational methods and cell assignment based on whether each cell expresses a marker gene. (C) UMAP visualization of the denoised data from MOFA and scMVAE model. (D) Clustering and denoised quality were assessed by AGI score based on the cell clustering predicted by computational methods (i.e. Seurat, scVI, MOFA and scMVAE model) and gene expression level of marker gene and housekeeping genes denoised by these methods. (E) The proportion of 135 TF–TG pairs inferred by two-omics data denoised from scVI, MOFA and scMVAE, as well as raw data, by Pearson coefficients larger than 0.3 within at least one cell cluster. (F) Fold-change enrichment of the predicted regulations of known five marker genes, which are validated by the RegNetwork database.

References

    1. Patel AP, Tirosh I, Trombetta JJ, et al. . Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 2014;344:1396–401. - PMC - PubMed
    1. Wills QF, Mead AJ. Application of single-cell genomics in cancer: promise and challenges. Hum Mol Genet 2015;24:R74–84. - PMC - PubMed
    1. Mahata B, Zhang XW, Kolodziejczyk AA, et al. . Single-cell RNA sequencing reveals T helper cells synthesizing steroids De Novo to contribute to immune homeostasis. Cell Rep 2014;7:1130–42. - "VSports注册入口" PMC - PubMed
    1. Ziegenhain C, Vieth B, Parekh S, et al. . Comparative analysis of single-cell RNA sequencing methods. Mol Cell 2017;65:631. - PubMed
    1. Kelsey G, Stegle O, Reik W. Single-cell epigenomics: recording the past and predicting the future. Science 2017;358:69–75. - PubMed

V体育官网 - Publication types