This site needs JavaScript to work properly. Please enable it to take advantage of the complete set of features!

VSports最新版本 - Skip to main page content

Add to Collections

"V体育ios版" Your saved search

Name of saved search: \/]*" title="The following characters are not allowed in the Name field: "&=<>/">

Search terms:

Test search terms

Would you like email updates of new search results?

Yes
No

Email: (change)

Frequency:

Which day?

Which day?

Report format:

Send at most:

Send even when there aren't any new results

Optional text in email:

Your RSS Feed

"VSports最新版本" Full text links

Silverchair Information Systems Free PMC article

Full text links

Actions

. 2021 Jul 20;22(4):bbaa287.

doi: 10.1093/bib/bbaa287.

Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data

Chunman Zuo¹, Luonan Chen^{1

2

3}

Affiliations

Affiliations

¹ Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China.
² Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China.
³ Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223 China.

PMID: 33200787
PMCID: PMC8293818
DOI: 10.1093/bib/bbaa287 (VSports最新版本)

Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data

Chunman Zuo et al. Brief Bioinform. 2021.

. 2021 Jul 20;22(4):bbaa287.

doi: 10.1093/bib/bbaa287.

Authors

Chunman Zuo¹, Luonan Chen^{1

2

3}

Affiliations

¹ Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China.
² Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China.
³ Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223 China.

PMID: 33200787
PMCID: PMC8293818
DOI: 10.1093/bib/bbaa287

Abstract

Simultaneous profiling transcriptomic and chromatin accessibility information in the same individual cells offers an unprecedented resolution to understand cell states. However, computationally effective methods for the integration of these inherent sparse and heterogeneous data are lacking. Here, we present a single-cell multimodal variational autoencoder model, which combines three types of joint-learning strategies with a probabilistic Gaussian Mixture Model to learn the joint latent features that accurately represent these multilayer profiles. Studies on both simulated datasets and real datasets demonstrate that it has more preferable capability (i) dissecting cellular heterogeneity in the joint-learning space, (ii) denoising and imputing data and (iii) constructing the association between multilayer omics data, which can be used for understanding transcriptional regulatory mechanisms. VSports手机版.

Keywords: data integration; deep joint-learning model; multimodal variational autoencoder; single-cell multiple omics data. V体育安卓版.

© The Author(s) 2020 V体育ios版. Published by Oxford University Press. .

PubMed Disclaimer

Figures

Figure 1
Overview of scMVAE model with three joint-learning strategies. (A) Overall framework of the scMVAE model. Given the scRNA-seq data ( with variables) and scATAC-seq data ( with variables) of the same cell as input, the scMVAE model learned a nonlinear joint embedding () of the cells that can be used for multiple analysis tasks (i.e. cell clustering and visualization) through a multimodal encoder with three learning strategies described as (B), and then reconstructed back to the original dimension as output through a decoder for each omics data. Note: the same cell orders for both omics data ensure that one cell corresponds to a point in the low-dimensional space. (B) Illustration model of three learning strategies: (i) ‘PoE’ framework was used to estimate the joint posterior by a product of posterior of each omics data (detailed in Material S1), (ii) ‘NN’ was used to learn the joint-learning space by using a neural network to combine the features extracted by a sub encoder network for each layer data and (iii) ‘Direct’ strategy was used to learn together by directly using the concatenation of the original features of two-layer data as input. Here, the neural networks: , , , , , were removed from the total network under this learning condition. (C) The distribution to where each variable of scMVAE model belongs. Each omics data were modeled as one ZINB distribution. The detailed description for each variable is given in datasets and preprocessing.

formula image — Figure 1
Overview of scMVAE model with three joint-learning strategies. (A) Overall framework of the scMVAE model. Given the scRNA-seq data ( with variables) and scATAC-seq data ( with variables) of the same cell as input, the scMVAE model learned a nonlinear joint embedding () of the cells that can be used for multiple analysis tasks (i.e. cell clustering and visualization) through a multimodal encoder with three learning strategies described as (B), and then reconstructed back to the original dimension as output through a decoder for each omics data. Note: the same cell orders for both omics data ensure that one cell corresponds to a point in the low-dimensional space. (B) Illustration model of three learning strategies: (i) ‘PoE’ framework was used to estimate the joint posterior by a product of posterior of each omics data (detailed in Material S1), (ii) ‘NN’ was used to learn the joint-learning space by using a neural network to combine the features extracted by a sub encoder network for each layer data and (iii) ‘Direct’ strategy was used to learn together by directly using the concatenation of the original features of two-layer data as input. Here, the neural networks: , , , , , were removed from the total network under this learning condition. (C) The distribution to where each variable of scMVAE model belongs. Each omics data were modeled as one ZINB distribution. The detailed description for each variable is given in datasets and preprocessing.

Figure 2
Visualization, clustering and run-time comparison on the simulated datasets. (A) Dot plot of the top two factors (PCs for Dataset1 and 2; UMAPs for Dataset 3) extracted from each of corrupted omics data of three simulated datasets, and latent features extracted by single-omics methods: scVI and Seurat for each omics data (upper layer for each dataset), and joint-learning latent features extracted by IntNMF, MOFA and scMVAE model, respectively (lower layer for each dataset). Cells are colored by their true cell types. For each dataset, the final subplot indicates its corruption rate of each omics data. (B) Clustering accuracy was evaluated by ARI and NMI between true cell label and predicted cell cluster by single-omics methods: scVI and Seurat; and multiomics methods: IntNMF, MOFA and scMVAE model, respectively, for each of three simulated datasets. (C) Run-time comparison for fitting four models on the 18 simulated datasets which were generated by randomly selected different sizes of cells and features from AdBrainCortex datasets with 3000 features per omics data. Algorithms were tested on a machine with one 40-core Intel(R) Xeon(R) Gold 5115 CPU addressing with 132GB RAM, and two NVIDIA TITAN V GPU addressing 24GB.

Figure 3
Feature embedding and clustering comparison on the original cell line mixture datasets. (A) UMAP visualization of the raw data and features separately extracted from scRNA-seq (upper layer) and scATAC-seq (lower layer), by Seurat and scVI, respectively. (B) UMAP visualization of the extracted features from the multiomics method: CCA, IntNMF, MOFA and scMVAE model. (C) Clustering accuracy was evaluated by clustering score between cell cluster predicted by nine computational methods (i.e. Seurat, scVI (scRNA-seq), IntNMF, MOFA, CCA, scVI (scATAC-seq) and scMVAE model) and cell assignments based on whether each cell expresses one marker gene. Each subpie plot shows the clustering score of nine methods for each cluster, and ideally, it is distributed on the diagonal. X and Y axis indicate marker genes and cell clusters, respectively. (D) Clustering accuracy was evaluated by AGI score based on the clustering assignment predicted by computational methods (i.e. Seurat, scVI, MOFA and scMVAE model) and the expression level of marker gene and housekeeping genes. Note: the higher the score, the better the clustering performance. (E) Clustering accuracy was assessed by ARI to compare different methods under the nine datasets with different sparsity levels of scRNA-seq and scATAC-seq data. (F) Clustering accuracy was assessed by NMI to compare different methods under the nine datasets with different sparsity levels of scRNA-seq and scATAC-seq data.

Figure 4
Consistency of clustering and features between two-omics data on the denoised cell line mixture datasets by scMVAE. (A) The consistency was evaluated by the Kappa coefficient between the clustering assignment of two-omics data denoised by MOFA, CCA, scVI and scMVAE model, as well as raw data. (B) Features similarity was assessed by Pearson and Spearman correlation between two-omics data denoised by MOFA, CCA, scVI and scMVAE, as well as raw data. (C) Pearson correlation between known TF–TG pairs of two-omics data denoised by MOFA, CCA, scVI and scMVAE model, as well as raw data.

Figure 5
scMVAE model works well on AdBrainCortex (a large dataset). (A) UMAP visualization of the latent features extracted by one-omics methods (i.e. Seurat and scVI) for scRNA-seq and scATAC-seq data, separately; and by two-omics methods (i.e. CCA, IntNMF, MOFA and scMVAE model) for multilayer data. (B) Clustering accuracy was evaluated by clustering score between cluster assignments predicted by computational methods and cell assignment based on whether each cell expresses a marker gene. (C) UMAP visualization of the denoised data from MOFA and scMVAE model. (D) Clustering and denoised quality were assessed by AGI score based on the cell clustering predicted by computational methods (i.e. Seurat, scVI, MOFA and scMVAE model) and gene expression level of marker gene and housekeeping genes denoised by these methods. (E) The proportion of 135 TF–TG pairs inferred by two-omics data denoised from scVI, MOFA and scMVAE, as well as raw data, by Pearson coefficients larger than 0.3 within at least one cell cluster. (F) Fold-change enrichment of the predicted regulations of known five marker genes, which are validated by the RegNetwork database.

See this image and copyright information in PMC

References

1. Patel AP, Tirosh I, Trombetta JJ, et al. . Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 2014;344:1396–401. - PMC - PubMed
1. Wills QF, Mead AJ. Application of single-cell genomics in cancer: promise and challenges. Hum Mol Genet 2015;24:R74–84. - PMC - PubMed
1. Mahata B, Zhang XW, Kolodziejczyk AA, et al. . Single-cell RNA sequencing reveals T helper cells synthesizing steroids De Novo to contribute to immune homeostasis. Cell Rep 2014;7:1130–42. - "VSports注册入口" PMC - PubMed
1. Ziegenhain C, Vieth B, Parekh S, et al. . Comparative analysis of single-cell RNA sequencing methods. Mol Cell 2017;65:631. - PubMed
1. Kelsey G, Stegle O, Reik W. Single-cell epigenomics: recording the past and predicting the future. Science 2017;358:69–75. - PubMed

V体育官网 - Publication types

Actions

MeSH terms (V体育安卓版)

VSports - Actions
Actions
Actions (V体育2025版)
Actions
Actions (V体育官网)
"V体育2025版" Actions
VSports在线直播 - Actions
Actions (VSports注册入口)
"V体育2025版" Actions

Substances

Actions

V体育2025版 - LinkOut - more resources

Full Text Sources