Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The . gov means it’s official. Federal government websites often end in . gov or VSports app下载. mil. Before sharing sensitive information, make sure you’re on a federal government site. .

Https

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely V体育官网. .

. 2023 Feb 21;14(1):964.
doi: 10.1038/s41467-023-36559-0.

Single-cell biological network inference using a heterogeneous graph transformer

Affiliations

Single-cell biological network inference using a heterogeneous graph transformer (V体育安卓版)

Anjun Ma et al. Nat Commun. .

Abstract (VSports手机版)

Single-cell multi-omics (scMulti-omics) allows the quantification of multiple modalities simultaneously to capture the intricacy of complex molecular mechanisms and cellular heterogeneity. Existing tools cannot effectively infer the active biological networks in diverse cell types and the response of these networks to external stimuli. Here we present DeepMAPS for biological network inference from scMulti-omics VSports手机版. It models scMulti-omics in a heterogeneous graph and learns relations among cells and genes within both local and global contexts in a robust manner using a multi-head graph transformer. Benchmarking results indicate DeepMAPS performs better than existing tools in cell clustering and biological network construction. It also showcases competitive capability in deriving cell-type-specific biological networks in lung tumor leukocyte CITE-seq data and matched diffuse small lymphocytic lymphoma scRNA-seq and scATAC-seq data. In addition, we deploy a DeepMAPS webserver equipped with multiple functionalities and visualizations to improve the usability and reproducibility of scMulti-omics data analysis. .

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures (VSports注册入口)

Fig. 1
Fig. 1. The workflow of DeepMAPS and HGT illustration.
a The overall framework of DeepMAPS. Five main steps were included in carrying out cell clustering and biological gene network inference from the input scMulti-omics data. b The graph autoencoder was inserted with a HGT model. The integrated cell-gene matrix was used to build a heterogeneous graph include all cells (green) and genes (purple) as nodes. The HGT model is trained on multiple subgraphs (50 subgraphs as an example) that cover nodes in the whole graph as many as possible. Each subgraph is used to train the model with 100 epochs; thus, the whole training process iterates 5,000 times. The trained model is then applied to the whole graph to learn and update the embeddings of each node. c An illustration of embedding update process of the target node in a single HGT layer. The red circle in the upper panel indicates the target node and the black circle indicates the source nodes. Arrows represents for the connection between a target node and source nodes. Colored rectangles represent for embeddings of different nodes. The zoom in detailed process in the bottom panel shows the massage passing process and attention mechanism. The final output of one HGT layer is an update of node embedding for all nodes. HGT heterogeneous graph transformer.
Fig. 2
Fig. 2. Benchmarking of DeepMAPS in terms of cell clustering.
a Benchmark cell clustering results of ten datasets in ARI for the three multiple scRNA-seq data and the three CITE-seq data with benchmark labels, and ASW for the four scRNA-ATAC-seq data without benchmark labels. Each box showcases the minimum, first quartile, median, third quartile, and maximum ARI or AWS results of a tool using different parameter settings (DeepMAPS: n = 96, Seurat: n = 16 for RNA-RNA and CITE-seq and 36 for RNA-ATAC, Harmony: n = 36, MOFA + : n = 36, TotalVI: n = 48, and GLUE: n = 72). Dots represent outliers. b Results comparison on five independent datasets. No repeated experiment was conducted. c Robustness test of DeepMAPS using the cell cluster leave-out method for the three independent test datasets with benchmarking cell labels. p-values were calculated based on two-tail t.test. Each box showcases the minimum, first quartile, median, third quartile, and maximum ARI results of a tool performed on different data subsets (R-test: n = 5, C-test: n = 20, and A-test-1: n = 5). Dots represent outliers. df UMAP comparison of R-test, C-test, and A-test-1 datasets between DeepMAPS and other tools using the original cell labels. Source data are provided as a Source Data file. ASW average Silhouette width, ARI adjusted rand index.
Fig. 3
Fig. 3. Evaluation and comparison of gene association network inference of DeepMAPS.
a, b Closeness centrality (CC) and eigenvector centrality (EC) were used to indicate the compactness and importance of genes to the network. We compared our results with IRIS3 and a background network using all genes for the R-test dataset (n = 5) a and C-test dataset (n = 14) b. p-values were calculated using a two-tail t-test. c Comparison of the number of unique TFs in GRNs that showed significantly enriched biological functions in three public databases. Each box contains the results of six scRNA-ATAC-seq datasets (n = 6). d Comparison of the number of cell-type-specific regulons in GRNs significantly enriched in only one biological function/pathway in the three public databases (n = 6). e The F1 score comparisons of regulons enriched to only one function/pathway using three databases (n = 6). The mean value of precision and recall scores of the selected six scRNA-ATAC-seq datasets were max-min scaled and shown in the heatmap with darker blue indicating high values and lighter blue indicating low values. Source data are provided as a Source Data file. Each box in Fig. 3 showcases the minimum, first quartile, median, third quartile, and maximum score of the corresponding criteria. CC closeness centrality, EC eigenvector centrality, CTSR cell-type-specific regulon.
Fig. 4
Fig. 4. DeepMAPS identification of heterogeneity in CITE-seq data of PBMC and lung tumor leukocytes.
a UMAPs for DeepMAPS cell clustering results from integrated RNA and protein data, protein data only, and RNA data only. Cell clusters were annotated based on curated marker proteins and genes. b Heatmap of curated marker proteins and genes that determine the cell clustering and annotation. c Heatmap of the Spearman correlation comparison of top differentially expressed genes and proteins in plasma cells and memory B cells. d UMAP is colored by the 51st embedding, indicating distinct embedding representations in plasma cells and memory B cells. e Expression of top differentially expressed genes and proteins in c as a function of the 51st embedding to observe the pattern relations between plasma cells and memory B cells. Each line represents a gene/protein, colored by cell types. For each gene, a line was drawn using a loess smoothing function based on the corresponding embedding and scaled gene expression in a cell. fh Similar visualization was conducted for the 56th embedding to compare EM CD8+ T cells and TRM CD8+ T cells ce. i Two signaling pathways, NECTIN and ALCAM, are shown to indicate the predicted cell–cell communications between two cell clusters. A link between a filled circle (resource cluster with highly expressed ligand coding genes) and an unfilled circle (target cluster with highly expressed receptor coding genes) indicates the potential cell-cell communication of a signaling pathway. Circle colors represent different cell clusters, and the size represents the number of cells. The two monocyte groups were merged. TRM tissue-resident memory, CM central memory, TAM tumor-associated macrophage, HGT heterogeneous graph transformer.
Fig. 5
Fig. 5. DeepMAPS identifies specific GRNs in DSLL subnetworks.
a Conceptual illustration of DeepMAPS analysis of scRNA-ATAC-seq data. Modalities are first integrated based on a velocity-weighted balance. The integrated GAS matrix was then used to build a heterogeneous graph as input into the HGT framework. The cell cluster and gene modules with high attention scores were then used for building TF-gene linkages and determining regulons in each cell cluster. b The UMAP shows the clustering results of DeepMAPS. Cell clusters were manually annotated based on curated marker genes. c The observed and extrapolated future states (arrows) based on the RNA velocity of the normal B cell and the two DSLL states are shown (top panel). Velocity-based trajectory analysis shows the pseudotime from the top to the bottom right (bottom panel). d Selected 20 TF in each of the three clusters, representing the top 20 regulons with the highest centrality scores. Colors represent regulons uniquely identified in each cluster or shared between different clusters. e Regulons in DSLL state-1 showed a significant difference in regulon activity compared to the other clusters. Motif shape and number of regulated genes are also shown. f Violin plots of regulon activities of the four regulons compared between the three clusters. g The downstream-regulated genes of JUN (the most differentially active regulon in DSLL state-1) in the three clusters. h An illustration of the BAFF signaling pathway identified from GAS-based cell-cell communication prediction using CellChat. The BAFF signaling pathway was found to exist between macrophage and both DSLL states. It further activates the JUN regulon and enables the transcription of genes like CDK6. Figure created with BioRender.com. i The ATAC peak, RNA expression, and GAS level of TNFRSF13B (the coding gene of TACI, the receptor in the BAFF signaling pathway). Source data are provided as a Source Data file.
Fig. 6
Fig. 6. The organization of the DeepMAPS web portal.
a Software-engineering diagram of DeepMAPS and an overview of the framework. b Pipeline illustration of the server, including major steps (left; colors indicate different steps), detailed analyses (middle), and featured figures and tables (right).

References

    1. Stuart T, Satija R. Integrative single-cell analysis. Nat. Rev. Genet. 2019;20:257–272. doi: 10.1038/s41576-019-0093-7. - DOI - PubMed
    1. Ma, A., McDermaid, A., Xu, J., Chang, Y. & Ma, Q. Integrative methods and practical challenges for single-cell multi-omics. Trends Biotechnol.38, 1007–1022 (2020). - PMC - PubMed
    1. Argelaguet, R., Cuomo, A. S. E., Stegle, O. & Marioni, J. C. Computational principles and challenges in single-cell data integration. Nat. Biotechnol.39, 1202–1215 (2021). - V体育2025版 - PubMed
    1. S Teichmann ME. Method of the year 2019: single-cell multimodal omics. Nat. Methods. 2020;17:1. doi: 10.1038/s41592-019-0703-5. - DOI - PubMed
    1. Hao Y, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184:3573–3587.e3529. doi: 10.1016/j.cell.2021.04.048. - VSports最新版本 - DOI - PMC - PubMed

Publication types