"V体育官网" Classification of breast cancer molecular subtypes based on contrast-enhanced ultrasound and superb microvascular imaging using machine learning approach
Highlight box
Key findings
• In this study, we developed and validated machine learning (ML)-based models for distinguishing the molecular subtypes of breast cancer (BC) using contrast-enhanced ultrasound (CEUS) and superb microvascular imaging (SMI) features. The good prediction performance of combining CEUS and SMI has been shown in this study, highlighting the potential value of vascularity viewpoint in BC pathophysiology V体育官网入口.
What is known and what is new?
• Angiogenesis plays a pivotal role in tumor growth, invasion, and metastasis, varying angiogenesis patterns are found in different molecular subtypes VSports在线直播. Hence, it is reasonable to infer that imaging techniques assessing vascular characteristics of lesions could provide valuable insights for predicting the molecular subtypes of BC.
• This study developed and validated ML models for distinguishing the molecular subtypes of BC using CEUS and SMI features. The good prediction performance of combining CEUS and SMI has been shown in this study, highlighting the potential value of vascularity viewpoint in BC pathophysiology V体育2025版.
What is the implication, and what should change now?
• Incorporating CEUS and SMI features into an ML approach may enhance the diagnostic capacity for distinguishing molecular subtypes of BC, potentially assisting clinical physicians in decision-making and evaluating prognosis.
Introduction (VSports最新版本)
Breast cancer (BC) is the most common malignant tumour and the second leading cause of cancer-related deaths among women globally (1). Recognition of the heterogeneity of this disease, four molecular subtypes of BC based on gene expression profiling have been determined, including luminal A, luminal B, human epidermal growth factor receptor 2 (HER2)-overexpressed, and triple-negative breast cancer (TNBC) (2). Distinct molecular subtypes of BC exhibit significant differences in molecular characteristics, biological activity, clinical outcome, and prognosis (3). Luminal subtypes show good response to endocrine therapy and have a favorable prognosis. HER2 positive usually implies aggressiveness, and anti-HER2 targeted therapy can improve patients’ survival, whereas the TNBC subtype demonstrates a good response to chemotherapy but has the worst prognosis. Therefore, it is crucial to determine which molecular subtype a BC patient belongs to and to plan an appropriate therapeutic strategy before treatment V体育官网.
Angiogenesis plays a pivotal role in tumour growth, invasion, and metastasis (4,5). Varying angiogenesis patterns are found in different molecular subtypes (6,7). Hence, it is reasonable to infer that imaging techniques assessing vascular characteristics of lesions could provide valuable insights for predicting the molecular subtypes of BC VSports手机版.
Contrast-enhanced ultrasound (CEUS) is a pure blood-pool imaging technology that utilizes the nonlinear harmonic of the contrast agents to facilitate continuous and dynamic observation of tumour microcirculation perfusion with diameters <100 µm (6,8). Previous research has indicated that CEUS has been proven valuable in identifying different molecular subtypes of BC. The Luminal subtype typically presents with low peak intensity (PI), while the HER2-overexpressed subtype commonly demonstrates higher PI, slope, shorter time to peak (TTP), and other high blood-flow perfusion features. In contrast, the TNBC subtype is characterized by a clear margin after enhancement (6,9). However, there has been inconsistency in previous research findings, with one potential reason being the widespread use of sulfur hexafluoride contrast agents, which may affect image quality due to insufficient signal from microbubbles generated by high-frequency transducer pulses (10). Superb microvascular imaging (SMI) is an emerging technique of Doppler ultrasonography for detecting microvascular blood flow V体育安卓版. By employing adaptive algorithms to suppress clutter signals and reduce motion artifacts, while retaining low-velocity flow signals, SMI excels in visualizing the microvascular morphologic and distribution features of breast lesions (11-13). Therefore, it is highly meaningful to identify the molecular subtypes of BC jointly with CEUS and SMI for a more comprehensive evaluation of this disease.
Machine learning (ML) algorithms have surpassed conventional models by effectively managing high-dimensional, complex data, thereby becoming a central focus of medical imaging research (14,15). Previous studies have demonstrated that ML achieves outstanding performance in BC diagnosis, prediction of pathological subtypes, and assessment of lymph node metastasis and substantially enhancing diagnostic accuracy and sensitivity (16,17). However, the limited interpretability of ML models constrains their broad adoption in clinical practice. The Shapley Additive exPlanations (SHAP) algorithm addresses this limitation by quantifying the contribution of each feature to the model’s prediction, thereby clarifying the underlying rationale (18). Incorporating interpretable ML models into the molecular subtyping of BC is expected to reduce pathologists’ workload, enhance diagnostic efficiency, and provide clinicians with insight into the model’s decision-making rationale.
This study aimed to construct ML models for the differentiation of BC molecular subtypes using CEUS and SMI. SHAP was used to interpret the model. We present this article in accordance with the TRIPOD reporting checklist (available at https://gs.amegroups.com/article/view/10.21037/gs-2025-220/rc).
Methods
Patients
This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This prospective study was approved by the ethics committee of Beijing Tongren Hospital (No. TRECKY2021-084). Written informed consent was obtained from each enrolled patient. From January 2022 to February 2024, we prospectively enrolled pathologically confirmed BC patients who underwent CEUS and SMI examinations at Beijing Tongren Hospital.
The inclusion criteria were as follows: (I) pathologically confirmed BC; (II) complete conventional ultrasound, CEUS, and SMI images; (III) no second malignancy; (IV) no previous treatment. The exclusion criteria were as follows: (I) allergic to the contrast agent; (II) presence of contraindications for the examinations; (III) pregnancy and lactation. Finally, a total of 191 females (mean age, 56.57±11.69 years, range, 29–76 years) with 193 breast lesions (mean size, 1.95±0.80 cm, range, 0.6–5.7 cm) were included. All lesions had complete imaging data and definitive pathological results, therefore, no missing data were present and no imputation or case-wise deletion was applied. The study flowchart is shown in Figure 1.

The collected lesions were divided into training (n=135) and test cohort (n=58) in a 7:3 ratio. The training and test cohorts were used to construct and verify the model performance, respectively.
Image acquisition
Ultrasound examination was performed using an Aplio i900 scanner (Canon Medical Systems, Tokyo, Japan) equipped with PLI-1205BX (i18LX5) linear array transducer operating at frequencies ranging from 5 to 18 MHz. All examinations were performed by two radiologists with over 15 years of experience in ultrasonography.
Initially, patients underwent conventional breast US examination in the supine or slightly lateral oblique position. B-mode ultrasound and color Doppler flow imaging (CDFI) images of the lesion were stored. Subsequently, vascularity within the breast lesions was evaluated by SMI, with parameters set to a velocity scale of 1.0–2.5 cm/s, a dynamic range of 21 dB, and a frame frequency of 26–60 frames/s. Lesions were observed in both monochromatic (mSMI) and color (cSMI) modes, and static and dynamic images were stored. The vascular index (VI) was obtained by defining the region of interest (ROI) at the margin of the lesion with the richest signal, and the VI was expressed as a percentage parameter obtained by dividing the vascular encoded pixels in the ROI (11). Three measurements of the VI were performed from different planes. Finally, CEUS was performed following an intravenous bolus injection of 0.015 mL/kg contrast agent Sonazoid (GE Healthcare, Oslo, Norway), followed by a 5–10 mL flush of 0.9% saline solution. The mechanical index was set to 0.08 and the depth was adjusted to 4 to 5 cm. Continuous observation was performed for 120 s, and the images were saved in DICOM format.
All image analyses were conducted by two experienced radiologists (W.R. and M.H.), with more than 15 years of experience in breast ultrasound imaging), who were blinded to all patient information. In case of disagreements, a third radiologist with over 20 years of experience joined in the image analysis until a consensus was reached.
"V体育官网入口" Ultrasound derived features
The conventional ultrasound characteristics included size, margins, calcification, and blood flow. The Adler classification to assess the CDFI images of lesions was as follows (19): Adler 0: no blood flow signal; Adler I: minimum blood flow (1–2 dot-like or short-line-like blood flow signals); Adler II: moderate blood flow (3–4 dot-like signals or 1 blood vessel longer than the lesion radius) small vessels or a main vessel; Adler III: marked blood flow (3 or more blood vessels).
On SMI imaging, the microvascular architecture of breast lesion was classified into six patterns, as reported in a previous study (11): type I: no signal; type II: penetrant signals or flow towards the lesion; type III: rim-like signals at the periphery of the lesion; type IV: dot-like/linear/stalked signal in a focal or regional area; type V: wheel-like signals inside the lesion; and type VI: irregular signals with disorganized or branching vessels in different diameters (Figure 2).

CEUS images contain both qualitative and quantitative parameters, and the scanner equipment’s contrast arrival-time parametric imaging (At-PI) mode was used to assess the qualitative parameters. In At-PI mode, different colors were used to differentiate contrast agent arrival times, ranging from early (red) to late (purple), with a 12-second interval between each color (20). Representative cases in this study are illustrated in Figure 3.

The qualitative and quantitative parameters obtained are provided in Appendix 1.
Histopathological analysis
All tumor specimens were analyzed by hematoxylin-eosin staining and immunohistochemistry. The histological tumor types were assessed according to the World Health Organization classification of breast tumors, by a pathologist with 20 years of experience. The expression of estrogen receptor (ER), progesterone receptor (PR), HER-2, and Ki67 was recorded. ER or PR status was classified as positive if proportion was ≥ 1%. For HER-2 expression, staining intensity was divided into four grades, with HER2(−) or HER2(1+) considered negative and HER2(3+) considered positive. Fluorescence in situ hybridization (FISH) was used to detect HER2 gene amplification when the result was HER2(2+). When FISH(+), HER2(2+) was considered as HER2 positivity, otherwise negative. A Ki67 proliferation index ≥14% was classified as highly expressed (21). BC was classified into four molecular subtypes: (I) luminal A (ER and/or PR positive, HER2 negative, Ki67 <14%); (II) luminal B (ER and/or PR positive, and HER2-negative, and Ki67 ≥14% or ER and/or PR positive and HER2-positive; (III) HER2-overexpressed (ER and PR negative, HER2 positive); and (IV) TNBC (ER, PR and HER2 negative) (2).
In this study, molecular subtypes were classified as follows: (I) luminal versus non-luminal, (II) HER2-overexpressed versus non-HER2-overexpressed, and (III) TNBC versus non-TNBC.
Feature selection and model construction
Redundant features, namely those that were either statistically non-significant between outcome-positive and -negative groups, or highly correlated (two features with an absolute inter-variable correlation coefficient >0.75), were excluded. To avoid overfitting, the recursive feature elimination (RFE) based on ML algorithms was used to identify optimal feature subset of the remaining features (22).
Three classical ML algorithms, including random forest (RF), support vector machine (SVM), and logistic regression (LR) were selected for analysis. The training set was used for model construction. Five repetitions of five-fold cross-validation with grid-search were applied for optimizing the hyperparameters on the training cohort, and the test cohort was used to evaluate the model. Class-weighted cross-entropy and on-the-fly augmentation (rotation, flipping, mild elastic deformation) were applied to mitigate class imbalance and overfitting. The hyperparameters of each model are shown in Appendix 1.
Model evaluation and interpretability (VSports注册入口)
The metrics for model performance included the area under the curve (AUC), accuracy, sensitivity, specificity, F1-score, positive predictive value (PPV) and negative predictive value (NPV). The cutoff value of predicted probability was determined by Youden index. Moreover, to improve the interpretability of the black-box ML model, a quantitative model interpretation method, SHAP, was deployed to illustrate the contribution of each feature to decision-making. Calibration of the model was evaluated using the calibration curve.
Statistical analysis
Statistical analysis was performed with R software (version 4.1.0). The normality of continuous variables was tested by Kolmogorov-Smirnov test. Continuous variables were tested using Student’s t-test or Mann-Whitney U test according to the data normality, which was expressed as mean ± standard deviation or median with interquartile range. Categorical variables were compared using the Chi-squared test or Fisher’s exact test and reported as numbers with percentages. Model comparison was tested using the DeLong test. RFE was performed with “caret” package. ML analyses were performed with “tidyverse” package. All statistical analyses were conducted in R (version 4.3.2). The two-sided P<0.05 was considered as significant difference.
Results
Characteristics of the patient cohort
The clinical characteristics of all included patients are shown in Tables 1,2. No significant differences in characteristics were found between the training and test cohorts.
Table 1
Characteristic | Value |
---|---|
Age (years) | 56.57±11.69 |
Tumour size (cm) | 1.95±0.80 |
Histopathology | |
Invasive ductal cancer | 148 (76.68) |
Invasive lobular cancer | 24 (12.44) |
Other | 21 (10.88) |
Histologic grade | |
Grade 1 | 38 (19.69) |
Grade 2 | 88 (45.60) |
Grade 3 | 67 (34.72) |
Receptor status | |
ER (+) | 122 (63.21) |
ER (−) | 71 (36.79) |
PR (+) | 104 (53.89) |
PR (−) | 89 (46.11) |
HER2 (+) | 70 (36.27) |
HER2 (−) | 123 (63.73) |
Ki67 ≥14% | 112 (58.03) |
Ki67 <14% | 81 (41.97) |
Molecular subtypes | |
Luminal | 96 (49.74) |
HER2-overexpressed | 50 (25.91) |
TNBC | 47 (24.35) |
Results for continuous data are expressed as mean ± standard deviation and for categorical data as n (%). ER, estrogen receptor; HER2, human epidermal growth factor receptor 2; PR, progesterone receptor; TNBC, triple-negative breast cancer.
Table 2
Characteristic | Overall (n=193) | Training cohort (n=135) | Test cohort (n=58) | P value |
---|---|---|---|---|
Age (years) | 56.57±11.69 | 54.23±10.78 | 54.84±11.13 | 0.76 |
BMI (kg/m2) | 22.45 (21.30–26.03) | 22.86 (21.80–26.03) | 22.66 (21.30–24.61) | 0.32 |
Family history | 0.09 | |||
No | 184 (95.34) | 131 (97.04) | 53 (91.38) | |
Yes | 9 (4.66) | 4 (2.96) | 5 (8.62) | |
Medical history | 0.90 | |||
No | 190 (98.45) | 133 (98.52) | 57 (98.28) | |
Yes | 3 (1.55) | 2 (1.48) | 1 (1.72) | |
Menopause | 0.20 | |||
No | 80 (41.45) | 60 (44.44) | 20 (34.48) | |
Yes | 113 (57.07) | 75 (55.56) | 38 (65.52) | |
Diameter (cm) | 1.95 (1.60–2.30) | 2.00 (1.70–2.45) | 1.90 (1.60–2.30) | 0.39 |
Margin | 0.34 | |||
Blurred | 142 (73.58) | 102 (75.56) | 40 (68.97) | |
Clear | 51 (26.42) | 33 (24.44) | 18 (31.03) | |
Location | 0.98 | |||
Left | 84 (50.78) | 60 (44.44) | 24 (41.38) | |
Right | 109 (62.18) | 75 (55.56) | 34 (58.62) | |
Calcification | 0.33 | |||
No | 80 (41.45) | 59 (43.70) | 21 (36.20) | |
Yes | 113 (58.55) | 76 (56.30) | 37 (63.79) | |
Adler’s grade | 0.42 | |||
0 | 0 (0.00) | 0 (0.00) | 0 (0.00) | |
I | 15 (7.77) | 10 (7.41) | 5 (8.62) | |
II | 87 (45.08) | 65 (48.15) | 22 (37.93) | |
III | 91 (47.15) | 60 (44.44) | 31 (53.45) | |
RI (1.0E−5 AU) | 0.73 (0.68–0.80) | 0.73 (0.67–0.80) | 0.73 (0.69–0.80) | 0.75 |
Enhancement speed | 0.15 | |||
Fast | 145 (75.13) | 97 (71.85) | 48 (82.76) | |
Synchronous | 40 (20.73) | 33 (24.44) | 7 (12.07) | |
Slow | 8 (4.15) | 5 (3.70) | 3 (5.17) | |
Enhancement degree | 0.18 | |||
Hyperenhancement | 125 (64.77) | 84 (62.22) | 41 (70.69) | |
Iso-enhancement | 33 (17.10) | 22 (16.30) | 11 (18.97) | |
Hypo-enhancement | 35 (18.13) | 29 (15.03) | 6 (10.34) | |
Internal homogeneity | 0.07 | |||
Homogeneous | 149 (77.20) | 109 (80.74) | 40 (68.97) | |
Homogeneous | 44 (22.80) | 26 (19.26) | 18 (31.03) | |
Perfusion defect | 0.61 | |||
Absent | 34 (17.62) | 25 (18.52) | 9 (15.52) | |
Present | 159 (82.38) | 110 (81.48) | 49 (84.48) | |
Enhancement order | 0.89 | |||
Centrifugal | 19 (9.84) | 15 (11.11) | 4 (6.90) | |
Centripetal | 156 (80.83) | 107 (79.26) | 49 (84.48) | |
Diffue | 18 (9.33) | 13 (9.63) | 5 (8.62) | |
Enhancement margin | 0.19 | |||
Blurred | 142 (73.58) | 103 (76.30) | 39 (67.24) | |
Clear | 51 (26.42) | 32 (23.70) | 19 (32.76) | |
Perforator vessel | 0.89 | |||
Absent | 68 (35.23) | 48 (35.56) | 20 (34.48) | |
Present | 125 (64.77) | 87 (64.44) | 38 (65.52) | |
Size after enhancement | 0.84 | |||
Larger | 48 (24.88) | 33 (24.44) | 15 (25.86) | |
Unchanged | 145 (75.13) | 102 (75.56) | 43 (74.14) | |
PI (1.0E−5 AU) | 28.29 (23.78–34.61) | 29.00 (23.64–33.84) | 27.80 (24.50–34.80) | 0.96 |
TTP (s) | 12.61 (10.71–14.62) | 12.41 (10.60–14.66) | 12.71 (11.31–14.41) | 0.48 |
MTT (s) | 13.20 (11.51–18.11) | 12.71 (11.41–18.71) | 14.10 (11.70–17.30) | 0.39 |
Slope (1.0E−5 AU/s) | 2.21 (1.73–3.05) | 2.21 (1.72–3.08) | 2.25 (1.76–2.97) | 0.77 |
Area (1.0E−5 AU.s) | 17.89 (15.62–20.53) | 17.83 (15.62–20.48) | 17.89 (16.18–21.00) | 0.56 |
Vascularization patterns | 0.51 | |||
a | 1 (0.52) | 1 (0.74) | 0 (0.00) | |
b | 49 (25.39) | 33 (24.44) | 16 (27.59) | |
c | 38 (19.69) | 26 (19.26) | 12 (20.69) | |
d | 26 (13.47) | 18 (13.33) | 8 (13.79) | |
e | 38 (19.69) | 29 (21.48) | 9 (15.52) | |
f | 41 (21.24) | 28 (20.74) | 13 (22.41) | |
VI (%) | 22.03±6.54 | 22.33±6.54 | 21.34±6.58 | 0.40 |
Molecular subtypes | 0.98 | |||
Luminal | 96 (49.74) | 70 (51.85) | 26 (44.82) | |
HER2-overexpressed | 50 (25.91) | 34 (25.19) | 16 (27.59) | |
TNBC | 47 (24.35) | 31 (22.96) | 16 (27.59) |
Results for continuous data are expressed as mean ± standard deviation or median (interquartile range), and for categorical data as n (%). AU, absorbance unit; BMI, body mass index; HER2, human epidermal growth factor receptor 2; MTT, mean transmit time; PI, peak intensity; RI, resistance index; TNBC, triple-negative breast cancer; TTP, time to peak; VI, vascular index.
Development and evaluation of the ML models
The RFE algorithm was applied to select the optimal feature sets for each BC subtype from 26 features (including 20 imaging features and 6 clinical features). Ultimately, 7 clinical, 5 CEUS, and 4 SMI features were retained for distinguishing Luminal, HER2-overexpressed, and TNBC subtypes, respectively. The features selected by each model are as follows: (I) luminal versus non-luminal: vascularization patterns, PI, TTP, VI, area, size-after-enhancement, and Adler’s grade; (II) HER2-overexpressed versus non-HER2-overexpressed: TTP, slope, perfusion defect, VI, and PI; (III) TNBC versus non-TNBC: enhancement margin, perfusion defect, size after enhancement, and VI.
Table 3 and Figure 4 provide a summary of the performance of ML models. In the test cohort, the SVM model showed the best performance in distinguishing Luminal from the other subtypes with an AUC, accuracy, sensitivity, specificity, F1 score, PPV, and NPV of 0.874 [95% confidence interval (CI): 0.769–0.979], 0.848, 0.880, 0.810, 0.863, 0.846, and 0.850, respectively. The SVM model significantly outperformed the LR model (P=0.04), but there was no significant difference compared to the RF model (P=0.34). The RF model achieved the highest AUC (0.872, 95% CI: 0.768–0.975) for discriminating between the HER2-overexpressed and other subtypes. Its accuracy, sensitivity, specificity, F1 score, PPV, and NPV were 0.800, 0.921, 0.823, 0.581, 0.818, and 0.899, respectively. There was a significant difference between the RF model and LR models (P=0.04), but no significant difference was observed compared to the SVM model (P=0.32). For identifying TNBC versus other subtypes, the LR model showed the highest AUC (0.824, 95 CI: 0.704–0.943). Its accuracy, sensitivity, specificity, F1 score, PPV, and NPV were 0.756, 0.909, 0.823, 0.645, 0.500, and 0.960, respectively. The LR model differed from the RF model (P=0.03) but showed no difference compared to the SVM model (P=0.44). In the test cohort, the calibration curve demonstrated that the mean squared error between the predicted probabilities and the actual event probabilities was only 5–15%, indicating good calibration and high predictive reliability (Figure 5).
Table 3
Groups | Models | Cohorts | AUC (95% CI) | Accuracy (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | F1 score (95% CI) | PPV (95% CI) | NPV (95%CI) |
---|---|---|---|---|---|---|---|---|---|
Luminal vs. non-luminal | RF | Training cohort | 0.922 (0.864–0.980) | 0.882 (0.814–0.931) | 0.911 (0.805–0.969) | 0.848 (0.761–0.911) | 0.895 (0.828 –0.941) | 0.879 (0.785–0.939) | 0.886 (0.814–0.936) |
Test cohort | 0.870 (0.762–0.998) | 0.891 (0.781–0.957) | 0.880 (0.690–0.969) | 0.905 (0.748–0.979) | 0.898 (0.799–0.951) | 0.917 (0.731–0.989) | 0.864 (0.730–0.947) | ||
SVM | Training cohort | 0.955 (0.914–0.996) | 0.912 (0.849–0.956) | 0.929 (0.838–0.978) | 0.891 (0.809–0.945) | 0.920 (0.867–0.956) | 0.912 (0.838–0.961) | 0.911 (0.833–0.961) | |
Test cohort | 0.874 (0.769–0.979) | 0.848 (0.729–0.929) | 0.880 (0.690–0.969) | 0.810 (0.646–0.921) | 0.863 (0.755–0.933) | 0.846 (0.694–0.940) | 0.850 (0.700–0.943) | ||
LR | Training cohort | 0.924 (0.859–0.988) | 0.912 (0.849–0.956) | 0.929 (0.838–0.978) | 0.891 (0.809–0.945) | 0.920 (0.867–0.956) | 0.912 (0.838–0.961) | 0.911 (0.833–0.961) | |
Test cohort | 0.836 (0.699–0.973) | 0.804 (0.678–0.897) | 0.800 (0.592–0.932) | 0.810 (0.646–0.921) | 0.816 (0.685–0.903) | 0.833 (0.650–0.941) | 0.773 (0.600–0.893) | ||
HER2-overexpressed vs. non-HER2-overexpressed | RF | Training cohort | 0.944 (0.902–0.986) | 0.816 (0.738–0.879) | 0.945 (0.815–0.992) | 0.859 (0.781–0.916) | 0.716 (0.620–0.798) | 0.858 (0.752–0.930) | 0.916 (0.852–0.959) |
Test cohort | 0.872 (0.768–0.975) | 0.800 (0.673–0.894) | 0.921 (0.739–0.989) | 0.823 (0.686–0.921) | 0.667 (0.520–0.791) | 0.818 (0.659–0.925) | 0.899 (0.762–0.968) | ||
SVM | Training cohort | 0.948 (0.897–0.999) | 0.913 (0.850–0.956) | 0.917 (0.826–0.970) | 0.911 (0.823–0.965) | 0.830 (0.752–0.890) | 0.759 (0.613–0.872) | 0.973 (0.924–0.995) | |
Test cohort | 0.845 (0.708–0.981) | 0.867 (0.745–0.939) | 0.818 (0.598–0.944) | 0.882 (0.738–0.969) | 0.750 (0.577–0.871) | 0.692 (0.493–0.851) | 0.938 (0.809–0.992) | ||
LR | Training cohort | 0.897 (0.836–0.958) | 0.786 (0.636–0.895) | 0.958 (0.835–0.976) | 0.734 (0.606–0.838) | 0.676 (0.602–0.956) | 0.523 (0.501–0.834) | 0.983 (0.832–0.996) | |
Test cohort | 0.832 (0.714–0.949) | 0.711 (0.610–0.849) | 0.909 (0.587–0.998) | 0.647 (0.502–0.775) | 0.606 (0.398–0.725) | 0.455 (0.246–0.601) | 0.957 (0.851–0.999) | ||
TNBC vs. non-TNBC | RF | Training cohort | 0.849 (0.769–0.930) | 0.728 (0.591–0.752) | 0.840 (0.638–0.956) | 0.692 (0.593–0.781) | 0.600 (0.523–0.765) | 0.467 (0.231–0.471) | 0.931 (0.865–0.987) |
Test cohort | 0.763 (0.611–0.916) | 0.711 (0.633–0.923) | 0.818 (0.482–0.977) | 0.676 (0.382–0.678) | 0.581 (0.267–0.642) | 0.450 (0.288–0.667) | 0.920 (0.758–0.992) | ||
SVM | Training cohort | 0.865 (0.788–0.943) | 0.806 (0.726–0.862) | 0.840 (0.638–0.956) | 0.795 (0.704–0.863) | 0.677 (0.601–0.935) | 0.568 (0.328–0.628) | 0.939 (0.889–0.987) | |
Test cohort | 0.794 (0.649–0.939) | 0.733 (0.499–0.762) | 0.909 (0.587–0.998) | 0.676 (0.423–0.716) | 0.625 (0.303–0.673) | 0.476 (0.183–0.512) | 0.958 (0.815–0.999) | ||
LR | Training cohort | 0.846 (0.758–0.933) | 0.816 (0.806–0.926) | 0.800 (0.646–0.907) | 0.821 (0.712–0.906) | 0.678 (0.656–0.873) | 0.588 (0.528–0.890) | 0.928 (0.839–0.964) | |
Test cohort | 0.824 (0.704–0.943) | 0.756 (0.499–0.762) | 0.909 (0.587–0.998) | 0.706 (0.670–0.867) | 0.645 (0.523–0.842) | 0.500 (0.418–0.673) | 0.960 (0.815–0.999) |
AUC, area under the curve; BC, breast cancer; CI, confidence interval; HER2, human epidermal growth factor receptor 2; LR, logistic regression; ML, machine learning; NPV, negative predictive value; PPV, positive predictive value; RF, random forest; SVM, support vector machine; TNBC, triple-negative breast cancer.


Interpretability and clinical benefit analysis
The contribution of each feature in the final ML model was visualized using SHAP (Figure 6). In the luminal final model, vascularization patterns emerged as the crucial feature for predicting subtypes. Features specific to the luminal subtype included lower Area, VI, PI and longer TTP. Furthermore, a larger size-after-enhancement, higher Adler’s grade, and type V or VI vascularization patterns indicated a non-luminal outcome. In the HER2-overexpressed final model, a shorter TTP, higher VI, slope, PI, and perfusion defect were significantly correlated with the HER2-overexpressed subtype. Regarding the TNBC subtypes, SHAP results showed that a larger size-after-enhancement, clear enhancement margin, higher VI, and perfusion defect were positively correlated with TNBC subtypes. A larger size-after-enhancement plays the most important role among them.

Discussion
This study developed and validated ML models for distinguishing the molecular subtypes of BC using CEUS and SMI features. The good predictive performance of the combination of CEUS and SMI has been shown in this study, highlighting the potential value of the vascular perspective in BC pathophysiology.
Three ML methods were used to establish models, and the best performing model was selected to predict the specific molecular subtypes of BC. This study showed that the SVM model has the best capacity to identify luminal from other subtypes. Microvascular architectural pattern as assessed by SMI plays a core role, ranking first among all features. In line with Kurt et al.’s study (11), our results also showed that the microvascular pattern of Luminal is commonly presented as rim-like, penetrant, and regional patterns, wheel-like and higgledy-piggledy signals inside the lesion were observed mostly in HER2-overexpressed and TNBC subtypes. We hypothesize that the observed pattern arises because luminal lesions exhibit relatively low invasiveness; tumor cells tend to proliferate along the interstices of connective and adipose tissues, resulting in a rim-like vascular distribution (23). In contrast, HER2-overexpressed and TNBC subtypes are more aggressive, with exuberant yet disorganized angiogenesis that frequently manifests as chaotic intralesional vascular architectures (24). To date, however, no consensus has been reached regarding the vascular distribution patterns of BC molecular subtypes under SMI, and large-scale studies are still required to clarify this issue (11). In addition, some CEUS quantitative parameters substantially contributed to Luminal outcome. Compared to other subtypes, the patients with luminal subtype had lower PI and Area values and longer TTP. This subtype has a low microvessel density (MVD), which may explain their low perfusion enhancement pattern (7). The RF model showed the best performance in distinguishing HER2-overexpressed from the other subtypes. CEUS characteristics played a dominant role in the model, with HER2-overexpressed patients exhibiting higher PI and slope, and perfusion defects were often observed. Furthermore, the VI from SMI was also relatively higher in HER2-overexpressed patients than non-HER2-overexpressed patients. As shown in the above-mentioned results, the HER2-overexpressed subtype exhibited high vascular perfusion. The HER2 gene can promote the expression of vascular endothelial growth factor (VEGF), which can induce tumor angiogenesis by promoting the proliferation of vascular endothelial cells. The high expression of VEGF in this subtype leads to more tumor neovascularization and significantly higher blood flow perfusion than other subtypes. Additionally, the abundance of tumor blood vessels is reflected not only in their rich blood flow but also in their distinct vascular structure compared to normal vessels. The thin and incomplete tumor vascular walls, along with the presence of arteriovenous fistulas, indicate rapid accumulation and clearance of contrast agents from the tumor (25,26). These features could be sensitively observed in CEUS imaging, and the quantitative parameters such as PI and slope can further quantify the extent and intensity of these characteristics. Tumors proliferate rapidly under the nourishment of blood vessels. When the tumor’s metabolism outpaces the supply of vasculature, localized liquefactive necrosis may occur, resulting in perfusion defects observed in CEUS (6). The LR model was ultimately used to identify TNBC subtypes due to its relatively stable AUC. In addition to the rich blood flow perfusion, it is worth mentioning that TNBC often exhibits a clear margin after enhancement. Wojcinski et al. (27) have suggested that this may be due to the higher histological grade and relatively rapid progression of this subtype.
In recent years, a few studies on the preoperative differentiation of BC molecular subtypes, most of which merely described specific imaging features of a certain BC subtype by comparing the differences between these subtypes (28,29). Given that BC is a highly vascular-dependent tumor, an exploration of CEUS or SMI in the differentiation of BC molecular subtypes would be informative. CEUS has the advantage of showing tumor blood perfusion. In this study, the quantitative CEUS parameters showed excellent performance in identifying various subtypes of BC, such as PI, TTP, slope. In addition, SMI depicted microvascular blood flow. We referred to the study by Kurt et al. (11) on the classification of microvascular patterns, yielding results consistent with theirs, and microvascular patterns played an essential role in establishing Luminal prediction models.
By simultaneously employing CEUS and SMI, we captured complementary information on tumor perfusion and microvascular distribution. The ML model that fused the two modalities yielded a more complete depiction of lesion vasculature and achieved excellent performance in predicting BC pathological subtypes, offering the prospect of reducing the need for immunohistochemical analysis, lowering costs, and improving diagnostic accuracy in clinical practice.
Currently, many studies use radiomics to identify molecular subtypes of BC, which can explore more imaging features. Our study employed ML methods; the RFE algorithm was used to screen factors and establish models for specific subtypes. This approach improves model effectiveness, and compared with radiomics models, ML models based on clinical and imaging data are easier for clinicians to understand and implement for clinicians in clinical practice. Ma et al. (30) have established radiomics models based on mammogram images to predict the luminal and HER2-overexpressed subtypes, with AUCs of 0.752 and 0.784, respectively. The AUCs reported are substantially lower than those in our study. However, we also observed that our TNBC model performed slightly worse than the other two subtypes. Liang et al. (6) also found that, compared with the luminal and HER2-overexpressed subtypes, TNBC currently has fewer specific imaging features. More studies are needed to explore the characteristics of TNBC.
There are some limitations in this study. First, the study was limited to a single-center, future work should include multi-center datasets to mitigate institutional bias. Second, the sample size was modest, and larger cohorts are required to enhance the precision and generalizability of the findings. Finally, although experienced sonographers performed the evaluations, some degree of subjectivity remains unavoidable.
Conclusions
Incorporating CEUS and SMI features into an ML approach could enhance the diagnostic performance for identifying molecular subtypes of BC, potentially helping clinical physicians in decision-making and evaluating prognosis.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://gs.amegroups.com/article/view/10.21037/gs-2025-220/rc
Data Sharing Statement: Available at https://gs.amegroups.com/article/view/10.21037/gs-2025-220/dss
Peer Review File: Available at https://gs.amegroups.com/article/view/10.21037/gs-2025-220/prf
Funding: None.
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://gs.amegroups.com/article/view/10.21037/gs-2025-220/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the ethics committee of Beijing Tongren Hospital (No. TRECKY2021-084) and written informed consent was obtained from each enrolled patient.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Giaquinto AN, Sung H, Miller KD, et al. Breast Cancer Statistics, 2022. CA Cancer J Clin 2022;72:524-41. [Crossref] [PubMed]
- Goldhirsch A, Winer EP, Coates AS, et al. Personalizing the treatment of women with early breast cancer: highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2013. Ann Oncol 2013;24:2206-23. [Crossref] [PubMed]
- Waks AG, Winer EP. Breast Cancer Treatment: A Review. JAMA 2019;321:288-300. [Crossref] [PubMed]
- Mori N, Mugikura S, Takahashi S, et al. Quantitative Analysis of Contrast-Enhanced Ultrasound Imaging in Invasive Breast Cancer: A Novel Technique to Obtain Histopathologic Information of Microvessel Density. Ultrasound Med Biol 2017;43:607-14. [Crossref] [PubMed]
- Folkman J. Tumor angiogenesis: therapeutic implications. N Engl J Med 1971;285:1182-6. [Crossref] [PubMed]
- Liang X, Li Z, Zhang L, et al. Application of Contrast-Enhanced Ultrasound in the Differential Diagnosis of Different Molecular Subtypes of Breast Cancer. Ultrason Imaging 2020;42:261-70. [Crossref] [PubMed]
- Kraby MR, Krüger K, Opdahl S, et al. Microvascular proliferation in luminal A and basal-like breast cancer subtypes. J Clin Pathol 2015;68:891-7. [Crossref] [PubMed]
- Guo L, Liu ZG, Han PH, et al. Perfusion curve f (t) analysis of breast cancer by contrast-enhanced ultrasonography. Acta Radiol 2012;53:981-6. [Crossref] [PubMed]
- Li X, Zhang J, Zhang G, et al. Contrast-Enhanced Ultrasound and Conventional Ultrasound Characteristics of Breast Cancer With Different Molecular Subtypes. Clin Breast Cancer 2024;24:204-14. ["VSports在线直播" Crossref] [PubMed]
- Sun C, Sboros V, Butler MB, et al. In vitro acoustic characterization of three phospholipid ultrasound contrast agents from 12 to 43 MHz. Ultrasound Med Biol 2014;40:541-50. [Crossref] [PubMed]
- Kurt SA, Kayadibi Y, Saracoglu MS, et al. Prediction of Molecular Subtypes Using Superb Microvascular Imaging and Shear Wave Elastography in Invasive Breast Carcinomas. Acad Radiol 2023;30:14-21. [V体育平台登录 - Crossref] [PubMed]
- Park AY, Seo BK, Woo OH, et al. The utility of ultrasound superb microvascular imaging for evaluation of breast tumour vascularity: comparison with colour and power Doppler imaging regarding diagnostic performance. Clin Radiol 2018;73:304-11. ["V体育官网入口" Crossref] [PubMed]
- Park AY, Seo BK, Cha SH, et al. An Innovative Ultrasound Technique for Evaluation of Tumor Vascularity in Breast Cancers: Superb Micro-Vascular Imaging. J Breast Cancer 2016;19:210-3. [Crossref] [PubMed]
- Yun K, He T, Zhen S, et al. Development and validation of explainable machine-learning models for carotid atherosclerosis early screening. J Transl Med 2023;21:353. [Crossref] [PubMed]
- Deo RC. Machine Learning in Medicine. Circulation 2015;132:1920-30. [Crossref (V体育官网)] [PubMed]
- Chia JLL, He GS, Ngiam KY, et al. Harnessing Artificial Intelligence to Enhance Global Breast Cancer Care: A Scoping Review of Applications, Outcomes, and Challenges. Cancers (Basel) 2025;17:197. [Crossref] [PubMed]
- Sun S, Mutasa S, Liu MZ, et al. Deep learning prediction of axillary lymph node status using ultrasound images. Comput Biol Med 2022;143:105250. [Crossref] [PubMed]
- Deng H, Eftekhari Z, Carlin C, et al. Development and Validation of an Explainable Machine Learning Model for Major Complications After Cytoreductive Surgery. JAMA Netw Open 2022;5:e2212930. ["VSports app下载" Crossref] [PubMed]
- Adler DD, Carson PL, Rubin JM, et al. Doppler ultrasound color flow imaging in the study of breast cancer: preliminary findings. Ultrasound Med Biol 1990;16:553-9. [Crossref] [PubMed]
- Niu Q, Zhao L, Wang R, et al. Predictive value of contrast-enhanced ultrasonography and ultrasound elastography for management of BI-RADS category 4 nonpalpable breast masses. Eur J Radiol 2024;173:111391. [Crossref (V体育官网)] [PubMed]
- Wolff AC, Hammond ME, Hicks DG, et al. Recommendations for human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/College of American Pathologists clinical practice guideline update. J Clin Oncol 2013;31:3997-4013. [Crossref] [PubMed]
- Chatterjee S, Dey D, Munshi S. Integration of morphological preprocessing and fractal based feature extraction with recursive feature elimination for skin lesion types classification. Comput Methods Programs Biomed 2019;178:201-18. [Crossref] [PubMed]
- Chen J, Li CX, Shao SH, et al. The association between conventional ultrasound and contrast-enhanced ultrasound appearances and pathological features in small breast cancer. Clin Hemorheol Microcirc 2022;80:413-22. [Crossref (V体育平台登录)] [PubMed]
- Li CY, Gong HY, Ling LJ, et al. Diagnostic performance of contrast-enhanced ultrasound and enhanced magnetic resonance for breast nodules. J Biomed Res 2018;32:198-207. [VSports注册入口 - Crossref] [PubMed]
- Hoyt K, Umphrey H, Lockhart M, et al. Ultrasound imaging of breast tumor perfusion and neovascular morphology. Ultrasound Med Biol 2015;41:2292-302. [Crossref] [PubMed]
- Eberhard A, Kahlert S, Goede V, et al. Heterogeneity of angiogenesis and blood vessel maturation in human tumors: implications for antiangiogenic tumor therapies. Cancer Res 2000;60:1388-93.
- Wojcinski S, Soliman AA, Schmidt J, et al. Sonographic features of triple-negative and non-triple-negative breast cancer. J Ultrasound Med 2012;31:1531-41. [Crossref] [PubMed]
- Zhu JY, He HL, Jiang XC, et al. Multimodal ultrasound features of breast cancers: correlation with molecular subtypes. BMC Med Imaging 2023;23:57. ["V体育官网" Crossref] [PubMed]
- Wen B, Kong W, Zhang Y, et al. Association Between Contrast-Enhanced Ultrasound Characteristics and Molecular Subtypes of Breast Cancer. J Ultrasound Med 2022;41:2019-31. [Crossref] [PubMed]
- Ma W, Zhao Y, Ji Y, et al. Breast Cancer Molecular Subtype Prediction by Mammographic Radiomic Features. Acad Radiol 2019;26:196-201. [Crossref] [PubMed]