CT-based machine learning radiomics predicts CCR5 expression level and survival in ovarian cancer

Objective We aimed to evaluate the prognostic value of C-C motif chemokine receptor type 5 (CCR5) expression level for patients with ovarian cancer and to establish a radiomics model that can predict CCR5 expression level using The Cancer Imaging Archive (TCIA) and The Cancer Genome Atlas (TCGA) database. Methods A total of 343 cases of ovarian cancer from the TCGA were used for the gene-based prognostic analysis. Fifty seven cases had preoperative computed tomography (CT) images stored in TCIA with genomic data in TCGA were used for radiomics feature extraction and model construction. 89 cases with both TCGA and TCIA clinical data were used for radiomics model evaluation. After feature extraction, a radiomics signature was constructed using the least absolute shrinkage and selection operator (LASSO) regression analysis. A prognostic scoring system incorporating radiomics signature based on CCR5 expression level and clinicopathologic risk factors was proposed for survival prediction. Results CCR5 was identified as a differentially expressed prognosis-related gene in tumor and normal sample, which were involved in the regulation of immune response and tumor invasion and metastasis. Four optimal radiomics features were selected to predict overall survival. The performance of the radiomics model for predicting the CCR5 expression level with 10-fold cross- validation achieved Area Under Curve (AUCs) of 0.770 and of 0.726, respectively, in the training and validation sets. A predictive nomogram was generated based on the total risk score of each patient, the AUCs of the time-dependent receiver operating characteristic (ROC) curve of the model was 0.8, 0.673 and 0.792 for 1-year, 3-year and 5-year, respectively. Along with clinical features, important imaging biomarkers could improve the overall survival accuracy of the prediction model. Conclusion The expression levels of CCR5 can affect the prognosis of patients with ovarian cancer. CT-based radiomics could serve as a new tool for prognosis prediction. Supplementary Information The online version contains supplementary material available at 10.1186/s13048-022-01089-8.


Introduction
Ovarian cancer is one of the most common gynecological malignant tumors worldwide, with more than 10,000 patients suffering from it every year [1]. The fatality rate of ovarian cancer ranks high among gynecological malignancies, and it carries a dismal prognosis. Despite remarkable achievements in treating ovarian cancer, such as debulking surgery, chemotherapy, immunotherapy, and targeted therapy, its 5-year survival rate is only approximately 30% [2]. Intratumor heterogeneity within patients and inter-tumor heterogeneity among patients present significant challenges for predicting overall survival and effectiveness of therapy. Although clinicopathologic features, serum markers including cancer antigen 125, and traditional imaging findings have served as prognostic indicators, limitations like low specificity [3] and requirement for surgical specimens and professional staff make them difficult to meet the clinical need. Thus, the search for additional reliable prognostic indicators continues to enable personalized precision medicine.
The tumor microenvironment, which involves multiple immune infiltrations, plays an essential role in the prognosis and clinical benefit of therapy. The chemokines and their specific receptors could mediate migration and impact the cellular process of both tumor and immune cells [4,5]. The C-C motif chemokine receptor type 5 (CCR5) is one of the G protein-coupled receptors on leukocyte surfaces and is involved in the process of host immune response, which serves as our main defense against pathogens. Emerging evidence indicates CCR5 contributes largely to the tumor invasion and metastasis [6]. High expression of CCR5 causes tumor cell migration and vascular invasion, which can result in distant metastasis of several types of tumors such as breast cancers [7][8][9], hepatocellular carcinoma [10] and prostate cancers [11][12][13]. Given its prominence in the regulation of immunity， several clinical trials targeting CCR5 are underway, involving cancers [7,13], COVID-19 [14], immune deficiency diseases like HIV [15][16][17][18], etc. It has been reported that in comparison to CD133non-cancer stem-like cells (CSLCs), chemokine CCL5 and its receptors, CCR1, CCR3, and CCR5, were significantly increased in CSLCs, and the blocking of CCL5 and its receptors effectively inhibits their invasive capacity [19]; CCR5 were consistently upregulated in Tregs in a patient who had been diagnosed with ovarian cancer， which indicated ovarian CSCs recruit Tregs via CCL5-CCR5 interactions [20].However, to the best of our knowledge, non-invasive and effective methods with a universal prognostic guidance to detect the marker for survival are still lacking.
Preoperative computed tomography (CT) images can tailor the surgery strategy since a wealth of cancer information can be obtained. However, intraoperative exploration and postoperative pathological diagnosis are more widely applied for treatment guidance and response assessment. It is developing an accurate and safe method to evaluate survival and guide treatment before operation remains a significant challenge. With the development of science and technology that enables the digital medical image to be converted into high-throughput data, radiomics has evolved rapidly [21,22]. Radiomics involves the application of advanced computational analyses of images, increasing visual assessment by extracting features not perceptible to the naked human eye [23]. Combined with machine learning techniques, radiomics is amenable to good performance in cancer prognosis prediction [24]. It provides highly accurate, dynamic and non-invasive methods for personalized medicine. Prior studies have demonstrated that CT imaging features are associated with angiogenesis, metabolism, hypoxia, and microenvironment of cancers [25][26][27]. Radiomics on CT imaging can be used in early diagnosis, risk stratification, evaluation of residual lesions, and tumor heterogeneity in a patient with ovarian cancer [28][29][30][31][32][33]. Despite recent advances in radiomics, to our best knowledge, no study involving survival analysis of patients with ovarian tumors has used radiomics features to predict the expression levels of CCR5 and evaluate its prognostic value.
Our current research aims to investigate the association between CCR5 expression levels and the prognosis of patients with ovarian cancer by using data from the Cancer Genome Atlas (TCGA) and matched patient data from the Cancer Imaging Archive (TCIA). Furthermore, based on CT images, we attempted to develop a radiomics model to predict the expression level of CCR5 which could be offered as a reference for clinical decisions.

Methods
Data retrieval and processing for this study 343 OC cases from the TCGA portal system (https:// portal. gdc. cancer. gov/) were used for the gene-based prognostic analysis. Fifty-seven cases had preoperative CT images stored in TCIA (https:// wiki. cance rimag ingar chive. net/ displ ay/ Public/ TCGA-LGG) with genomic data in TCGA were used for radiomics feature extraction and model construction. Eighty-nine cases with TCGA and TCIA clinical data were used for radiomics model evaluation. All the samples from TCIA and TCGA were anonymized and publicly available.
The exclusion criteria were: samples with incomplete clinical data without CCR5 expression data; samples without CT images; samples with poor image quality, and images with no corresponding clinical or gene expression information (The brief inclusion/exclusion criteria can be seen in supplemental table 1).

Identification of CCR5 as a differentially expressed gene
We calculate the cutoff value of CCR5 expression using the survMisc package of R. An online tool (https:// www. xiant ao. love/ produ cts) was employed to characterize the gene expression profiles of different tissues. RNAseq data in TPM format were uploaded and transferred. Kaplan-Meier (KM) curves and log-rank test were used to calculate survival probability and median survival time. We further performed conditional landmark analyses separately at 12, 36, and 60 months to compare two groups using the package jskm and survival.
Univariate and multivariate analysis of various factors that affect OS was performed by Cox regression. The time-dependent receiver operating characteristic (ROC) analysis was applied to validate the CCR5 expressionbased prognostic signature. The relationships between the CCR5 expression and clinical parameters were assessed using the Spearman correlation test.
CIBERSORTx (https:// ciber sortx. stanf ord. edu/) was employed to analyze the immune composition based on the gene expression profiles of complex tissues. RNA-seq data in TPM format were uploaded as the mixture file. Impute Cell Fractions, and LM22 (22 immune cell types) were selected for the signature matrix file. Gene ontology (GO) and KEGG enrichment analyses of differentially expressed genes were carried out with the clusterProfiler package in R.

Radiomics feature extraction and model establishment
The image processing was shown as a supplemental figure 1. Device information and scan parameters are as follows: Manufacturer: GE, SIEMENS X-ray tube current: median 259.0 mA X-ray tube voltage: median 120 kVp Slice thickness: median 5.0mm Exposure_Times: median 828ms Pixel spacing: median 0.742×0.742mm2 Contrast: Optiray 350, Isovue 370 Before feature selection, data were centered and scaled, implemented by the preProcess function from the R caret (Classification And Regression Training) package. As explained and reported in previous research [34], a Principal Component Analysis (PCA) was performed on the extracted features to plot data in a space of reduced dimensions (Supplemental Figure 8). The radiomics features of the two devices were similar. Radiomics features were extracted from 3D VOIs using Pyradiomics. Spatial Resampling: 1 × 1 × 1 mm3; Intensity Rescaling: 500; Intensity Discretization: binWidth 25 [35,36]. The region of interest (ROI) delineating each tumor was manually drawn under the supervision of two experienced radiologists. The robustness of the radiomics features was assessed using the intraclass correlation coefficient (ICC) by the R package irr. ICC ≥ 0.75 indicated high consistency, 0.51-0.74 middle, and < 0.5 low. Extracted radiomics features with an ICC of ≥0.75 met the criteria for further analysis, while the others were excluded from the final feature dataset. The least absolute shrinkage and selection operator (LASSO) regression, suitable for high-dimensional data, was performed to select the most useful predictive features using the "glmnet" package of R from the primary data set. Wilcoxon test analyses were used for exploring the relationship between CCR5 expression and radiomics parameters, and the results were visualized using the R package Ggpubr. The predictive radiomics characteristics screened from LASSO regression were incorporated into a multivariate Logistic regression to establish a radiomics model.

Model evaluation and clinical application
Restricted cubic spline (RCS) was performed using the "rms" and "survMisc" package of R. RCS shows the hazard ratio (HR) and 95% confidence interval (CI) describing the A The expression level of CCR5 in OV tissues was significantly higher than that in normal tissues; B The Kaplan-Meier curve shows no significant difference in patients' OS between two groups with different CCR5 expression levels; C Landmark analyses at 36 months showed a higher rate of OS for patients with high-expression of CCR5, whereas landmark analysis at 12 and 60 months showed no significant difference between the two groups association of Radiomics scores (RS) to allow nonlinear assessment. Clinical and image data of 89 cases of ovarian cancer were retrospectively analyzed in TCIA. The subjects were divided into the low-expression CCR5 and highexpression groups according to Radiomics Score (RS) when the cutoff was 0.388. All patients were divided into the training and validation groups according to a ratio of 7:3.
With the R package of rms, the clinical features and RS were incorporated into multivariate stepwise logistic regression with minimum AIC (Akaike Information Criterion) method to measure goodness of fit. Based on the factors in the final model, a predictive nomogram was generated to predict the 1-year, 3-year, and 5-year prognosis for patients with ovarian cancer. The time-dependent Receiver Operating Characteristic (ROC) curve was plotted. The calibration curve was used to describe and evaluate the nomogram performance visually. The perfect prediction should fall on the diagonal line of the figure.

Statistical analysis
We conducted all the statistical analysis using R version 3.5 (https:// www.r-proje ct. org). All p values < 0.05 was considered statistically significant. The performance of the radiomics model was assessed by using several indices with 10-fold cross-validation on the training and validation set, including accuracy (ACC), specificity (SPE), sensitivity (SEN), positive predictive value (PPV), and negative predictive value (NPV). A ROC curve was also employed to assess the overall performance of the radiomics models, and a Precision-Recall (PR) curve was displayed for comprehensive evaluations of the performance. The area under the curve (AUC) was calculated. We also calculated the Brier score to quantify the radiomics model's performances. Finally, the decision curve analysis (DCA) was conducted to reveal the clinical usefulness of the radiomics evaluation. Wilcoxon test analyses were used for exploring the relationship between RS and CCR5 expression, and the results were visualized using the R package Ggpubr.

Identification of CCR5 as a differentially expressed gene Clinical characteristics of the population with high and low CCR5 expression group
0.829 was calculated as the cutoff of the CCR5 expression. All the cases were divided into high-and lowexpression groups according to this value. Comparisons of the clinical characteristics are shown in Table 1. No statistical difference found in clinical factors between two groups, except for age.

A comparison of the CCR5 expression level between the normal tissues and ovarian cancer tissues and the comparison of survival data
The expression level of CCR5 in OV tissues was significantly higher than that in normal tissues(P<0.001), as shown in Fig. 1(A).  Landmark analysis shows a higher rate of OS for the high expression CCR5 group at a late stage (p=0.018) with a 36-month landmark, while no statistical difference was found at an early stage (p=0.66) and at the other landmark time (Fig. 1(C))

Associations between overall survival and clinicopathological characteristics using Cox regression: CCR5
Univariate Cox regression analysis revealed that high expression of CCR5 was a protective factor for OS although the result was not statistically significant (HR=0.872， 95%CI:0.72-1.055，p=0.159) ( Fig. 2A). With multivariate Cox regression analysis, however, the result demonstrated a significant correlation between CCR5 expression and OS (HR=0.803， 95%CI: 0.655-0.985，p=0.035) High expression of CCR5 remained to be a protective prognostic factor. (Fig. 2B)

ROC curves for predicting OS of ovarian cancer by CCR5 expression in the survival model
The time-dependent ROC curve was plotted, and the AUC was 0.442, 0.490, 0.562 for 12 months, 36 months, and 60 months, respectively (Fig. 3).

Relationship between CCR5 expression level and clinicopathological characteristics
Specifically, it was observed that the expression of CCR5 was negatively correlated with age, chemotherapy, FIGO stage, and radiotherapy. In contrast, its expression was positively correlated with lymphatic invasion, neoplasm histologic grade, tumor residual disease, and venous invasion, although only its correlation with lymphatic invasion was statistically significant. (p<0.05) (Fig. 4)

Relationship between CCR5 expression level and abundance of immune infiltrates
It was observed in the heat map of Spearman's rank correlation that the expression level of CCR5 correlated positively with the relative abundance of dendritic cells activated(p<0.05). By contrast, CCR5 was negatively related to B cells naive (p<0.05); CCR5 was also found not to be significantly correlated with B cells memory. (Supplemental figure 2)

GO and KEGG enrichment analysis of the CCR5
The most significantly enriched GO categories in the highexpression CCR5 group compared to the others were for neuromodulation and protease metabolism process (Supplemental figure 3). The results of KEGG analysis showed that pathways in the cell cycle, tumor necrosis factors, and m-TOR signaling pathway were mainly enriched.
Four optimal features, called original_glcm_Idn, origi-nal_gldm_GrayLevelNonUniformity, original_glrlm_ RunEntropy, and original_shape_MinorAxisLength (The overall importance was 1.271, 0.445, 0.763 and 0.426, respectively) were selected by the LASSO algorithm among a total of 95 features after ICC analysis.  Table 2) We construct the radiomics signature with a Radscore calculated using the following formulas: Features' importance was shown in Supplemental figure 6.

The performance of the radiomics model for predicting the CCR5 expression level with 10-fold cross-validation
In the training set, the ACC, SPE, SEN, PPV and NPV of the radiomics model was score 0.737, 0.875, 0.636, 0.875, 0.636 respectively, with a ROC-AUC of 0.770, and a brier score of 0.19.In the validation set, he ACC, SPE, SEN,PPV and NPV of the radiomics model was score 0.737, 0.833, 0.667, 0.846, 0.645 respectively, with a ROC-AUC of 0.726, and a brier score of 0.212. The PR-AUC of the model was 0.759 and DCA demonstrated its preferable clinical practicality (Fig. 5)

Difference between radiomics score when compared using Wilcoxon test in the training and validation set
The radiomics score (RS) was significantly higher in highexpression CCR5 group than in low-expression CCR5 group both in training and validation set (p<0.05). (Supplemental figure 7)

Clinical characteristics of the population with high and low RS group
Comparisons of baseline characteristics are shown in   (Fig. 7(B)) shows a higher rate of OS for high RS group at an early stage (p=0.049) with a 60-month landmark while no statistical difference was found at late stage (p=0.887) and at the other landmark time.

Nomogram and model evaluation
Besides RS, clinical characteristics including age, chemotherapy, tumor residual diseases were contained in the model using stepwise logistic regression with minimum AIC method. A predictive nomogram was generated based on the total score of each patient. For each person, every selected variable pointed to a score according to the above scale, and we could get a total score by summing up all scores (Fig. 8(A)). The AUC of the risk score of the time-dependent ROC was 0.8, 0.673 and 0.792 for 1-year, 3-year and 5-year, respectively ( Fig. 8(B)). Compared to every single predictor, the predictive nomogram achieved the best performance with the highest ROC-AUC (Supplemental table 3). It appeared that all three calibration curves of 1-year, 3-year and 5-year were closed to the standard curve. (Fig. 8(C))

Discussion
In this integrated radiomics-molecular analysis based on CT images, we aimed to explore the correlations between the pre-treatment radiomics profile of ovarian cancer and CCR5 expression levels and overall survival. Our results demonstrated that CT features are associated with differences in CCR5 expression levels and overall survival of ovarian cancer, indicating that additional molecular information can be obtained from radiomics analysis of CT images which might provide a new method for therapeutic decision-making. Due to the heterogeneity of ovarian cancers, the survival rate of patients varies widely. 15% of women with ovarian cancer die within 2 months of diagnosis, while almost half are expected to survive beyond five years [37]. Accurate prediction of prognosis is pivotal for the management of ovarian cancer. CCR5 as a chemokines' receptor, which may promote invasion and metastasis [6], has been actively investigated in various cancers [7][8][9][10][11][12][13]. Although limited in the field of ovarian cancer, several studies reported that CCR5 might be associated with prognosis [19,20]. Recently, the successful applications of radiomics in ovarian cancer, including cancer detection, differentiation diagnosis, prediction of treatment response, survival and lymph node metastasis [21,[28][29][30][31][32][33], suggested that radiomics may hold potential for decoding the CCR5 status in patients with ovarian cancer. Therefore, we conducted the present study. Landmark analyses at 60 months showed a higher rate of OS for patients with high radiomics scores, which demonstrated that higher expression levels of CCR5 are linked to better prognosis in patients with ovarian cancer (p < 0.05). We believe that noninvasive predicting the expression levels of CCR5 based on radiomics is helpful for nuanced clinical judgments.
Recently, a wealth of critical molecular markers, other prediction information, and the development of molecular biological and artificial intelligence techniques owed cancer researchers. Wingfield [5] applied radiomics features to predict the expression levels of CD44 and CD133 in lower-grade gliomas; In the training and validation sets, the model yielded AUCs of 0.912 and 0.805, respectively, in the CD44 model; 0.912 and 0.816, respectively, in the CD133 model. Xiao [38] applied a radiomics model to predict SYP gene expression; The prediction model yielded an accuracy of 0.93. In our study, after features screening based on repeat LASSO using the optimal lambda of -2.349, four optimal features: original_glcm_Idn, original_gldm_Gray-LevelNonUniformity, original_glrlm_RunEntropy, and original_shape_MinorAxisLength were selected for constructing the model. The predictive performance of models predicting CCR expression levels was good in the training and validation sets, and yielded AUCs of 0.770 and 0.726, respectively. In our research, restricted cubic spline models were used to fit the relationship between RS and OS and identify the optimal cut-off value of RS. It showed good predictive power with the time-dependent 0.679, 0.552, and 0.613 for 1-year, 3-year, and 5-year survival using single RS as predictor and AUROC of 0.8, 0.673, and 0.792 using RS combined with clinical features. The results demonstrated that combining clinical and radiomics models improved model performance. This finding was consistent with the research of Avesani [39], who built predictive radiomics models for early relapse and BRCA mutation based on a multicentric database of OC. Their models showed low performance in predicting both BRCA mutation and 1-year relapse with traditional radiomics (AUC: 0.46-0.59 for BRCA and 0.46-0.56 for relapse) and deep learning (AUC of 0.48 BRCA and 0.50 for relapse). The inclusion of clinical variables improved the performance of the radiomics models to predict BRCA mutation (AUC in the test set of 0.74).
Accurate survival prediction is important for decision making, especially for patients with cancers.
Decision-making, as one of the core tenets of medicine, relies upon integrating quantitative data. Many indicators were used to predict the survival of OC. Zhang et al. used indicator CA125 combined with D-dimer (ICD) in predicting OS in patients with OC. They showed that ICD (HR 2.651, 95% CI 1.273-5.520, p=0.009) was an independent predictor of OS in ovarian  (20) cancer, although the predictive power was limited [40]. Qi constructed a predictive nomogram based on risk factors for OS, including age, laterality, and American Joint Committee on Cancer (AJCC) stage. The C-index of the OS nomograms were 0.85 (95% CI: 0.81-0.89) and 0.80 (95% CI: 0.74-0.87), respectively in the training and validation cohort [41]. These predictors usually require the invasive operation or experienced staff, being laborintensive, time-consuming, and possibly leading to interobserver bias, while the predictive power was not ideal.
In the last few years, more and more new techniques have begun paving the way towards personalized and precision medicine based on the increasing knowledge of the tumoral microenvironment at a molecular level. Radiomics represents a recently introduced translational field of study, aiming to find relationships between multilayered information extracted from imaging examinations and clinical data to support evidence-based clinical judgement [42]. Imaging biomarkers could be used as predictors of OS. Several previous papers have reported a significant correlation between radiomics based on CT images and survival in ovarian cancer patients (summarized in supplemental Table 4) and have shown the good performance of the prediction model [43][44][45][46][47][48][49]. These results are consistent with those of our analysis. This consistency suggests that the radiomics offer good prediction power in ovarian cancer. In this work, we built prognostic models based on a biomarker, radiomics, and clinical features for survival analysis of patients with ovarian cancer. The model was proved to be a significant predictor of overall survival. We provided a noninvasive tool with relatively high accuracy to predict the OS of ovarian cancer using the CT-based machine learning radiomics predicting CCR5 expression level. Our findings suggested that combining the biomarker-based features into the standard radiomics might offer a useful approach for improving the prognostic prediction accuracy with a view to clinical use. Some limitation of our research needs to be considered. Firstly, all image information was acquired from public dataset TCIA, which inevitably contain variance in the quality of images that may have an impact on predictive analysis, Secondly, this study is a retrospective study with a relatively small sample size, so the generalizability still has to be studied. Thirdly, in our study, ROI delineating each tumor was manually drawn under the supervision of two experienced radiologists, while manual delineation is a critical task, especially because it carries a certain amount of subjectivity. As reported in a recent study [50], although with high accuracy, manual segmentation by radiologists is labor-intensive, time-consuming, and not always feasible for radiomics analysis requiring huge datasets. Additionally, manual segmentation is subject to inter-and intra-observer variability [51]. Hence, many semi-automatic delineation algorithms are applied in the clinical practice although less precise than manual segmentation [50]. On the other hand, in the case of manual delineations performed by different radiologists, one solution is the STAPLE tool [52] that can be used to overcome the limitation by producing a consolidated reference between the different operators. So, to reduce the operator interaction in the segmentation process and to improve the reproducibility of radiomics studies, automatic or semi-automatic approaches should be used in further study.

Conclusion
In conclusion, the expression levels of CCR5 can significantly influence the prognosis of patients with ovarian cancer. We developed a model from radiomics can effectively predict the expression levels of CCR5. Our model therefore has the potential to be widely used as a practical tool for the noninvasive characterization of tumor.