Development and validation of Nomograms for predicting overall survival and Cancer-specific survival in patients with ovarian clear cell carcinoma

Background Ovarian clear cell carcinoma (OCCC) is a rare histologic type of ovarian cancer. There is a lack of an efficient prognostic predictive tool for OCCC in clinical work. This study aimed to construct and validate nomograms for predicting the overall survival (OS) and cancer-specific survival (CSS) in patients with OCCC. Methods Data of patients with primary diagnosed OCCC in the Surveillance, Epidemiology, and End Results (SEER) database between 2010 and 2016 was extracted. Prognostic factors were evaluated with LASSO Cox regression and multivariate Cox regression analysis, which were applied to construct nomograms. The performance of the nomogram models was assessed by the concordance index (C-index), calibration plots, decision curve analysis (DCA) and risk subgroup classification. The Kaplan-Meier curves were plotted to compare survival outcomes between subgroups. Results A total of 1541 patients from SEER registries were randomly divided into a training cohort (n = 1079) and a validation cohort (n = 462). Age, laterality, stage, lymph node (LN) dissected, organ metastasis and chemotherapy were independently and significantly associated with OS, while laterality, stage, LN dissected, organ metastasis and chemotherapy were independent risk factors for CSS. Nomograms were developed for the prediction of 3- and 5-year OS and CSS. The C-indexes for OS and CSS were 0.802[95% confidence interval (CI) 0.773–0.831] and 0.802 (0.769–0.835), respectively, in the training cohort, while 0.746 (0.691–0.801) and 0.770 (0.721–0.819), respectively, in the validation cohort. Calibration plots illustrated favorable consistency between the nomogram predicted and actual survival. C-index and DCA curves also indicated better performance of nomogram than the AJCC staging system. Significant differences were observed in the survival curves of different risk subgroups. Conclusions We have constructed predictive nomograms and a risk classification system to evaluate the OS and CSS of OCCC patients. They were validated to be of satisfactory predictive value, and could aid in future clinical practice.


Introduction
Ovarian cancer (OC) is one of the most aggressive gynecological cancer, consisting of a group of heterogonous tumors. As a subtype of epithelial ovarian cancer (EOC), ovarian clear cell carcinoma (OCCC) presents a distinct biological profile from other histological types [1]. With a higher incidence in East Asia (~30%), OCCC is reported to be diagnosed at a younger age compared with serous carcinoma [2].
Patients with early-stage OCCC generally exhibit favorable prognosis, while those in advanced stage present worse survival outcomes than patients in the high-grade serous group [3]. In addition to stage, other factors were proposed to exert influences on the prognosis of OCCC, such as the presence of endometriosis, surgical methods and venous thromboembolism [4,5]. Demographic characteristics, tumor size, lymph node status and treatment strategies are also important when evaluating survival outcomes.
Standard treatment guidelines for OCCC have not yet been developed, given its rarity in the large clinical trials of EOC. A comprehensive prognostic judgment system would be useful to guide the selection of a treatment protocol. The nomogram, a statistic-based predictive tool with the ability to integrate pivotal predictive factors, has been widely utilized to quantify risks and evaluate the prognosis of many cancer types [6][7][8]. However, to the best of our knowledge, no nomograms for patients with OCCC have been developed. In this study, we aimed to construct nomograms using data extracted from the Surveillance, Epidemiology, and End Results (SEER) database to predict the prognosis of patients with OCCC.

Data source
Medical records of patients with OCCC were obtained from the SEER database, which contains data of cancer patients from 18 regional registries, covering approximately 34.6% of the total population in the United States [9]. Relevant information was extracted applying SEER*Stat software version 8.3.6 (https://seer.cancer.gov/seerstat/).

Data extraction
Patients diagnosed as OCCC from 2010 to 2016 were selected through the International Classification of Diseases for Oncology, 3rd edition (ICD-O-3) morphology codes "8310/3; 8313/3; 8443/3; 8444/3" from OC patients. Variables for this study included age at diagnosis, race, laterality, grade, stage (American Joint Commission on Cancer [AJCC] 7th version), tumor size, organ metastasis, radiotherapy, chemotherapy, number of examined lymph nodes (LNs), LN status, vital status and survival time. The organ metastasis sites referred to liver, lung, bone, and brain according to the available data in the SEER database.
Patients with OCCC were excluded in the following scenarios: (1) not the primary tumor; (2) without histologic confirmation; (3) survival time shorter than 1 month; (4) no surgery; and (5) unknown information about LN, race, tumor size, stage and organ metastasis.

Statistic methods
Patients enrolled in our study were randomly assigned into the training cohort and validation cohort at a ratio of 7:3. The primary end points were overall survival (OS) and cancer-specific survival (CSS). Categorical variables were shown as frequencies and proportions. The comparison of clinicopathological characteristics between the training and validation cohorts was performed using a chi-squared test.
The least absolute shrinkage and selection operator (LASSO) method was used to primarily select useful predictive features to avoid over-fitting to some extent. Significant prognostic factors were further identified in multivariate analysis from the Cox proportional hazards model. Then, the nomograms associated with OS and CSS were constructed incorporating the final risk factors.
The performance of the nomogram was validated internally in the training cohort and externally in the validation cohort. Harrell's concordance index (C-index) ranging from 0.5 to 1.0 was used to evaluate the discriminative abilities of the nomograms. Calibration curves (1000 bootstrap resamples) were generated to test the consistency between the predicted and actual 3-and 5year OS and CSS. Emerging as a new method, decision curve analysis (DCA) was applied to evaluate the latent value of the nomograms [10]. Moreover, the whole cohort was regrouped into low-and high-risk groups with the median risk score generated from the nomogram. Kaplan-Meier analysis and log-rank test were used to explore the survival difference between risk subgroups.
All statistical analyses were performed with SPSS (version 25.0, SPSS, Chicago, IL, USA) and R software (version 3.6.0; http://www.r-project.org/). A P value of < 0.05 was considered statistically significant.

Patient characteristics and survival outcomes
A total of 1541 patients diagnosed with primary OCCC were identified from the SEER database. Most of the patients were at an early stage (78.3%) and the proportion of white patients (74.9%) was the greatest. 1223(79.4%) patients underwent lymph node dissection and 1265 (82.1%) received chemotherapy. Characteristics of patients in the training cohort (n = 1079) and the validation cohort (n = 462) were listed in Table 1.
The 3-and 5-year OS rates were 77.8, and 70.6% for all patients, respectively, with a mean follow-up time of 65.3 months. The 3-and 5-year CSS rates were 77.8, and 70.6% for all patients, respectively, with a mean followup time of 66.8 months. The 3-and 5-year OS and CSS rates of all patients in terms of different clinical features were shown in Table 2.

Construction of the prognostic nomograms for OS and CSS
In total, 11 variables were included in the analysis. According to the results of LASSO Cox regression analysis, age, laterality, stage, LN dissected, LN status, organ metastasis, radiotherapy and chemotherapy were identified for OS and CSS risk factors (Fig. 1).
In the multivariate analysis of these 8 factors, age, laterality, stage, LN dissected, organ metastasis and chemotherapy were independently and significantly associated with OS, while laterality, stage, LN dissected, organ metastasis and chemotherapy were independently and significantly associated with CSS (Table 3). Based on the above, nomograms were constructed by incorporating the prognostic factors to predict 3-and 5-year OS and CSS (Fig. 2).

Nomogram validation
The The calibration curves indicated excellent agreement between the nomogram predicted and actual survival outcomes in the training and validation cohort (Fig. 3). DCA curves indicated that the nomogram models made favorable predictions and outperformed the AJCC staging system (Fig. 4).

Risk stratification of OCCC patients
The risk score of each variable was generated from the nomogram and the total scores were calculated for all the patients. The median risk score was 6 (range: 3-21) for OS and 4 (range: 2-19) for CSS. The whole cohort was divided into low-and high-risk subgroups based on the median risk score. According to the survival curves in Fig. 5, significant differences were observed between the low-and high -risk groups for both OS (P < 0.001) and CSS (P < 0.001), implying the nomogram's outstanding ability for risk stratification.

Discussion
In the current study, data of patients with OCCC who have undergone surgery in the SEER database were used for the analysis of risk factors. Nomograms were constructed to assess the 3-and 5-year CSS and OS based on the identified prognostic factors. Favorable discrimination and calibration were observed from C-index, calibration curves and DCA curves in both training and validation sets, indicating excellent performances of the nomograms. Moreover, risk scores generated from the nomograms were applied to successfully build a risk stratification system.
Our study identified six independent prognostic factors for OS: age, tumor laterality, organ metastasis, LN dissected, stage and chemotherapy. These factors also significantly impact CSS, except for age. Generally, patients at an older age are more likely to present worse survival outcomes due to lower immune response [11]. However, we observed that patients younger than 50 years tended to have poorer prognosis. One of the possible explanations may be the relatively conservative surgical mode for patients who wanted to preserve fertility. In addition to age, the relationship between other demographic characteristics (such as race) and prognosis was also explored.  Systematic lymphadenectomy was regarded as an important part of treatment guidelines for patients with EOC considering the prognostic value of LN status [12]. We observed that patients with more than 10 lymph nodes removed were associated with better prognosis. A 10-lymph node cutoff was defined as adequate lymphadenectomy according to the Gynecologic Oncology Group criteria. Several retrospective studies have demonstrated favorable survival outcomes of systematic lymphadenectomy on patients with early-stage OCCC [13][14][15]. In a combined exploratory analysis of three prospectively randomized phase III multicenter trials, Magazzino et al. reported that lymphadenectomy offered benefit to patients with advanced OC who received complete intraperitoneal debulking [16]. However, another study did not observe significant improvement in the survival of advanced OCCC patients with systematic retroperitoneal lymphadenectomy [17]. It is worth noting that LN status was not a prognostic factor in our study. Therefore, the role of lymphadenectomy on the whole cohort of OCCC patients required further investigation.
Chemotherapy is important in the management of EOC, especially for high-grade cases, while the role of chemotherapy for patients with OCCC remains controversial [18]. It was reported that the response rate of OCCC to conventional platinum was much lower than serous type in the first-line setting [19]. Published studies mainly focused on the performance of chemotherapy on early-stage cases [20][21][22]. In our study, we noted that chemotherapy was significantly associated with OS and CSS, implying its value in improving survival outcomes. Toru and his group carried out a randomized phase III trial (JGOG3017/GCIG Trial) to make a comparison between two chemotherapy regimens for OCCC. They demonstrated that irinotecan plus cisplatin and paclitaxel plus carboplatin were both well tolerated with no significant difference in survival benefit [23].
Several studies have reported better performance of the nomogram model than conventional staging systems and proposed it as a promising tool for prognosis evaluation [24][25][26]. Diao [25]. Similarly, a prediction model was constructed for patients with non-small cell lung cancer, with a C-index of 0.674 (0.652-0.696, 26]. All of the above nomograms presented better discriminatory capacity than did the staging systems. Lack of some clinical information was the common limitation for these studies.
The nomograms developed in our study also presented better prediction capacity than AJCC 7th staging system. The nomogram model enables risk stratification of patients, thus facilitating personalized treatment plans and follow-up schedules. Considering the chemo-resistant feature of OCCC, efforts have been made to explore precision medicine based on molecular profiles, such as drugs targeting ARID1A-deficient OCCC patients [27]. It may be feasible to use a prediction model to select candidates for clinical trials.
It should be noted that there are several limitations in our study. First, detailed information about chemotherapy and radiotherapy as well as surgical procedures were unavailable. Data about the recurrence and reoperation were also unavailable in the SEER database. Second, selection bias was inevitable due to the study's retrospective nature. Third, the nomogram model only received internal validation. External validation of cohorts from other countries and prospective randomized clinical trials are were necessary to confirm its performance.

Conclusion
Nomograms with favorable capacity of prognosis assessment of 3-and 5-year OS and CSS for patients with initially diagnosed OCCC were constructed using data from a large-scale dataset. A risk stratification system was built based on risk scores generated from the nomograms. These nomograms may be useful to provide prognostic information in clinical work.