Development and validation of circulating CA125 prediction models in postmenopausal women

Sasamoto, Naoko; Babic, Ana; Rosner, Bernard A.; Fortner, Renée T.; Vitonis, Allison F.; Yamamoto, Hidemi; Fichorova, Raina N.; Titus, Linda J.; Tjønneland, Anne; Hansen, Louise; Kvaskoff, Marina; Fournier, Agnès; Mancini, Francesca Romana; Boeing, Heiner; Trichopoulou, Antonia; Peppa, Eleni; Karakatsani, Anna; Palli, Domenico; Grioni, Sara; Mattiello, Amalia; Tumino, Rosario; Fiano, Valentina; Onland-Moret, N. Charlotte; Weiderpass, Elisabete; Gram, Inger T.; Quirós, J. Ramón; Lujan-Barroso, Leila; Sánchez, Maria-Jose; Colorado-Yohar, Sandra; Barricarte, Aurelio; Amiano, Pilar; Idahl, Annika; Lundin, Eva; Sartor, Hanna; Khaw, Kay-Tee; Key, Timothy J.; Muller, David; Riboli, Elio; Gunter, Marc; Dossus, Laure; Trabert, Britton; Wentzensen, Nicolas; Kaaks, Rudolf; Cramer, Daniel W.; Tworoger, Shelley S.; Terry, Kathryn L.

doi:10.1186/s13048-019-0591-4

Research
Open access
Published: 26 November 2019

Development and validation of circulating CA125 prediction models in postmenopausal women

Naoko Sasamoto ORCID: orcid.org/0000-0002-4526-2181¹,
Ana Babic²,
Bernard A. Rosner³,
Renée T. Fortner⁴,
Allison F. Vitonis¹,
Hidemi Yamamoto⁵,
Raina N. Fichorova⁵,
Linda J. Titus⁶,
Anne Tjønneland^7,8,
Louise Hansen⁷,
Marina Kvaskoff^9,10,
Agnès Fournier^9,10,
Francesca Romana Mancini^9,10,
Heiner Boeing¹¹,
Antonia Trichopoulou^12,13,
Eleni Peppa¹²,
Anna Karakatsani^12,14,
Domenico Palli¹⁵,
Sara Grioni¹⁶,
Amalia Mattiello¹⁷,
Rosario Tumino¹⁸,
Valentina Fiano¹⁹,
N. Charlotte Onland-Moret²⁰,
Elisabete Weiderpass²¹,
Inger T. Gram²²,
J. Ramón Quirós²³,
Leila Lujan-Barroso²⁴,
Maria-Jose Sánchez^25,26,27,
Sandra Colorado-Yohar^27,28,29,
Aurelio Barricarte^27,30,
Pilar Amiano^27,31,
Annika Idahl³²,
Eva Lundin³³,
Hanna Sartor^34,35,
Kay-Tee Khaw³⁶,
Timothy J. Key³⁷,
David Muller³⁸,
Elio Riboli³⁸,
Marc Gunter²¹,
Laure Dossus²¹,
Britton Trabert³⁹,
Nicolas Wentzensen³⁹,
Rudolf Kaaks⁴,
Daniel W. Cramer¹,
Shelley S. Tworoger^40,41 &
…
Kathryn L. Terry^1,41

Journal of Ovarian Research volume 12, Article number: 116 (2019) Cite this article

4534 Accesses
10 Citations
4 Altmetric
Metrics details

Abstract

Background

Cancer Antigen 125 (CA125) is currently the best available ovarian cancer screening biomarker. However, CA125 has been limited by low sensitivity and specificity in part due to normal variation between individuals. Personal characteristics that influence CA125 could be used to improve its performance as screening biomarker.

Methods

We developed and validated linear and dichotomous (≥35 U/mL) circulating CA125 prediction models in postmenopausal women without ovarian cancer who participated in one of five large population-based studies: Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO, n = 26,981), European Prospective Investigation into Cancer and Nutrition (EPIC, n = 861), the Nurses’ Health Studies (NHS/NHSII, n = 81), and the New England Case Control Study (NEC, n = 923). The prediction models were developed using stepwise regression in PLCO and validated in EPIC, NHS/NHSII and NEC.

Result

The linear CA125 prediction model, which included age, race, body mass index (BMI), smoking status and duration, parity, hysterectomy, age at menopause, and duration of hormone therapy (HT), explained 5% of the total variance of CA125. The correlation between measured and predicted CA125 was comparable in PLCO testing dataset (r = 0.18) and external validation datasets (r = 0.14). The dichotomous CA125 prediction model included age, race, BMI, smoking status and duration, hysterectomy, time since menopause, and duration of HT with AUC of 0.64 in PLCO and 0.80 in validation dataset.

Conclusions

The linear prediction model explained a small portion of the total variability of CA125, suggesting the need to identify novel predictors of CA125. The dichotomous prediction model showed moderate discriminatory performance which validated well in independent dataset. Our dichotomous model could be valuable in identifying healthy women who may have elevated CA125 levels, which may contribute to reducing false positive tests using CA125 as screening biomarker.

Background

Cancer antigen 125 (CA125) is a high molecular-weight glycoprotein (MUC16) normally expressed on tissues derived from the coelomic and Mullerian epithelial cells and aberrantly expressed on a variety of cancers, including breast, lung, leukemia, gastric, and ovarian cancer [1,2,3]. CA125 levels are elevated in more than 80% of ovarian cancer cases and have proven utility assessing response to therapy and prognosis [4].

While CA125 remains the most promising biomarker for ovarian cancer screening, results from two large randomized trials comparing combined CA125 and transvaginal ultrasound (TVUS) to usual care did not show significant improvement in overall survival in the screened group [5, 6]. In the United Kingdom Collaborative Trial of Ovarian Cancer Screening (UKCTOCS), stage of ovarian cancer diagnosis was earlier in the screened group, but there was no clinically significant reduction in overall mortality [6]. The Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) showed no difference in ovarian cancer mortality between women screened with CA125 and TVUS and normal clinical care [5].

CA125 has been limited as an ovarian cancer screening biomarker by low sensitivity and specificity in part due to variation associated with differences in personal characteristics, such as age, hormone use, and menopausal status [6,7,8,9,10]. Identifying factors that influence CA125 levels in healthy individuals could be used to create personalized thresholds for CA125, thereby improving its performance as an ovarian cancer screening biomarker. Here we developed and validated two prediction models (linear and dichotomous) of circulating CA125 levels among postmenopausal women without ovarian cancer who had participated in one of five large population-based studies.

Methods

Study population

PLCO

The Prostate, Lung, Colorectal and Ovarian Cancer (PLCO) Screening Trial was designed to determine the efficacy of screening in reducing mortality from four mentioned cancers [11]. Briefly, from 1993 to 2001, 155,000 healthy subjects, including 78,214 women ages 55–74, were recruited from 10 study sites across the U. S and randomized to screening (the intervention arm) or usual care (the control arm). Screening intervention consisted of CA125 measurements and transvaginal ultrasound at baseline and at each of six annual screenings. For the purpose of this analysis, we used only the baseline CA125 measurements. Data on demographic and lifestyle factors were collected by questionnaires administered at baseline. Among a total of 78,214 participants, we excluded women from the control arm (n = 34,304), as well as those with no ovaries at baseline (n = 9658), a prior diagnosis of ovarian, fallopian or peritoneal cancer (n = 1), missing CA125 measurements at baseline (n = 5624), missing baseline questionnaire data (n = 51), a diagnosis of ovarian cancer or loss to follow-up within 3 year from baseline (n = 535), and those missing information on candidate predictor variables of CA125 (n = 1060). After these exclusions, data from 26,981 PLCO participants were available for this analysis.

EPIC

The European Prospective Investigation into Cancer and Nutrition (EPIC) study is a prospective cohort established between 1992 to 2000 [12]. Briefly, 519,978 participants, including 366,521 women, recruited from 23 research centers in 10 European countries, had completed questionnaires on lifestyle, medical and dietary factors. Most participants (74%) provided a blood sample at baseline. Within this cohort, a nested case-control study of ovarian cancer was designed by matching each ovarian case (n = 810) with up to four controls using incidence density sampling [13]. Among 1939 available controls, we defined postmenopausal women as those who met one of the following criteria at the time of blood draw: not on hormones and had not menstruated in the year prior to blood draw; on hormones and age 50 or greater; age at last menstruation was missing and age 50 or greater; had hysterectomy and age 50 or greater at the time of blood draw [7]. We excluded premenopausal women (n = 485), women whose menopausal status could not be determined using the algorithm above (n = 26), women without available CA125 measurement (n = 12), and those missing information on candidate predictor variables of CA125 (total n = 555), leading to a total study population of 861 EPIC participants for this analysis.

NHS/NHSII

The Nurses’ Health Study (NHS) is a prospective cohort established in 1976 when 121,700 registered nurses residing in 11 U.S. states were enrolled to investigate the long-term health outcomes of various contraceptive methods in women [14]. Nurses’ Health Study II (NHSII) is a prospective cohort established in 1989 when 116,429 nurses residing in 14 states were enrolled to study the association between oral contraceptives, diet, and lifestyle factors and long-term outcomes [15]. Participants answered baseline and biennial follow-up questionnaires about a variety of lifestyle, reproductive and medical characteristics. Blood samples were collected at two time points both in NHS (1989–1990, 2000–2002) and NHSII (1996–1999, 2010–2011). Among women with available blood samples, CA125 was measured in 152 NHS participants and 50 NHSII participants with no evidence of ovarian cancer, for a total of 202 women. We restricted to postmenopausal women defined as not having menstrual period within the past 12 months at the time of blood draw. We excluded premenopausal women (n = 47), those with unknown menopausal status (n = 14), and those missing information on candidate predictor variables of CA125 (n = 60), resulting in a final dataset of 81 NHS/NHSII participants for this analysis.

NEC

The New England Case Control Study (NEC) is a population-based ovarian cancer case-control study that enrolled participants from New Hampshire and Eastern Massachusetts over three study phases (1992–1997, 1998–2002, 2003–2008) [16]. Briefly, a total of 2075 epithelial ovarian cancer cases and 2100 controls, frequency matched on age and state of residence, participated. All the participants were interviewed in person about lifestyle factors, and medical and reproductive history. Over 95% of the study participants provided blood specimens at enrollment. Of 2100 controls, we restricted to postmenopausal women defined as: not on hormones and self-reported their menstruation had stopped or were regularly bleeding because of menopausal hormone use, were not menstruating because of hysterectomy or a medical condition/treatment and age at blood draw was 50 or greater. We excluded premenopausal women (n = 885), women without CA125 values (n = 95), and those missing information on candidate predictor variables of CA125 (n = 197), resulting in a total population of 923 healthy women for this analysis.

CA125 predictor variables

Candidate predictors of CA125 were selected for this analysis based on the previously published reports [6,7,8,9,10]. These included age at blood draw, race, body mass index (BMI, calculated by kg/m²), smoking status and pack-years (calculated by number of packs of cigarettes per day multiplied by the number of smoking years), age at menarche, use of oral contraceptives (OC), parity, ovarian cysts, self-reported endometriosis, hysterectomy, age at menopause, time since menopause, hormone therapy (HT) use and duration, family history of ovarian cancer in first-degree relatives, and previous history of cancer.

We first developed the prediction models in PLCO using the candidate predictors above and then harmonized the selected final predictors across all studies so the categorization of the variables matched the variables in PLCO. Information on predictor variables listed above were collected by questionnaire data in all five studies. For PLCO, EPIC, and NEC, predictor variables and blood samples were obtained at baseline. For NHS/NHSII, age and weight were obtained from the questionnaire administered at the time of blood draw and other predictor variables were obtained from the most recent biennial questionnaire prior to the blood collection. Smoking duration among current smokers and former smokers was defined by pack-years among current and former smokers respectively across all studies. Age at menopause was defined as the self-reported age at the last menstrual period in all studies. For women who had a hysterectomy and were missing age at menopause, age at menopause was excluded. Time since menopause was calculated by subtracting age at menopause from age at blood draw.

CA125 measurements

In PLCO, CA125 was measured using the CA-125II radioimmunoassay (Centocor) with an upper limit of normal (ULN) of 35 U/mL, described in detail elsewhere [8]. The coefficients of variation (CV) were 4.1% at a CA125 level of 52.7 U/mL, and 3.8% at a CA125 level of 106.5 U/mL [5]. In NEC and NHS/NHSII, CA125 was measured using CA-125II radioimmunoassay (Centocor) at the CERLab at Boston Children’s Hospital. The reproducibility of the assay was evaluated by including five blinded aliquots of a uniform quality control pool in each of the 46 test batches (CV = 1%). In EPIC, CA125 was measured using a volume-effective highly sensitive multiplex platform (Meso Scale Discovery, MSD) in the Genital Tract Biology Laboratory at Brigham and Women’s Hospital, with ULN of 55 U/mL. The CV for unblinded quality controls samples on each assay plates was 8.4% [13].

Statistical analysis

CA125 levels were log-transformed to achieve normality in all of the analyses.

Recalibration of CA125

To account for the differences in CA125 values measured in CA125II and MSD assays, we used data from 534 NEC participants, including 353 postmenopausal women, with CA125 measured using both assays to build the recalibration model [17]. First, we built a regression model to obtain the intercept and beta coefficient (i.e. log-transformed CA125II assay value = intercept + beta*log-transformed MSD assay values). Then, we applied the intercept and beta coefficient values from this model to calculate the predicted log-transformed CA125II assay values for all the EPIC participants based on their MSD assay values. We used the predicted CA125 values based on this model for all EPIC participants in our analyses.

We calculated geometric means of CA125 values across levels of predictor variables and assessed the changes in CA125 values using percent change calculated as [exp (beta)-1] × 100 for a 1-unit change in the predictor. In order to develop the prediction model for continuous CA125 values, we randomly divided the PLCO dataset into a training (n = 17,987) and testing (n = 8994) dataset. Using the PLCO training dataset, we first examined the most appropriate way to model all the variables (continuous, categorical). For continuous variables including age, BMI, parity, and pack-years of smoking we tested for linearity of the association using restricted cubic splines [18]. For categorical variables we used likelihood ratio test to compare nested models, and Vuong test and Akaike information criterion for non-nested models [19]. Based on these evaluations, we modeled the candidate predictors as follows: age, BMI, and pack-years of smoking were modeled as continuous variables; race (white, non-white), smoking status (categorical, never, current, former), age at menarche (categorical, < 10 years, 10–11 years, 12–13 years, 14–15 years, ≥ 16 years), OC use (never, ever), parity (categorical, 0, 1, 2, 3, 4, ≥5), history of ovarian cysts (no, yes), history of endometriosis (no, yes), history of hysterectomy (no, yes), age at menopause (categorical, < 40 years, 40–44 years, 45–49 years, 50–54 years, ≥55 years), HT use (never, ever), time since menopause (categorical, < 5 years, 5–9 years, 10–14 years, 15–19 years, ≥20 years), duration of HT use (categorical, ≤ 1 years, 2–3 years, 4–5 years, 6–9 years, ≥10 years), family history of ovarian cancer (no, yes), family history of breast cancer (no, yes), and previous history of cancer (no, yes).

Prediction modeling

We developed and validated CA125 prediction models (linear and dichotomous) in postmenopausal women using five large population-based datasets (Fig. 1). We developed the prediction model in PLCO and validated the models in EPIC and in NHS/NHSII/NEC combined dataset.

Linear model

The association between individual predictors and CA125 levels were examined in age-adjusted models using linear regression in the entire PLCO dataset. Linear trend was tested using the continuous value of the variables (i.e. age, BMI, pack-years in current/former smokers, parity) or using the midpoint of the categories (i.e. age at menarche, duration of OC use, age at menopause, time since menopause, duration of HT use). To develop a linear CA125 prediction model, we used variables associated with CA125 at p-value < 0.05 in univariate analysis and performed a stepwise selection using p-values of 0.15 as model entry and retention criteria in the PLCO training dataset. Next, we tested the performance of the linear prediction model in PLCO testing dataset, EPIC and NHS/NHSII/NEC datasets. Briefly, predicted CA125 values in those three datasets were calculated using effect estimates from the linear prediction model developed in the PLCO training dataset and plotted against the measured CA125 values. Pearson correlation coefficient (r) was used to evaluate the linear correlation between measured and predicted CA125 values.

Dichotomous model

The association between individual predictors and CA125 levels ≥35 U/mL was examined in age-adjusted models using logistic regression in the entire PLCO dataset since there were only 435 participants with CA125 levels > 35 U/mL. Then, we developed a multivariate prediction model for CA125 ≥ 35 U/mL. To develop the final dichotomous CA125 prediction model, we used variables associated with CA125 levels at p-value ≤0.05 in the univariate analysis and performed a stepwise selection using p-values of 0.15 as model entry and retention criteria. Using variables selected in the stepwise selection, we evaluated the area under the curve (AUC) of receiver-operating-characteristic (ROC) curves in PLCO and NHS/NHSII/NEC datasets. EPIC was not included since only a single participant with data on all predictors had recalibrated a CA125 value ≥35 U/mL.

All statistical analyses were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC).

Results

Baseline characteristics were mostly similar across study populations, with CA125 averaging between 10 and 14 U/mL (Additional file 1: Table S1). Briefly, women were in their early 60s on average, with average BMI around 26 kg/m², and mostly white race (> 90%). Approximately half of the participants reported ever smoking and most participants were parous (90%).

Recalibration of CA125

We recalibrated the CA125 values in the EPIC participants using the model based on 534 NEC controls with CA125 measurements on both CA125II and MSD assays. The measured CA125II assay values and the recalibrated values calculated based on the recalibration model in NEC were highly correlated with Pearson correlation coefficient of 0.90 (95%CI: 0.88–0.91).

Linear model

First, we evaluated the association between candidate predictors and continuous CA125 levels in 26,981 postmenopausal women in PLCO. Older age at blood draw, white race, lower BMI, former smoking status, shorter duration of smoking among former smokers, older age at first menstrual period, higher parity, having history of benign ovarian cyst, no history of hysterectomy, older age at last menstrual period, ever use and longer duration of hormone therapy, and shorter time since menopause were associated with higher levels of CA125 (Table 1).

Table 1 Age-adjusted association between selected characteristics and CA125 levels in Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO)

Full size table

We used stepwise regression analysis in the PLCO training dataset to develop the linear prediction model using variables associated with CA125 levels at p-value < 0.05 in univariate models (i.e. age, race, BMI, smoking status, pack-years among current smokers, pack-years among former smokers, age at first menstrual period, parity, hysterectomy, age at last menstrual period, time since menopause, and ever use and duration of HT use).

The linear prediction model included age, race, BMI, smoking status, pack-years among current and former smokers, parity, hysterectomy, age at last menstrual period, and HT use and duration, which explained 5% of the variability of log-transformed CA125 (Table 2). Alternatively, when all significant predictors were included in the model without variable selection process (which consists of variables above plus age at first menstrual period and time since menopause), the r-squared was 0.05, same as that of the linear model developed using stepwise regression with fewer predictors. The associations between the selected predictors and CA125 levels in the multivariate model were similar to those observed in the univariate model. Next, we calculated the predicted log-transformed CA125 levels in the validation datasets based on the regression coefficients in the PLCO training dataset. In the PLCO testing dataset, the Pearson correlation coefficient of the measured and the predicted log-transformed CA125 was 0.18 (95%CI: 0.16–0.20) (Fig. 2a). In NHS/NHSII/NEC dataset, the Pearson correlation coefficient of the measured and the predicted log-transformed CA125 was 0.14 (95%CI: 0.08–0.20) (Fig. 2b) and in EPIC dataset it was 0.14 (95%CI: 0.07–0.20) (Fig. 2c), both similar to that in the PLCO testing dataset.

Table 2 Linear CA125 prediction model in Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) training dataset

Full size table

Dichotomous model

We evaluated the association between candidate predictors and having CA125 ≥ 35 U/mL in PLCO (Additional file 1: Table S2). Older age at blood draw, white race, lower BMI, greater pack-years among former smokers, nulliparity, no history of hysterectomy, older age at last menstrual period, longer duration of HT use and shorter time since menopause was associated with having CA125 levels ≥35 U/mL.

We used stepwise regression analysis using all of the candidate predictors to develop the dichotomous prediction model, which included age, race, BMI, smoking status, pack-years among current and former smokers, hysterectomy, time since menopause, and duration of HT use, with an AUC of 0.64 (95%CI: 0.61–0.66) in PLCO (Table 3, Fig. 3). When we applied the regression coefficients in the PLCO to the validation dataset, the AUC was 0.80 (95%CI: 0.73–0.87) in NHS/NHSII/NEC (Fig. 3).

Table 3 Dichotomous CA125 prediction model in Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO)

Full size table

We observed that ever HT use and longer duration of use were positively associated with CA125 levels both in the linear and dichotomous model. Since women with a history of hysterectomy are more likely to have taken estrogen-only HT and type of HT may be differentially associated with CA125 levels, we conducted a stratified analysis by history of hysterectomy. However, we did not observe statistically significant effect modification by history of hysterectomy (p-interaction = 0.58; data not shown).

Discussion

We confirmed factors contributing to variations in CA125 levels among postmenopausal women, including age, BMI, race, smoking status and duration, age at first menstrual period, parity, having benign ovarian cyst, hysterectomy, age at last menstrual period, HT use and duration, and time since menopause. Based on these factors, we developed and validated two prediction models in postmenopausal women without ovarian cancer using data from five large population-based studies. The final linear CA125 prediction model explained little of the total variation of CA125 values but showed similar performance in the testing and external validation datasets. The final dichotomous CA125 prediction model showed moderate discriminatory performance and validated well in the external validation dataset. Interestingly, age, BMI, race, hysterectomy, and duration of HT use were selected in both linear and dichotomous models, suggesting that these factors are robust predictors of CA125 levels in postmenopausal women.

Studies have examined personal factors that influence CA125 levels in healthy women in order to improve the clinical utility in interpreting the biomarker levels [7,8,9,10, 20]. The significant predictors selected in our linear prediction model were consistent with three prior studies that evaluated predictors of CA125 in postmenopausal women without ovarian cancer [7,8,9]. Older age at blood draw, non-white race, current smoking status, younger age at menopause, and history of hysterectomy were significant predictors that were consistently associated with lower CA125 levels across all of the studies that had information on these variables. Increased parity was also consistently associated with higher CA125 levels in two of the studies that assessed parity [7, 9]. HT use and longer duration were associated with higher CA125 levels in our linear prediction model, but the results were mixed in the prior two studies that examined HT use [7, 8]. This could be due to the possible differences in association by type of HT (e.g. estrogen only, estrogen and progesterone combined). If many of the hormone therapies were cyclical hormone therapies using estrogen and progesterone combined, these would result in proliferation of the endometrium and withdrawal bleeding which may possibly lead to increase in CA125 levels compared to women who are not on hormonal therapy and have no withdrawal bleeding, given that CA125 is expressed in the endometrial tissue. Although we did not observe significant effect modification of the HT associations by history of hysterectomy, given that women with history of hysterectomy are more likely to be on estrogen only HT, lack of effect modification is difficult to conclude since we were not able to evaluate the association by type of HT use due to limited information on type of HT. We did investigate former and current HT use separately, although the effect estimates were similar in these two subgroups and therefore we combined the categories into an “ever” use category when including in the final model.

In addition to examining individual predictors, we evaluated and validated the performance of the multivariate linear CA125 prediction model. Although several variables were significant predictors of CA125 in postmenopausal women and our linear prediction model was validated in independent datasets, the total variance explained by the linear prediction model was only 5%, suggesting that the known predictors may not be sufficient in explaining the CA125 variation. This is further supported by the observed lack of significant improvement in the model performance even when including all significant predictors in the model.

We also developed and validated a dichotomous prediction model using the CA125 ≥ 35 U/mL threshold. Only one prior study examined predictors of CA125 ≥ 35 U/mL in postmenopausal women, with age, BMI and hysterectomy being the only significant factors in the multivariate model [8], which were consistent with our findings. Our final dichotomous model additionally included race, smoking status and duration, time since menopause, and duration of HT use as significant predictors. Furthermore, our final dichotomous model showed moderate discriminatory performance with nine predictors which validated well in the independent dataset, suggesting the robustness of the model.

The major strength of our study was the use of data from five large population-based studies to develop and conduct internal and external validation of circulating CA125 prediction models in postmenopausal women without ovarian cancer, resulting in robust prediction models. However, there are two major limitations to the study. Since we restricted the candidate predictors to those that have been described previously in the literature, we may be lacking significant predictors which have not been investigated to date. Given that the total variance explained by our final linear model was 5%, there may be other predictors of CA125, such as genetic variants, common medications, or dietary factors, which may explain more of the variability of CA125 in postmenopausal women. Misclassification of CA125 levels in the EPIC cohort is also a concern since CA125 was measured using a different assay in this study. However, the recalibrated CA125 values based on the MSD assay values were highly correlated with the measured CA125 values using the CA125II assays in NEC (r = 0.90). In addition, the performance of the final linear model in NHS/NHSII/NEC was similar to that in EPIC, suggesting the high accuracy of the recalibration model.

Conclusion

In summary, we developed and validated models predicting circulating CA125 in healthy postmenopausal women. The dichotomous prediction model showed moderate discriminatory performance which validated well in independent dataset. However, the linear prediction model explained a small portion of the total variability of CA125, suggesting the need to identify novel predictors of CA125. While CA125 has shown value in distinguishing malignant from benign pelvic masses [21, 22], its value as a screening biomarker in the general population has been limited by elevated levels roughly 10% of women without cancer result, which could lead to unnecessary interventions and psychological harms [23]. Our dichotomous model could be used to identify healthy women who may have CA125 levels greater than the current clinical cutoff, which may contribute to reducing false positive tests using CA125 as screening biomarker.

Availability of data and materials

The datasets that support the findings of this study are available on reasonable request. The data are not publicly available due to privacy and ethical restrictions. For information on how to submit an application for gaining access to EPIC data and/or biospecimens, please follow the instructions at http://epic.iarc.fr/access/index.php.

Abbreviations

AUC:: Area under the curve
BMI:: Body mass index
CA125:: Cancer antigen 125
CIs:: Confidence intervals
CV:: Coefficients of variation
EPIC:: European Prospective Investigation into Cancer and Nutrition
HT:: Hormone therapy
MSD:: Meso Scale Discovery
NEC:: New England Case-Control Study
NHS:: Nurses’ Health Studies
OC:: Oral contraceptive
PLCO:: Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial
ROC:: Receiver-operating-characteristic
TVUS:: Transvaginal ultrasound
UKCTOCS:: United Kingdom Collaborative Trial of Ovarian Cancer Screening
ULN:: Upper limit of normal

References

Haridas D, Ponnusamy MP, Chugh S, Lakshmanan I, Seshacharyulu P, Batra SK. MUC16: molecular analysis and its functional implications in benign and malignant conditions. FASEB J. 2014;28:4183–99.
Article CAS Google Scholar
Anderson GL, McIntosh M, Wu L, Barnett M, Goodman G, Thorpe JD, et al. Assessing lead time of selected ovarian cancer biomarkers: a nested case-control study. J Natl Cancer Inst. 2010;102:26–38.
Article CAS Google Scholar
Schorge JO, Modesitt SC, Coleman RL, Cohn DE, Kauff ND, Duska LR, et al. SGO white paper on ovarian cancer: etiology, screening and surveillance. Gynecol Oncol. 2010;119:7–17.
Article Google Scholar
Bast RC Jr, Klug TL, St John E, Jenison E, Niloff JM, Lazarus H, et al. A radioimmunoassay using a monoclonal antibody to monitor the course of epithelial ovarian cancer. N Engl J Med. 1983;309:883–7.
Article Google Scholar
Buys SS, Partridge E, Greene MH, Prorok PC, Reding D, Riley TL, et al. Ovarian cancer screening in the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial: findings from the initial screen of a randomized trial. Am J Obstet Gynecol. 2005;193:1630–9.
Article Google Scholar
Jacobs IJ, Menon U, Ryan A, Gentry-Maharaj A, Burnell M, Kalsi JK, et al. Ovarian cancer screening and mortality in the UK collaborative trial of ovarian Cancer screening (UKCTOCS): a randomised controlled trial. Lancet. 2016;387:945–56.
Article Google Scholar
Fortner RT, Vitonis AF, Schock H, Husing A, Johnson T, Fichorova RN, et al. Correlates of circulating ovarian cancer early detection markers and their contribution to discrimination of early detection models: results from the EPIC cohort. J Ovarian Res. 2017;10:20.
Article Google Scholar
Johnson CC, Kessel B, Riley TL, Ragard LR, Williams CR, Xu JL, et al. The epidemiology of CA-125 in women without evidence of ovarian cancer in the prostate, lung, colorectal and ovarian Cancer (PLCO) screening trial. Gynecol Oncol. 2008;110:383–9.
Article CAS Google Scholar
Pauler DK, Menon U, McIntosh M, Symecko HL, Skates SJ, Jacobs IJ. Factors influencing serum CA125II levels in healthy postmenopausal women. Cancer Epidemiol Biomarkers Prev. 2001;10:489–93.
CAS PubMed Google Scholar
Westhoff C, Gollub E, Patel J, Rivera H, Bast R Jr. CA 125 levels in menopausal women. Obstet Gynecol. 1990;76:428–31.
CAS PubMed Google Scholar
Zhu CS, Pinsky PF, Kramer BS, Prorok PC, Purdue MP, Berg CD, et al. The prostate, lung, colorectal, and ovarian cancer screening trial and its associated research resource. J Natl Cancer Inst. 2013;105:1684–93.
Article Google Scholar
Riboli E, Hunt KJ, Slimani N, Ferrari P, Norat T, Fahey M, et al. European prospective investigation into Cancer and nutrition (EPIC): study populations and data collection. Public Health Nutr. 2002;5:1113–24.
Article CAS Google Scholar
Terry KL, Schock H, Fortner RT, Husing A, Fichorova RN, Yamamoto HS, et al. A prospective evaluation of early detection biomarkers for ovarian Cancer in the European EPIC cohort. Clin Cancer Res. 2016;22:4664–75.
Article CAS Google Scholar
Colditz GA, Hankinson SE. The Nurses' health study: lifestyle and health among women. Nat Rev Cancer. 2005;5:388–96.
Article CAS Google Scholar
Rockhill B, Willett WC, Hunter DJ, Manson JE, Hankinson SE, Spiegelman D, et al. Physical activity and breast cancer risk in a cohort of young women. J Natl Cancer Inst. 1998;90:1155–60.
Article CAS Google Scholar
Terry KL, De Vivo I, Titus-Ernstoff L, Sluss PM, Cramer DW. Genetic variation in the progesterone receptor gene and ovarian cancer risk. Am J Epidemiol. 2005;161:442–51.
Article Google Scholar
Eliassen AH, Hendrickson SJ, Brinton LA, Buring JE, Campos H, Dai Q, et al. Circulating carotenoids and risk of breast cancer: pooled analysis of eight prospective studies. J Natl Cancer Inst. 2012;104:1905–16.
Article CAS Google Scholar
Durrleman S, Simon R. Flexible regression models with cubic splines. Stat Med. 1989;8:551–61.
Article CAS Google Scholar
Clarke KA, Signorino CS. Discriminating methods: tests for nonnested discrete choice models. Pol Stud. 2010;58:368–88.
Article Google Scholar
Lowe KA, Shah C, Wallace E, Anderson G, Paley P, McIntosh M, et al. Effects of personal characteristics on serum CA125, mesothelin, and HE4 levels in healthy postmenopausal women at high-risk for ovarian cancer. Cancer Epidemiol Biomarkers Prev. 2008;17:2480–7.
Article CAS Google Scholar
Bast RC Jr, Skates S, Lokshin A, Moore RG. Differential diagnosis of a pelvic mass: improved algorithms and novel biomarkers. Int J Gynecol Cancer. 2012;22(Suppl 1):S5–8.
Article Google Scholar
Terlikowska KM, Dobrzycka B, Witkowska AM, Mackowiak-Matejczyk B, Sledziewski TK, Kinalski M, et al. Preoperative HE4, CA125 and ROMA in the differential diagnosis of benign and malignant adnexal masses. J Ovarian Res. 2016;9:43.
Article Google Scholar
Force USPST, Grossman DC, Curry SJ, Owens DK, Barry MJ, Davidson KW, et al. Screening for ovarian Cancer: US preventive services task Force recommendation statement. JAMA. 2018;319:588–94.
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank all the participants and staff of the European Prospective Investigation into Cancer and Nutrition Study, the New England Case-Control Study, the Nurses’ Health Study, and the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial for their valuable contributions. The authors alone are responsible for the views expressed in this article and they do not necessarily represent the views, decisions or policies of the institutions with which they are affiliated. Where authors are identified as personnel of the International Agency for Research on Cancer / World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer / World Health Organization.

Funding

Research reported in this publication was supported by U.S. National Institutes of Health under the following award numbers: R01 CA193965 (to K.L. Terry), R01 CA 158119 and R35 CA197605 (to D.W. Cramer), P01 CA087969 (to S.S. Tworoger), UM1 CA186107, R01 CA49449, UM1 CA176726, R01 CA67262, and supported in part by the intramural research program of the U.S. National Cancer Institute, National Institutes of Health, Department of Health and Human Services. The coordination of EPIC is financially supported by the European Commission (DG-SANCO) and the International Agency for Research on Cancer. The national cohorts are supported by Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Générale de l’Education Nationale, Institut National de la Santé et de la Recherche Médicale (INSERM) (France); German Cancer Aid, German Cancer Research Center (DKFZ), Federal Ministry of Education and Research (BMBF), Deutsche Krebshilfe, Deutsches Krebsforschungszentrum and Federal Ministry of Education and Research (Germany); the Hellenic Health Foundation (Greece); Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (The Netherlands); ERC-2009-AdG 232997 and Nordforsk, Nordic Centre of Excellence programme on Food, Nutrition and Health (Norway); Health Research Fund (FIS), PI13/00061 to Granada;, PI13/01162 to EPIC-Murcia), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, ISCIII RETIC (RD06/0020) (Spain); Swedish Cancer Society, Swedish Research Council and County Councils of Skåne and Västerbotten, The Cancer Research Foundation of Northern Sweden (Sweden); Cancer Research UK (14136 to EPIC-Norfolk; C570/A16491 and C8221/A19170 to EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk, MR/M012190/1 to EPIC-Oxford) (United Kingdom).

Author information

Authors and Affiliations

Obstetrics and Gynecology Epidemiology Center, Brigham and Women’s Hospital and Harvard Medical School, 221 Longwood Avenue, Boston, MA, 02115, USA
Naoko Sasamoto, Allison F. Vitonis, Daniel W. Cramer & Kathryn L. Terry
Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
Ana Babic
Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
Bernard A. Rosner
Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
Renée T. Fortner & Rudolf Kaaks
Laboratory of Genital Tract Biology, Department of Obstetrics, Gynecology and Reproductive Biology, Brigham and Women’s Hospital, Boston, MA, USA
Hidemi Yamamoto & Raina N. Fichorova
Departments of Epidemiology and Pediatrics, Geisel School of Medicine at Dartmouth and Norris Cotton Cancer Center, Hanover, NH, USA
Linda J. Titus
Diet, Genes and Environment, Danish Cancer Society Research Center, Copenhagen, Denmark
Anne Tjønneland & Louise Hansen
Department of Public Health, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
Anne Tjønneland
CESP, Fac. de médecine - Univ. Paris-Sud, Fac. de médecine - UVSQ, INSERM, Université Paris-Saclay, Villejuif, France
Marina Kvaskoff, Agnès Fournier & Francesca Romana Mancini
Gustave Roussy, Villejuif, France
Marina Kvaskoff, Agnès Fournier & Francesca Romana Mancini
Department of Epidemiology, German Institute of Human Nutrition Potsdam-Rehbruecke, Nuthetal, Germany
Heiner Boeing
Hellenic Health Foundation, Athens, Greece
Antonia Trichopoulou, Eleni Peppa & Anna Karakatsani
WHO Collaborating Center for Nutrition and Health, Unit of Nutritional Epidemiology and Nutrition in Public Health, Dept. of Hygiene, Epidemiology and Medical Statistics, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
Antonia Trichopoulou
2nd Pulmonary Medicine Department, School of Medicine, “ATTIKON” University Hospital, National and Kapodistrian University of Athens, Haidari, Greece
Anna Karakatsani
Cancer Risk Factors and Life-Style Epidemiology Unit, Institute for Cancer Research, Prevention and Clinical Network - ISPRO, Florence, Italy
Domenico Palli
Epidemiology and Prevention Unit, Fondazione IRCCS Istituto Nazionale dei Tumori di Milano, Milano, Italy
Sara Grioni
Dipartimento Di Medicina Clinica E Chirurgia, Federico II University, Naples, Italy
Amalia Mattiello
Cancer Registry and Histopathology Department, “Civic - M.P. Arezzo”Hospital, ASP, Ragusa, Italy
Rosario Tumino
Unit of Cancer Epidemiology– CeRMS, Department of Medical Sciences, University of Turin, Turin, Italy
Valentina Fiano
Department of Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
N. Charlotte Onland-Moret
International Agency for Research on Cancer, Lyon, France
Elisabete Weiderpass, Marc Gunter & Laure Dossus
Department of Community Medicine, University of Tromsø, The Arctic University of Norway, Tromsø, Norway
Inger T. Gram
Public Health Directorate, Astruias, Spain
J. Ramón Quirós
Unit of Nutrition and Cancer, Cancer Epidemiology Research Program, Catalan Institute of Oncology (ICO-IDIBELL), L’ Hospitalet de Llobregat, Barcelona, Spain
Leila Lujan-Barroso
Andalusian School of Public Health (EASP), Granada, Spain
Maria-Jose Sánchez
Instituto de Investigación Biosanitaria de Granada (ibs. GRANADA). Universidad de Granada, Granada, Spain
Maria-Jose Sánchez
CIBER of Epidemiology and Public Health (CIBERESP), Madrid, Spain
Maria-Jose Sánchez, Sandra Colorado-Yohar, Aurelio Barricarte & Pilar Amiano
Department of Epidemiology, Murcia Regional Health Council, IMIB-Arrixaca, Murcia, Spain
Sandra Colorado-Yohar
Research Group on Demography and Health, National Faculty of Public Health, University of Antioquia, Medellín, Colombia
Sandra Colorado-Yohar
Navarra Public Health Institute, Navarra Institute for Health Research (IdiSNA), Pamplona, Spain
Aurelio Barricarte
Public Health Division of Gipuzkoa, BioDonostia Research Institute, San Sebastian, Spain
Pilar Amiano
Department of Clinical Sciences, Obstetrics and Gynecology, Umeå University, Umeå, Sweden
Annika Idahl
Department of Medical Biosciences, Pathology, Umeå University, Umeå, Sweden
Eva Lundin
Department of Medical Imaging and Physiology, Skåne University Hospital, Lund, Sweden
Hanna Sartor
Department of Translational Medicine, Lund University, Lund, Sweden
Hanna Sartor
Cancer Epidemiology Unit, University of Cambridge, Cambridge, UK
Kay-Tee Khaw
Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
Timothy J. Key
Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
David Muller & Elio Riboli
Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Washington, D.C, USA
Britton Trabert & Nicolas Wentzensen
Department of Cancer Epidemiology, Moffitt Cancer Center, Tampa, Florida, USA
Shelley S. Tworoger
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Shelley S. Tworoger & Kathryn L. Terry

Authors

Naoko Sasamoto
View author publications
You can also search for this author in PubMed Google Scholar
Ana Babic
View author publications
You can also search for this author in PubMed Google Scholar
Bernard A. Rosner
View author publications
You can also search for this author in PubMed Google Scholar
Renée T. Fortner
View author publications
You can also search for this author in PubMed Google Scholar
Allison F. Vitonis
View author publications
You can also search for this author in PubMed Google Scholar
Hidemi Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Raina N. Fichorova
View author publications
You can also search for this author in PubMed Google Scholar
Linda J. Titus
View author publications
You can also search for this author in PubMed Google Scholar
Anne Tjønneland
View author publications
You can also search for this author in PubMed Google Scholar
Louise Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Marina Kvaskoff
View author publications
You can also search for this author in PubMed Google Scholar
Agnès Fournier
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Romana Mancini
View author publications
You can also search for this author in PubMed Google Scholar
Heiner Boeing
View author publications
You can also search for this author in PubMed Google Scholar
Antonia Trichopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Eleni Peppa
View author publications
You can also search for this author in PubMed Google Scholar
Anna Karakatsani
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Palli
View author publications
You can also search for this author in PubMed Google Scholar
Sara Grioni
View author publications
You can also search for this author in PubMed Google Scholar
Amalia Mattiello
View author publications
You can also search for this author in PubMed Google Scholar
Rosario Tumino
View author publications
You can also search for this author in PubMed Google Scholar
Valentina Fiano
View author publications
You can also search for this author in PubMed Google Scholar
N. Charlotte Onland-Moret
View author publications
You can also search for this author in PubMed Google Scholar
Elisabete Weiderpass
View author publications
You can also search for this author in PubMed Google Scholar
Inger T. Gram
View author publications
You can also search for this author in PubMed Google Scholar
J. Ramón Quirós
View author publications
You can also search for this author in PubMed Google Scholar
Leila Lujan-Barroso
View author publications
You can also search for this author in PubMed Google Scholar
Maria-Jose Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Colorado-Yohar
View author publications
You can also search for this author in PubMed Google Scholar
Aurelio Barricarte
View author publications
You can also search for this author in PubMed Google Scholar
Pilar Amiano
View author publications
You can also search for this author in PubMed Google Scholar
Annika Idahl
View author publications
You can also search for this author in PubMed Google Scholar
Eva Lundin
View author publications
You can also search for this author in PubMed Google Scholar
Hanna Sartor
View author publications
You can also search for this author in PubMed Google Scholar
Kay-Tee Khaw
View author publications
You can also search for this author in PubMed Google Scholar
Timothy J. Key
View author publications
You can also search for this author in PubMed Google Scholar
David Muller
View author publications
You can also search for this author in PubMed Google Scholar
Elio Riboli
View author publications
You can also search for this author in PubMed Google Scholar
Marc Gunter
View author publications
You can also search for this author in PubMed Google Scholar
Laure Dossus
View author publications
You can also search for this author in PubMed Google Scholar
Britton Trabert
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Wentzensen
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Kaaks
View author publications
You can also search for this author in PubMed Google Scholar
Daniel W. Cramer
View author publications
You can also search for this author in PubMed Google Scholar
Shelley S. Tworoger
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn L. Terry
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

NS, KLT had full access to the data in the study and final responsibility for the decision to submit for publication. Study concept and design: NS, AB, BAR, RTF, RK, DWC, SST, KLT Acquisition, analysis, or interpretation of data: all authors. Drafting of manuscript: NS, KLT Critical revision and approving the final version of manuscript: all authors.

Corresponding author

Correspondence to Naoko Sasamoto.

Ethics declarations

Ethics approval and consent to participate

This study protocol was approved by the Institutional Review Board of the Brigham and Women’s Hospital, Boston, Massachusetts (2017P000012).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Table S1.

Baseline characteristics across Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO), European Prospective Investigation into Cancer and Nutrition (EPIC), Nurses’ Health Studies (NHS/NHSII), and New England Case-Control Study (NEC).Table S2. Age-adjusted association between predictors and CA125 levels above 35 U/mL in Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Sasamoto, N., Babic, A., Rosner, B.A. et al. Development and validation of circulating CA125 prediction models in postmenopausal women. J Ovarian Res 12, 116 (2019). https://doi.org/10.1186/s13048-019-0591-4

Download citation

Received: 24 July 2019
Accepted: 04 November 2019
Published: 26 November 2019
DOI: https://doi.org/10.1186/s13048-019-0591-4

Development and validation of circulating CA125 prediction models in postmenopausal women

Abstract

Background

Methods

Result

Conclusions

Background

Methods

Study population

PLCO

EPIC

NHS/NHSII

NEC

CA125 predictor variables

CA125 measurements

Statistical analysis

Recalibration of CA125

Prediction modeling

Linear model

Dichotomous model

Results

Recalibration of CA125

Linear model

Dichotomous model

Discussion

Conclusion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary information

Additional file 1: Table S1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Ovarian Research

Contact us