Correlates of circulating ovarian cancer early detection markers and their contribution to discrimination of early detection models: results from the EPIC cohort

Background Ovarian cancer early detection markers CA125, CA15.3, HE4, and CA72.4 vary between healthy women, limiting their utility for screening. Methods We evaluated cross-sectional relationships between lifestyle and reproductive factors and these markers among controls (n = 1910) from a nested case-control study in the European Prospective Investigation into Cancer and Nutrition (EPIC). Improvements in discrimination of prediction models adjusting for correlates of the markers were evaluated among postmenopausal women in the nested case-control study (n = 590 cases). Generalized linear models were used to calculate geometric means of CA125, CA15.3, and HE4. CA72.4 above vs. below limit of detection was evaluated using logistic regression. Early detection prediction was modeled using conditional logistic regression. Results CA125 concentrations were lower, and CA15.3 higher, in post- vs. premenopausal women (p ≤ 0.02). Among postmenopausal women, CA125 was higher among women with higher parity and older age at menopause (ptrend ≤ 0.02), but lower among women reporting oophorectomy, hysterectomy, ever use of estrogen-only hormone therapy, or current smoking (p < 0.01). CA15.3 concentrations were higher among heavier women and in former smokers (p ≤ 0.03). HE4 was higher with older age at blood collection and in current smokers, and inversely associated with OC use duration, parity, and older age at menopause (≤ 0.02). No associations were observed with CA72.4. Adjusting for correlates of the markers in prediction models did not improve the discrimination. Conclusions This study provides insights into sources of variation in ovarian cancer early detection markers in healthy women and informs about the utility of individualizing marker cutpoints based on epidemiologic factors. Electronic supplementary material The online version of this article (doi:10.1186/s13048-017-0315-6) contains supplementary material, which is available to authorized users.


Background
Mucins CA125 (MUC16) and CA15.3 (MUC1) are membrane-bound, high molecular weight glycoproteins expressed in certain epithelial tissues, as well as some epithelial cancers [1,2]. CA125 is expressed in >80% of ovarian cancers, while CA15.3 is commonly expressed in breast cancer [2]. Human epididymis protein 4 (HE4) is a member of the whey acidic protein family and is widely expressed in ovarian cancers [3]. CA72.4, is a mucin-like glycoprotein expressed in gastric, breast, and ovarian cancers [4]. Circulating concentrations of CA125, CA15.3, HE4 and CA72.4 have been investigated for ovarian cancer early detection. However, these markers have limited predictive utility for ovarian cancer screening given low sensitivity and specificity for early stage disease, as described in an earlier investigation by our group [5]. Variable circulating concentrations of these markers are found in healthy women, limiting their utility for screening.
Given these observations in healthy women, understanding correlates of early detection markers could help improve the utility of these markers in early detection prediction models. We therefore (i) describe associations between lifestyle and reproductive factors and CA125, CA15.3, HE4, and CA72.4; and, (ii) evaluate whether adjusting for these factors in early detection prediction models including early detection marker data improves the discriminatory capacity of these markers in a large, prospective investigation in the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort.

Methods
The EPIC cohort was established between 1992 and 2000 in 23 centers in 10 European countries: Denmark, France, Germany, Greece, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom. Approximately 500,000 participants were recruited at study baseline. Study participants completed questionnaires describing diet, reproductive history, menstrual factors, exogenous hormone use, as well as disease history, smoking, and alcohol use. A total of 385,747 (74%) participants provided a blood sample at or near baseline. The study was approved by the ethics committee of the International Agency for Research on Cancer (IARC) and the ethical committees at the participating centers.
Details of study design and follow-up have been published previously [18]. Briefly, follow-up is conducted via linkages with cancer and population registries with the exception of centers in Germany, Greece, and Naples, Italy; these centers utilize a combination of active follow-up, next-of-kin, and population registries.

Study population
Selection of the cases and controls for this nested casecontrol study has been described in detail previously [5]. Briefly, incident ovarian (n = 752), fallopian tube (n = 33), and primary peritoneal (n = 25) cancers were matched to up to four controls (n = 1938) on study recruitment center, age at blood donation (±6 months), time of the day of blood collection (±1 h), fasting status (<3 h, 3-6 h, >6 h), menopausal status at blood collection (premenopausal, perimenopausal, postmenopausal), and current use of exogenous hormones (OC, menopausal hormone therapy (HT)) at the time of blood draw, as well as menstrual cycle phase for premenopausal women (3-5 categories, depending on available data) using incidence density sampling.
The primary cross-sectional analyses included preand postmenopausal controls from the nested casecontrol study (n = 1910). Given established differences in circulating CA125 by menopausal status [8], crosssectional analyses were restricted to women pre-or postmenopausal at time of blood collection. Women were considered premenopausal if they met one of the following criteria at blood collection: menstruated at least once in the prior year while not on hormones; were on hormones but were less than 50 years old; had a hysterectomy before last period and were less than 50 years old; or, age at last menstruation was missing and age was less than 50. Postmenopausal status was assigned to women who met one of the following criteria at blood collection: were not on hormones and had not menstruated in the past year, on hormones and age was 50 or greater, had a hysterectomy and age was 50 or greater, age at last menstruation was missing and age was 50 or greater. Controls that were perimenopausal or had unknown menopausal status (n = 28) were excluded. In a secondary analysis, we evaluated cross-sectional associations among cases (n = 791). Cases who were perimenopausal or had unknown menopausal status (n = 19) were excluded from these analyses.

Exposure data
Data on lifestyle and reproductive exposures, as well as anthropometric measures, were collected at baseline and included: age at menarche, age at blood draw, OC use and duration, HT use and duration, type of postmenopausal HT, parity, estimated number of ovulatory cycles (defined as the time between age at menopause and age at menarche not taking OCs or pregnant), phase of menstrual cycle at blood collection (premenopausal women), tubal ligation, hysterectomy, oophorectomy, BMI, smoking, and family history of breast cancer. Those missing exposure of interest were excluded from analyses for that exposure. Among controls, the following variables had missing observations: age at menarche (n = 78), OC use (n = 57), duration of OC use (n = 66), parity (n = 136), number of children (n=43), tubal ligation (n = 1661), IUD use at recruitment (n = 522), hysterectomy (n = 347), ovulatory cycles (n = 385), age at menopause (n = 244), HT duration (n = 166), HT at blood (n = 359), BMI (n = 94), smoking (n = 28), pack-years among smokers (n = 10), and family history of breast cancer (n = 1250).

Laboratory methods
All assays were performed in the Brigham and Women's Hospital Obstetrics and Gynecology Laboratory of Genital Tract Biology using a volume-efficient highly sensitive multiplex platform (Meso Scale Discovery (MSD), Gaithersburg, MD, USA) based on electrochemiluminescence (ECL) detection. Single ECL assays for antigen detection of human CA125 (catalog number K151WC) and Human Prototype CA15.3 (Catalog number N45ZA-1) and all reagents related to these two assays were provided by MSD. The linearity range for CA125 was 0.6-10,000 U/ml, and for CA15.3 was 0.19-12,500 mU/ml. HE4 and CA72.4 were analyzed using a custom-designed duplex assay. The following reagents were provided by Fujirebio Diagnostics, Inc. (Malvern, PA): HE4 protein (IgHE4 antigen), which we used to generate a calibration curve with a linear range of 0.0137-3600 pM; anti-HE4 capture IgG1 (2H5 mouse hybridoma); anti-HE4 detection IgG1 (mouse hybridoma 3D8); TAG72 Defined Antigen, which we used to generate a calibrator curve with a linear range of 0.146-2400 U/ml; anti-CA72.4 capture IgG1 (mouse hybridoma CC49, Fujirebio catalog number 110-005); anti-CA72.4 detection IgG1 (mouse hybridoma B72.3). The samples were split into batches such that matched case-control sets and samples from the same study center were kept together in the same batches. The samples were tested undiluted in the CA125 singleplex and the HE4/CA72.4 duplex, and they were tested at a 50-fold dilution in the CA15.3 assay. Blinded quality control (QC) samples were included on each assay plate. In blinded QCs with values within the linearity range of each assay we observed the following interplate CVs and min-max (mean) intraplate CVs: 19% and 3-20 (9)% for CA125, 22% and 3-5% (4%) for CA15.3, 9% and 4-10% (6)% for HE4, 16% and 1-16% (6%) for CA72.4). CA72.4 concentrations were below the lower limit of detection in the blinded QC samples, therefore CVs are based on the remaining 13 aliquots (concentration range: 1.15 to 1.87 U/mL).

Statistical analyses
Biomarker concentrations were log-transformed to obtain a more normal data distribution. We assessed each biomarker for outliers using the generalized extreme studentized deviate many-outlier procedure [19]. Eight outliers were identified for HE4; the influence of these values was assessed in sensitivity analyses. No outliers were identified for CA125 or CA15.3. We used generalized linear models to estimate the mean CA125, CA15.3, and HE4 values across categories of each characteristic and exponentiated results to obtain geometric mean values in the original scale. Since the majority of the CA72.4 values (82%) were below the lower detection limit (1.119 U/mL), we used a logistic regression analysis with a dichotomous CA72.4 variable (≥1.119 vs. < 1.119 U/mL) as the outcome and results are presented only in a supplemental table. Wald tests of continuous variables were used to assess trend. All analyses were adjusted for matching factors from the parent nested case-control study: study center (grouped by country), age at blood draw, fasting status, date of blood draw, menstrual cycle phase for women premenopausal at blood collection, OC/HT use at blood collection, and length of follow up. We adjusted for oophorectomy, number of ovulatory cycles, and smoking status in sensitivity analyses among premenopausal women, and these factors plus age at menopause, hysterectomy, and type of HT among postmenopausal women. Missing indicators were used to account for missing data for covariates. CA125 and CA15. 3 have been reported to vary across the menstrual cycle [20,21]. Therefore, we evaluated these markers both adjusting for menstrual cycle phase and standardized using phasespecific residuals. Results were similar with both approaches; we present the models adjusted for menstrual cycle phase.
To assess whether the adjustment for correlates of these early detection markers improved discrimination between controls and individuals who subsequently became cases, we evaluated the area under the receiver operating characteristic curve (AUC) and compared AUCs from models including the marker alone to those including the marker standardized for its correlates. These analyses were limited to cases who were postmenopausal at time of blood collection (n = 590; and their matched controls), given significant predictors of the markers were only identified among women postmenopausal at blood collection. AUCs were calculated using conditional logistic regression models to account for the matched study design. We calculated absolute risk estimates for ovarian cancer using a model derived in the EPIC cohort [22] and calibrated the conditional logistic regression model towards the absolute risk estimates as an offset variable. We used regression residuals to standardize the marker concentrations based on significant correlates of the marker. Briefly, we calculated the deviation (residual) from the mean predicted concentration given each study participant's profile of correlates. Correlates included for each marker were: CA125: parity, hysterectomy, unilateral oophorectomy, age at menopause, estrogen-alone HT use, ovulatory cycles, current smoking; CA15.3: BMI, former smoking; HE4: age at blood draw, OC use, parity, age at menopause, current smoking; CA72.4: no correlates identified.
Analyses were conducted using SAS version 9.4 (Cary, NC) and R 3.3.0. All statistical tests were twotailed and significant at p < 0.05.

Results
Study participants in the primary cross-sectional analyses restricted to controls were mean age 56 years at blood collection, and 74% were postmenopausal (n = 1421; premenopausal, n = 489). The majority of participants were parous (89%), half were ever users of OCs at the time of blood collection, and 33% of postmenopausal women reported using HT use at the time of blood collection. Average BMI was 25.8 kg/m 2 , and 19% reported smoking at the time of blood draw (Table 1). Characteristics of the full nested case-control study population have been presented previously [5]. Briefly, cases were median age 63 years at diagnosis (range: 31-86 years), with median 6 years between blood collection and diagnosis (range: 0-16 years). The majority of cases were diagnosed with tumors of serous histology (n = 443; 55%). CA125 concentrations differed significantly by menopausal status at blood collection, with lower concentrations observed among postmenopausal women (premenopausal: 26.1 U/mL; postmenopausal: 18.4 U/mL; p < 0.01; Table 2). Concentrations of CA15.3 were significantly higher among postmenopausal (617.5 U/mL) compared to premenopausal (552.9 U/mL, p = 0.02) women. HE4 concentrations did not differ by menopausal status at blood collection (p = 0.92). Among premenopausal women, biomarker concentrations did not differ significantly by menstrual cycle phase (Fig. 1).
Significant associations between epidemiologic factors and the investigated markers were predominantly observed among women who were postmenopausal at blood collection. Specifically, parity (p = 0.04), higher number of full-term pregnancies among parous women (p trend = 0.02), older age at menopause (p trend < 0.01), and greater estimated lifetime number of ovulatory cycles (p trend < 0.01) were all associated with higher CA125 concentrations, whereas hysterectomy, unilateral oophorectomy, estrogen-only hormone therapy (vs. never use), and current smoking (vs. never smoking) were associated with lower concentrations (all associations p < 0.01). For CA125, no associations were observed among premenopausal women, with the exception of an inverse association between OC at blood collection and CA125 (users: 19 U/mL; non-users: 30 U/mL; p < 0.01). For CA15.3, higher BMI (p trend < 0.01) and former smoking versus never smoking (p = 0.03) were associated with higher concentrations among postmenopausal women, while younger age at blood collection (p = 0.03) was associated with higher CA15.3 among premenopausal women. None of the remaining exposures were associated with circulating CA15.3. Older age at blood collection was associated with higher HE4 concentrations in postmenopausal women (p trend < 0.01), whereas longer duration of OC use (p trend < 0.01), higher parity (p trend = 0.02), and older age at menopause were associated with lower concentrations. Current smoking, relative to never smoking, was associated with higher HE4 concentrations in both pre-and postmenopausal women (p < 0.01).
CA72.4 was evaluated as a dichotomous outcome (i.e., detectable vs. non-detectable concentrations), given than 82% of values were below the detection limit. We observed no associations between any of the examined epidemiologic risk factors and detectable vs. non-detectable CA72.4 concentrations, except suggestively higher CA72.4 with a higher BMI (≥25 vs. < 25, p = 0.05; Additional file 1: Table S1).
Tubal ligation (yes/no), age at menarche (<12, 12, 13, 14, 14+ years), IUD use (yes/no), and family history of breast cancer (yes/no) were not associated with any of the examined markers. The associations between oophorectomy, hysterectomy, ovulatory cycles with CA125, as well as the association between age at blood draw and CA15.3 in premenopausal women, were attenuated and no longer statistically significant after adjustment for the other investigated factors (i.e., adjusted for matching factors plus all significant correlates of the markers presented in the tables; Table 3 and Additional file 1: Table S2). The remaining associations were similar after adjustment. Finally, results were essentially unchanged in sensitivity analyses excluding eight outlying HE4 values.
We observed few significant associations between the evaluated lifestyle and reproductive factors and the examined markers among ovarian cancer cases in the nested case-control study (Additional file 1: Table S3). There were no significant associations with CA125 among cases. However, among premenopausal women diagnosed with high-grade serous ovarian cancer over follow-up, current smoking was associated with lower CA125 (data not shown). Longer duration of OC use was associated with lower CA15.3 levels in postmenopausal women. Interestingly, higher parity and fewer ovulatory cycles were associated with lower premenopausal CA15.3 levels while the same exposures were associated with higher premenopausal HE4 levels. Among women diagnosed with high-grade serous disease, the association between OC use and HE4 levels persisted but the other associations did not (data not shown).
Finally, we investigated the discrimination of these markers before and after adjusting the markers (using biomarker residuals) for the epidemiologic factors identified as significant correlates in the cross-sectional analyses. These analyses were conducted in strata of time between blood collection and diagnosis (<1 year, 1 to <2 years, 2 to <3 years, ≥3 years). AUCs for the markers (individually and combined) were essentially unchanged when the marker values were adjusted for the epidemiologic correlates (e.g., AUC <1 year , postmenopausal women, markers unadjusted: 0.87 (95% confidence interval (CI): 0.81-0.93); marker residuals: 0.89 (95% CI: 0.84-0.95); Table 4).

Discussion
We present results from a large, cross-sectional study evaluating lifestyle and reproductive factors and ovarian cancer early detection markers. Adjustment for the identified correlates of these markers in early detection prediction models did not improve discrimination.
We confirmed previously reported observations [8,[13][14][15][16] of lower CA125 levels in post-vs. premenopausal women. CA15.3 levels were significantly higher among postmenopausal women. HE4 concentrations did not vary by menopausal status. We examined the effect of age at blood collection within strata of menopausal status and did not observe significant associations for CA125 and CA15.3, with the exception of a significant inverse association between age and CA15.3 among women who were premenopausal at blood collection. Large prior studies have reported a modest inverse association between age and CA125 [7,8], evident in both pre- [8] and postmenopausal women [7,8], whereas prior studies on CA15.3 observed a modest positive [16,17] or no [23] association. However, neither of the studies observing a positive association between age and CA15.3 accounted for menopausal status at blood collection. We observed higher HE4 levels with older age only among women postmenopausal at blood collection. A positive association between age and HE4 has been previously reported (reviewed in [10,24]). Older age at menopause was positively associated with CA125 concentrations, as has been observed previously [6][7][8], and inversely associated with HE4. We observed no association between age at menopause and CA15.3.
Among postmenopausal women, there was a modest inverse association between longer duration of OC use and HE4 concentrations. OC use has not previously been associated with circulating HE4 [11,25]. However, data on OC duration are sparse. HE4 is expressed through the female reproductive tract [26], with the exception of the ovary. OC use inhibits cyclic proliferation leading to endometrial atrophy and predecidual changes in the stroma [27], though this is somewhat dependent on formulation. Therefore, OC use may impact HE4 concentrations via the effect on the endometrium. OC use may also impact mucin expression through upregulation of proinflammatory pathways recently shown to affect immunity in the distal reproductive tract [28]. Circulating concentrations of CA125 did not differ by duration of past OC use in our study, consistent with prior investigations [7,23]. Use of estrogen-alone HT was associated with lower CA125 concentrations; these    associations persisted in multivariable models but were only evident among women with hysterectomy in stratified models. Administered transdermal 17ß-estradiol has previously been associated with an increase in circulating CA125 in women without hysterectomy [29] and HT use (overall; formulation not specified) was associated with higher CA125 in the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) [6]. However, a positive association between HT and CA125 has not been universally observed [23,30]. Cengiz et al. observed lower CA15.3 among women using estrogenalone HT [31] in an analysis limited to women with hysterectomy and bilateral salpingo-oophorectomy. Thus, comparability to our study population is limited. One additional investigation observed no association between HT use (overall) and circulating CA15.3 [23]. HT use was not associated with HE4, consistent with others [12], or CA72.4 in the current investigation. Parity was associated with higher CA125 concentrations and lower HE4 concentrations among women who were postmenopausal at blood collection. The endometrium is a major source of CA125 in healthy women, and it is plausible that the extensive pregnancy-induced changes in the endometrium contribute to long-term changes in circulating CA125 and HE4. Further, data suggest CA125 increases during early pregnancy and close to delivery [32][33][34][35][36]; it is plausible that the higher concentrations observed in pregnancy persist post-pregnancy. One study reported higher CA125 concentrations among women reporting parity of two or higher [37], however, an association between parity and CA125 has not consistently been observed among healthy women [7]. Prior data suggest lower CA125 concentrations in uterine flushing from women with recurrent miscarriages [38]. In turn, recurrent miscarriage is associated with lower parity. Additional studies are needed to clarify the impact of parity on subsequent circulating CA125 concentrations. The inverse association between parity and HE4 concentrations is consistent with one prior investigation [39]. However, other investigations have observed no association between parity and HE4 [11,12]. We observed no association between parity and CA72.4  [23].
Higher BMI was associated with higher CA15.3 concentrations among postmenopausal women in our study. This is consistent with a prior study [17] in men and women, suggesting that the effect is not explained solely by higher estrogen levels in obese postmenopausal women. An additional small study (n < 50) reported no significant association between BMI and circulating CA15.3 [40]. However, the analysis was limited to     [6]. Only 15% of the participants in the current investigation were obese (n = 271), as compared to almost 24% of PLCO study participants (n = 6063), and the current study was not statistically powered to detect such a small relative difference. Previous investigations on BMI and HE4 are mixed, reporting no [9,10,39], inverse [41], and positive [12] associations. Current smoking was inversely associated with CA125 concentrations. This is consistent with three prior investigations in recent large, well-characterized populations [6][7][8], though significant associations were not observed in smaller prior studies [39,42,43]. Smoking may reduce CA125 concentrations via its effect on endogenous estrogens. Smoking is inversely associated with endogenous estrogens [44], whereas administered estradiol is associated with higher circulating CA125 [29]. Further, smoking is associated with earlier age at menopause [45], which, in turn, is associated with lower CA125. Finally, CA125 is expressed in the respiratory tract [46], and smoking may reduce circulating CA125 concentrations via damage to the respiratory tract epithelia or via more general immunosuppressive effects. We observed a higher concentrations of HE4 among women reporting current smoking, compared to never smokers, consistent with others (reviewed in [10]). As with CA125, HE4 is expressed in the oral cavity and respiratory tract, and it has been hypothesized that higher HE4 in smokers may be due to smoking-induced inflammation [41]. However, the mechanisms underlying the associations between CA125 and HE4 and smoking remain to be fully characterized. Former smoking, but not current smoking, was associated with higher CA15.3 concentrations in this investigation. One prior investigation observed no association between smoking and CA15.3 concentrations [17].
As expected, and consistent with prior studies [7,47], we observed lower CA125 concentrations among women reporting hysterectomy, though this association did not persist in the fully adjusted model. A similar pattern was observed for oophorectomy. While ovarian cells do not express CA125, this marker is expressed in the fallopian tube epithelium. Prior studies have not observed an association between oophorectomy and CA125 [6,7,48] or have concluded that the decline in CA125 after bilateral salpingectomy-oophorectomy is not due to ovarian CA125 [47]. In our investigation, unilateral oophorectomy was associated with CA125 only before adjustment for hysterectomy and HT use, supporting prior observations that oophorectomy is not independently associated with CA125 concentrations. CA15.3, HE4, and CA72.4 were similar in women with and without reported hysterectomy or bilateral oophorectomy.
Significant associations observed in this study were predominantly observed among women who were postmenopausal at blood collection. CA125 and CA15. 3 have been reported to vary across the menstrual cycle [20,21]; substantial variation was not observed among premenopausal women in this investigation. However, we (i) adjusted for menstrual cycle phase and (ii) used phase-specific residuals to evaluate potential variability. Results were similar with both approaches. We were unable to assess cross-sectional associations during individual phases of the menstrual cycle. Our cross-sectional analysis between epidemiologic factors and early detection markers yielded few significant associations in women who subsequently developed ovarian cancer. Among high grade serous cases, significant findings were limited to only current smoking and lower CA125 in premenopausal women and oral contraceptives use with lower CA15.3 in postmenopausal women but may be due to chance given these associations were not observed in controls.
We hypothesized that inclusion of significant correlates of the evaluated early detection markers would improve discrimination of the early detection prediction models including these markers as this would, in part, account for sources of variation in these markers due to factors other than ovarian malignancy. However, we observed no improvement of the AUC in models adjusting for the epidemiologic correlates identified in the crosssectional analyses and results from this study do not support the approach of adjusting CA125, CA15.3, HE4 or CA72.4 concentrations for their correlates to improve ovarian early detection models. An alternative approach would be to develop personalized cutpoints for the markers, based on a woman's individual characteristics, as has been proposed for CA125 by menopausal status [8]. The current study was not designed to define or assess the utility of individualized cutpoints for ovarian cancer early detection; however, this should be explored in future studies designed for this purpose.

Conclusions
This investigation adds to the limited data on correlates of CA125, CA15.3, HE4, and CA72.4 in healthy women, and provides the first data by menopausal status at blood collection. While we did not observe improvements in discrimination of early detection prediction models after accounting for these correlates, this data may inform future research on the development of individualized early detection marker cutpoints based on epidemiologic factors.

Additional file
Additional file 1: Table S1. Association between epidemiologic characteristics and high CA72.4 by menopausal status at blood collection in healthy women: EPIC. Table S2. Multivariate adjusted association between epidemiologic characteristics and CA125, CA15.3 and HE4 by menopausal status at blood collection in healthy women: EPIC. Table S3. Association between epidemiologic characteristics and CA125, CA15.