Integrative analysis of the common genetic characteristics in ovarian cancer stem cells sorted by multiple approaches

Background Ovarian cancer is the second fatal malignancy of the female reproductive system. Based on the cancer stem cell (CSC) theory, its poor prognosis of ovarian cancer attributed to tumor recurrence caused by CSCs. A variety of cell surface-specific markers have been employed to identify ovarian cancer stem cells (OCSCs). In this study, we attempted to explore the common feature in ovarian cancer stem cells sorted by multiple approaches. Methods We collected the gene expression profiles of OCSCs were from 5 public cohorts and employed R software and Bioconductor packages to establish differently expressed genes (DEGs) between OCSCs and parental cells. We extracted the integrated DEGs by protein-protein interaction (PPI) network construction and explored potential treatment by the Cellminer database. Results We identified and integrated the DEGs of OCSCs sorted by multiple isolation approaches. Besides, we identified OCSCs share characteristics in the lipid metabolism and extracellular matrix changes. Moreover, we obtained 16 co-expressed core genes, such as FOXQ1, MMP7, AQP5, RBM47, ETV4, NPW, SUSD2, SFRP2, IDO1, ANPEP, CXCR4, SCNN1A, SPP1 and IFI27 (upregulated) and SERPINE1, DUSP1, CD40, and IL6 (downregulated). Through correlation analysis, we screened out ten potential drugs to target the core genes. Conclusion Based on the comprehensive analysis of the genomic datasets with different sorting methods of OCSCs, we figured out the common driving genes to regulating OCSC and obtained ten new potential therapies for eliminating ovarian cancer stem cells. Hence, the findings of our study might have potential clinical significance.


Background
Ovarian cancer is the second most lethal gynecologic malignancy in women around the world [1]. Debulking surgery and platinum-based chemotherapy results in complete response in 70% of patients, most will relapse or even succumb to chemoresistance [2]. Significant progress in maintenance therapy has been seen by combination with poly (ADP-ribose) polymerase inhibitors, which have been approved in disease recurrence and a first-line setting among women with BRCA1/BRCA2 mutations [1]. Tumor recurrence has been attributed to suboptimal resection and the presence of residual chemo-resistant OCSCs [2,3].
Over the years, multiple biomarkers have been identified exclusively or co-expressed in OCSCs and have been explored for their unique functions in tumorigenesis [4]. Several studies have contributed to the isolation and identification of OCSCs. Spheroids' formation in cancer stem cell culture has been recognized as the first commonly used approach [5,6]. With the Hoechst 33342 dye, Side population (SP), have overexpressed several members of ABC transporters and exhibited some characteristics of CSCs, are collected [7,8]. Based on cell surface markers, CD44, CD117, and CD133 etc., OCSCs have been successfully identified and isolated [9][10][11]. The activity of ALDH1 has been widely used in the identification of stem/progenitor cells or CSCs. Cells expressing high levels of ALDH1 can be identified by ALDEFLUOR assay and isolated by the ALDH1 antibody [10,12,13].
CSCs have generally been attributed to the heterogeneity of tumors. Stem cell-associated heterogeneity resulted from intrinsic tumor plasticity can be shaped by the microenvironment [14]. Many abnormal signaling pathways of CSC play a vital role in its maintenance, survival and metastasis, including Hedgehog, Notch and Wnt/β-catenin pathways, carcinogenic cascades such as PI3K/AKT, TGF-β, EGFR, JAK/STAT or NF-κB as well as transcriptional regulators such as OCT4, Nanog, YAP/TAZ and Myc [3]. OCSCs identified by the different approach has shown different mechanisms for maintaining cancer stem-like properties. They may share the same biomarkers as well as biological characteristics. This has led to an increasing interest in elucidating the underlying mechanism of OCSCs identified by different methods.
In this study, we downloaded five original microarray datasets, namely, GSE82305 [13], GSE28799 [5], GSE53759 [15], GSE94358 [16], and GSE33874 [17], from the GEO database, and these datasets contain a total of 45 samples, including 21 OCSCs samples and 24 parental cancer cell samples. We used R language software to standardize all the datasets and to get DEGs.
The 'RobustRankAggreg' package was subsequently used to integrate the results and obtain integrated differentially expressed genes (DEGs). By function and pathway analysis, we identified OCSCs share characteristics in the lipid metabolism and extracellular matrix changes. Combined with WGCNA and PPI network, we identified the hub genes of OCSC and obtained 16 coexpressed core genes, such as FOXQ1, MMP7, AQP5, RBM47, ETV4, NPW, SUSD2, SFRP2, IDO1, ANPEP, CXCR4, SCNN1A, SPP1 and IFI27 (upregulated) and SERPINE1, DUSP1, CD40, and IL6 (downregulated). Based on the comprehensive analysis of the genomic datasets with different markers of OCSCs, we figured out the common driving signal pathways to regulating OCSCs. Finally, we obtained ten new potential therapies for the elimination of ovarian cancer stem cells.

Integration of microarray data
The R package 'limma' was used to test DEGs in each dataset. Genes with an adjusted P-value < 0.05 and |log fold change (FC)| > 1 were considered DEGs. The DEGs in the four datasets were integrated using the R package 'RobustRankAggreg' (http://www.bioconductor.org/). The integrated upregulated and downregulated DEG lists were saved for subsequent analysis.

Weighted gene co-expression network analysis
The variant genes in the GSE33874 dataset were constructed to an approximate scalefree fundamental gene co-expression network using the R package 'WGCNA' [19]. Genes with a high correlation were clustered and the network modules were generated using the topological overlap measure (TOM). The color bars correspond to the clusters of genes can be seen as the gene module. The threshold of the coexpression module was set as p < 0.05.

Function and pathway analysis
The gene ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of the integrated DEGs were performed using the DAVID 6.8 database (https://david. ncifcrf.gov/). GO terms were classified in three categories: biological process (BP), cellular component (CC), and molecular function (MF). The term with highest −log10qValue was determined the most significant enrichment. Q-values below 0.05 (q < 0.05) were considered significant. The visualization of the GO and KEGG pathway enrichment analyses was performed using R 3.6.3 software.

Protein-protein interaction (PPI) network construction and hub gene selection
The PPI network of the integrated DEGs was analyzed with the STRING database (http://string-db. org/) and visualized using Cytoscape 3.8.0 software. The plug-in molecular complex detection of Cytoscape was subsequently applied to construct the subnetwork for further analysis. The top 3 cluster, with the default parameters "false Degree Cutoff = 2", "Node Score Cutoff = 0.2", "K-Core = 2" and "Max.-Depth from Seed = 100", was saved and listed in Table 3. Genes in a significant module of WGCNA were analyzed, and the top 3 subnets were listed in Table 2.

Gene expression in immune subtypes
A new immune classification of solid tumors has identified six immune subtypes (C1-C6) [20]. Our study population included all OV patients from TCGA with available information on Immune immune subtypes (N = 234). Gene expression was calculated as log2(x + 1) transformed RSEM normalized counts.
The prediction of potential drug based on drug-gene correlation DTP NIC-60 z scores and corresponding RNA-seq composite expression were downloaded from the Cellminer database (https://discover.nci.nih.gov/cellminer/ loadDownload.do) [21,22]. Drug z-score correlated with gene expression and statistically significant (P < 0.05) were saved and listed in Table 4. The details of the predicted drug were listed in Table 4. Drug information was derived from the Drugbank database (https://www.drugbank.ca/).

Flow chart for the study design
In this study, we conducted a comprehensive analysis of common essential genes in OCSCs isolated by deferent methods and their critical roles in OV by several computational methods. The study design was illustrated in Fig. 1.

Construction of co-expression networks and identification of key modules
We employed the WGCNA to analyze the differentially expressed genes between side population (SP) and main population (MP), which were isolated from fresh ascites obtained from 10 women with highgrade serous ovarian adenocarcinoma. The 21,655 genes in 20 samples of the GSE33874 dataset were used to construct the co-expression module. The cluster analysis on these samples and the results were depicted (Fig. S4A). Then, we screened out the softthresholding power (Fig. 3 a). When the power value was equal to 16, the independence degree was up to 0.9. Therefore, the power value was used to construct the co-expression module, and the results showed that 18 distinct gene co-expression modules were identified (Fig. 3 b). We analyzed the correlation between module eigengene and group traits and found only one co-expression module significantly correlated with SP and MP (Fig. 3 c). 1154 genes in the blue module correlated negative with SP. We performed PPI network analysis with genes in the blue module. The whole network and the top 3 subnets were depicted ( Fig. S4B-C), and its details were listed in Table 2. Subsequently, these genes in the blue module were subjected to GO and KEGG analysis. The top GO term and KEGG pathway in the three subnets were listed in Table S1.

The integrated DEGs and their function and pathway analysis
The DEGs of GSE82305, GSE28799, GSE53759, and GSE94358 datasets were screened using the R package 'limma' and sorted according to log FC. The integrated DEGs were obtained using the R package 'RobustRan-kAggreg' (P < 0.05, |log FC| > 1). The 343 integrated DEGs, consisting of 111 upregulated genes and 232 downregulated genes, were identified (Table S2). Heat map showing the top 20 upregulated and 20 downregulated genes in the integrated DEGs (Fig. 4). Moreover, these upregulated genes and downregulated genes were subjected to cluster Profiler for GO and KEGG analysis, respectively. GO term annotation showed that these upregulated genes correlated with the regulation of lipid metabolic process, response to steroid hormone, cellular ketone metabolic process, cellular response to fatty acid, regulation of fatty acid oxidation, positive regulation of fatty acid oxidation (BP) (Fig. 5 a). GO analysis also showed that these down-regulated genes related to extracellular structure organization and extracellular matrix organization (Fig. 5b). We figured out that the upregulated genes were mainly enriched in lipid Fig. 4 The heatmap of the representative integrated DEGs. The red square represents upregulated DEGs, the green square represents downregulated DEGs, and the value in the square represents the log FC value metabolism and the downregulated genes were mainly enriched in extracellular stroma. This indicated that the extracellular matrix regulated cancer stem cell behavior and character to some extent. The upregulated integrated DEGs were mainly enriched in the intestinal immune network for IgA production (Fig. 5c), and downregulated genes in focal adhesion pathways (Fig.  5d). We performed PPI analysis with integrated DEGs. The top 3 subnets were depicted (Fig. 6a), and its details were listed in Table 3. We tested the expression of upregulated integrated DEGs in TCGA and GTEx. Among them, 26 genes were differentially expressed (Fig. 6b). We selected the 14 genes, including FOXQ1, MMP7, AQP5, RBM47, ETV4, NPW, SUSD2, SFRP2, IDO1, ANPEP, CXCR4, SCNN1A, SPP1, and IFI27, which were both overexpressed in OV and OCSCs. Further, we used the Venn diagram to select the shared genes in upregulated integrated DEGs and the genes in the blue module. We obtained four common genes, including SERPINE1, DUSP1, CD40, and IL6 (Fig.  6c), whose expression was similarly low in OV than healthy ovarian tissue (Fig. 6d). In brief, we derived specific expression profile (SEP) of OCSCs, which were composed of FOXQ1, MMP7, AQP5, RBM47, ETV4, NPW, SUSD2, SFRP2, IDO1, ANPEP, CXCR4, SCNN1A, SPP1 and IFI27 (upregulated) and SERPINE1, DUSP1, CD40, and IL6 (downregulated).

SEP expression in immune subtypes
OCSCs can survive from treatment and can be exempt from immunosurveillance. We explored whether the OCSCs signature associated with characteristics contributes to immune escape. A research group had used characteristic immune-oncologic gene signatures to cluster TCGA tumor types into six groups (C1-C6) [20]. The density of specific immune cells and overall prognosis show wide variations between the different immune subtypes. Only 4 immune subtypes were identified in OV, predominantly IFN-gamma Dominant (Immune C2, N = 138) and Lymphocyte Depleted (Immune C4, N = 53). C2 had the highest M1/M2 macrophage polarization, higher densities of CD8 T cells, a high proliferation rate, and the highest intratumoral heterogeneity. Therefore, we tested the expression of SEP of OCSCs in immune subtypes and found a few upregulated genes had a higher expression in the C2 and C4 group, which were characterized as lymphocyte depleted, may predict the distinct gene profile of cancer stem cells contributes to immune evasion in ovarian Cancer Patients (Fig. 7).

Drug susceptibility prediction based on SEP of OCSCs
To explore potential molecular-targeted drugs for OCSCs, we download NCI-60 drug z scores and corresponding NCI-60 cell lines RNA-seq/composite expression from the Cellminer database. The higher cell lines z scores have, the more sensitive to the corresponding drug they are. For better applications in the clinic, we employed the US FDA-approved drugs and drugs in clinical trials. We analyzed the correlation between drug z score and SEP. We listed four representative Pearson's correlation dot plot (Fig. 8 a). All targeted genes and relevant predicted drugs (P < 0.05) were shown in the Sankey diagram (Fig. 8 b). We selected drugs that were correlated with at least four target genes as a potential therapy regimen. As a result, we identified ten drugs related to SEP. Detailed information were shown in Table 4. Thus, we inferred that these drugs might be repurposed to the OCSCs with the SEP as drug targets. We extracted drugindication from DrugBank (https://www.drugbank.ca/) and listed applications of the ten medications in Table 5. For example, Ixabepilone, which was associated with CD40, CXCR4, IL6, and SERPINE1, were used in locally advanced breast cancer and metastatic breast cancer, can potentially be repurposed to treat OCSCs.

Discussion
Most ovarian cancer patients respond to initial chemotherapy, but more than 70% of patients will develop tumor recurrence and eventually develop resistance to treatment [2]. Ovarian cancer stem cells are thought to promote the recurrence of ovarian cancer and lead to the development of treatment resistance [23]. There are several methods to isolate ovarian tumor stem cells. However, the mechanisms of ovarian tumor stem cells obtained by different routes to promote tumor development are not the same. The core signaling pathways that regulate ovarian cancer stem cells remain unclear. There is still lacking in effective drugs and drug combinations to eliminate them to improve cancer survival. In this study, the GSE82305, GSE28799, GSE53759, and GSE94358 datasets were analyzed, and 343 integrated DEGs were found. As for study GSE33874, we applied WGCNA analysis and got 18 significant enriched modules. The blue module was significantly correlated with SP and MP. The 343 integrated DEGs were then subjected to gene enrichment analysis. The upregulated genes were mainly enriched in lipid metabolism, and the downregulated genes were enriched primarily in extracellular structure organization. The KEGG pathway analysis revealed that these downregulated integrated DEGs are mostly enriched in focal adhesion, which is essential in the extracellular matrix formation. Like the metabolic characteristics of tumor cells, rapidly proliferating stem cells mainly rely on aerobic glycolysis to provide energy [14]. In a study, chemical imaging of a single living cell was performed. They identified and described lipid unsaturation in ovarian cancer stem cells for the first time and suggested to effectively eliminate CSCs by inhibition of lipid desaturases [24]. OXPHOS pathway and lipid metabolism in cancer stem cells are recognized as targets for the development of novel anticancer therapies [25]. In our study, we found upregulated genes were enriched in lipid metabolism, which suggested cellular event accumulation of lipids and secondary metabolites.
To meet the increasing energetic requirements of CSCs, the lipid metabolic pathway can flexibly turn to the other metabolic pathways [26]. CSCs are incredibly reliant on the activity of enzymes involved in lipid metabolism, which engaged in CSCs fate decisions, such as Hippo and Wnt signal pathway [25]. Emerging evidence suggests that alterations in lipid-and fatty acidassociated pathways are essential for the maintenance of CSCs [27,28]. Some recent evidence has demonstrated that cancer stem cell maintenance and differentiation is regulated by extracellular matrix mechanics [29,30]. Interactions of cells with the extracellular matrix are crucial for the establishment and maintenance of stem cell [31,32]. Our results also confirm that the most significant GO term in subnet3 of the blue module is response to mechanical stimulus (Table S1). In our study, we uncovered the changes in lipid metabolism and extracellular matrix are universal, independent of cell types and sorting methods. This phenomenon addresses the interactions of OCSCs with environment which result in the modulation of lipid metabolism, and thereby of OCSCs phenotype. Studies have shown that tumor stem cells can reduce and evade from NK cells by downregulating the active ligand of NK cells, such as major histocompatibility I polypeptide related sequences A (MICA) and histocompatibility I polypeptide related sequences B (MICB), so as to escape from immune surveillance [33]. The cancer stem cells have been identified to survived within a specialized cellular niche through the crosstalk with the surrounding microenvironment [34]. Ovarian carcer was classified into four immune subtypes (C1-C4) based on the study of Thorsson et al. C2 and C4 showed poor prognosis, despite C2 had a substantial immune component. Our results showed the expression of MMP7, RBM47, and SCNN1A were significantly higher in C2 and C4 than other immune subtypes. In this regard, the complex interrelations between cancer stem cells and tumor immune microenvironment might play an vital role in MPM tumorigenesis. To identify potential drugs for OCSCs based on the SEP, we compared drug sensitivity of US FDA-approved anticancer drugs, which can be conducive to treatment. We chose ten possible drugs for OCSCs. These drugs have been used in other kinds of diseases and are believed to be further explored in tumor stem cell treatment.
In summary, the purpose of this study was to increase our understanding of the core signal pathway in OCSCs through an integrated bioinformatics analysis that aimed to identify integrated DEGs and the related pathways   Table 5 Detailed information of drugs listed in Table 4 Targeted genes
Additional file 1 Figure S1. Data processing in the GSE28799 dataset.
A. The volcano plot showed differentially expressed genes (DEGs) between the two groups of samples in GSE28799. Based on an adjusted P < 0.05 and |log fold change| > 1, the red spots represent the upregulated genes and the blue spots represent the downregulated genes; the grey spots represent genes with no significant difference. B. The heatmap of the top 100 DEGs in GSE28799. Orange indicates relative upregulated genes; Blue indicates the relative downregulated gene; yellow suggests no significant change in gene expression; Additional file 2 Figure S2. Data processing in the GSE53759 dataset.
A. The volcano plot showed differentially expressed genes (DEGs) between the two groups of samples in GSE53759. Based on an adjusted P < 0.05 and |log fold change| > 1, the red spots represent the upregulated genes and the blue spots represent the downregulated genes; the grey spots represent genes with no significant difference. B. The heatmap of the top 100 DEGs in GSE53759. Orange indicates relative upregulated genes; Blue indicates the relative downregulated gene; yellow suggests no significant change in gene expression.
Additional file 3 Figure S3. Data processing in the GSE94358 dataset.
A. The volcano plot showed differentially expressed genes (DEGs) between the two groups of samples in GSE94358. Based on an adjusted P < 0.05 and |log fold change| > 1, the red spots represent the upregulated genes and the blue spots represent the downregulated genes; the grey spots represent genes with no significant difference. B. The heatmap of the top 100 DEGs in GSE94358. Orange indicates relative upregulated genes; Blue indicates the relative downregulated gene; yellow suggests no significant change in gene expression.
Additional file 4 Figure S4. PPI networks analysis of the blue module.
A. Cluster dendrogram of 20 samples in GSE33874. B. PPI networks of genes in the blue module. C. The top 3 subnets of network in (B).