Background: Tremendous amounts of omics data accumulated has made it possible to identify cancer driver pathways through computational methods, which is believed to be able to offer critical information in such downstream research as ascertaining cancer pathogenesis, developing anti-cancer drugs, and so on. The integration of multiple omics data to identify cancer driving pathways is a challenging problem.
Results: In this paper, a parameter-free identification model SMCMN, incorporating both pathway features and gene associations in Protein-Protein Interaction (PPI) network, is proposed. A novel measurement of mutual exclusivity is devised to exclude some gene sets with "inclusion" relationship. By introducing gene clustering based operators, a partheno-genetic algorithm CPGA is put forward for solving the SMCMN model. Experiments were implemented on three real cancer datasets to compare the identification performance of models and methods. The comparisons of models demonstrate that the SMCMN model does eliminate the "inclusion" relationship, and produces gene sets with better enrichment performance compared with the classical model MWSM in most cases.
Conclusions: The gene set recognized by the proposed CPGA-SMCMN method possesses more genes engaging in known cancer related pathways, as well as stronger connectivity in PPI network. All of which have been demonstrated through extensive contrast experiments among method CPGA-SMCMN and six state-of-the-art ones.