Elucidating crucial driver genes is paramount for understanding the cancer origins and mechanisms of progression, as well as selecting targets for molecular therapy. Cancer genes are usually ranked by the frequency of mutation, which, however, does not necessarily reflect their driver strength. Here we hypothesize that driver strength is higher for genes that are preferentially mutated in patients with few driver mutations overall, because these few mutations should be strong enough to initiate cancer. We propose a formula to calculate the corresponding Driver Strength Index (DSI), as well as the Normalized Driver Strength Index (NDSI), the latter completely independent of the overall gene mutation frequency. We validate these indices using the largest database of human cancer mutations - TCGA PanCanAtlas, multiple established algorithms for cancer driver prediction (2020plus, CHASMplus, CompositeDriver, dNdScv, DriverNet, HotMAPS, IntOGen Plus, OncodriveCLUSTL, OncodriveFML) and four custom computational pipelines that integrate driver contributions from SNA, CNA and aneuploidy at the patient-level resolution. We demonstrate that NDSI provides substantially different rankings of genes as compared to DSI and frequency approach.
For example, NDSI highlighted the importance of guanine nucleotide-binding protein subunits GNAQ, GNA11, GNAI1, GNAZ and GNB3, General Transcription Factor II family members GTF2I and GTF2F2, as well as fibroblast growth factor receptors FGFR2 and FGFR3. Intriguingly, NDSI prioritized CIC, FUBP1, IDH1 and IDH2 mutations, as well as 19q and 1p chromosome arm losses, that comprise characteristic molecular alterations of gliomas. KEGG analysis shows that top NDSI-ranked genes comprise PDGFRA-GRB2-SOS2-HRAS/NRAS-BRAF pathway, GNAQ/GNA11-HRAS/NRAS-BRAF pathway, GNB3-AKT1-IKBKG/GSK3B/CDKN1B pathway and TCEB1-VHL pathway. NDSI does not seem to correlate with the number of protein-protein interactions. We share our software to enable calculation of DSI and NDSI for outputs of any third-party driver prediction algorithms or their combinations.