With expanding applications of next-generation sequencing in medical genetics, increasing computational methods are being developed to predict the pathogenicity of missense variants. Selecting optimal methods can accelerate the identification of candidate genes. However, the performances of different computational methods under various conditions have not been completely evaluated. Here, we compared 12 performance measures of 23 methods based on three independent benchmark datasets: (i) clinical variants from the ClinVar database related to genetic diseases, (ii) somatic variants from the IARC TP53 and ICGC databases related to human cancers and (iii) experimentally evaluated PPARG variants. Some methods showed different performances under different conditions, suggesting that they were not always applicable for different conditions. Furthermore, the specificities were lower than the sensitivities for most methods (especially, for the experimentally evaluated benchmark datasets), suggesting that more rigorous cutoff values are necessary to distinguish pathogenic variants. Furthermore, REVEL, VEST3 and the combination of both methods (i.e. ReVe) showed the best overall performances with all the benchmark data. Finally, we evaluated the performances of these methods with de novo mutations, finding that ReVe consistently showed the best performance. We have summarized the performances of different methods under various conditions, providing tentative guidance for optimal tool selection.