Variant scoring methods (VSMs) aid in the interpretation of coding mutations and their potential impact on health, but their evaluation in the context of human genetics applications remains inconsistent. Here, we describe GeneticsGym, a systematic approach to evaluating the real-world impact of VSMs on human genetic analysis across selection regimes. We show that the relative performance of VSMs varies across the spectrum of variant impact, as well as by gene function, and that both variant-to-gene and gene-to-disease components contribute.