15Purpose: Pathogenicity predictors are an integral part of genomic variant interpretation but, despite 16 their widespread usage, an independent validation of performance using a clinically-relevant dataset 17has not been undertaken. 18 19Methods: We derive two validation datasets: an "open" dataset containing variants extracted from 20 publicly-available databases, similar to those commonly applied in previous benchmarking exercises, 21 and a "clinically-representative" dataset containing variants identified through research/diagnostic 22 48 consistency to variant classification and have been followed by a number of regional and disorder-49 specific publications 2-4 . Common to all guidelines is the recommendation of the use of in silico 50 prediction tools to aid in the classification of missense variants. In silico prediction tools are 51 algorithms designed to predict the functional impact of variation, usually missense changes caused 52 by single nucleotide variants (SNVs). Though originally designed for the prioritisation of research 53 variants 5 , the tools are used routinely in clinical diagnostics during variant classification. The tools 54 integrate a number of features in order assess the impact of a variant on protein function 6 . Initially, 55inter-species conservation formed the bulk of the predictions, with some additional functional 56 information, such as substitution matrices of physicochemical distances of amino acids (such as 57Grantham 7 or PAM 8 ), and data derived from a limited number of available X-ray crystallographic 58 structures 9 . Since the development of the first in silico prediction tools over a decade ago 5,9 , large-59 scale experiments such as the ENCODE project 10 have generated huge amounts of functional data, 60and we now also have access to large-scale databases of clinical and neutral variation [11][12][13] . These 61 additional sources of data have led to an explosion of new in silico prediction algorithms 14-16 that 62 purport to increase accuracy. 63 64However, the large increase in the number of predictors integrated into classification algorithms has 65 raised concerns about overfitting 17,18 . Overfitting occurs when the prediction algorithm is trained on 66 superfluous data or features that are irrelevant to the prediction outcome 18 . While it may appear 67that an increasingly large feature list leads to improvements in prediction, random variability within 68 the training dataset may actually result in decreased accuracy when applied to a novel dataset. 69Overfitting can be mitigated through the use of increasingly large training datasets, and the usage of 70 online variant databases, such as the genome aggregation database (gnomAD) 19 and ClinVar 12 , 71 allows for sufficiently large training datasets. Additionally, reliance on additional information -such 72 as protein functional data and allele frequency data such as from gnomAD 19 -may be contrary to the 73 standard assumptions of variant classification methodology, namely that each dataset is 74 inde...