To help characterize the diversity in biological function of proteins emerging from the analysis of whole genomes, we present an operational definition of biological function that provides an explicit link between the functional classification of proteins and the effects of genetic variation or mutation on protein function. Using phylogenetic information, we establish definite criteria for functional relatedness among proteins and a companion procedure for predicting deleterious alleles or mutations. Applied to the functional classification of sequences similar to 13 human tumor suppressor proteins, our methods predict there are functional properties unique to mammals for three of them, BRCA1, BRCA2, and WT1. We examine protein variants caused by nonsynonymous single-nucleotide polymorphisms in a set of clinically important genes and estimate the magnitude of a disproportionate propensity for disruption of function among the nonsynomous singlenucleotide polymorphisms that are maintained at low frequency in the human population.A lthough the idea that structural similarity between proteins can be anticipated from their sequences alone is well established, the notion that a signature of functional similarity exists in the comparison of sequences is much less well developed. In fact, the very definition of functional similarity is more elusive than that of structural similarity, which can be quantified (1-3), and pertains to relatively subtle aspects of proteins and their sequences. In proteins inferred to share a remote common ancestor, amino acids determined to be homologous from accurately aligned sequences may not share strictly analogous roles in function and stability, even though their relationship to an overall structural fold may be the same. This observation suggests an operational criterion for what it means that a set of proteins is functionally similar: corresponding amino acids at each residue position in functionally related proteins should serve analogous roles and should likely be interchangeable. From this perspective, the separate problems of functional classification of proteins and the prediction of functional consequences of amino acid substitutions are very closely related.This study demonstrates how information in a multiple sequence alignment can provide an explicit link between protein functional classification and the tolerance of protein function to amino acid substitutions. In our analysis, we note that most multiple sequence alignments of a query and its homologues will contain too few sequences for the observed profile of amino acids at each residue position to reflect thorough sampling of all 20 amino acids by evolution. To overcome this paucity of empirical amino acid sampling, a key element of our analysis is the use of preexisting mixtures of Dirichlet prior distributions of amino acid frequencies (4) to infer which additional amino acids might be functionally consistent with the observed profiles. Using the Bayesian formalism associated with these distributions, we present a fram...