Phosphatase and tensin homolog (PTEN) is a tumor suppressor frequently mutated in diverse cancers. Germline PTEN mutations are also associated with a range of clinical outcomes, including PTEN hamartoma tumor syndrome (PHTS) and autism spectrum disorder (ASD). To empower new insights into PTEN function and clinically relevant genotype-phenotype relationships, we systematically evaluated the effect of PTEN mutations on lipid phosphatase activity in vivo. Using a massively parallel approach that leverages an artificial humanized yeast model, we derived high-confidence estimates of functional impact for 7,244 single amino acid PTEN variants (86% of possible). These data uncovered novel insights into PTEN protein structure, biochemistry, and mutation tolerance. Variant functional scores can reliably discriminate likely pathogenic from benign alleles. Further, 32% of ClinVar unclassified missense variants are phosphatase deficient in our assay, supporting their reclassification. ASD associated mutations generally had less severe fitness scores relative to PHTS associated mutations (p = 7.16x10 -5 ) and a higher fraction of hypomorphic mutations, arguing for continued genotype-phenotype studies in larger clinical datasets that can further leverage these rich functional data.
MAIN TEXTRecent large-scale exome sequencing studies have highlighted the abundance of protein-coding variation in the human population 1 . It remains challenging to predict variant pathogenicity and clinical outcomes, especially for genes with pleiotropic effects. With most rare variants private to a single family or individual, using traditional approaches to establish pathogenicity such as variant segregation within a pedigree or identification in independent patients is infeasible. Even for well-studied genes, hundreds of variants are currently defined as variants of uncertain significance (VUS). Moreover, purely computational approaches still suffer from high false positive rates 2 and subjective interpretations that limit the clinical utility of these predictions. To address these challenges for genes of clinical importance, one proposed approach is to prospectively measure the functional effects of all possible mutations, allowing these empirical data to be integrated into the clinical assessment of novel rare variants 3,4 . Historically, these types of functional assays have been conducted in a serial nature, which limits scalability, and often only within a portion of the protein of interest. While there are some notable examples of whole-gene brute force saturation mutagenesis, e.g., TP535 , new more scalable experimental paradigms are being developed that allow the functional dissection of the effects of thousands of genetic mutations in parallel 6 . These approaches leverage recent advances in DNA synthesis and sequencing technologies and have proven particularly valuable in understanding the effects of mutations in cancer-associated genes 7,8 . With these issues in mind, we have developed a saturation mutagenesis approach to...