Representational biases that are common in biological data can inflate prediction performance and confound our understanding of how and what machine learning (ML) models learn from large complicated datasets. However, auditing for these biases is not a common practice in ML in the life sciences. Here, we devise a systematic auditing framework and harness it to audit three different ML applications of significant therapeutic interest: prediction frameworks of protein-protein interactions, drug-target bioactivity, and MHC-peptide binding. Through this, we identify unrecognized biases that hinder the ML process and result in low model generalizability.Ultimately, we show that, when there is insufficient signal in the training data, ML models are likely to learn primarily from representational biases.Datasets in the life sciences have grown increasingly large and complicated. With the advent of single cell studies and biobanks, scientists are turning to machine learning (ML) to derive meaningful interpretations of these massive genomic, transcriptomic, proteomic, phenotypic, and clinical datasets. One major obstacle to the development of reliable and generalizable ML models is that auditing for biases has not been established as a common practice in life sciences ML; despite a large body of work in non-biological ML which addresses the identification and removal of algorithm biases (Zou and Schiebinger 2018). Yet, it is well known that biological datasets often suffer from representational biases stemming from evolutionary, inherent, and experimental artifacts. When these biases are not identified and eliminated, the ML process can be misled such that the model learns predominantly from the biases unique to the training dataset and, hence, is not generalizable across different datasets. In this scenario, prediction performance is inflated for the test set, but drops drastically for external predictions.Here, we demonstrate that several prominent protein-protein interaction (PPI) predictors are overwhelmingly reliant on a bias in their training dataset -to the extent that the PPI predictions become randomized (i.e., the model has no predictive power) once the bias is removed. For this reason, when applying ML to biological datasets, it is crucial to systematically audit for biases inherent in the data. This will help us to understand how and what the model is learning in order to ensure that its predictions are based on true biological insights from the data.We devised a systematic auditing framework for paired-input biological ML applications (Fig. 1a), which are widely harnessed to predict the biological relationships between two entities, e.g., physical interactions between proteins, bioactivity of drugs and their targets, or binding of MHC molecules to their antigens. We used this framework to identify biases that have confounded, over the past two decades, the ML process in three applications of great interest to the life sciences and biotech communities: protein-protein interactions, drug-target bioactivity, a...