Rapid determination of whether a candidate compound will bind to a particular target receptor remains a stumbling block in drug discovery. We use an approach inspired by random matrix theory to decompose the known ligand set of a target in terms of orthogonal "signals" of salient chemical features, and distinguish these from the much larger set of ligand chemical features that are not relevant for binding to that particular target receptor. After removing the noise caused by finite sampling, we show that the similarity of an unknown ligand to the remaining, cleaned chemical features is a robust predictor of ligand-target affinity, performing as well or better than any algorithm in the published literature. We interpret our algorithm as deriving a model for the binding energy between a target receptor and the set of known ligands, where the underlying binding energy model is related to the classic Ising model in statistical physics.drug discovery | random matrix theory | protein-ligand affinity | computational pharmacology | statistical physics F inding new ligands that bind to a given target is both a crucial step and a major stumbling block in modern drug discovery. Numerous attempts have been made to develop computational algorithms to predict the binding affinity of a ligand to a given receptor, which would allow potential compounds to be screened in silico, reducing costs and saving time. In particular, in response to the wealth of experimental data that exists both within pharmaceutical companies, and also in freely accessible online databases such as ChEMBL (1), approaches that attempt to "learn" from these data are increasingly gaining attention (2).An intuitive data-driven approach builds on the hypothesis that chemical commonalities among the known ligand set reveal salient features of the binding site. A corollary is that ligands with similar chemical functionality are expected to share similar binding affinity toward a particular receptor (3,4). This suggests that the known ligand set of a given target can be used to learn criteria that predict whether a novel ligand will bind to the target. This ligand-based approach is a powerful paradigm that does not require structural information about the receptor, which is potentially arduous to obtain, unlike other more atomistic methods such as docking or molecular dynamics.Any ligand-based method requires a way to quantify the chemical functionalities of a ligand, and various chemical descriptors have been proposed. Examples include a vector of measured or predicted physical properties (5-8), a vector enumerating the presence or absence of known functional groups on the ligand (9, 10), a vectorial representation of connectivities in the molecular graph (11, 12) (known also as molecular fingerprints), and simply the 3D shape of the ligand (13-16). Existing approaches then take the descriptor associated with each ligand and compare ligands with each other, for example through the Tanimoto coefficient (17, 18).Nonetheless, regardless of how ligand chemical func...