Knowledge-based approaches frequently employ empirical relations to determine effective potentials for coarse-grained protein models directly from protein databank structures. Although these approaches have enjoyed considerable success and widespread popularity in computational protein science, their fundamental basis has been widely questioned. It is well established that conventional knowledge-based approaches do not correctly treat manybody correlations between amino acids. Moreover, the physical significance of potentials determined by using structural statistics from different proteins has remained obscure. In the present work, we address both of these concerns by introducing and demonstrating a theory for calculating transferable potentials directly from a databank of protein structures. This approach assumes that the databank structures correspond to representative configurations sampled from equilibrium solution ensembles for different proteins. Given this assumption, this physics-based theory exactly treats many-body structural correlations and directly determines the transferable potentials that provide a variationally optimized approximation to the free energy landscape for each protein. We illustrate this approach by first constructing a databank of protein structures using a model potential and then quantitatively recovering this potential from the structure databank. The proposed framework will clarify the assumptions and physical significance of knowledge-based potentials, allow for their systematic improvement, and provide new insight into many-body correlations and cooperativity in folded proteins.protein structure prediction | coarse-grained models | inverse problems | Yvon-Born-Green theory A ccurate potentials are essential for quantitative models of protein structure, dynamics, and function. Although atomistic force fields provide an accurate description of protein structure and fluctuations on nanosecond time scales (1), atomically detailed models remain prohibitively expensive for investigating processes that evolve on microsecond time scales or longer. In contrast, low-resolution coarse-grained (CG) models provide a highly efficient alternative for characterizing protein dynamics on time scales that are inaccessible to atomistic models (2). Indeed, since the seminal work of Levitt and Warshel (3), CG protein models have provided a powerful tool for protein structure prediction (4), for studying protein self-assembly (5), for characterizing folding dynamics (6-8), and for investigating functional fluctuations (9). Consequently, the development of transferable CG potentials that accurately model protein structure would represent a significant advance for many areas of computational protein science.Following the pioneering work of Tanaka and Scheraga (10), many investigators (11-15) have employed the structural correlations observed within the Protein Data Bank (PDB) (16) to determine effective "knowledge-based" statistical potentials (KBPs). In particular, quasi-chemical (17) and Boltzmann-...