In
theoretical modeling of a physical system, a crucial step consists
of the identification of those degrees of freedom that enable a synthetic
yet informative representation of it. While in some cases this selection
can be carried out on the basis of intuition and experience, straightforward
discrimination of the important features from the negligible ones
is difficult for many complex systems, most notably heteropolymers
and large biomolecules. We here present a thermodynamics-based theoretical
framework to gauge the effectiveness of a given simplified representation
by measuring its information content. We employ this method to identify
those reduced descriptions of proteins, in terms of a subset of their
atoms, that retain the largest amount of information from the original
model; we show that these highly informative representations share
common features that are intrinsically related to the biological properties
of the proteins under examination, thereby establishing a bridge between
protein structure, energetics, and function.