The Nudix homology clan encompasses over 80,000 protein domains from all three
domains of life, defined by homology to each other. Proteins with a domain from this clan
fall into four general functional classes: pyrophosphohydrolases, isopentenyl diphosphate
isomerases (IDIs), adenine/guanine mismatch-specific adenine glycosylases (A/G-specific
adenine glycosylases), and non-enzymatic activities such as protein/protein interaction
and transcriptional regulation. The largest group, pyrophosphohydrolases, encompasses more
than 100 distinct hydrolase specificities. To understand the evolution of this vast number
of activities, we assembled and analyzed experimental and structural data for 205 Nudix
proteins collected from the literature. We corrected erroneous functions or provided more
appropriate descriptions for 53 annotations described in the Gene Ontology Annotation
database in this family, and propose 275 new experimentally-based annotations. We manually
constructed a structure-guided sequence alignment of 78 Nudix proteins. Using the
structural alignment as a seed, we then made an alignment of 347 “select”
Nudix homology domains, curated from structurally determined, functionally characterized,
or phylogenetically important Nudix domains. Based on our review of Nudix
pyrophosphohydrolase structures and specificities, we further analyzed a loop region
downstream of the Nudix hydrolase motif previously shown to contact the substrate molecule
and possess known functional motifs. This loop region provides a potential structural
basis for the functional radiation and evolution of substrate specificity within the
hydrolase family. Finally, phylogenetic analyses of the 347 select protein domains and of
the complete Nudix homology clan revealed general monophyly with regard to function and a
few instances of probable homoplasy.