Abbreviations: CBD, chitin-binding domain; BI, branched intermediate; BIL, Bacterial Intein-Like; HINT, Hedgehog/INTein; Hh-C, the C-terminal domain of the Hedgehog protein or hog protein; IMAC, immobilized metal affinity chromatography; IPTG, isopropyl-β-D-thiogalactoside; MchDnaB1 intein, DnaB1 intein from Mycobacterium chimaera; PDB, Protein Data Bank; r.m.s.d., root-mean-square deviation; PEG, polyethylene glycol; PMSF, phenylmethane sulfonyl fluoride; DTT, dithiothreitol
AbstractThe widely used molecular evolutionary clock assumes the divergent evolution of proteins.Convergent evolution has been proposed only for small protein elements but not for an entire protein fold. We investigated the structural basis of the protein splicing mechanism by class 3 inteins, which is distinct from class 1 and 2 inteins. We gathered structural and mechanistic evidence supporting the notion that the Hedgehog/INTein (HINT) superfamily fold, commonly found in protein splicing and related phenomena, could be an example of convergent evolution of an entire protein fold. We propose that the HINT fold is a structural and biochemical solution for trans-peptidyl and trans-esterification reactions.
IntroductionProteins fold into various defined three-dimensional structures to exert their unique biochemical functions. Proteins with similar structures and functions across different organisms share common ancestors and have evolved through divergent evolution 1 . However, protein structures could also converge into a similar structure to function analogously but having evolved from different ancestors. This convergent evolution is best exemplified by the catalytic Ser-His-Asp triad commonly found in hydrolases, suggesting the importance of structural and functional constraints required for catalysis 2,3,4 . Even though convergent evolution is a commonly observed phenomenon across the diversity of living organisms, the convergent evolution of protein structures has been documented for only small structural elements of proteins 5 .Structural convergence of an entire protein fold has not been reported 6 .Protein splicing catalyzed by intervening protein sequences termed inteins was discovered in the 1990s. The splicing reaction involves the self-removal of the intein and concomitant joining of the two flanking sequences (exteins) 7,8 . Protein splicing is analogous to RNA splicing but occurs on the protein level. The biological function of protein splicing is still enigmatic despite several proposals for eventual regulatory functions 9 . Inteins are often considered merely as selfish gene elements because they can be removed without affecting the fitness of their host organisms. Inteins commonly insert in conserved sequences close to the active sites of essential proteins. Any mutations within inteins detrimental to the protein splicing could be lethal or strongly affect the fitness of their host, which is the mechanism ensuring intein persistence and protection from degeneration.The most common protein splicing mechanism has been generall...