The amino acid sequence of fi-galactosidase (f-D-galactoside galactohydrolase, EC 3.2.1.23) has been compared to itself and to other proteins. Two segments, each of about 380 amino acids, comprising the first three-fourths of the polypeptide chain, were found to be very similar to each other. It is concluded that they are homologous. The carboxyl-terminal fourth has a high percentage of amino acid identities with dihydrofolate reductase of Escherichia coli, suggesting these sequences also are homologous. A model for the origin ofgalactosidase is presented. The overall similarity of ft-galactosidase to lac repressor does not appear to be significant.There exists a considerable body of literature concerned with the evolution of proteins (1-3). Amino acid sequences within many individual proteins have been examined for similarity to gain insight into probable mechanisms and rates of evolutionary processes. Proteins that carry out a specific function have been isolated from a wide variety of species and have also been compared to each other. On the other hand, little information is available regarding sequence homology for proteins that differ in function but are part of a single metabolic system or pathway. A bacterial operon is of interest from this point of view.The amino acid sequence of f3-galactosidase (fl-D-galactoside galactohydrolase, EC 3.2.1.23) of Escherichia coli has recently been determined (4). Each of its four identical polypeptide chains contains 1021 amino acid residues. Examination for sequence homology might yield some clues to explain the origin of this large protein. It is specified by the lacZ gene, the first of the three structural genes of the lac operon of E. coli (5). Transcription of the structural genes is controlled in part by lac repressor, a protein whose primary structure has also been determined (6). Comparison of amino acid sequences of #-galactosidase and lac repressor might yield information on evolution of proteins in an operon. This report presents the results of such examinations.
METHODSComparisons of the sequences of the proteins were done by computer. The program of Gibbs and McIntyre (7) was used to find the number of duplications of each given length and the expected number of such duplications.A second program was developed that examined all possible
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.