A cDNA clone for a physiologically regulated Tetrahymena cysteine protease gene was sequenced. The nucleotide sequence predicts that the clone encodes a 336-amino acid protein composed of a 19-residue N-terminal signal sequence followed by a 107-residue propeptide and a 210-residue mature protein. Comparison of the deduced amino acid sequence of the protein with those of other cysteine proteases revealed a highly conserved interspersed amino acid motif in the propeptide region of the protein, the ERFNIN motif. The motif was present in all of the cysteine proteases in the data base with the exception of the cathepsin B-like proteins, which have shorter propeptides. Differences in the propeptides and in conserved amino acids of the mature proteins suggest that the ERFNIN proteases and the cathepsin B-like proteases constitute two distinct subfamilies within the cysteine proteases.The cysteine proteases are a family of enzymes that play an important role in intracellular protein degradation. These proteases and their cDNA clones have been isolated from phylogenetically diverse organisms ranging from slime mold to mammals. The tertiary structures of two plant cysteine proteases, papain and actinidin, have been solved (1, 2). The enzymes have two protein domains that come together to form the active site. Amino acid sequence homologies suggest this double domain structure is conserved in the animal thiol proteases cathepsins B, H, and L (3).The phylogenetic range of organisms for which the sequence of cysteine protease genes are known was extended by determination of the sequence of a cDNA clone for a gene from a ciliated protozoan, Tetrahymena thermophila.t Comparison of the deduced amino acid sequence to those of known cysteine proteases revealed the presence of an amino acid motif in the propeptide region consisting of highly conserved amino acids interspersed with variable residues. The motif was present in 15 of 20 cysteine proteases in the EMBL/GenBank data base (August 1992). The five proteases that lacked the motif were all cathepsin B-like enzymes. Recognition of the differences in the propeptide region prompted comparison of the mature proteins. Alignment of the amino acid sequences of the proteases as two separate groups allowed identification of amino acids that are highly conserved among the proteases with the propeptide motif or among the cathepsin B-like proteases but strikingly different between the two groups. We suggest that the proteins with the interspersed motif and the cathepsin B-like proteases represent two distinct classes of cysteine proteases that can be distinguished by both propeptide and mature protein structure.
MATERIALS AND METHODSTetrahymena clone pCyP (formerly BC11) is a cDNA clone of an RNA that is expressed in starved, but not growing, cells (4, 5). The clone was isolated from a cDNA library of RNA from starved cells cloned into the Pst I site of pUC9 (4). DNA fragments were subcloned into pBluescript for sequencing. The sequence was scanned for open reading frames by usi...