The primary structure of protein SI, the largest protein component of the Escherichia coli ribosome, has been elucidated by determining the amino acid sequence of the protein (from E. coli MRE600) and the nucleotide sequence of the SI gene (rpsA, of a K-12 strain). The two methods gave results in perfect agreement except at two positions where possible strain-specific differences were found. Protein S1 (MRE600) is composed of 557 amino acid residues (no modified amino acids were detected) and has Mr 61,159. The DNA sequence for protein SI (K-12) suggests 556 amino acid residues. A computer survey of the sequence revealed three regions in S1 with a high degree ofinternal homology. The ribosome binding domain of SI (NH2 terminus) does not show any preponderance ofbasic amino acids. The two cysteine and the majority of tryptophan residues of S1 as well as two of the three homologous regions are located in its middle region which contains the nucleic acid binding domain. The pattern of degenerate codon usage in the S1 gene is nonrandom and similar to that reported for other ribosomal protein genes.Protein S1 ofthe Escherichia coli ribosome is the largest protein component of this organelle (1). There have been some questions earlier about its stoichiometry in ribosomes but it is now established that S1 is present in one copy per ribosomal particle (2,3). In addition to its occurrence in ribosomes, protein S1 is also a component of the multimeric enzyme Qf3 replicase (4) found in E. coli cells infected with bacteriophage Q,3.Studies from several laboratories (e.g., refs. 5 and 6) have established an essential role-for S1 in the initiation of protein synthesis by E. coli ribosomes, but the exact nature of this role has not been elucidated. Protein S1 is a very elongated protein of about the same length as the ribosome (-250 A) according to various physical measurements (7-10). It is a strong RNA binding protein and is capable of unwinding double-stranded regions in RNA structure (11)(12)(13)(14). According to recent studies (15-17), protein S1 is organized into two distinct functional domains-one for binding to the ribosome and the other for binding to RNA.The structural gene for S1 (rpsA) has been mapped at 20 min on the E. coli K-12 chromosome (18) and a transducing A phage carrying rpsA (ASerC) has been isolated (19). It therefore is feasible to determine the primary structure of protein S1 by both protein and DNA sequence determination procedures. The two methods, when combined, shorten the time required for establishing the primary structure of a protein of the size of S1. Knowledge of the primary structure of S1 would help to understand the various functions associated with this protein.In this paper we report the primary structure of S1 as determined by both protein and DNA sequences and some essential features of this structure.MATERIALS AND METHODS Protein Sequence. Protein S1 was isolated from E. coli MRE600. The protein was cleaved with CNBr to yield six fragments which were isolated in pure form (...