Self-assembling proteins and protein fragments encoded by the Escherichia coli genome were identified from E. coli K-12 strain MG1655. Libraries of random DNA fragments cloned into a series of repressor fusion vectors were subjected to selection for immunity to infection by phage . Survivors were identified by sequencing the ends of the inserts, and the fused protein sequence was inferred from the known genomic sequence. Four hundred sixty-three nonredundant open reading frame-encoded interacting sequence tags (ISTs) were recovered from sequencing 2,089 candidates. These ISTs, which range from 16 to 794 amino acids in length, were clustered into families of overlapping fragments, identifying potential homotypic interactions encoded by 232 E. coli genes. Repressor fusions identified ISTs from genes in every protein-based functional category, but membrane proteins were underrepresented. The IST-containing genes were enriched for regulatory proteins and for proteins that form higher-order oligomers. Forty-eight (20.7%) homotypic proteins identified by ISTs are predicted to contain coiled coils. Although most of the IST-containing genes are identifiably related to proteins in other bacterial genomes, more than half of the ISTs do not have identifiable homologs in the Protein Data Bank, suggesting that they may include many novel structures. The data are available online at http://oligomers.tamu.edu/.For many proteins, quaternary structure is intimately coupled to function and stability. This coupling allows the regulation of many cellular processes to be controlled through specific assembly or disassembly of protein complexes as well as by conformational changes that alter how subunits contact one another.Proteins use a wide variety of quaternary structures to assemble multisubunit complexes. Genome-wide identification of protein interactions by use of genetic (21,36,46,57,58) or biochemical (15,19,41) screens has provided a wealth of insight into the diversity of structures used for self-assembly. In the annotation of predicted open reading frames (ORFs), assembly interactions are an important feature that provides insights into structure and function. In addition, the involvement of a gene product in a multimeric complex suggests strategies for the generation of assembly-based inhibitors for functional studies (18). The possibility that protein interactions represent a large and largely underexploited target for drug discovery has also been discussed (6,55) The study of the protein interactome has focused on heterotypic interactions, as these can provide links between proteins of unknown function and proteins of known function. However, homotypic interactions, which are found in both homomultimeric proteins and as subcomplexes of heteromultimeric proteins, may be the most common way to form protein complexes in nature (31). Although by definition self-interaction does not link a protein's function to that of another protein, homotypic interactions are important in the study of protein structure, function, and evol...