Escherichia coli is a clonal species. The best-understood components of its clonal variation are the flagellar (H) and polysaccharide (O) antigens, both well documented since the mid-1930s because of their use in serotyping. Flagellin is the protein subunit of the flagellum that carries H-antigen specificity. We show that 43 of the 54 H-antigen specificities of E. coli map to the flagellin gene at fliC and sequenced all 43 forms and confirmed specificity of each by cloning and expression. This is, to our knowledge, the first time that all known forms of such a highly polymorphic gene have been fully sequenced and characterized for any species. The established distinction between a highly variable central region and more conserved flanking regions is upheld. The sequences fall into two groups, one of which may be derived from the fliC gene of the E. coli/Salmonella enterica common ancestor, the other perhaps obtained by lateral transfer since species divergence. Comparison of sequences revealed that both horizontal DNA transfer and fixation of mutations under diversifying selection pressure contributed to polymorphism in this locus.The O polysaccharide and flagellin are the two major antigens of gram-negative bacteria, also known respectivly as the O and H antigens. Both are highly polymorphic, and Escherichia coli, if one includes the Shigella strains, has 187 O and 53 H forms defined by serology (4,6,15,21). In this study, we show that 43 of the 53 H forms map to the fliC locus and have sequenced all 43 alleles. In some strains the H-antigen phenotype maps to alternative loci, so we cloned, sequenced, and expressed the fliC gene from type strains to relate definitively H-antigen specificity and sequence. These data supplement the genome sequence data of E. coli K-12 and O157:H7 to give more comprehensive genetic information on the species and, in conjunction with the recently published structure (31) of one flagellin form, will allow analysis of the structural basis of the antigenic variation and development of a molecular typing scheme for the H antigen.The bacterial flagellum projects well beyond the surface of the cell and is rotated to provide motive power. The flagellar filament is composed of a single protein, flagellin. The flagellin proteins of E. coli and several other species are conserved in their terminal regions, while the central region is variable and carries H-serotype-specific epitopes (9,17,22,39,40). The structure of the Salmonella enterica LT2 flagellum is known from electron microscopy, X-ray fiber diffraction, and X-ray crystallography. Three domains are recognized (Fig. 1). The conserved terminal segments form the D1 domain located in the center of the flagellum, while the central region of the protein forms two domains (D2 and D3) exposed on the surface (31). The boundaries between D1 and D2 correspond quite well to the boundaries between the central and terminal regions of the protein as determined by alignment of sequences of different forms. However, because we are dealing mostly with...