Enterotoxigenic Escherichia coli (ETEC) strains produce a type IV pilus named Longus. We identified a 16-gene cluster involved in the biosynthesis of Longus that has 57 to 95% identity at the protein level to CFA/III, another type IV pilus of ETEC. Alleles of the Longus structural subunit gene lngA demonstrate a diversity of 12 to 19% at the protein level with strong positive selection for point replacements and horizontal transfer.Enterotoxigenic Escherichia coli (ETEC) is an important cause of infant diarrhea in developing countries (5, 18), a leading cause of traveler's diarrhea (1, 14), and a re-emergent diarrheal pathogen in the United States (2, 35). One of the ETEC putative colonization factors is Longus, a type IV pilus (T4P) composed of a 22-kDa major structural subunit designated LngA (11, 13), which is estimated to be encoded or expressed by 10 to 35% of ETEC strains (12,15,20,21). Antibodies reacting with LngA were found in stool from patients with ETEC infections (22). LngA shares homology with the major subunit of various T4Ps (9, 13), including the toxin-coregulated pilin of Vibrio cholerae (29) and the bundle-forming pilin of enteropathogenic E. coli (10). The highest homology (79%) is shared by LngA with CofA, the major subunit of another T4P of ETEC-CFA/III pili (13,16,27). Here we identify the genes involved in the assembly and regulation of Longus and describe their genetic and evolutionary variability.Three plasmid libraries were constructed by partial restriction endonuclease digestion of the Longus-encoding virulence plasmid from ETEC strain E9034A. Plasmids were screened by PCR with lngA-derived primers and then other lng-specific primers (the plasmids used in this study are listed in Table 1). The Longus gene cluster obtained from ETEC strain E9034A is 14 kb in length and contains 16 open reading frames. Fourteen genes share considerable homology, as well as cluster topology, with CFA/III genes and are thus designated, in homology with the cof cluster, lngR, lngS, lngT, lngA, lngB, lngC, lngD, lngE, lngF, lngG, lngH, lngI, lngJ, and lngP (Fig.