The formation of mature mRNAs in vertebrates involves the cleavage and polyadenylation of the pre-mRNA, 10-30 nt downstream of an AAUAAA or AUUAAA signal sequence. The extensive cDNA data now available shows that these hexamers are not strictly conserved. In order to identify variant polyadenylation signals on a large scale, we compared over 8700 human 3Ј untranslated sequences to 157,775 polyadenylated expressed sequence tags (ESTs), used as markers of actual mRNA 3Ј ends. About 5600 EST-supported putative mRNA 3Ј ends were collected and analyzed for significant hexameric sequences. Known polyadenylation signals were found in only 73% of the 3Ј fragments. Ten single-base variants of the AAUAAA sequence were identified with a highly significant occurrence rate, potentially representing 14.9% of the actual polyadenylation signals. Of the mRNAs, 28.6% displayed two or more polyadenylation sites. In these mRNAs, the poly(A) sites proximal to the coding sequence tend to use variant signals more often, while the 3Ј-most site tends to use a canonical signal. The average number of ESTs associated with each signal type suggests that variant signals (including the common AUUAAA) are processed less efficiently than the canonical signal and could therefore be selected for regulatory purposes. However, the position of the site in the untranslated region may also play a role in polyadenylation rate.The 3Ј untranslated regions (UTRs) of eukaryotic mRNAs contain regulatory elements affecting mRNA translation, stability, and transport. Mature 3Ј UTRs are formed by polyadenylation of the pre-mRNA, a coupled reaction involving endonucleolytic cleavage followed by poly(A) synthesis. A significant fraction of mRNAs display multiple polyadenylation sites (Gautheret et al. 1998). The choice of poly(A) sites may influence the stability, translation efficiency, or localization of an mRNA in a tissue-or disease-specific manner (Edwalds-Gilbert et al. 1997). In the mammalian system, effective polyadenylation requires two main sequence components: a highly conserved AAUAAA signal located 10-30 nucleotide 5Ј to the cleavage site and a more variable GU-rich element, 20-40 bases 3Ј of the site (see Proudfoot 1991; Colgan and Manley 1997 for reviews). Although the AAUAAA signal is often considered to be present in 90% of the mRNAs and replaced by a AUUAAA variant in the other 10% (Wahle and Keller 1996; Colgan and Manley 1997), alternate signals are certainly present in a significant fraction of the 3Ј ends (Claverie 1997;Gautheret et al. 1998;Tabaska and Zhang 1999;Graber et al. 1999).The expressed sequence tag (EST) database, dbEST (Boguski et al. 1993), which contains highly redundant partial cDNAs, especially from the 3Ј UTRs, is a rich source of information on mRNA 3Ј ends. Analyzing clustered EST sequences, we previously identified multiple cases of alternate polyadenylation in mRNA (Gautheret et al. 1998). Based on a public EST collection now containing over 1.4 million human sequences, the present work focuses on the region immediatel...