Unlike most genes transcribed by RNA polymerase II, the simian virus 40 late transcription unit does not have a TATA box. To determine what sequences are required for initiation at the major late mRNA cap site of simian virus 40, clustered point mutations were constructed and tested for transcriptional activity in vitro and in vivo. Three promoter elements were defined. The first is centered 31 base pairs upstream of the cap site in a position normally reserved for a TATA box. The second is at the cap site. The third occupies a novel position centered 28 base pairs downstream of the cap site within a protein-coding sequence. The ability of RNA polymerase II to recognize this promoter suggests that there is greater variation in promoter architecture than had been believed previously.Synthesis of eucaryotic mRNAs requires specific DNA sequences that are located close to the transcriptional initiation site. These sequences, termed promoters, bind protein factors that position the transcriptional initiation reaction at the mRNA cap site (for review, see references 17 and 46). The binding of protein factors to promoters has also been implicated in biological regulation, for example, during the induction of transcription by heat shock (69, 74) and phorbol esters (1,40), and in the tissue-specific expression of several genes (42). For many genes, the level of transcription is also regulated by enhancer sequences that are distinct from promoter sequences. These elements interact with promoters in an orientation-and distance-independent manner.The sequences that make up eucaryotic promoters have been studied in detail. For protein-coding genes, which are transcribed by RNA polymerase II, the first common sequence to be identified was the sequence TATAA. These so-called TATA boxes are usually located 30 base pairs upstream of the cap site, and significant changes in transcription have been demonstrated when these sequences are mutated (12,45,72,76 Previous studies have indicated that sequences controlling SV40 late gene expression extend over a 350-base-pair region, beginning at the origin of viral replication and extending downstream to, and possibly beyond, the major late cap site at nucleotide 325. One of the major mechanisms controlling the level of late gene expression is trans-activation by an early gene product, large T antigen (5, 37). The sequences required for this trans-activation lie considerably upstream of the major late start and include sequences within the origin of replication and the 72-base-pair repeats (6, 31, 38). The locations of the upstream control sequences are flexible and, in this respect, the sequences are enhancerlike (38,52,64).In contrast to the upstream regulatory elements, the sequences that position the late transcriptional initiation sites are not well understood. There is no TATA box upstream of the major late initiation site, suggesting that the SV40 late promoter may be fundamentally different from other promoters. However, it seemed likely that sequences near the transcriptional sta...