SUMMARYWe have determined the DNA sequence of the long repeat region (RL) in the genome of herpes simplex virus type 1 (HSV-1) strain 17, as 9215 bp of composition 71.6~ G + C. In addition, the sequences of parts of the long unique region (UL) adjacent to the terminal (TRL) and internal (IRL) copies of RL were determined (2611 and 3836 bp, respectively). Gene organization in these regions of UL was deduced from the sequences and other available data. It was proposed that the region of UL sequenced, adjacent to TRL, contains three complete genes, none with significant previous characterization, and that the region of UL adjacent to IRL also contains three genes, one encoding the immediate early protein IE63. The RL sequence contains one well characterized gene, for the protein IEll0, whose organization we have described previously. Between the downstream end of the IE110 gene and UL there is a 3500 bp segment of RL in which we did not find convincing protein-coding sequences, and which thus remains of obscure functionality. Upstream of the IEll0 gene is a region previously proposed by others to contain a gene. However, our sequence data are not compatible with their interpretation. We do consider it possible that the region is protein-coding, but regard gene organization here as still unresolved.
INTRODUCTIONThe genome of herpes simplex virus type 1 (HSV-1) consists of a linear, double-stranded DNA of some 152000 bp (see McGeoch et al., 1988a). The DNA is regarded as consisting of two covalently linked segments, termed the long (L) and short (S) regions. Each of these contains a unique sequence (UL and Us) which is flanked by a pair of oppositely oriented repeat sequences (RL and Rs). The terminal and internal copies of RL are termed TRL and IRL ; similarly for Rs. This structure is shown in Fig. 1. The genome also possesses a terminal redundancy of 400 bp, termed the a sequence, and at the internal 'joint' between the L and S segments there are one or more further copies of the a sequence, oppositely oriented to the terminal copies. In a productively infected cell, an inversion process operates at the internal a sequence so that the progeny virion DNA population consists of a mixture of four sequence-orientation isomers which differ in the relative orientations of their L and S segments. One isomer is designated as the prototype for mapping purposes (Roizman, 1979).We have been engaged in general sequence determination of the genome of HSV-1 strain 17, and have published sequences for Us (McGeoch et al., 1985), Rs (McGeoch et al., 1986) and UL (McGeoch et al., 1988a). In this paper we report the complete sequence for RL, of 9200 bp, together with adjacent parts of UL. This work has allowed precise definition of the boundaries between UL and the flanking copies of RL, and of the organization of genes in UL adjacent to those junctions. RL contains one well characterized gene, encoding the immediate early (IE) transcriptional activator IE110, whose sequence we have previously described (Perry et al.,