The entire nucleotide sequence of an infectious clone of human T-cell leukemia virus type II provirus was determined. This provirus consists of 8952 nucleotides. In addition to long terminal repeats and gag, pol, env, and X, a protease gene that is responsible for processing the gag precursor protein was found. The protease gene is encoded in a different frame from gag and poi and was located between the gag and pol open reading frames. Human T-cell leukemia virus type I (HTLV-I) and human Tcell leukemia virus type II (HTLV-II) are typical exogenous human retroviruses and have some characteristics in common with other retroviruses (1-3). These two viruses are related. The gag proteins of these virions show immunological cross-reactivity (4). The nucleotide sequences of the env regions of the two viruses show about 65% homology (5), although their envelope proteins show low cross-reactivity (6). In addition, HTLV-II and HTLV-I have a sequence of about 1.6 kilobase pairs (kbp), called X or pX, between env and the 3' long terminal repeat (LTR) (3,7,8). From comparison of this sequence in the two viruses, we and others previously predicted that this sequence might be translated (7,8), and, in fact, proteins of 41 and 38 kDa were found to be encoded from this region in HTLV-I-and HTLV-Il-infected cells, respectively (9)(10)(11)(12).To elucidate the functions of other regions of the HTLV-TI genome, we have determined the entire nucleotide sequence of the provirus. The provirus examined was molecularly cloned from a patient (Mo) with hairy cell leukemia and was found to be replication competent (8, 13). Analysis of the nucleotide sequence indicated that the HTLV-II provirus has the structure LTR-gag-protease-pol-env-X-LTR in this order from the 5' end of the genome (Fig. 1).MATERIALS AND METHODS DNAs and Sequencing. An infectious clone of HTLV-II provirus, XH6.0, was subcloned in pBR322 at the BamHI site (13). The corresponding subclones, pH6-B5.0 and pH6-B3.5, which covered the 5' and 3' halves of the original provirus, respectively, were used for sequencing. The method of Maxam and Gilbert was mainly used for sequencing (14), and the M13 phage method (15) was used for sequencing part of the region of the pol gene.
RESULTS AND DISCUSSIONThe nucleotide sequence of HTLV-II provirus consists of 8952 bases, as shown in Fig. 2. In addition to three major open reading frames, corresponding to gag, poi, and env, there are four large open reading frames. Three are located in the X region as reported previously (8) and the other is between the 3' end of gag and the 5' end of poi. This open reading frame was identified as the gene that codes for a protease that processes the precursor Gag protein to mature forms. The provirus has a genome structure (as shown in Fig. 4) different from that of any other retrovirus but similar to that of HTLV-I (3) and bovine leukemia virus (BLV) (16).LTR and 5' Noncoding Region. As we reported previously (17), the LTR has 763 bases, in which several functional domains, such as a promotor,...