16Retroviral integration site targeting is not random and plays a critical role in expression and long-term 17 survival of the integrated provirus. To better understand the genomic environment surrounding 18 retroviral integration sites, we performed an extensive comparative analysis of new and previously 19 published integration site data from evolutionarily diverse retroviruses from seven genera, including 20 different HIV-1 subtypes. We showed that evolutionarily divergent retroviruses exhibited distinct 21 integration site profiles with strong preferences for non-canonical B-form DNA (non-B DNA). Whereas 22 all lentiviruses and most retroviruses integrate within or near genes and non-B DNA, MMTV and ERV 23 integration sites were highly enriched in heterochromatin and transcription-silencing non-B DNA 24 features (e.g. G4, triplex and Z-DNA). Compared to in vitro-derived HIV-1 integration sites, in vivo-25 derived sites are significantly more enriched in transcriptionally silent regions of the genome and 26 transcription-silencing non-B DNA features. Integration sites from individuals infected with HIV-1 27 subtype A, C or D viruses exhibited different preferences for non-B DNA and were more enriched in 28 transcriptionally active regions of the genome compared to subtype B virus. In addition, we identified 29 several integration site hotspots shared between different HIV-1 subtypes with specific non-B DNA 30 sequence motifs present at these hotspots. Together, these data highlight important similarities and 31 differences in retroviral integration site targeting and provides new insight into how retroviruses 32 integrate into genomes for long-term survival.33 34 65 Africa.
66Several models, not mutually exclusive, have been proposed to explain integration site 67 selection. In the chromatin accessibility model, the structure of chromatin influences accessibility of 68 target DNA sequences to PICs. In vivo target DNA is not expected to be naked but rather wrapped in 69 nucleosomes. Wrapping target DNA in nucleosomes does not reduce integration, but instead creates 70 hotspots for integration at sites of probable DNA distortion (14, 15). Distortion of DNA in several other 71 protein-DNA complexes has also been shown to favour integration in the major grooves facing 72 outwards from the nucleosome core (16, 17). Although chromatin structure can facilitate integration, 73 chromatin accessibility cannot solely explain the differences observed in integration site preferences.
3The protein tethering model suggests that a cellular protein, potentially specific for each 75 retroviral genera, act as tethering factors between chromatin and the PIC. The most characterized 76 tethering factor identified to date is lens epithelium-derived growth factor and co-factor p75 77 (LEDGF/p75) (also known as PSIP1/p75) (18)(19)(20). LEDGF/p75 interacts with HIV integrase and 78 tethers the PIC to genomic DNA in transcriptionally active genes marked by specific histone 79 modifications such as H3K20me1, H3K27me1 and H3K36me3. In sim...