CTCF is a key insulator-binding protein and mammalian genomes contain numerous CTCFbinding sites (CBSs), many of which are organized in tandem arrays. Here we provide direct evidence that CBSs, if located between enhancers and promoters in the Pcdh and -globin clusters, function as an enhancer-blocking insulator by forming distinct directional chromatin loops, regardless whether enhancers contain CBS or not. Moreover, computational simulation and experimental capture revealed balanced promoter usage in cell populations and stochastic monoallelic expression in single cells by large arrays of tandem variable CBSs. Finally, gene expression levels are negatively correlated with CBS insulators located between enhancers and promoters on a genome-wide scale. Thus, single CBS insulators ensure proper enhancer insulation and promoter activation while tandem-arrayed CBS insulators determine balanced promoter usage. This finding has interesting implications on the role of topological insulators in 3D genome folding and developmental gene regulation.2 22]. Insertion, mutation, deletion, inversion, or duplication of CBS elements alters chromatin topology and gene expression [12, 14-16, 18, 22-24]. Emerging evidence suggests that spatial control of genome topology by CTCF/Cohesin regulates gene expression; however, how numerous CBS elements in mammalian genomes function as insulators to control proper promoter activation and balanced usage remains obscure.
RESULTS
Exogenous CTCF Sites as Protocadherin InsulatorsSimilar to the enormous diversity of DSCAM1 proteins in Drosophila, combinatorial cisand trans-interactions between mammalian clustered cell-surface protocadherin (Pcdh) proteins, encoded by the three closely-linked gene clusters (, , and ), endow individual neurons with a unique identity code and specific self-recognition module, which are required for neuronal migration, dendrite self-avoidance, and axon tiling in the brain [25][26][27][28][29][30][31]. The human Pcdh cluster contains 13 highly-similar, tandem-arrayed, unusually-large "alternate" variable exons (1-13) and 2 divergent "ubiquitous" variable exons (c1-c2), followed by 3 downstream small constant exons ( Figure 1A), reminiscent of the variable and constant genome organizations of immunoglobulin (Ig) and T-cell receptor (Tcr) clusters [25,32]. Each of the 13 "alternate" variable exons (1-13) carries its own promoter, which is flanked by two forward-oriented CBS (CSE and eCBS) elements ( Figure 1A). However, the c1 "ubiquitous" promoter carries only one forward-oriented CBS and the c2 promoter has no CBS element ( Figure 1A). Two distal Pcdh enhancers, HS7 and HS5-1, are located downstream, and one of which, HS5-1, is flanked by two reverse-oriented CBS (HS5-1a and HS5-1b) elements [33,34]. Multiple longdistance chromatin interactions between these enhancers and Pcdh target promoters form a transcription hub and determine the promoter choice [34,35]. We performed single-cell RNAseq of mouse cortical neurons and found members of the Pcdh clu...