Background
- Atrial fibrillation (AF) often arises from structural abnormalities in the left atria (LA). Annotation of the non-coding genome in human LA is limited, as are effects on gene expression and chromatin architecture. Many AF-associated genetic variants reside in noncoding regions; this knowledge gap impairs efforts to understand the molecular mechanisms of AF and cardiac conduction phenotypes.
Methods
- We generated a model of the LA non-coding genome by profiling 7 histone post-translational modifications (active: H3K4me3, H3K4me2, H3K4me1, H3K27ac, H3K36me3; repressive: H3K27me3, H3K9me3),
CTCF
binding, and gene expression in samples from 5 individuals without structural heart disease or AF. We used MACS2 to identify peak regions (
P
< 0.01), applied a Markov model to classify regulatory elements, and annotated this model with matched gene expression data. We intersected chromatin states with eQTL, DNA methylation, and Hi-C chromatin interaction data from LA and left ventricle. Finally, we integrated genome wide association data for AF and electrocardiographic traits, to link disease-related variants to genes.
Results
- Our model identified 21 epigenetic states, encompassing regulatory motifs such as promoters, enhancers, and repressed regions. Genes were regulated by proximal chromatin states; repressive states were associated with a significant reduction in gene expression (
P
< 2x10
-16
). Chromatin states were differentially methylated, promoters were less methylated than repressed regions (
P
< 2x10
-16
). We identified over 15,000 LA-specific enhancers, defined by homeobox family motifs, and annotated several CVD susceptibility loci. Intersecting AF and PR GWAS loci with long-range chromatin conformation data identified a gene interaction network dominated by
NKX2-5
,
TBX3
,
ZFHX3
, and
SYNPO2L
.
Conclusions
- Profiling the non-coding genome provides new insights into the gene expression and chromatin regulation in human LA tissue. These findings enabled identification of a gene network underlying AF; our experimental and analytic approach is extensible to identifying molecular mechanisms for other cardiac diseases and traits.
The invention of chromatin conformation capture (3C) technology 1 and derived methods 2 has greatly advanced our knowledge of the principles and regulatory potential of three-dimensional (3D) genome folding in vivo. Insights obtained using these technologies include the discovery of regulatory chromatin loops that bring distal enhancers in close physical proximity to target gene promoters in order to increase their transcriptional output. These methods have also led to the identification of and architectural loops, often anchored by bound CTCF proteins, which form the structural chromosomal domains that spatially insulate transcription regulatory circuits 3,4 . Detailed topological studies and genetic evidence have further revealed that individual enhancers can contact and control the expression of multiple genes. Vice versa, single genes are oftentimes influenced by multiple enhancers 5,6 . Similarly, in population based assays individual CTCF sites can be seen contacting multiple other CTCF sites. Based on such observations it has . CC-BY-NC-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx.doi.org/10.1101/206094 doi: bioRxiv preprint first posted online Oct. 19, 2017; 3 been hypothesized that DNA may fold into spatial chromatin hubs 7,8 . However, current population-based pair-wise contact matrices cannot distinguish clustered interactions from mutually-exclusive interactions that independently occur in different cells. To investigate the existence and nature of these hubs, highthroughput strategies are needed for robust detection, analysis and interpretation of multi-way DNA contacts.Recently, several 3C procedures have been modified for the study of multi-way contacts, but most are limited in throughput and contact complexity [9][10][11][12] . A recently developed genome-wide approach for multi-contact analysis, called C-Walks (chromosomal walks) 11 gave an interesting glimpse of the spatial aggregation of genomic loci, indicating that cooperative hubs may be rare but present for example at polycomb bodies. C-walks, but also genome architecture mapping (GAM), another new method that analyzes genomic co-occurance frequencies in thin slices of fixed nuclei 13 , are difficult to scale up and currently don't offer the resolution necessary to study higher-order topologies inside structural domains.To enable the comprehensive study of spatial clustering of regulatory elements and genes, and dissect their interplay at the level of single alleles, we developed Multi-Contact 4C-seq (MC-4C Cas9-mediated in vitro digestion of respectively the viewpoint fragment (in between the inverse PCR primers) and its two neighbor fragments is performed prior to PCR. After PCR, the product is size-selected (>1.5Kb) and sequenced on the MinION sequencing platform (Fig. 1a).An integral component of MC-4C is its elaborate computational analysis stra...
Highlights d Enhancer annotation in marmoset tissues for disease model suitability analysis d Newly evolved enhancers are highly variable between individuals d New enhancers are poorly integrated in the transcriptional machinery d Disease-associated enhancers are more often conserved in marmoset
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.