33Japan 34 35 ABSTRACT 37 Genome-wide association studies (GWAS) have identified over 150,000 links between 38 common genetic variants and human traits or complex diseases. Over 80% of these 39 associations map to polymorphisms in non-coding DNA. Therefore, the challenge is 40 to identify disease-causing variants, the genes they affect, and the cells in which 41 these effects occur. We have developed a platform using ATAC-seq, DNaseI 42 footprints, NG Capture-C and machine learning to address this challenge. Applying 43 this approach to red blood cell traits identifies a significant proportion of known 44 causative variants and their effector genes, which we show can be validated by direct 45 in vivo modelling.Identification of the variation of the genome that determines the risk of common chronic and 48 infectious diseases informs on their primary causes, which leads to preventative or 49 therapeutic approaches and insights. Whilst genome-wide association studies (GWASs) 50 have identified thousands of chromosome regions 1 , the identification of the causal genes, 51 variants and cell types remains a major bottleneck. This is due to three major features of the 52 genome and its complex association with disease susceptibility. Trait-associated variants 53 are often tightly associated, through linkage disequilibrium (LD), with tens or hundreds of 54 other variants, mostly single-nucleotide polymorphisms (SNPs), any one or more of which 55 could be causal; the majority (>85%) the variants identified in GWAS lie within the non-56 coding genome 2 . Although non-coding regions are increasingly well annotated, many 57 variants do not correspond to known regulatory elements, and even when they do, it is rarely 58 known which genes these elements control, and in which cell types. New technical 59 approaches to link variants to the genes they control are rapidly improving but are often 60 limited by their sensitivity and resolution [3][4][5][6] ; and because so few causal variants have been 61 unequivocally linked to the genes they affect, the mechanisms by which non-coding variants 62 alter gene expression remain unknown in all but a few cases; and, third, the complexity of 63 gene regulation and cell/cell interactions means that knowing when in development, in which 64 cell type, in which activation state, and within which pathway(s) a causal variant exerts its 65 effect is usually impossible to predict. Although significant progress is being made, currently, 66 none of these problems has been adequately solved. 68Here, we have developed an integrated platform of experimental and computational 69 methods to prioritise likely causal variants, link them to the genes they regulate, and 70 determine the mechanism by which they alter gene function. To illustrate the approach we 71 have initially focussed on a single haematopoietic lineage: the development of mature red 72 blood cells (RBC), for which all stages of lineage specification and differentiation from a 73 haematopoietic stem cell to a RBC are known, and can be r...
The promoters of mammalian genes are commonly regulated by multiple distal enhancers, which physically interact within discrete chromatin domains. How such domains form and how the regulatory elements within them interact within single cells is not understood. To address this we developed Tri-C, a new Chromosome Conformation Capture (3C) approach to identify concurrent chromatin interactions at individual alleles within single cells. The heterogeneity of interactions observed between such cells shows that CTCF-mediated formation of chromatin domains and interactions within them are dynamic processes. Importantly, our analyses reveal higher-order structures involving simultaneous interactions between multiple enhancers and promoters within individual cells. This provides a structural basis for understanding how multiple cis-elements act together to establish robust regulation of gene expression.All rights reserved. No reuse allowed without permission.(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.