DNA comprises molecular information stored in genetic and epigenetic bases, both of which are vital to our understanding of biology. Most DNA sequencing approaches address either genetics or epigenetics and thus capture incomplete information. Methods widely used to detect epigenetic DNA bases fail to capture common C-to-T mutations or distinguish 5-methylcytosine from 5-hydroxymethylcytosine. We present a single base-resolution sequencing methodology that sequences complete genetics and the two most common cytosine modifications in a single workflow. DNA is copied and bases are enzymatically converted. Coupled decoding of bases across the original and copy strand provides a phased digital readout. Methods are demonstrated on human genomic DNA and cell-free DNA from a blood sample of a patient with cancer. The approach is accurate, requires low DNA input and has a simple workflow and analysis pipeline. Simultaneous, phased reading of genetic and epigenetic bases provides a more complete picture of the information stored in genomes and has applications throughout biomedicine.
Genome sequencing has revealed over 300 million genetic variations in human populations. Over 90% of variants are single nucleotide polymorphisms (SNPs), the remainder include short deletions or insertions, and small numbers of structural variants. Hundreds of thousands of these variants have been associated with specific phenotypic traits and diseases through genome wide association studies which link significant differences in variant frequencies with specific phenotypes among large groups of individuals. Only 5% of disease-associated SNPs are located in gene coding sequences, with the potential to disrupt gene expression or alter of the function of encoded proteins. The remaining 95% of disease-associated SNPs are located in non-coding DNA sequences which make up 98% of the genome. The role of non-coding, disease-associated SNPs, many of which are located at considerable distances from any gene, was at first a mystery until the discovery that gene promoters regularly interact with distal regulatory elements to control gene expression. Disease-associated SNPs are enriched at the millions of gene regulatory elements that are dispersed throughout the non-coding sequences of the genome, suggesting they function as gene regulation variants. Assigning specific regulatory elements to the genes they control is not straightforward since they can be millions of base pairs apart. In this review we describe how understanding 3D genome organization can identify specific interactions between gene promoters and distal regulatory elements and how 3D genomics can link disease-associated SNPs to their target genes. Understanding which gene or genes contribute to a specific disease is the first step in designing rational therapeutic interventions.
Early detection of cancer will improve survival rates. The blood biomarker 5-hydroxymethylcytosine has been shown to discriminate cancer. In a large covariate-controlled study of over two thousand individual blood samples, we created, tested and explored the properties of a 5-hydroxymethylcytosine-based classifier to detect colorectal cancer (CRC). In an independent validation sample set, the classifier discriminated CRC samples from controls with an area under the receiver operating characteristic curve (AUC) of 90% (95% CI [87, 93]). Sensitivity was 55% at 95% specificity. Performance was similar for early stage 1 (AUC 89%; 95% CI [83, 94]) and late stage 4 CRC (AUC 94%; 95% CI [89, 98]). The classifier could detect CRC even when the proportion of tumor DNA in blood was undetectable by other methods. Expanding the classifier to include information about cell-free DNA fragment size and abundance across the genome led to gains in sensitivity (63% at 95% specificity), with similar overall performance (AUC 91%; 95% CI [89, 94]). We confirm that 5-hydroxymethylcytosine can be used to detect CRC, even in early-stage disease. Therefore, the inclusion of 5-hydroxymethylcytosine in multianalyte testing could improve sensitivity for the detection of early-stage cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.