HepG2 is one of the most widely used human cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher-order structural features of its genome beyond its karyotype were only cursorily known. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective.
Correct and complete interpretation of the extensive functional genomics data fromHepG2 requires an understanding of the cell line's genome sequence and genome structure. We performed deep whole-genome sequencing, mate-pair sequencing and linked-read sequencing to identify a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments, SNVs and Indels (both corrected for copynumber), phased haplotype blocks, structural variants (SVs) including complex genomic rearrangements, and novel mobile element insertions. A large number of SVs were phased, sequence assembled and experimentally validated. Several chromosomes show striking loss of heterozygosity. We re-analyzed HepG2 RNA-Seq and wholegenome bisulfite sequencing data for allele-specific expression and phased DNA methylation. We show examples where deeper insights into genomic regulatory complexity could be gained by taking knowledge of genomic structural contexts into account. Furthermore, we used the haplotype information to produce an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize HepG2.