Motivation
Long-read phasing has been used for reconstructing diploid genomes, improving variant calling, and resolving microbial strains in metagenomics. However, the phasing blocks of existing methods are broken by large Structural Variations (SVs), and the efficiency is unsatisfactory for population-scale phasing.
Results
This paper presents a novel algorithm, LongPhase, which can simultaneously phase single nucleotide polymorphisms (SNPs) and SVs of a human genome in 10-20 minutes, 10x faster than the state-of-the-art WhatsHap, HapCUT2 and Margin. In particular, co-phasing SNPs and SVs produces much larger haplotype blocks (N50 = 25Mbp) than those of existing methods (N50 = 10-15Mbp). We show that LongPhase combined with Nanopore ultra-long reads is a cost-effective and highly contiguous solution, which can produce between one and 26 blocks per chromosome arm without the need for additional trios, chromosome-conformation, and strand-seq data.
Availability
LongPhase is freely available at https://github.com/twolinin/LongPhase/.
Supplementary information
Supplementary figures and tables are available online.