Deep learning-based variant callers are becoming the standard and have achieved superior SNP calling performance using long reads. In this paper, we present Clair3, which makes the best of two major method categories: pile-up calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 ran faster than any of the other state-ofthe-art variant callers and performed the best, especially at lower coverage.
Main TextThe rst preprint of DeepVariant 1 was released in late 2016, marking the beginning of the use of deep learning-based methods (DL methods) instead of traditional statistical methods for variant calling. Over the years, several DL methods have been developed. We are now witnessing a complete take-over, led by DeepVariant for short-read variant calling. Long-read variant calling, using Oxford Nanopore (ONT) data, on the other hand, has been dominated by DL-methods since the beginning, primarily owing to the di culty caused by its higher base error rate in general. Although the DL methods for short-read and longread have a lot in common, the problem of long-read variant calling is considered more di cult. This led to two major designs -using pileup or full-alignment as the input of the decision-making neural network -which are signi cantly different in both performance and speed. Long-read variant callers, including Clairvoyante 2 , Clair 3 , and Nanocaller 4 , are pileup-based, in which the read alignments are summarized into features and counts before being inputted into a variant calling network. PEPPER-Margin-DeepVariant 5 (PEPPER) is full alignment-based. The input to the DeepVariant variant calling network is kept with spatial information in the read alignments and is tens of times larger than the pileup inputs in terms of size. Medaka 6 is consensus-based; it uses pileup input to generate a diploid consensus in the rst iteration and two haploid consensuses in the second. The differences between the reference and consensuses are identi ed and combined into variants. These are all state-of-the-art algorithms; the pileup-based algorithms are usually superior in terms of time e ciency and the full-alignment algorithms provide the best precision and recall. However, while the two designs are not mutually exclusive, there have not been any studies combining pileup calling and full-alignment calling.To ll the gap, we developed Clair3, the successor to Clair, which makes the best of both designs. It runs as fast as the pileup-based callers and performs as well as the full alignment-based callers. Supplementary Figure 1 shows the work ow for Clair3. The philosophy behind Clair3 is to trust the fullalignment model unless the pileup model can make a quick but reliable decision. First, the pileup calling network goes through all the variant candidates that passed a coverage threshold and an alternative allele frequency threshold. Next, the high-quality pileup calls are used to phase the alignments and as part of the nal output. Then, ...