Tel/ Fax: +86-020-85222787 1. Abstract:Motivation: As whole genome sequencing (WGS) is becoming cost-effective progressivelly, it has been applied increasingly in medical and scientific fields.Although the traditional variant-calling pipeline (BWA+GATK) has very high accuracy, false positives (FPs) are still an unavoidable problem that might lead to unfavorable outcomes, especially in clinical applications. As a result, filtering out FPs is recommended after variant calling. However, loss of true positives (TPs) is inevitable in FP-filtering methods, such as GATK hard filtering (GATK-HF).Therefore, to minimize the loss of TPs and maximize the filtration of FPs, a comprehensive understanding of the features of TPs and FPs, and building an improved model of classification are necessary. To obtain information about TPs and FPs, we used Platinum Genome (PT) as the mutation reference and its 300× deep sequenced dataset NA12878 as the simulation template. Then random sampling across depth gradients from NA12878 was performed to study the depth effect.Results: FPs among heterozygous mutations were found to have pattern distinct from that of FPs among homozygous mutations. FPfilter makes use of this model to filter out FPs specifically. We evaluated FPfilter on a training dataset with depth gradients from NA12878 and a test dataset from NA12877 and NA24385. Compared with GATK-HF, FPfilter showed a significantly higher FP/TP filtration ratio and F-measure score. Our results indicate that FPfilter provides an improved model for distinguishing FPs from TPs and filters FPs with high specificity.Availability: FPfilter is freely available for download on GitHub (https://github.com/yuxiangtan/FPfilter). Users can easily install it from anaconda.