Purpose Clinical genome sequencing (cGS) followed by orthogonal confirmatory testing is standard practice. While orthogonal testing significantly improves specificity, it also results in increased turnaround time and cost of testing. The purpose of this study is to evaluate machine learning models trained to identify false positive variants in cGS data to reduce the need for orthogonal testing. Methods We sequenced five reference human genome samples characterized by the Genome in a Bottle Consortium (GIAB) and compared the results with an established set of variants for each genome referred to as a truth set. We then trained machine learning models to identify variants that were labeled as false positives. Results After training, the models identified 99.5% of the false positive heterozygous single-nucleotide variants (SNVs) and heterozygous insertions/deletions variants (indels) while reducing confirmatory testing of nonactionable, nonprimary SNVs by 85% and indels by 75%. Employing the algorithm in clinical practice reduced overall orthogonal testing using dideoxynucleotide (Sanger) sequencing by 71%. Conclusion Our results indicate that a low false positive call rate can be maintained while significantly reducing the need for confirmatory testing. The framework that generated our models and results is publicly available at https://github.com/HudsonAlpha/STEVE.
Purpose : Clinical genome sequencing (cGS) followed by orthogonal confirmatory testing is standard practice. While orthogonal testing significantly improves specificity it also results in increased turn-around-time and cost of testing. The purpose of this study is to evaluate machine learning models trained to identify false positive variants in cGS data to reduce the need for orthogonal testing.Methods : We sequenced five reference human genome samples characterized by the Genome in a Bottle Consortium (GIAB) and compared the results to an established set of variants for each genome referred to as a 'truth-set'. We then trained machine learning models to identify variants that were labeled as false positives.Results : After training, the models identified 99.5% of the false positive heterozygous single nucleotide variants (SNVs) and heterozygous insertions/deletions variants (indels) while reducing confirmatory testing of true positive SNVs to 1.67% and indels to 20.29%. Employing the algorithm in clinical practice reduced orthogonal testing using dideoxynucleotide (Sanger) sequencing by 78.22%. Conclusion :Our results indicate that a low false positive call rate can be maintained while significantly reducing the need for confirmatory testing. The framework that generated our models and results is publicly available at https://github.com/HudsonAlpha/STEVE .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.