In recent years, many software packages for identifying structural variants (SVs) using whole-genome sequencing data have been released. When published, a new method is commonly compared with those already available, but this tends to be selective and incomplete. The lack of comprehensive benchmarking of methods presents challenges for users in selecting methods and for developers in understanding algorithm behaviours and limitations. Here we report the comprehensive evaluation of 10 SV callers, selected following a rigorous process and spanning the breadth of detection approaches, using high-quality reference cell lines, as well as simulations. Due to the nature of available truth sets, our focus is on general-purpose rather than somatic callers. We characterise the impact on performance of event size and type, sequencing characteristics, and genomic context, and analyse the efficacy of ensemble calling and calibration of variant quality scores. Finally, we provide recommendations for both users and methods developers.
The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multisample analysis.
The identification of genomic rearrangements, particularly in cancers, with high sensitivity and specificity using massively parallel sequencing remains a major challenge. Here, we describe the Genome Rearrangement IDentification Software Suite (GRIDSS), a high-speed structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph assembler. By combining assembly, split read and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line and patient tumour data, recently winning SV sub-challenge #5 of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods. GRIDSS identifies non-template sequence insertions, micro-homologies and large imperfect homologies, and supports multi-sample analysis. GRIDSS is freely available at https://github.com/PapenfussLab/gridss.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.