Background
With the rapid development of long-read sequencing technologies, it is possible to reveal the full spectrum of genetic structural variation (SV). However, the expensive cost, finite read length and high sequencing error for long-read data greatly limit the widespread adoption of SV calling. Therefore, it is urgent to establish guidance concerning sequencing coverage, read length, and error rate to maintain high SV yields and to achieve the lowest cost simultaneously.
Results
In this study, we generated a full range of simulated error-prone long-read datasets containing various sequencing settings and comprehensively evaluated the performance of SV calling with state-of-the-art long-read SV detection methods. The benchmark results demonstrate that almost all SV callers perform better when the long-read data reach 20× coverage, 20 kbp average read length, and approximately 10–7.5% or below 1% error rates. Furthermore, high sequencing coverage is the most influential factor in promoting SV calling, while it also directly determines the expensive costs.
Conclusions
Based on the comprehensive evaluation results, we provide important guidelines for selecting long-read sequencing settings for efficient SV calling. We believe these recommended settings of long-read sequencing will have extraordinary guiding significance in cutting-edge genomic studies and clinical practices.
Long-read sequencing technologies have great potential for the comprehensive discovery of structural variation (SV). However, the accurate genotype assignment for SV is still a challenge since the unavoidable factors like specific sequencing errors or limited coverages. Herein, we propose cuteSV2, a fast and accurate long-read-based re-genotyping approach that is able to force-calling genotypes for the given records. cuteSV2 is an upgraded version of cuteSV and applies an improved strategy of the refinement and purification of the heuristic extracted signatures through spatial and allele similarity estimation. The benchmarking results on several baseline evaluations demonstrate that cuteSV2 outperforms the-state-of-art methods and has stable and robust capability in practice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.