BackgroundRecently copy number variation (CNV) has gained considerable interest as a type of genomic/genetic variation that plays an important role in disease susceptibility. Advances in sequencing technology have created an opportunity for detecting CNVs more accurately. Recently whole exome sequencing (WES) has become primary strategy for sequencing patient samples and study their genomics aberrations. However, compared to whole genome sequencing, WES introduces more biases and noise that make CNV detection very challenging. Additionally, tumors’ complexity makes the detection of cancer specific CNVs even more difficult. Although many CNV detection tools have been developed since introducing NGS data, there are few tools for somatic CNV detection for WES data in cancer.ResultsIn this study, we evaluated the performance of the most recent and commonly used CNV detection tools for WES data in cancer to address their limitations and provide guidelines for developing new ones. We focused on the tools that have been designed or have the ability to detect cancer somatic aberrations. We compared the performance of the tools in terms of sensitivity and false discovery rate (FDR) using real data and simulated data. Comparative analysis of the results of the tools showed that there is a low consensus among the tools in calling CNVs. Using real data, tools show moderate sensitivity (~50% - ~80%), fair specificity (~70% - ~94%) and poor FDRs (~27% - ~60%). Also, using simulated data we observed that increasing the coverage more than 10× in exonic regions does not improve the detection power of the tools significantly.ConclusionsThe limited performance of the current CNV detection tools for WES data in cancer indicates the need for developing more efficient and precise CNV detection methods. Due to the complexity of tumors and high level of noise and biases in WES data, employing advanced novel segmentation, normalization and de-noising techniques that are designed specifically for cancer data is necessary. Also, CNV detection development suffers from the lack of a gold standard for performance evaluation. Finally, developing tools with user-friendly user interfaces and visualization features can enhance CNV studies for a broader range of users.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-017-1705-x) contains supplementary material, which is available to authorized users.
BackgroundDue to recent advances in sequencing technologies, sequence-based analysis has been widely applied to detecting copy number variations (CNVs). There are several techniques for identifying CNVs using next generation sequencing (NGS) data, however methods employing depth of coverage or read depth (RD) have recently become a main technique to identify CNVs. The main assumption of the RD-based CNV detection methods is that the readcount value at a specific genomic location is correlated with the copy number at that location. However, readcount data’s noise and biases distort the association between the readcounts and copy numbers. For more accurate CNV identification, these biases and noise need to be mitigated. In this work, to detect CNVs more precisely and efficiently we propose a novel denoising method based on the total variation approach and the Taut String algorithm.ResultsTo investigate the performance of the proposed denoising method, we computed sensitivities, false discovery rates and specificities of CNV detection when employing denoising, using both simulated and real data. We also compared the performance of the proposed denoising method, Taut String, with that of the commonly used approaches such as moving average (MA) and discrete wavelet transforms (DWT) in terms of sensitivity of detecting true CNVs and time complexity. The results show that Taut String works better than DWT and MA and has a better power to identify very narrow CNVs. The ability of Taut String denoising in preserving CNV segments’ breakpoints and narrow CNVs increases the detection accuracy of segmentation algorithms, resulting in higher sensitivities and lower false discovery rates.ConclusionsIn this study, we proposed a new denoising method for sequence-based CNV detection based on a signal processing technique. Existing CNV detection algorithms identify many false CNV segments and fail in detecting short CNV segments due to noise and biases. Employing an effective and efficient denoising method can significantly enhance the detection accuracy of the CNV segmentation algorithms. Advanced denoising methods from the signal processing field can be employed to implement such algorithms. We showed that non-linear denoising methods that consider sparsity and piecewise constant characteristics of CNV data result in better performance in CNV detection.
Copy number variation (CNV) is a type of genomic/genetic variation that plays an important role in phenotypic diversity, evolution, and disease susceptibility. Next generation sequencing (NGS) technologies have created an opportunity for more accurate detection of CNVs with higher resolution. However, efficient and precise detection of CNVs remains challenging due to high levels of noise and biases, data heterogeneity and the "big data" nature of NGS data. Sequence coverage (readcount) data are mostly used for detecting CNVs, specially for whole exome sequencing data. Readcount data are contaminated with several types of biases and noise that hinder accurate detection of CNVs. In this work, we introduce a novel preprocessing pipeline for reducing noise and biases to improve the detection accuracy of CNVs in heterogeneous NGS data, such as cancer whole exome sequencing data. We have employed several normalization methods to reduce readcount's biases that are due to GC content of reads, read alignment problems, and sample impurity. We have also developed a novel efficient and effective smoothing approach based on Taut String to reduce noise and increase CNV detection power. Using simulated and real data we showed that employing the proposed preprocessing pipeline significantly improves the accuracy of CNV detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.