Background Long interspersed element 1 (LINE-1 or L1) retrotransposons are mobile elements that constitute 17–20% of the human genome. Strong correlations between abnormal L1 expression and several human diseases have been reported. This has motivated increasing interest in accurate quantification of the number of L1 copies present in any given biologic specimen. A main obstacle toward this aim is that L1s are relatively long DNA segments with regions of high variability, or largely present in the human genome as truncated fragments. These particularities render traditional alignment strategies, such as seed-and-extend inefficient, as the number of segments that are similar to L1s explodes exponentially. This study uses the pattern matching methodology for more accurate identification of L1s. We validate experimentally the superiority of pattern matching for L1 detection over alternative methods and discuss some of its potential applications. Results Pattern matching detected full-length L1 copies with high precision, reasonable computational time, and no prior input information. It also detected truncated and significantly altered copies of L1 with relatively high precision. The method was effectively used to annotate L1s in a target genome and to calculate copy number variation with respect to a reference genome. Crucial to the success of implementation was the selection of a small set of k-mer probes from a set of sequences presenting a stable pattern of distribution in the genome. As in seed-and-extend methods, the pattern matching algorithm sowed these k-mer probes, but instead of using heuristic extensions around the seeds, the analysis was based on distribution patterns within the genome. The desired level of precision could be adjusted, with some loss of recall. Conclusion Pattern matching is more efficient than seed-and-extend methods for the detection of L1 segments whose characterization depends on a finite set of sequences with common areas of low variability. We propose that pattern matching may help establish correlations between L1 copy number and disease states associated with L1 mobilization and evolution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.