Motivation There are very few methods for de novo genome assembly based on the overlap graph approach. It is considered as giving more exact results than the so-called de Bruijn graph approach but in much greater time and of much higher memory usage. It is not uncommon that assembly methods involving the overlap graph model are not able to successfully compute greater data sets, mainly due to memory limitation of a computer. This was the reason for developing in last decades mainly de Bruijn-based assembly methods, fast and fairly accurate. However, the latter methods can fail for longer or more repetitive genomes, as they decompose reads to shorter fragments and lose a part of information. An efficient assembler for processing big data sets and using the overlap graph model is still looked out. Results We propose a new genome-scale de novo assembler based on the overlap graph approach, designed for short-read sequencing data. The method, ALGA, incorporates several new ideas resulting in more exact contigs produced in short time. Among these ideas we have creation of a sparse but quite informative graph, reduction of the graph including a procedure referring to the problem of minimum spanning tree of a local subgraph, and graph traversal connected with simultaneous analysis of contigs stored so far. What is rare in genome assembly, the algorithm is almost parameter-free, with only one optional parameter to be set by a user. ALGA was compared with nine state-of-the-art assemblers in tests on genome-scale sequencing data obtained from real experiments on six organisms, differing in size, coverage, GC content, and repetition rate. ALGA produced best results in the sense of overall quality of genome reconstruction, understood as a good balance between genome coverage, accuracy, and length of resulting sequences. The algorithm is one of tools involved in processing data in currently realized national project Genomic Map of Poland. Availability ALGA is available at http://alga.put.poznan.pl. Supplementary information Supplementary material is available at Bioinformatics online.
The pseudoknot is a specific motif of the RNA structure that highly influences the overall shape and stability of a molecule. It occurs when nucleotides of two disjoint single-stranded fragments of the same chain, separated by a helical fragment, interact with each other and form base pairs. Pseudoknots are characterized by great topological diversity, and their systematic description is still a challenge. In our previous work, we have introduced the pseudoknot order: a new coefficient representing the topological complexity of the pseudoknotted RNA structure. It is defined as the minimum number of base pair set decompositions, aimed to obtain the unknotted RNA structure. We have suggested how it can be useful in the interpretation and understanding of a hierarchy of RNA folding. However, it is not trivial to unambiguously identify pseudoknots and determine their orders in an RNA structure. Therefore, since the introduction of this coefficient, we have worked on the method to reliably assign pseudoknot orders in correspondence to the mechanisms that control the biological process leading to their formation in the molecule. Here, we introduce a novel graph coloring-based model for the problem of pseudoknot order assignment. We show a specialized heuristic operating on the proposed model and an alternative integer programming algorithm. The performance of both approaches is compared with that of state-of-the-art algorithms which so far have been most efficient in solving the problem in question. We summarize the results of computational experiments that evaluate our new methods in terms of classification quality on a representative data set originating from the non-redundant RNA 3D structure repository.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.