In the last decade, a technological shift in the bioinformatics field has occurred: larger genomes can now be sequenced quickly and cost effectively, resulting in the computational need to efficiently compare large and abundant sequences. Furthermore, detecting conserved similarities across large collections of genomes remains a problem. The size of chromosomes, along with the substantial amount of noise and number of repeats found in DNA sequences (particularly in mammals and plants), leads to a scenario where executing and waiting for complete outputs is both time and resource consuming. Filtering steps, manual examination and annotation, very long execution times and a high demand for computational resources represent a few of the many difficulties faced in large genome comparisons. In this work, we provide a method designed for comparisons of considerable amounts of very long sequences that employs a heuristic algorithm capable of separating noise and repeats from conserved fragments in pairwise genomic comparisons. We provide software implementation that computes in linear time using one core as a minimum and a small, constant memory footprint. The method produces both a previsualization of the comparison and a collection of indices to drastically reduce computational complexity when performing exhaustive comparisons. Last, the method scores the comparison to automate classification of sequences and produces a list of detected synteny blocks to enable new evolutionary studies.
Abstract. Metagenomics is an inherently complex field in which one of the primary goals is to determine the compositional organisms present in an environmental sample. Thereby, diverse tools have been developed that are based on the similarity search results obtained from comparing a set of sequences against a database. However, to achieve this goal there still are affairs to solve such as dealing with genomic variants and detecting repeated sequences that could belong to different species in a mixture of uneven and unknown representation of organisms in a sample. Hence, the question of whether analyzing a sample with reads provides further understanding of the metagenome than with contigs arises. The assembly yields larger genomic fragments but bears the risk of producing chimeric contigs. On the other hand, reads are shorter and therefore their statistical significance is harder to asses, but there is a larger number of them. Consequently, we have developed a workflow to assess and compare the quality of each of these alternatives. Synthetic read datasets beloging to previously identified organisms are generated in order to validate the results. Afterwards, we assemble these into a set of contigs and perform a taxonomic analysis on both datasets. The tools we have developed demonstrate that analyzing with reads provide a more trustworthy representation of the species in a sample than contigs especially in cases that present a high genomic variability.
BackgroundTechnical advances in mobile devices such as smartphones and tablets have produced an extraordinary increase in their use around the world and have become part of our daily lives. The possibility of carrying these devices in a pocket, particularly mobile phones, has enabled ubiquitous access to Internet resources. Furthermore, in the life sciences world there has been a vast proliferation of data types and services that finish as Web Services. This suggests the need for research into mobile clients to deal with life sciences applications for effective usage and exploitation.ResultsAnalysing the current features in existing bioinformatics applications managing Web Services, we have devised, implemented, and deployed an easy-to-use web-based lightweight mobile client. This client is able to browse, select, compose parameters, invoke, and monitor the execution of Web Services stored in catalogues or central repositories. The client is also able to deal with huge amounts of data between external storage mounts. In addition, we also present a validation use case, which illustrates the usage of the application while executing, monitoring, and exploring the results of a registered workflow. The software its available in the Apple Store and Android Market and the source code is publicly available in Github.ConclusionsMobile devices are becoming increasingly important in the scientific world due to their strong potential impact on scientific applications.Bioinformatics should not fall behind this trend. We present an original software client that deals with the intrinsic limitations of such devices and propose different guidelines to provide location-independent access to computational resources in bioinformatics and biomedicine. Its modular design makes it easily expandable with the inclusion of new repositories, tools, types of visualization, etc.Electronic supplementary materialThe online version of this article (10.1186/s12864-018-4439-x) contains supplementary material, which is available to authorized users.
Due to major breakthroughs in sequencing technologies throughout the last decades, the time and cost per sequencing experiment have reduced drastically, overcoming the data generation barrier during the early genomic era. Such a shift has encouraged the scientific community to develop new computational methods that are able to compare large genomic sequences, thus enabling large-scale studies of genome evolution. The field of comparative genomics has proven itself invaluable for studying the evolutionary mechanisms and the forces driving genome evolution. In this line, a full genome comparison study between 2 species requires a quadratic number of comparisons in terms of the number of sequences (around 400 chromosome comparisons in the case of mammalian genomes); however, when studying conserved syntenies or evolutionary rearrangements, many sequence comparisons can be skipped for not all will contain significant signals. Subsequently, the scientific community has developed fast heuristics to perform multiple pairwise comparisons between large sequences to determine whether significant sets of conserved similarities exist. The data generation problem is no longer an issue, yet the limitations have shifted toward the analysis of such massive data. Therefore, we present XCout, a Web-based visual analytics application for multiple genome comparisons designed to improve the analysis of large-scale evolutionary studies using novel techniques in Web visualization. XCout enables to work on hundreds of comparisons at once, thus reducing the time of the analysis by identifying significant signals between chromosomes across multiple species. Among others, XCout introduces several techniques to aid in the analysis of large-scale genome rearrangements, particularly (1) an interactive heatmap interface to display comparisons using automatic color scales based on similarity thresholds to ease detection at first sight, (2) an overlay system to detect individual signal contributions between chromosomes, (3) a tracking tool to trace conserved blocks across different species to perform evolutionary studies, and (4) a search engine to search annotations throughout different species.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.