The sequencing market has increased steadily over the last years, with different approaches to read DNA information, prone to different types of errors. Multiple studies demonstrated the impact of sequencing errors on different applications of Next Generation Sequencing (NGS), making error correction a fundamental initial step. Different methods in the literature use different approaches and fit different types of problems. We analysed a number of 50 methods divided into five main approaches (k-spectrum, suffix arrays, multiple sequence alignment, read clustering and probabilistic models). They are not published as a part of a suite (stand-alone) and target raw, unprocessed data without an existing reference genome (de Novo). These correctors handle one or more sequencing 1 Correspondence to: asalic@posgrado.upv.es 1 technologies using the same or different approaches. They face general challenges (sometimes with specific traits for specific technologies) such as repetitive regions, uncalled bases and ploidy. Even assessing their performance is a challenge in itself because of the approach taken by various authors, the unknown factor (de Novo) and the behaviour of the third party tools employed in the benchmarks. This work aims at helping the researcher in the field to advance the state-of-the-art, the educator to have a brief but comprehensive companion and the bioinformatician to choose the right tool for the right job.The Next Generation Sequencing (NGS) appeared in 2005 and since then its market has increased steadily, with various technologies being developed. The NGS has evolved faster than the the Moore's law in Computer Science, allowing us to sequence and assemble large genomes like the Loblolly Pine with 22 Gb 1 or the Norway Spruce with 20 Gb 2 for a reasonable cost in time and resources. However, there are many other species (e.g. the Amoeba Dubia with a 670 Gb estimated genome size 3 , 200x human genome's size) that are still challenging to assemble. The errors introduced by the sequencing process are one of the main reasons NGS data has to be corrected before any further use. Multiple studies have demonstrated the impact of sequencing errors on different applications of NGS, making error correction a fundamental initial step. [4][5][6][7] There are many error correction tools in the literature that cope with different technologies and error types. However, to our knowledge, there is no complete, objective review of the modern methods that could help researchers, educators and users at the same time. There are benchmarks summarizing a number of methods, but there is none extensively focusing on the implementation, features and the overall domain (including challenges). Our work synthesises 50 de Novo stand-alone error-correction software. The Supplementary Material includes the description of the approach used to search the literature along with the inclusion criteria.The article continues with the motivation (also containing a brief description of the sequencing technologies and various err...
In this paper we propose a distributed architecture to provide machine learning practitioners with a set of tools and cloud services that cover the whole machine learning development cycle: ranging from the models creation, training, validation and testing to the models serving as a service, sharing and publication. In such respect, the DEEP-Hybrid-DataCloud framework allows transparent access to existing e-Infrastructures, effectively exploiting distributed resources for the most compute-intensive tasks coming from the machine learning development cycle. Moreover, it provides scientists with a set of Cloud-oriented services to make their models publicly available, by adopting a serverless architecture and a DevOps approach, allowing an easy share, publish and deploy of the developed models. INDEX TERMS Cloud computing, computers and information processing, deep learning, distributed computing, machine learning, serverless architectures.
Data analysis of public transportation data in large cities is a challenging problem. Managing data ingestion, data storage, data quality enhancement, modelling and analysis requires intensive computing and a non-trivial amount of resources. In EUBra-BIGSEA (Europe-Brazil Collaboration of Big Data Scientic Research Through Cloud-Centric Applications), we address such problems in a comprehensive and integrated way. EUBra-BIGSEA provides a platform for building up data analytic workows on top of elastic cloud services without requiring skills related to either programming or cloud services. The approach combines cloud orchestration, Quality of Service and automatic parallelisation on a platform that includes a toolbox for implementing privacy guarantees and data quality enhancement as well as advanced services for sentiment analysis, trac jam estimation and trip recommendation based on estimated crowdedness. All developments are available under Open Source licenses (
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.