BackgroundHigh-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms.ResultsSeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming.ConclusionsSeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts.
Abstract. Bioinformatics tools are required to produce reliable, high quality data devoid of unwanted sequences in the preprocessing stage of current sequencing and EST projects. In this paper we describe SeqTrim, an algorithm designed to extract the insert sequence from any sequence read devoid of any foreign, contaminant or unwanted sequence, whatever the experimental process was. SeqTrim is easy to install and able to identify the sequence insert by removing low quality sequences, cloning vector, poly A or T tails, adaptors, and sequences that can be considered contaminants. It is easy to use and can be used as stand-alone application or as web page. The default parameters of the algorithm are best suited for most cases but a configuration file can be provided along with input sequences. SeqTrim admits several input and output formats (with and without quality values), which enables its inclusion in already or newly defined sequence processing workflows. SeqTrim is under continuous refinement due to collaboration between biologists and computer scientists which has succeed in correct dealing with most sequence cases and opens the possibility to include new capabilities to manage new kinds of bad sequences.
Current genomic analyses often require the managing and comparison of big data using desktop bioinformatic software that was not developed regarding multicore distribution. The task-farm SCBI MapReduce is intended to simplify the trivial parallelisation and distribution of new and legacy software and scripts for biologists who are interested in using computers but are not skilled programmers. In the case of legacy applications, there is no need of modification or rewriting the source code. It can be used from multicore workstations to heterogeneous grids. Tests have demonstrated that speed-up scales almost linearly and that distribution in small chunks increases it. It is also shown that SCBI MapReduce takes advantage of shared storage when necessary, is faulttolerant, allows for resuming aborted jobs, does not need special hardware or virtual machine support, and provides the same results than a parallelised, legacy software. The same is true for interrupted and relaunched jobs. As proof-of-concept, distribution of a compiled version of Blast+ in the SCBI Distributed Blast gem is given, indicating that other blast binaries can be used while maintaining the same SCBI Distributed Blast code. Therefore, SCBI MapReduce suits most parallelisation and distribution needs in, for example, gene and genome studies.
One of the main research areas of human-computer interaction is the study of the different ways in which users communicate or interact with the computer [1, 2]. Each interaction style offers its own way of organising system functionality, managing user inputs, and displaying information. Two main approaches can be considered in order to interact with modern devices: the conversational world and the model world. The former is sequential and based on text. The latter, the model world, uses graphics and metaphors [3], like "Windows, Icons, Menus and Pointers" (WIMP), to assist the user with an asynchronous and a free management of objects on the screen. Users can see and predict the behaviour of familiar objects through metaphors. They then follow their natural intuition to manipulate them, receiving immediate feedback. The success of this Direct
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.