Resumen -En este trabajo es presentado un proceso de diseño y construcción de un software que reúne las herramientas necesarias para una investigación bioinformática sobre datos RNA-Seq. A partir del problema se propone un flujo de trabajo que representa todo un proceso que pasa por las principales etapas de tratamiento de datos trascriptómicos y de secuenciación de nueva generación, como lo son preprocesamiento, mapeo, ensamblaje, anotación y expresión diferencial; Desarrollando así una plataforma de trabajo integral y amigable para el usuario que facilite el trabajo de los investigadores en el área de transcriptómica. Disponibilidad: http://tinyurl.com/Download-RNA-Seq-UD Palabras Clavebioinformática; transcriptómica; RNA-Seq; Próxima generación de secuenciación.Abstract -In this work is presented a process of design and building of a software that gather the tools required for a RNA-Seq data bioinformatics research. From the proposed a workflow is proposed that represents all a process that goes through the main stages of the treatment of transcriptomic and next generation sequencing data, such as preprocessing, mapping, assembly, annotation, and differential expression; Creating a platform of integral works and user-friendly that facilitates the work of researchers in the transcriptomic area. Availability: http://tinyurl.com/Download-RNA-Seq-UD
The k-mers processing techniques based on partitioning of the data set on the disk using minimizer-type seeds have led to a significant reduction in memory requirements; however, it has added processes (search and distribution of super k-mers) that can be intensive given the large volume of data. This paper presents a massive parallel processing model in order to enable the efficient use of heterogeneous computation to accelerate the search of super k-mers based on seeds (minimizers or signatures). The model includes three main contributions: a new data structure called CISK for representing the super k-mers, their minimizers and two massive parallelization patterns in an indexed and compact way: one for obtaining the canonical m-mers of a set of reads and another for searching for super k-mers based on minimizers. The model was implemented through two OpenCL kernels. The evaluation of the kernels shows favorable results in terms of execution times and memory requirements to use the model for constructing heterogeneous solutions with simultaneous execution (workload distribution), which perform co-processing using the current search methods of super k -mers on the CPU and the methods presented herein on GPU. The model implementation code is available in the repository: https://github.com/BioinfUD/K-mersCL.
Abstract:In this paper an assessment of several de-novo genomic assembler tools based on de Bruijn graph is made, with the purpose to measure the impact of the use of disk partitioning techniques regarding the computational requirements and generate a framework for bioinformatics researchers to let them identify advantages, disadvantages, bottlenecks and challenges of the assemblers using those techniques.Assessed assemblers using disk partitioning techniques were: Minia and EPGA, the assessed assemblers that do not use disk partitioning were: ABySS and SOAPDenovo2. The parameters measured were the following: occupied space in RAM, processing time, parallelization and disk read and write access. A dataset was used with 36,504,800 short reads corresponding to 14th human chromosome. The assessment was made for two kmers size: 31 and 55. The results obtained were the following: The tools based on disk partitioning techniques showed the less RAM use. The tools with more I/O transfer intensity were the ones using disk partitioning techniques. The techniques that achieved more parallelization were the ones using disk partitioning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.