Resumen -En este trabajo es presentado un proceso de diseño y construcción de un software que reúne las herramientas necesarias para una investigación bioinformática sobre datos RNA-Seq. A partir del problema se propone un flujo de trabajo que representa todo un proceso que pasa por las principales etapas de tratamiento de datos trascriptómicos y de secuenciación de nueva generación, como lo son preprocesamiento, mapeo, ensamblaje, anotación y expresión diferencial; Desarrollando así una plataforma de trabajo integral y amigable para el usuario que facilite el trabajo de los investigadores en el área de transcriptómica. Disponibilidad: http://tinyurl.com/Download-RNA-Seq-UD Palabras Clavebioinformática; transcriptómica; RNA-Seq; Próxima generación de secuenciación.Abstract -In this work is presented a process of design and building of a software that gather the tools required for a RNA-Seq data bioinformatics research. From the proposed a workflow is proposed that represents all a process that goes through the main stages of the treatment of transcriptomic and next generation sequencing data, such as preprocessing, mapping, assembly, annotation, and differential expression; Creating a platform of integral works and user-friendly that facilitates the work of researchers in the transcriptomic area. Availability: http://tinyurl.com/Download-RNA-Seq-UD
Internet of things (IoT) and artificial intelligence provide more and more solutions to the exercise of capturing data effectively, taking them through processing and analysis stages to extract valuable information. Currently, technological tools are applied to counteract incidents in motorcycle driving, whether they are part of the same vehicle or are externally involved in the environment. Incidents in motorcycle driving are increasing due to the demand for the acquisition of these vehicles, which makes it important to generate an approach towards reducing the risk of road accidents based on the analysis of dynamic behavior while driving. The development of this research began with the detection and storage of data associated with the dynamic acceleration variable of a motorcycle while driving, this with the help of a 3-axis accelerometer sensor generating a dataset, which was processed and analyzed for later be taken by three predictive classification models based on machine learning which were decision trees, K-Nearest neighbors and random forests. The performance of each model was evaluated in the task of better classifying the level of accident risk, concerning the driving style based on certain levels of acceleration. The random forest model showed a slightly better performance compared to that shown by the other two models, with 97.24% accuracy and recall, 97.16% precision and 97.17% F1 score.
En este artículo se compara el desempeño de una máquina de vectores de soporte de mínimos cuadrados multi-clase (multi-class Least Square Support Vector Machine mc-LSSVM) frente a un clasificador por regresión logística multi-clase, ante el problema del reconocimiento de dígitos numéricos (0-9) escritos a mano. Para desarrollar la comparación se usó un set de datos compuesto por 5000 imágenes de dígitos numéricos escritos a mano (500 imágenes por cada número del 0-9), cada imagen de 20 x 20 pixeles. La entrada a cada uno de los sistemas evaluados fueron vectores de dimensión 400, correspondientes a cada imagen (no se realizó extracción de características). Ambos clasificadores utilizan la estrategia Uno contra todos (OneVsAll) para habilitar la multi-clasificación y una función de validación cruzada aleatoria para el proceso de minimización de la función de costo. Las métricas de comparación fueron la precisión y el tiempo de entrenamiento bajo las mismas condiciones computacionales. Ambas técnicas evaluadas presentaron una precisión superior al 95 %, siendo LS-SVM ligeramente más precisa. Sin embargo, en el costo computacional sí se encontró una diferencia notoria: LS-SVM requiere un tiempo de entrenamiento 16,42 % inferior al requerido por el modelo basado en regresión logística bajos las mismas condiciones computacionales.
The k-mers processing techniques based on partitioning of the data set on the disk using minimizer-type seeds have led to a significant reduction in memory requirements; however, it has added processes (search and distribution of super k-mers) that can be intensive given the large volume of data. This paper presents a massive parallel processing model in order to enable the efficient use of heterogeneous computation to accelerate the search of super k-mers based on seeds (minimizers or signatures). The model includes three main contributions: a new data structure called CISK for representing the super k-mers, their minimizers and two massive parallelization patterns in an indexed and compact way: one for obtaining the canonical m-mers of a set of reads and another for searching for super k-mers based on minimizers. The model was implemented through two OpenCL kernels. The evaluation of the kernels shows favorable results in terms of execution times and memory requirements to use the model for constructing heterogeneous solutions with simultaneous execution (workload distribution), which perform co-processing using the current search methods of super k -mers on the CPU and the methods presented herein on GPU. The model implementation code is available in the repository: https://github.com/BioinfUD/K-mersCL.
Abstract:In this paper an assessment of several de-novo genomic assembler tools based on de Bruijn graph is made, with the purpose to measure the impact of the use of disk partitioning techniques regarding the computational requirements and generate a framework for bioinformatics researchers to let them identify advantages, disadvantages, bottlenecks and challenges of the assemblers using those techniques.Assessed assemblers using disk partitioning techniques were: Minia and EPGA, the assessed assemblers that do not use disk partitioning were: ABySS and SOAPDenovo2. The parameters measured were the following: occupied space in RAM, processing time, parallelization and disk read and write access. A dataset was used with 36,504,800 short reads corresponding to 14th human chromosome. The assessment was made for two kmers size: 31 and 55. The results obtained were the following: The tools based on disk partitioning techniques showed the less RAM use. The tools with more I/O transfer intensity were the ones using disk partitioning techniques. The techniques that achieved more parallelization were the ones using disk partitioning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.