Several decades have passed since the development of the revolutionary DNA sequencing method by Frederick Sanger and his colleagues. After the Human Genome Project, the time interval between sequencing technologies began to shrink, while the volume of scientific knowledge continued to grow exponentially. Following Sanger sequencing, considered as the first generation, new generations of DNA sequencing were consistently introduced into practice. Advances in next generation sequencing (NGS) technologies have contributed significantly to this trend by reducing costs and generating massive sequencing data. To date, there are three generations of sequencing technologies. Second generation se-quencing, which is currently the most commonly used NGS technology, consists of library preparation, amplification and sequencing steps, while in third generation sequencing, individual nucleic acids are sequenced directly to avoid bias and have higher throughput. The development of new generations of sequencing has made it possible to overcome the limitations of traditional DNA sequencing methods and has found application in a wide range of projects in molecular biology. On the other hand, with the development of next generation technologies, many technical problems arise that need to be deeply analyzed and solved. Each generation and sequencing platform, due to its methodological approach, has specific advantages and disadvantages that determine suitability for certain applications. Thus, the assessment of these characteristics, limitations and potential applications helps to shape the directions for further research on sequencing technologies.
Counting the occurrence of different k-mers often causes problems of genome assembly. Analysis of the frequency distribution of k-mers makes it possible to find assembly errors in already formed contigs. Currently, in connection with the development of instrumentation for genetic analysis, there is an urgent need to develop methods for assessing the quality of genomic assembly. Such techniques will make it possible to assess the reliability of genetic analysis in existing and newly developed devices. In this work, based on the analysis of various software tools, programs were selected to assess the quality of genomic assembly in parallel sequencing sequencers. Using the selected programs, the data obtained on the domestic sequencer for parallel sequencing Nanofor SPS were processed. Based on the results of processing these data, the quality of the genomic assembly was assessed by the method of analysis of k-mers and recommendations were given for improving the hardware and software of the Nanofor SPS device.
The success of genomic sequencing is impossible without the development of information technologies and mathematical methods for data processing to establish various features in the analyzed objects (nucleic acids) and trends in their changes. The volume of experimental data in the research of the genome has grown significantly, and new methods and algorithms are required for their processing.
The primary stage of processing the data of devices for genomic parallel sequencing is the evaluation of the parameters of images obtained from video cameras in the form of electrical signals. The next stage of processing is the construction of a sequence of nucleotides according to algorithms that depend on the principle of operation of the device for sequencing nucleic acids. When performing this stage, algorithms for evaluating quality indicators for all individual readings (reads) are important. One of the ways to assess quality is to use algorithms based on the k-measure analysis methodology. The calculation of the number of occurrences of k-measures during the experiment on the parallel sequencing system makes it possible to assess the reliability of the analysis. In this article, algorithms for processing genetic analyzer data are considered.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.