With the advances in technologies of sequencing and assembly, draft sequences of more and more genomes are available. However, there commonly exist gaps in these draft sequences which influence various downstream analysis of biological studies. Gap filling methods can shorten the length of gaps and improve the completion of these draft sequences of genomes. Although some gap filling tools have been developed, their effectiveness and accuracy need to be improved. In this study, we develop a novel tool, called GapReduce, which can fill the gaps using the paired reads. For a gap, GapReduce selects the reads whose mate reads are aligned on the left or the right flanking region, and partitions the reads to two sets. Then GapReduce adopts different values and frequency thresholds to iteratively construct De Bruijn graphs, which are used for finding the correct path to fill the gap. For overcoming the branching problems caused by repetitive regions and sequencing errors in the procedure of path selection, GapReduce designs a novel approach that simultaneously considers frequency and distribution of paired reads based on the partitioned read sets. We compare the performance of GapReduce with current popular gap filling tools. The experimental results demonstrate that GapReduce can produce satisfactory gap filling results, especially for long insert size datasets. GapReduce is publicly available for downloading at https://github.com/bioinfomaticsCSU/GapReduce.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.