For the best results in quantitative polymerase chain reaction (qPCR) experiments, it is essential to design high-quality primers considering a multitude of constraints and the purpose of experiments. The constraints include many filtering constraints, homology test on a huge number of off-target sequences, the same constraints for batch design of primers, exon spanning, and avoiding single nucleotide polymorphism (SNP) sites. The target sequences are either in database or given as FASTA sequences, and the experiment is for amplifying either each target sequence with each corresponding primer pairs designed under the same constraints or all target sequences with a single pair of primers. Many websites have been proposed, but none of them including our previous MRPrimerW fulfilled all the above features. Here, we describe the MRPrimerW2, the update version of MRPrimerW, which fulfils all the features by maintaining the advantages of MRPrimerW in terms of the kinds and sizes of databases for valid primers and the number of search modes. To achieve it, we exploited GPU computation and a disk-based key-value store using PCIe SSD. The complete set of 3 509 244 680 valid primers of MRPrimerW2 covers 99% of nine important organisms in an exhaustive manner. Free access: http://MRPrimerW2.com
Background
Design of valid high-quality primers is essential for qPCR experiments. MRPrimer is a powerful pipeline based on MapReduce that combines both primer design for target sequences and homology tests on off-target sequences. It takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB. Due to the effectiveness of primers designed by MRPrimer in qPCR analysis, it has been widely used for developing many online design tools and building primer databases. However, the computational speed of MRPrimer is too slow to deal with the sizes of sequence DBs growing exponentially and thus must be improved.
Results
We develop a fast GPU-based pipeline for primer design (GPrimer) that takes the same input and returns the same output with MRPrimer. MRPrimer consists of a total of seven MapReduce steps, among which two steps are very time-consuming. GPrimer significantly improves the speed of those two steps by exploiting the computational power of GPUs. In particular, it designs data structures for coalesced memory access in GPU and workload balancing among GPU threads and copies the data structures between main memory and GPU memory in a streaming fashion. For human RefSeq DB, GPrimer achieves a speedup of 57 times for the entire steps and a speedup of 557 times for the most time-consuming step using a single machine of 4 GPUs, compared with MRPrimer running on a cluster of six machines.
Conclusions
We propose a GPU-based pipeline for primer design that takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB at once without an additional step using BLAST-like tools. The software is available at https://github.com/qhtjrmin/GPrimer.git.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.