BACKGROUND: Long read sequencing technology is becoming increasingly popular for Precision Medicine applications like variant calling from Whole Genome Sequencing (WGS) and for metagenomics applications like microbial abundance estimation. Minimap2 is the state-of-the-art aligner and mapper used by the leading long read sequencing technologies, today. However, Minimap2 is very slow for long noisy reads. 60-70% of the run-time on a CPU comes from the highly sequential chaining step in Minimap2. Most Point-of-Care computational workflows in long read sequencing use Graphics Processing Units (GPUs). We present minimap2-accelerated (mm2-ax), a heterogeneous design for sequence mapping and alignment where the compute intensive chaining step of minimap2 is sped up on the GPU and demonstrate its time and cost benefits. RESULTS: We extract better intra-read parallelism from chaining without loosing mapping accuracy by forward transforming Minimap2's chaining algorithm . Further, we utilize the high memory available on modern cloud instances for better performance on the GPU by converting a sparse vector which defines the chaining workload to a dense one in order to optimize for better arithmetic intensity (more operations per byte of data fetched from high-latency global memory) on the GPU. We also optimize for better workload balancing, data locality and minimal branch divergence on the GPU. We show mm2-ax on an NVIDIA A100 GPU improves the chaining step with 12.6-5X Speedup and 9.44-3.77X Speedup:Costup over the fastest version of Minimap2, mm2-fast, benchmarked on a single Google Cloud Platform instance of 30 SIMD cores (Intel Cascade Lake with AVX-512).
Background Long read sequencing technology is becoming increasingly popular for Precision Medicine applications like variant calling from Whole Genome Sequencing (WGS) and for metagenomics applications like microbial abundance estimation. Minimap2 is the state-of-the-art aligner and mapper used by the leading long read sequencing technologies, today. However, Minimap2 is very slow for long noisy reads. ∼60-70% of the run-time on a CPU comes from the highly sequential chaining step in Minimap2. On the other hand, most Point-of-Care computational workflows in long read sequencing use Graphics Processing Units (GPUs). We present minimap2-accelerated ( mm2-ax ), a heterogeneous design for sequence mapping and alignment where the compute intensive chaining step of minimap2 is sped up on the GPU and demonstrate its time and cost benefits. Results We extract better intra-read parallelism from chaining without loosing mapping accuracy by forward transforming Minimap2’s chaining algorithm . Further, we utilize the high memory available on modern cloud instances for better performance on the GPU by converting a sparse vector which defines the chaining workload to a dense one in order to optimize for better arithmetic intensity (more operations per byte of data fetched from high-latency global memory) on the GPU. We also optimize for better workload balancing, data locality and minimal branch divergence on the GPU. We show mm2-ax on an NVIDIA A100 GPU improves the chaining step with 12.6 - 5X speedup and 9.44 - 3.77X speedup: costup over the fastest version of Minimap2, mm2-fast , benchmarked on a single Google Cloud Platform instance of 30 SIMD cores. Conclusions mm2-ax is minimap2 sped-up on GPU without losing mapping accuracy. mm2-ax executable is made available at: https://doi.org/10.5281/zenodo.6374533 .
Long read sequencing technology is becoming increasingly popular for Precision Medicine applications like Whole Genome Sequencing (WGS) and microbial abundance estimation. Minimap2 is the state-of-the-art aligner and mapper used by the leading long read sequencing technologies, today. However, Minimap2 is very slow for long noisy reads. ∼60-70% of the run-time on a CPU comes from the highly sequential chaining step in Minimap2. On the other hand, most Point-of-Care computational workflows in long read sequencing use Graphics Processing Units (GPUs). We present minimap2-accelerated (mm2-ax), a heterogeneous design for sequence mapping and alignment where minimap2’s compute intensive chaining step is sped up on the GPU and demonstrate its time and cost benefits. We extract better intra-read parallelism from chaining without loosing mapping accuracy by forward transforming Minimap2’s chaining algorithm . Further, we better utilize the high memory available on modern cloud instances apart from better workload balancing, data locality and minimal branch divergence on the GPU. We show mm2-ax on an NVIDIA A100 GPU improves the chaining step with 12.6 - 5X speedup and 9.44 - 3.77X speedup : costup over the fastest version of Minimap2, mm2-fast, benchmarked on a single Google Cloud Platform instance of 30 SIMD cores.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.