In this paper, we present HPMA, a graphics processing unit (CPU) accelerated meta-genome sequence alignment algorithm for a collection of DNA sequences. This algorithm supports all-to-all pairwise local alignment on NVIDIA CPUs. HPMA builds on an CPU alignment algorithm that we developed earlier with the addition of a filter module. We designed and developed this new kernel function based on the suffix array data structure. The filter module improves performance by identifying a subset of sequences which meet a user-defined similarity threshold and should be considered for alignment. HPMA has the ability to balance the workload between CPU and CPU. HPMA allows us to preprocess massively large metagenomes in a reasonable amount of time in response to increasing speed of NCS sequencers. The performance of HPMA has been evaluated on a cluster of Kepler-based Tesla K20 CPUs using a variety of short DNA sequence datasets.We evaluate HPMA thoroughly with four test datasets. The first two test sets are comprised of 10 simulated datasets where read length varies from 72 to 750 base-pairs. The third test set is designed to allow a comparison with published results for CSWABE, a competing CPU alignment tool. The fourth test set is an actual metagenome of over 2 million sequences with an average length of 270 bp. We utilized a cluster of NVIDIA-K20 CPUs in the Stampede supercom puter at the Texas Advanced Computing Center (Austin, TX, USA). When running on a cluster of 10 NVIDIA K20 CPUs, HPMA is able to align 2 million simulated metagenome sequences of length 300 bp in 160 seconds. In the case of real metagenomic data, HPMA is able to align 2,038,516 sequences with an average length of 270 bp in 60 seconds.