In the Quadratic Assignment Problem (QAP), n units (usually departments , machines, or electronic components) must be assigned to n locations given the distance between the locations and the flow between the units. The goal is to find the assignment that minimizes the sum of the products of distance traveled and flow between units. The QAP is a combinatorial problem difficult to solve to optimality even for problems where n is relatively small (e.g., n = 30). In this paper, we develop a parallel tabu search algorithm to solve the QAP and leverage the compute capabilities of current GPUs. The single instruction multiple data (SIMD) algorithm is implemented on the Stampede cluster hosted by the Texas Advanced Computing Center (TACC) at the University of Texas at Austin. We enhance our implementation by exploiting the dynamic parallelism made available in the Nvidia Kepler high performance computing architecture. On a series of experiments on the well-known QAPLIB data sets, our algorithm produces solutions that are as good as the best known ones posted in QAPLIB. The worst case percentage of accuracy we obtained was 0.83%. Given the applicability of QAP, our algorithm has very good potential to accelerate scholarly research in Engineering, in the fields of Operations Research and design of electronic devices. To the best of our knowledge, this work is the first to successfully parallelize the tabu search metaheuristic to solve the QAP with the recency-based feature, implemented serially in [10]. Our work is also the first to exploit GPU dynamic parallelism in a tabu search metaheuristic to solve the QAP.