With the increasing concerns over the personal privacy of mobile devices, biometrics algorithms plays an important role to enhance the security. As one of the most popular approaches, fingerprint verification as a personal identification interface is widely recognized and adopted by many commercial devices. However, its inherent computational complexity make the algorithm of fingerprint verification difficult to achieve high performance on mobile platforms, such a battery powered, size limited, and producing cost controlled device. In addition to the performance, energy efficiency is also of significant consideration of such a fingerprint verification system. In this paper, we present an energy efficient OpenCL based heterogeneous implementation of the fingerprint verification system on a commercial mobile platform, taking advantage of mobile CPUs and GPUs. We carefully analyze the workloads through system profiling to identify the parallelism then to partition the algorithm between the CPU and GPU . The experimental results show that our GPU implementation of DFT analysis achieves a 1.4X speedup and 36.87% energy reduction compared to the CPU only implementation in the mobile platform. This heterogenous implementation of the entire fingerprint verification system accomplishes 1.32X speedup and 16.70% energy superiority above the CPU only solution. To the best of the authors' knowledge, this work is the first published implementation of OpenCL based fingerprint verification system accelerated by mobile GPUs on a heterogeneous mobile device. We believe our mapping methodology of this fingerprint verification system can be generalized to map more similar applications onto heterogeneous mobile devices.