Mobile GPU applications usually constrain by the real-time requirement. However, FLOPS of mobile GPU is limited by the size and power supply of the SoC systems. Same to desktop GPUs, the mobile GPU consists of an on-chip memory hierarchy, and proper usage of memory hierarchy accelerates mobile GPU applications such as Discrete Wavelet Transform (DWT) to satisfy the real-time requirement. In this paper, by taking advantage of GPU shared memory in Tegra K1, a mobile GPU from Nvidia, we develop Bank Conflict Free Shared Memory Parallel DWT for mobile GPU applications. Computational results show that, with the display resolution of 640 × 350 (EGA), Bank Conflict Free Shared Memory Parallel DWT is significantly faster than SoC CPU-based DWT. Computational results also show that, with the display resolution of 320 × 200 (CGA), 640 × 480 (VGA), 800 × 600 (SVGA) and 1024 × 768 (XGA), Bank Conflict Free Shared Memory Parallel DWT can generally satisfy the real-time requirement.