Abstract-In this paper we present an extended and optimized version of a smart Direct Memory Access (sDMA) controller supporting different on-the-fly protocol stack acceleration concepts for Long Term Evolution (LTE) mobile terminals. In addition to the downlink processing, we analyse different on-the-fly hardware acceleration modes for the uplink protocol stack processing in layer 2 (L2). Moreover, the system performance is further improved by adopting parallelization methods. The efficiency of on-the-fly hardware acceleration is proved by comparing the transport block processing times to those achieved with a conventional hardware accelerator. Therefore, a cycle approximate virtual prototype of a state-of-the-art mobile phone platform based on an ARM1176 processor is simulated at LTE-Advanced data rates of up to 1 Gbit/s. In uplink direction, we are able to reduce the complexity in the sDMA controller and simultaneously improve the processing performance in the mobile platform. This is realized by intelligent hardware/software partitioning and an optimized descriptor format. Furthermore, a significant optimization (up to 13 %) of the system performance in a mobile device is achieved by adopting parallelized on-the-fly hardware acceleration modes. We show how the sDMA controller clearly outperforms the traditional approach by reaching speedups of up to 35 % and 66 % for the transport block processing times in uplink and downlink directions, respectively.