Since data rate in wireless communication systems has exponentially increased during the last decade, serious efforts are considered to fulfill their target requirements. Therefore, providing a satisfactory hardware to support high data rates while minimizing power consumption is a key design challenge for Long Term Evolution (LTE) mobile terminals. In this paper we introduce an optimized parallel software architecture for LTE mobile terminals using an energy aware scheduling and load balancing. We show that the proposed software architecture on single, dual, triple and quad-core hardware platforms leads to up to 39% energy savings. In addition, different hardware design options are investigated in order to minimize the average power consumption. The homogeneous multi-core with four cores processing the optimized LTE software saves about one quarter of the energy compared to a single-core running at higher clock frequency achieving the same data rate. Considering statistics for the mobile user behavior, the homogeneous multi-core with four cores proves to provide the minimum average power consumption compared to the other hardware architectures considered in this work.Index Terms-Energy aware scheduling, power saving, LTE protocol stack, multi-core mobile terminal, parallel embedded software, load balancing.