In this paper, we propose a joint algorithm/code-level optimization scheme to make it feasible to perform real-time H.264/AVC video decoding software on ARM-based platform for mobile multimedia applications. In the algorithm-level optimization, we propose various techniques like fast interpolation scheme, zero-skipping technique for texture decoding, fast boundary strength decision for in-loop filtering, and pattern matching algorithm for CAVLD. In the code-level optimization, we propose the design techniques on minimizing memory access and branch times. The experimental result shows that we have reduced the complexity of H.264 video decoder up to 93% as compared to the reference software JM9.7. The optimized H.264 video decoder can achieve the QCIF@30Hz video decoding on an ARM9 processor when operating at 120MHz clock.