System-on-chip market relies on implementing multimedia products as embedded software modules on re-usable architecture platforms. The efficient implementation of the Jpeg2000 encoder engine is still challenging HW and SW developers with its highly complex computational kernel. While several hardwired Jpeg2000 enconding modules exist, the efficient programming of Jpeg2000 on re-usable embedded highperformance cores is still an open issue. We performed an exhaustive analysis of the attainable execution speedup when specialized SW is run on different architectures built upon a multimedia-oriented VLIW processor core, demonstrating that the compression effort can be reduced by more than 50% if a SIMD-extended architecture is adopted, and by 80% when the code is optimized for a multi-core architecture.