Elliptic curve cryptography (ECC) is an excellent candidate for secure embedded multimedia applications due to its small key size and high security protection. The performance profiling of the ECC implementation, such as execution time and data cache stalls, on TriMedia TM1300 and Intel Pentium 4 is conducted in this research. Based on this study, we identify the main bottlenecks of the EEC implementation, and propose some favorable micro-architecture for this application. Moreover, several integer multiplication schemes are presented for the TM1300 processor for performance enhancement. In particular, the FIR-based multiplication is built with the special FIR instruction provided by TM1300. The performance improvement of the proposed schemes is reported and discussed. Overall, we aim at providing a good understanding of the system architecture of secure embedded multimedia applications, hardware and software cryptography implementation with ECC as an example.