H.264 video decoder is a good choice for embedded instruments because of its higher compression ratio than MPEG2, as well as its higher requirements of run-time computational resource. Multi-core system is the future of the embedded processor design for its power efficiency and multi-thread parallelization, and can be used to fit well with the requirements for this decoder. To simulate and evaluate the performance of such application-specific multi-core systems effectively, a method based on the combination of TLM language (SystemC) and shared-memory parallel programming model (OpenMP ) is given, and experiments show that it can effectively simulate the system in a short time and more importantly, it can be used to help analyze the efficiency of each task-parallelization strategy. After optimization, the speedup ratio for each slice decoding can get about 3.06 on average under 4-core multi-core systems.