Performance improvement by parallel execution depends on two factors: the potential parallelism of the application itself, and the optimal mapping of the application to the target architecture, which is usually very target specific. As a case study, we analyze the expected performance of parallel execution of an H.264 encoding algorithm, known as X264, on the Cell processor. Considering the communication architecture of the Cell processor, we parallelize the algorithm at the macro-block level. From the performance analysis, we discover the overhead factors of parallel execution and estimate the expected performance. Comparison with simulation results proves the accuracy and the usefulness of the proposed analysis method.