Traditional design-and optimization techniques for embedded devices apply local transformations of source-code to maximize the performance and minimize the power consumption. Unfortunately, such transformations cannot adequately deal with the highly dynamic nature of the today's multimedia applications as they do not exploit application specific knowledge. We decided to go one step back in the design process. Starting from the original UML (Unified Modeling Language) model of the source code, we transform the UML model first before refining it into executable code. In this paper we present (i) the transformation of various UML models, (ii) a fast technique for the estimation of the high-level cost parameters that steer our transformations, and (iii) experiments based on three case-studies (a Snake game, a Tetris game and a 3D rendering engine) that show that our transformations can result in factors improvement in memory footprint and/or execution time with respect to the original model.