While there have been considerable work in the last couple of years for architecting embedded chip multiprocessors, programming and compiler support required for them took relatively less attention. Our goal in this paper is to show that conventional compiler-directed code parallelization used in high performance computing is not very suitable for embedded chip multiprocessors where minimizing memory space requirements is an important issue. We propose and evaluate a novel memory-conscious loop parallelization strategy with the objective of minimizing the data memory requirements of processors. The proposed approach, which is formulated as a branch-and-bound problem, accomplishes its objective by being careful in selecting the loops to parallelize in a given loop nest.