In the early 2000s, the superscalar CPU paradigm reached the point of diminishing returns mainly due to power requirements and overheating concerns. Faced with a constant demand for performance, hardware developers were in need of new ways to efficiently use the ever increasing transistor count predicted by Moore's law. The Chip MultiProcessors (CMPs) came as a natural solution to the power wall: several less complex and significantly less "power hungry" cores integrated on a single chip. In almost all ICT segments today, from High Performance Computing (HPC) to embedded devices, CMPs have become the architecture of choice. With this wide adoption of CMPs, software developers need to use parallel programming to fully exploit this architecture. Although parallelization can maximize the performance and energy efficiency of applications running on CMPs, it also comes with its own set of challenges. Among these, inherent management overheads that can account for sub-linear speedups and can increase the energy consumption of executions. Because of rising concerns for energy cost and battery life, much research and development today focuses on reducing power requirements and saving energy.In this thesis, we investigate how parallel programming can be used to improve the energy efficiency of applications running on CMP systems. We focus on a programming paradigm called Task Based Programming (TBP). The base concept of the TBP model is that the programmer focuses on identifying and annotating pieces of code (tasks) which can be executed concurrently with other tasks. An important result of our work is an increased understanding of how computations, parallelization and energy consumption relate when executing on CMP systems.Working in this direction, we use a simulation framework to allow for increased flexibility in design space exploration and noninvasive measurements. Unfortunately, the performance overhead of simulation is significant: simulating a parallel application can be 10000x slower than executing it on real hardware. In the first part of our research, we took it upon ourselves to try to solve this issue. We investigate the challenges of employing a sampling based technique to take advantage of the periodic behavior in TBP parallel applications. Our proposal is a simple 3-phase methodology that identifies only a small number of representative execution samples to simulate thus reducing the overall simulation time.In the second part of our work, we look at parallelization as a mean to save energy on CMP platforms. We test and compare two TBP libraries, Wool and Intel TBB, focusing on the behavior of some basic TBP parallelization operations like task spawning, task synchronization and task stealing. We investigate the energy footprint of these parallelization overheads and the effect it has on the energy-efficiency of the executing system. We have identified that failed task steals amount for the largest overhead. To reduce their impact and improve our system's energy efficiency, we devised a new occupa...