Software-Defined Radio (SDR) provides the flexibility to enable cost-effective multi-mode terminals. However, the growing complexity of the new communication standards, which need to be executed with the reduced energy budget required by battery-powered devices, is still challenging architects. Although Coarse Grain Array (CGA) -based processors extended with domain specific instructions are considered strong candidates to undertake both the high-performance and low power, the lack of efficient methodologies to derive optimal instances of such an architecture paradigm is still a major limitation. In this paper, an extensive energy-performance exploration of a CGA-based SDR processor is presented. This approach targets sufficient relative accuracy on the optimization metrics, which assures meaningful comparisons between different instances, while the absolute accuracy is relaxed and traded off against simulation time. The balance between the different sources of architectural parallelism, such as data and instruction level parallelism is crucial in order to achieve the required performance at minimum energy cost. Accordingly, the proposed method is used to select the optimal DLP-ILP combination required to run the symbol-based baseband processing of a 100 Mbps+ WLAN (Wireless Local Area Network) receiver in a CGAbased processor. As a result, a 4 × 4 array with four ways SIMD (Single Instruction, Multiple Data) extensions is shown to be the optimal instance, providing minimum energy consumption and real-time processing guarantees.