The Internet-of-Things (IoT) revolution has shaped a new application domain where low-power RISC architectures constitute the standard computational backbone. The current de-facto design practice for such architectures is to extend the ISA and the corresponding microarchitecture with custom instructions to efficiently manage the complex tasks imposed by IoT applications, i.e., augmented reality, artificial intelligence and autonomous driving, within narrow energy and area budgets. However, the new IoT application domain also offers a unique opportunity to revisit and optimize the RISC microarchitectural design flow from a more communication- and memory-centric viewpoint. This manuscript critically explores and optimizes the design of a RISC CPU front-end for IoT delivering a two-fold objective: (i) provide an optimized CPU microarchitecture; and (ii) present a set of three design guidelines to steer the implementation of IoT CPUs. The exploration sits on a newly proposed Systems-on-Chip (SoC) and RISC CPU implementing the RISC-V/IMF ISA and accounting for area, timing, and performance design metrics. Such SoC offers a reference design to evaluate pros and cons of different microarchitectural solutions. A wide combination of microarchitectures considering different branch prediction schemes, cache design architectures and on-chip bus solutions have been evaluated. The entire exploration is focused on the FPGA-based implementation due to the renewed interest for this technology demonstrated by both the research community and companies. We note that ARM launched the DesignStart FPGA program to make available the Cortex-M microcontrollers on Xilinx FPGAs in the form of IP blocks.