Autonomous vehicles, smart manufacturing, heterogeneous systems, and new high-performance embedded computing (HPEC) applications can benefit from the reuse of code coming from the high-performance computing world. However, unlike for HPC, energy efficiency is critical in embedded systems, especially when running on battery power. Code base from HPC mostly relies on the message passing interface (MPI) message passing runtime to deal with distributed systems. MPI has been designed primarily for performance and not for energy efficiency. One drawback is the way messages are received, in an energy-consuming busy-wait fashion. In this work, we study a simple approach in which receiving processes are put to sleep instead of constantly polling. We implement this strategy at the user level to be transparent to the MPI runtime and the application. Experiments are conducted with OpenMPI, MPICH, and MPC, using a video processing application and a software-distributed shared memory system deployed over two heterogeneous platforms, including the Christmann RECS|Box Antares Microserver. Results show significant energy savings. In some particular cases involving process colocation, we also observe better performance using our strategy which can be explained by a better sharing of the computing resource.