Shared-memory and message-passing are two op- posite models to develop parallel computations. The shared- memory model, adopted by existing frameworks such as OpenMP, represents a de-facto standard on multi-/many-core architectures. However, message-passing deserves to be studied for its inherent properties in terms of portability and flexibility as well as for its better ease of debugging. Achieving good performance from the use of messages in shared-memory architectures requires an efficient implementation of the run-time support. This paper investigates the definition of a delegation mechanism on multi- threaded architectures able to: (i) overlap communications with calculation phases; (ii) parallelize distribution and collective oper- ations. Our ideas have been exemplified using two parallel bench- marks on the Intel Phi, showing that in these applications our message-passing support outperforms MPI and reaches similar performance compared to standard OpenMP implementations