In order to achieve exascale performance, all aspects of applications and system software need to be analysed and potentially improved. The EU FP7 project "Collaborative Research into Exascale Systemware, Tools & Applications" (CRESTA) uses co-design of advanced simulation applications and system software as well as related development tools as a key element in its approach towards exascale. In this paper we present first results of a co-design activity using the highly scalable application NEK5000. We have analysed the communication structure of NEK5000 and propose new, optimised collective communication operations that will allow to improve the performance of NEK5000 and to prepare it for the use on several millions of cores available in future HPC systems. The latency-optimised communication operations can also be beneficial in other contexts, for instance we expect them to become an important building block for a runtime-system providing dynamic load balancing, also under development within CRESTA.