The use of parallel computing systems frequently represents the only scalable way to solve HPC (High performance Computing) problems in reasonable execution times. The current trend in high performance computing platforms is to include in the same machine several parallel devices, of different type and architectures, and to interconnect them to form highly parallel and heterogeneous distributed systems. Programming efficient and portable parallel applications that can really exploit these systems, imposes specific and complex challenges to the programmers. A programmer must be proficient in distributed-memory communication tools or layers, shared-memory programming models, and specific programming models for the available co-processors, in order to create hybrid programs that will exploit all the machine capabilities. Moreover, she also has to deal with the proper workload distribution among the different nodes and devices, assigning to each one an amount of workload related to their computation power and features. Nowadays, all these issues should be solved by the programmer, making the programming of heterogeneous platforms an actual challenge. This PhD. Thesis addresses several main problems related to the parallel programming for highly heterogeneous and distributed systems. It first tackles problems to allow the developing of efficient coordination codes, portable across different kind of devices, accelerators, and architectures. Then, it also targets problems related to the data communication and partition issues concerning the use of devices in distributed-memory systems. In this dissertation we introduce abstractions, mechanisms, and methods to solve many of these problems. We also discuss their practical application to develop research prototypes and actual programming tools. Experimental works conducted using these tools validates the applicability of the proposed techniques and the portability, efficiency, and versatility of the programs that can be obtained.