Parallel computing is becoming increasingly accessible through advancement in microprocessors and networking technologies. It is found that the performances of the clusters do not match with their promises, although they are built on the most advanced components. Much effort has been devoted to address the software overhead problem in the past, which is known as the major hindrance in achieving high performance. This thesis shows that having a low-latency communication system does not guarantee high performance, as there are other communication issues that have not been addressed by the use of low-latency communication, such as contention, communication patterns and scheduling of communication events. The development of an efficient parallel application depends upon a realistic prediction of application behavior and the ability to explain the performance characteristics of an application on a parallel system; this requires in-depth understanding of both the application and the architecture characteristics.This dissertation proposes the use of a realistic communication model to guide the performance understanding and the algorithm design processes, which are the keys to achieve high performance. The model includes a collection of performance parameters which correspond to essential features in the communication architecture, and a collection of benchmark methodologies that quantify these performance features. This model can be used as a framework for programmers to conduct performance studies on various communication issues. We use this framework to examine the performance characteristics of a lightweight messaging system, and show that the model can be effectively used as an evaluation tool as well as an emulating tool.This thesis explores the congestion problem with the support of the communication model. Through modeling studies and experimental evaluations, we examine how different buffering mechanisms interact with a Go-Back-N reliable protocol, and show that the buffering mechanism dominates the congestion behavior of the communication systems. In the performance studies, we find that the timeout and flow control settings are the prevailing factors that interact with the buffering mechanism to determine how the congestion evolves. As our model is derived on a resource-centric view of how data move across the communication system, we use it to show that with careful design of communication schedules, we can achieve efficient communication as well as prevent congestion. We have developed a complete exchange algorithm, the Synchronous Shuffle Exchange, which is an optimal algorithm on the non-blocking network. To avoid congestion loss caused by the non-deterministic delays in communication events, a global congestion control scheme is introduced. This scheme uses a global windowing concept to coordinate all participating nodes to monitor and regulate the traffic load, which effectively avoids congestion loss and maintains sufficient throughput to maximize the performance.In summary, this thesis shows that the ...