To make it simpler to experiment with the impact different configurations can have on the performance of a parallel cluter application, we developed the PATHS system. The PATHS system use a "wrapper" to provide a level of indirection to the actual run-time location of data making the data available from wherever threads or processes are located. A wrapper specify where data is located, how to get there, and which protocols to use. Wrappers are also used to add or modify methods accessing data. Wrappers are specified dynamically. A "path" is comprised of one or more wrappers. Sections of a path can be shared among two or more paths.By reconfiguring the LAM-MPI Allreduce operation we achieved a performance gain of 1.52, 1.79, and 1.98 on respectively two, four and eight-way clusters. We also measured the performance of the unmodified Allreduce operation when using two clusters interconnected by a WAN link with 30-50ms roundtrip latency. Configurations which resulted in multiple messages being sent across the WAN did not add any significant performance penalty to the unmodified Allreduce operation for packet sizes up to 4KB. For larger packet sizes the Allreduce operation rapidly detoriated performancewise.To log and visualize the performance data we developed EventSpace, a configurable data collecting, management and observation system used for monitoring low-level synchronization and communication behavior of parallel applications on clusters and multi-clusters. Event collectors detect events, create virtual events by recording timestamped data about the events, and then store the virtual events to a virtual event space. Event scopes provide different views of the application, by combining and pre-processing the extracted virtual events. Online monitors are implemented as consumers using one or more event scopes. Event collectors, event scopes, and the virtual event space can be configured and mapped to the available resources to improve monitoring performance or reduce perturbation. Experiments demonstrate that a wind-tunnel application instrumented with event collectors, has insignificant slowdown due to data collection, and that monitors can reconfigure event scopes to trade-off between monitoring performance and perturbation. The visual views we generated allowed us to detect anomalous communication behavior, and detect load balance problems.