In this paper, we report on experiences with P 3 T +, a performance estimator for distributed and parallel programs which is used to examine at compile time the performance outcome of changes in code, problem and machine sizes, and target architectures. P 3 T + computes a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. It is unique in that it models programs, code transformations, and parallel and distributed architectures and derives a performance prediction based on all three of these elements. P 3 T + is the successor tool of P 3 T which computed a similar set of performance parameters, however, for parallel programs only. P 3 T + has been re-designed and re-implemented from scratch and goes beyond P 3 T by extending the class of programs that can be handled and by employing several novel estimation methods (symbolic analysis, simulation, pre-measured kernel codes, etc.).The core part of this paper reports on the evaluation of P 3 T + to demonstrate both accuracy and usefulness of this tool for realistic kernel codes taken from real-world applications (pricing of financial derivatives and quantum mechanical calculations of solids).