The identification of performance issues and the diagnosis of their root causes are time-consuming and complex tasks, especially in clustered environments. To simplify these tasks, researchers have been developing tools with built-in expertise for practitioners. However, various limitations exist in these tools that prevent their efficient usage in the performance testing of clusters (e.g. the need of manually analysing huge volumes of distributed results). In a previous work, we introduced a policy-based adaptive framework (PHOEBE) that automates the usage of diagnosis tools in the performance testing of clustered systems, in order to improve a tester's productivity, by decreasing the effort and expertise needed to effectively use such tools. This paper extends that work by broadening the set of policies available in PHOEBE, as well as by performing a comprehensive assessment of PHOEBE in terms of its benefits, costs and generality (with respect to the used diagnosis tool). The performed evaluation involved a set of experiments in assessing the different trade-offs commonly experienced by a tester when using a performance diagnosis tool, as well as the time savings that PHOEBE can bring to the performance testing and analysis processes. Our results have shown that PHOEBE can drastically reduce the effort required by a tester to do performance testing and analysis in a cluster. PHOEBE also exhibited consistent behaviour (i.e. similar time-savings and resource utilisations), when applied to a set of commonly used diagnosis tools, demonstrating its generality. Finally, PHOEBE proved to be capable of simplifying the configuration of a diagnosis tool. This was achieved by addressing the identified trade-offs without the need for manual intervention from the tester. PHOEBE is implemented with the multi-agent architecture depicted in Figure 1. There, it can be seen how PHOEBE is composed of three types of agents: the control agent is responsible for interacting with the load testing tool to know when the test starts and ends. It is also responsible for evaluating the policies and propagating the decisions to the other nodes. Meanwhile, the application node agent is responsible for performing the required tasks in each application node (e.g. sampling collection or sending the collected samples to the diagnosis tool). Finally, the diagnosis tool Here, the objective was to evaluate the potential trade-off between the number of samples concurrently processed by a diagnosis tool and the number of resources it requires to process the samples. The following sections describe this experiment and its results. PHOEBE 1863 7.1. Experiment #4: proposed policies assessmentThe objective of this experiment was to evaluate the behaviour of PHOEBE, as well as the set of proposed policies, in order to assess how well they have fulfilled their purpose of addressing the identified trade-offs without the need for manual intervention from the tester. The following sections describe this experiment and its results.7.1.1. Experimental set-up....