SUMMARYFor surrogate construction, a good experimental design (ED) is essential to simultaneously reduce the effect of noise and bias errors. However, most EDs cater to a single criterion and may lead to small gains in that criterion at the expense of large deteriorations in other criteria. We use multiple criteria to assess the performance of different popular EDs. We demonstrate that these EDs offer different trade-offs, and that use of a single criterion is indeed risky. In addition, we show that popular EDs, such as Latin hypercube sampling (LHS) and D-optimal designs, often leave large regions of the design space unsampled even for moderate dimensions. We discuss a few possible strategies to combine multiple criteria and illustrate them with examples. We show that complementary criteria (e.g. bias handling criterion for variancebased designs and vice versa) can be combined to improve the performance of EDs. We demonstrate improvements in the trade-offs between noise and bias error by combining a model-based criterion, like the D-optimality criterion, and a geometry-based criterion, like LHS. Next, we demonstrate that selecting an ED from three candidate EDs using a suitable error-based criterion helped eliminate potentially poor designs. Finally, we show benefits from combining the multiple criteria-based strategies, that is, generation of multiple EDs using the D-optimality and LHS criteria, and selecting one design using a pointwise bias error criterion. The encouraging results from the examples indicate that it may be worthwhile studying these strategies more rigorously and in more detail.