With rapid advancement in computing power and development of numerical tools and scientific theories in fields like structural engineering, simple experiments can now be carried out in-silico. However, simulating many real-life phenomena in analytical fields still remains largely intractable or requires huge computational resources. A number of researchers have developed suitable metamodels to reduce the computational time needed to solve complex structural problems. As such, response surface method has become quite popular due to its versatility and ability to reduce even the most hard-to-model engineering problems into a simple polynomial form. The number and type of sampling points needed for building the response surface approximation are selected by design of experimentation techniques like Box-Behnken design, central composite design, D-optimal design, etc. One design may be appropriate for some particular problems, while a different design would perform better on others. To assess the performance of such metamodels, statistical measures like R 2 and error-based metrics like root-mean-squared error have been commonly used by researchers. In this paper, extensive numerical experiments are performed to gauge the performance of Box-Behnken design, central composite design and D-optimal designs as a metamodeling tool in structural engineering. The insufficiency of classical statistical accuracy measures like R 2 , R 2 adj and R 2 pred is demonstrated, and the need for measures based on external test data such as Q 2 F1 , Q 2 F2 or Q 2 F3 is stressed. Subsequently, some recommendations are also made on building metamodels for structural engineering problems.