Cluster expansions of first-principles density-functional databases in multicomponent systems are now used as a routine tool for the prediction of zero-and finite-temperature physical properties. The ability of producing large databases of various degrees of accuracy, i.e., high-throughput calculations, makes pertinent the analysis of error propagation during the inversion process. This is a very demanding task as both data and numerical noise have to be treated on equal footing. We have addressed this problem by using an analysis that combines the variational and evolutionary approaches to cluster expansions. Simulated databases were constructed ex professo to sample the configurational space in two different and complementary ways. These databases were in turn treated with different levels of both systematic and random numerical noise. The effects of the crossvalidation level, size of the database, type of numerical imprecisions on the forecasting power of the expansions were extensively analyzed. We found that the size of the database is the most important parameter. Upon this analysis, we have determined criteria for selecting the optimal expansions, i.e., transferable expansions with constant forecasting power in the configurational space ͑a structure-property map͒. As a by-product, our study provides a detailed comparison between the variational cluster expansion and the genetic-algorithm approaches.
I. RATIONAL DESIGN, STRUCTURE-PROPERTY MAPS, AND TARGETING PHYSICAL PROPERTIESRational design of molecular systems and solid-state materials relies on the knowledge of the effective potentials or interactions to tailor motifs with favorable properties. In a combinatorial high-throughput approach the task is, in principle, simple: To solve the Schrödinger equation for all viable conformations and combinations of a list of candidate components. In practice, however, this is unfeasible due the astronomical size of the chemical space ͑i.e., the set of all spatial and chemical conformations available to the system͒ that must be scanned to optimize a target property. 1 Constructing maps that relate structure to physical properties is at the core of rational design strategies. This approach looks for correlations between a set of measurements ͑experimental observations and/or quantum-mechanical calculations͒ of an observable F and a potential-energy surface V. In a practical fashion, the functional map replaces the true V dependence of F͓V͔ with a functional f͑v͒ through a suitable transformation from V to a set of key variables v = ͑v 1 , ... ,v N ͒. Members of the potential energy surface are thus characterized by different v's. The choice for the particular form and nature of v depends, of course, on the problem at hand. Data-centered methods 2,3 and basis-functions expansions 4,5 are among the most popular choices in materials science although neural-network approaches for inverting intermolecular potentials also have been reported in the literature. 6 Cluster expansion ͑CE͒ ͑Ref. 7-9͒ is the method of choice to...