Several important scientific and engineering applications require analysis of spatially distributed data from expensive experiments or complex simulations, which can demand days, weeks, or even years on petaflopsclass computing systems. Consider the conceptual design of a high-speed civil transport, which involves the disciplines of aerodynamics, structures, mission-related controls, and propulsion (see Figure 1).1 Frequently, the engineer will change some aspect of a nominal design point and run a simulation to see how the change affects the objective function (for example, takeoff gross weight, or TOGW). Or the design process is made configurable, so the engineer can concentrate on accurately modeling one aspect while replacing the remainder of the design with fixed boundary conditions surrounding the focal area. However, both these approaches are inadequate for exploring large high-dimensional design spaces, even at low fidelity. Ideally, the design engineer would like a high-level mining system to identify the pockets that contain good designs and merit further consideration. The engineer can then apply traditional tools from optimization and approximation theory to fine-tune preliminary analyses.Data mining is a key solution approach for such applications, supporting analysis, visualization, and design tasks.2 It serves a primary role in many domains and a complementary role in others by augmenting traditional techniques from numerical analysis, statistics, and machine learning.Three important characteristics distinguish the applications studied in this article. First, they are characterized not by an abundance of data, but rather a scarcity of it (owing to the cost and time involved in conducting simulations). Second, the computational scientist has complete control over data acquisition (for example, regions of the design space where he or she can collect data), especially via computer simulations. Finally, significant domain knowledge exists in the form of physical properties such as continuity, correspondence, and locality. Using such information to focus data collection for data mining is thus natural.This combination of data scarcity plus control over data collection plus the ability to exploit domain knowledge characterizes many important computational science applications. In