Optimizing the execution of intelligent codes on high performance computer's (HPC's) has become more challenging as the numbers of processors increases. Single processors in many HPC's have been replaced with dual processors, and more recently multiprocessors.This, combined with the inherent complexities of multi-core processors, has made the processing of intelligent codes even more complex on the latest HPC's. The coming availability of thousands of processors in more affordable medium sized HPC's offers the potential for improved performance for codes that can scale sufficiently to take advantage of hundreds of teraflops. Additionally, techniques for harnessing the performance potential of multi-code processors require the appropriate location of data in shared memories, or even shared level-2 caches, and can afford additional orders of magnitude performance increases. The key to designing code that uses the available teraflops wisely is an understanding of the application's behavior. For intelligent systems, whose behavior may depend on heuristics evaluated at runtime, measurements and profiling runs provide the basis for system design decisions, regarding distribution of data and processing. This paper focuses on the metrics needed to optimize intelligent codes, and how a specific image processing code was instrumented to produce the required metrics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.