I n this paper we present a model of coarse grain dataflow execution. We present one top down and two bottom up methods for generation of multithreaded code, and evaluate their effectiveness. The bottom up techniques start form a fine-grain dataflow graph and coalesce this into coarse-grain clusters. The top down technique generates clusters directly from the intermediate data dependence graph used for compiler opfimizations. We discuss the relevant phases in the compilation process. We compare the effectiveness of the strategies by measuring the total number of clusters executed, the total number of instructions executed, cluster size, and number of matches per cluster. It turns out that the top down method generates more efficient code, and larger clusters. However the number of matches per cluster is larger for the top down method, which could incur higher cluster synchronization costs.
Set-associative caches are widely used in CPU memory hierarchies, I/O subsystems, and file systems to reduce average access times. This article proposes an efficient simulation technique for simulating a group of set-associative caches in a single pass through the address trace, where all caches have the same line size but varying associativities and varying number of sets. The article also introduces a generalization of the ordinary binomial tree and presents a representation of caches in this class using the Generalized Binomial Tree (gbt). The tree representation permits efficient search and update of the caches. Theoretically, the new algorithm, GBF_LS, based on the gbt structure, always takes fewer comparisons than the two earlier algorithms for the same class of caches: all-associativity and generalized forest simulation. Experimentally, the new algorithm shows performance gains in the range of 1.2 to 3.8 over the earlier algorithms on address traces of the SPEC benchmarks. A related algorithm for simulating multiple alternative direct-mapped caches with fixed cache size, but varying line size, is also presented.
Cache miss characterization models such as the three Cs model are useful in developing schemes to reduce cache misses and their penalty. In this paper we propose the OPT model that uses cache simulation under optimal (OPT) replacement to obtain a finer and more accurate characterization of misses than the three Cs model. However, current methods for optimal cache simulation are slow and difficult to use. We present three new techniques for optimal cache simulation. First, we propose a limited lookahead strategy with error fixing, which allows one pass simulation of multiple optimal caches. Second, we propose a scheme to group entries in the OPT stack, which allows efficient treebased fully-associative cache simulation under OPT. Third, we propose a scheme for exploiting partial inclusion in set-associative cache simulation under OPT. Simulators based on these algorithms were used to obtain cache miss characterizations using the OPT model for nine SPEC benchmarks. The results indicate that miss ratios under OPT are substantially lower than those under LRU replacement, by up to 70% in fully-associative caches, and up to 32% in t we-way set-associative caches.
Vector instruction sets are receiving renewed interest because of their applicability to multimedia. Current multimedia instruction sets use short vectors with SIMD implementations, but long vector, pipelined implementations have a number of advantages and are a logical next step in multimedia ISA development.Support for conditional operations (as occur in loops containing IF statements) is an important aspect of a vector ISA. Seven ISA alternatives for implementing conditional operations are systematically explored. Performance considerations are discussed through evaluation of a typical IF loop over a range of vector lengths and true conditional values. An approach using masked operations is shown to be one of the better methods, especially if its implementation is able to skip over blocks of false mask bits. Additional analyses of complex IF loops and parallel pipeline implementations support the masked operation approach. The paper concludes with a practical implementation of masked operations that skips over power-of-2-length blocks of false values. This implementation is simpler than skipping arbitrary-length blocks and provides similar performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.