Abstract-While custom (and reconfigurable) computing can provide orders-of-magnitude improvements in energy efficiency and performance for many numeric, data-parallel applications, performance on non-numeric, sequential code is often worse than what is achievable using conventional superscalar processors. This work attempts to address the problem of improving sequential performance in custom hardware by (a) switching from a statically scheduled to a dynamically scheduled (dataflow) execution model, and (b) developing a new compiler IR for highlevel synthesis that enables aggressive exposition of ILP even in the presence of complex control flow. This new IR is directly implemented as a static dataflow graph in hardware by our prototype high-level synthesis tool-chain, and shows an average speedup of 1.13× over equivalent hardware generated using LegUp, an existing HLS tool. In addition, our new IR allows us to further trade area and energy for performance, increasing the average speedup to 1.55×, through loop unrolling, with a peak speedup of 4.05×. Our custom hardware is able to approach the sequential cycle counts of an Intel Nehalem Core i7 superscalar processor, while consuming on average only 0.25× the energy of an in-order Altera Nios IIf processor.
Motivation. With the impending Dark Silicon problem spelling doom for multicore performance scaling, there is an ever increasing need for processor architectures with much better energy efficiency. To address this, designers are increasingly utilising custom (and reconfigurable) computing to provide orders-of-magnitude improvements in energy efficiency over equivalent software implementations of the same code [5]. Unfortunately, one of the key issues with custom computing is that it is often unable to match the performance of conventional processors when implementing irregular code with complex control-flow. Out-of-order superscalar processors implement aggressive branch prediction to speculate across multiple branches with very high accuracy, dynamically exposing ILP. But custom hardware lacks an efficient & safe control-flow speculation mechanism (particularly for loops), as it is difficult to implement misprediction roll-back and recovery in hardware without introducing a centralized synchronization bottleneck.
In this paper we present an evaluation of selected parallel strategies for Simulated Annealing and Simulated Evolution, identifying the impact of various issues on the effectiveness of parallelization. Issues under consideration are the characteristics of these algorithms, the problem instance, and the implementation environment. Observations are presented regarding the impact of parallel strategies on runtime and achievable solution quality. Effective parallel algorithm design choices are identified, along with pitfalls to avoid. We further attempt to generalize our assessments to other heuristics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.