Multi-input addition is an important operation for many DSP and video processing applications. On FPGAs, multi-input addition has traditionally been implemented using trees of carry-propagate adders. This approach has been used because the traditional lookup table (LUT) structure of FPGAs is not amenable to compressor trees, which are used to implement multi-input addition and parallel multiplication in ASIC technology. In prior work, we developed a greedy heuristic method to map compressor trees onto the general logic of an FPGA using a component called generalized parallel counter (GPC). Although this technique reduced the combinational delay of our circuits, when synthesized onto Altera Stratix-II FPGAs, by 27% on average; however, the area was increased by an average 11%. To further reduce the delay and limit the increase in area, we have developed a new solution to the mapping problem based on integer linear programming. This new approach reduced the delay of the compressor tree by 32% on average and reduced the area by 3% compared to an adder tree.
Despite the progress of the last decades in electronic design automation, arithmetic circuits have always received way less attention than other classes of digital circuits. Logic synthesisers, which play a fundamental role in design today, play a minor role on most arithmetic circuits, performing some local optimisations but hardly improving the overall structure of arithmetic components. Architectural optimisations have been often studied manually, and only in the case of very common building blocks such as fast adders and multi-input adders, ad-hoc techniques have been developed. A notable case is multi-input addition, which is the core of many circuits such as multipliers, etc. The most common technique to implement multi-input addition is using compressor trees, which are often composed of carry-save adders (based on (3 : 2) counters, i.e., full adders). A large body of literature exists to implement compressor trees using large counters. However, all the large counters were built by using full and half adders recursively. In this paper we give some definite answers to issues related to the use of large counters. We present a general technique to implement large counters whose performance is much better than the ones composed of full and half adders. Also we show that it is not always useful to use larger optimised counters and sometimes a combination of various size counters gives the best performance. Our results show 15% improvement in the critical path delay. In some cases even hardware area is reduced by using our counters.
Abstract-Custom instruction set extensions (ISEs) are added to an extensible base processor to provide application-specific functionality at a low cost. As only one ISE executes at a time, resources can be shared. This paper presents a new high-level synthesis flow targeting ISEs. We emphasize a new technique for resource allocation, binding, and port assignment during synthesis. Our method is derived from prior work on datapath merging, and increases area reduction by accounting for the cost of multiplexors that must be inserted into the resulting datapath to achieve multi-operational functionality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.