In reconfigurable system, fast reconfiguration and small size of configuration contexts are strongly required to enhance the processing performance and reduce the implementation overhead. In this paper, a hierarchical representation of contexts for CGRA called HCC is proposed to satisfy the above requirements. In HCC, the contexts are constructed in a hierarchical fashion to thoroughly eliminate the repetitive portions of the contexts, not only reducing the overall contexts storage size, but also alleviating the contexts transportation overhead. The fast context-indexing mechanism is proposed in HCC to achieve high configuration speed, since the hierarchically organized contexts can be located and accessed conveniently. HCC has been verified in a reconfigurable processor called REMUS HP. Owing to HCC, when implementing H.264 decoding on REMUS HP, 76.67% of the overall contexts are reduced compared with the traditional non-hierarchical one; and the configuration speed is averagely 23× increased compared with the latest reported optimized configuration mechanism on Virtex-4 FX60. REMUS HP is implemented on a 48.9 mm 2 silicon with TSMC 65 nm technology. Simulation shows that 1920 × 1088@30 fps could be achieved for H.264 high-profile decoding when exploiting a 200 MHz working frequency. Compared with the high performance version of XPP, the performance is 181% boosted.
Using the coarser operand grain and simplified interconnection patterns, CGRA (coarse grained reconfigurable architectures) has been proven to be energy efficient in several specific domains. As we know, the speed at which the contexts are applied to a PEA (processing element array) directly determines the performance of CGRA. In this paper, the design space in CGRA is further developed from the configuration granularity perspective by one middle-grained configuration granularity-the row-based configuration mechanism (RCM). The most prominent feature of the RCM is that a large DFG (data flow graph) can be mapped onto a small array in once reconfiguration, which is carried out on a row-by-row basis. Compared with an ordinary DFGpartitioning solution, the reconfiguration time and the data transfer time are well reduced. Furthermore, the proposed RCM offers much more efficient storage for the contexts. Compared with the DFG partitioning solution, the performance is boosted from 2.6% to 57.8%, while the area penalty is only 4.79% and the power penalty is only 7.22%. The RCM has been used in one reconfigurable processor called REMUS HPA (reconfigurable multi-media system, high performance version advanced). REMUS HPA has been implemented on a 50.5 mm 2 silicon with TSMC 65 nm technology. Simulation shows that 1920×1088@37 fps can be achieved for H.264 high-profile decoding when exploiting a 200 MHz working frequency. Compared with the high performance version of XPP (one commercial reconfigurable processor), the performance is 247% boosted.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.