Adaptive compression for instruction code of Coarse Grained Reconfigurable Architectures

Chung, Moo-Kyoung; Kim, Jun-Kyoung; Cho, Yeongon; Ryu, Soojung

doi:10.1109/fpt.2013.6718396

Cited by 6 publications

(5 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Aslam et al [31] uses state-of-the-art dictionary methods and reorganized the PEs to improve the compression in the dictionary. The approach taken by Chung et al [32,33] exploits the spatial and temporal redundancy from the configuration stream and saves the most frequently occurring values in a dictionary. The latency of their decompressor is two cycles and can be pipelined, but the overhead of the decompressor is not shown in their papers.…”

Section: Related Workmentioning

confidence: 99%

“…For example, in Figure 2, HReA [18] uses GCM 1024 bits wide and 128 lines deep feeding the context to PEs and it takes 42% of the entire chip area and 38% of the chip power consumption. To reduce context-fetching overhead and context-memory footprint, most existing context-reduction frameworks [31][32][33][34][35][36] rely on the statistical analysis over the context bitstream in the pre-silicon phase, and their compressed context is encoded after the original context-generation phase. These approaches can be classified as post-contextgeneration method [37].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA

Zhao

Sheng

et al. 2021

Electronics

View full text Add to dashboard Cite

Modulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for context memory and context-fetching. To tackle this challenge, this paper uses an architecture/compiler co-designed method for context reduction. From an architecture perspective, we carefully partition the context into several subsections and only fetch the subsections that are different to the former context word whenever fetching the new context. We package each different subsection with an opcode and index value to formulate a context-fetching primitive (CFP) and explore the hardware design space by providing the centralized and distributed CFP-fetching CGRA to support this CFP-based context-fetching scheme. From the software side, we develop a similarity-aware tuning algorithm and integrate it into state-of-the-art modulo scheduling and memory access conflict optimization algorithms. The whole compilation flow can efficiently improve the similarities between contexts in each PE for the purpose of reducing both context-fetching latency and context footprint. Experimental results show that our HW/SW co-designed framework can improve the area efficiency and energy efficiency to at most 34% and 21% higher with only 2% performance overhead.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA

Zhao

Sheng

et al. 2021

Electronics

View full text Add to dashboard Cite

show abstract

“…In [27], they proposed a new method to reduce the energy of the context switching process by decreasing the number of active lines of the context memory and also reducing the number of transition bits on the CSN. In [28], they used the differential loading technique to reduce the number of transition bits on the CSN. But, Kim and Mahapatra [29,30] neither stored nor transmitted the unnecessary bits of the configuration words to reduce the energy of the CGRA.…”

Section: Related Workmentioning

confidence: 99%

“…By fragmenting a BB into multiple contexts, one branching instruction (per context) will be added to the Ni. On the other hand, based on the efficiency of the CGRA compiler, some PEs in the PEN cannot be mapped and they will be idle [28]. Thus, we have added a new parameter for that effect, called α.…”

Section: Figure 2 Input High-level Programmentioning

confidence: 99%

Design-Space Exploration of Application-specific Instruction-set Processor Design

Sargolzaei¹

2021

IJC

View full text Add to dashboard Cite

Application-Specific Instruction-Set Processors (ASIPs) have established their processing power in the embedded systems. Since energy efficiency is one of the most important challenges in this area, coarse-grained reconfigurable arrays (CGRAs) have been used in many different domains. The exclusive program execution model of the CGRAs is the key to their energy efficiency but it has some major costs. The context-switching network (CSN) is responsible for handling this unique program execution model and is also one of the most energy-hungry parts of the CGRAs. In this paper, we have proposed a new method to predict important architectural parameters of the CSN of a CGRA, such as the size of the processing elements (PEs), the topology of the CSN, and the number of configuration registers in each PE. The proposed method is based on the high-level code of the input application, and it is used to prune the design space and increase the energy efficiency of the CGRA. Based on our results, not only the size of the design space of the CSN of the CGRA is reduced to 10%, but also its performance and energy efficiency are increased by about 13% and 73%, respectively. The predicted architecture by the proposed method is over 97% closer to the best architecture of the exhaustive searching for the design space.

show abstract

“…Due to the complexity of the compression, a genetic algorithm and an integer linear program (ILP) are used. A dictionary-based compression is proposed in [10]. This method separates two types of dictionaries based on the locality of a mapped application.…”

Section: Related Workmentioning

confidence: 99%

A Fine-Grained Multicasting of Configuration Data for Coarse-Grained Reconfigurable Architectures

Kojima

Amano

2019

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

A novel configuration data compression technique for coarse-grained reconfigurable architectures (CGRAs) is proposed. Reducing the size of configuration data of CGRAs shortens the reconfiguration time especially when the communication bandwidth between a CGRA and a host CPU is limited. In addition, it saves energy consumption of configuration cache and controller. The proposed technique is based on a multicast configuration technique called RoMultiC, which reduces the configuration time by multicasting the same data to multiple PEs (Processing Elements) with two bit-maps. Scheduling algorithms for an optimizing the order of multicasting have been proposed. However, the multicasting is possible only if each PE has completely the same configuration. In general, configuration data for CGRAs can be divided into some fields like machine code formats of general perpose CPUs. The proposed scheme confines a part of fields for multicasting so that the possibility of multicasting more PEs can be increased. This paper analyzes algorithms to find a configuration pattern which maximizes the number of multicasted PEs. We implemented the proposed scheme to CMA (Cool Mega Array), a straight forward CGRA as a case study. Experimental results show that the proposed method achieves 40.0% smaller configuration than a previous method for an image processing application at maximum. The exploration of the multicasted grain size reveals the effective grain size for each algorithm. Furthermore, since both a dynamic power consumption of the configuration controller and a configuration time are improved, it achieves 50.1% reduction of the energy consumption for the configuration with a negligible area overhead.

show abstract

Adaptive compression for instruction code of Coarse Grained Reconfigurable Architectures

Cited by 6 publications

References 7 publications

Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA

Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA

Design-Space Exploration of Application-specific Instruction-set Processor Design

A Fine-Grained Multicasting of Configuration Data for Coarse-Grained Reconfigurable Architectures

Contact Info

Product

Resources

About