Automatic memory partitioning and scheduling for throughput and power optimization

Cong, Jason; Jiang, Wei; Liu, Bin; Zou, Yi

doi:10.1145/1929943.1929947

Cited by 68 publications

(35 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Another salient feature of custom hardware accelerators is that they typically use a specialized, partitioned memory architecture [15,47]. Partitioned memory is synthesized at compile time together with the rest of the hardware accelerator.…”

Section: Application-specific Hardware Accelerators and The Role Of Amentioning

confidence: 99%

“…Their approach operates in conjunction with loop nest transformations that are commonly applied to array based computations and provide higher performance by increasing memory level parallelism. Baradaran and Diniz [6], Cong et al [15] present efforts that combine scheduling techniques with memory bank-interleaved array layout to improve performance.…”

Section: Compiler-guided Memory Partitioningmentioning

confidence: 99%

“…First, two requests can use the same port to a memory unless they are executed simultaneously in the same cycle. Statically assigning each load/store to one of the interleaved banks of a memory [15] is not applicable within our method, which already generates minimal, indivisible memories: if the loads/stores accessing a memory could be partitioned into N groups, where each group is guaranteed at compile time to be accessing its own disjoint bank of this memory, these loads/stores would not constitute a minimal group of memory operations, contradicting the definition of a memory (see Sect. 4.2).…”

Section: Implementation Details and Hardware Implicationsmentioning

confidence: 99%

“…If the user has indicated that the program follows ANSI aliasing rules, we analyze the data types of the address expressions. We use the type information existing in the symbol table 15 to check whether the two instructions access variables of different types. If so, then per ANSI aliasing rules, we assume that there is no dependence.…”

Section: Static Analysismentioning

confidence: 99%

“…We also solve loop exit conditions to find the symbolic loop trip counts, which correspond to the maximum values for loop index variables (i.e., I max values where 0 ≤ I < I max ) 15. In our implementation, we use the symbol information embedded into the assembly files by gcc when the gstabs+ option is used.…”

mentioning

confidence: 99%

See 4 more Smart Citations

Memory Partitioning in the Limit

Kültürsay

Ebcioğlu

Küçük

et al. 2015

Int J Parallel Prog

View full text Add to dashboard Cite

The key difficulties in designing memory hierarchies for future computing systems with extreme scale parallelism include (1) overcoming the design complexity of system-wide memory coherence, (2) achieving low power, and (3) achieving fast access times within such a memory hierarchy. Towards addressing these difficulties, in this paper we propose an automatic memory partitioning method to generate a customized, application-specific, energy-efficient, low latency memory hierarchy, tailored to particular application programs. Given a software program to accelerate, our method automatically partitions the memory of the original program, creates a new customized application-specific multi-level memory hierarchy for the program, and modifies the original program to use the new memory hierarchy. This new memory hierarchy and modified program are then used as the basis to create a customized, application-specific, highly parallel hardware accelerator, which is functionally equivalent to the original, unmodified program. Using dependence analysis and fine grain valid/dirty bits, the memories in the generated hierarchy can operate in parallel with-B Emre Kültürsay 123 338 Int J Parallel Prog (2016) 44:337-380 out the need for maintaining coherence and can be independently initialized/flushed from/to their parent memories in the hierarchy, enabling a scalable memory design. The generated memories are fully compatible with the memory addressing in the original software program; this compatibility feature enables the translation of general software applications to application-specific accelerators. We also provide a compiler analysis method to perform accurate dependence analysis for memory partitioning based on symbolic execution, and a profiler-based futuristic limit study to identify the maximum gains that can be achieved by memory partitioning.

show abstract

Section: Application-specific Hardware Accelerators and The Role Of Amentioning

confidence: 99%

Section: Compiler-guided Memory Partitioningmentioning

confidence: 99%

Section: Implementation Details and Hardware Implicationsmentioning

confidence: 99%

Section: Static Analysismentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations