High-level address optimization and synthesis techniques for data-transfer-intensive applications

Miranda, M.; Catthoor, Francky; Janssen, M.; Man, H.J. De

doi:10.1109/92.736141

Cited by 57 publications

(27 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Miranda et al in [26][27][28][29] present the address optimization (ADOPT) environment. The framework gives a formalized methodology and an automated technique to support address arithmetic optimizations in flowgraph expressions for distributed memory architectures.…”

Section: Address Optimizationsmentioning

confidence: 99%

Address Generation Optimization for Embedded High-Performance Processors: A Survey

Talavera

Jayapala

Carrabina

et al. 2008

J Sign Process Syst Sign Image Video Technol

View full text Add to dashboard Cite

Nowadays embedded systems are growing at an impressive rate and provide more and more sophisticated applications characterized by having a complex array index manipulation and a large number of data accesses. Those applications require high performance specific computation that general purpose processors can not deliver at a reasonable energy consumption. Very long instruction word architectures seem a good solution providing enough computational performance at low power with the required programmability to speed up the time to market. Those architectures rely on compiler effort to exploit the available instruction and data parallelism to keep the data path busy all the time. With the density of transistors doubling each 18 months, more and more sophisticated architectures with a high number of computational resources running in parallel are emerging. With this increasing parallel computation, the access to data is becoming the main bottleneck that limits the available parallelism. To alleviate this problem, in current embedded architectures, a special unit works in parallel with the main computing elements to ensure efficient feed and storage of the data: the address generator unit, which comes in many flavors. Future architectures will have to deal with enormous memory bandwidth in distributed memories and the development of address generators units will be crucial for effective next generation of embedded processors where global trade-offs between reactiontime, bandwidth, energy and area must be achieved. This paper provides a survey of methods and techniques that optimize the address generation process for embedded systems, explaining current research trends and needs for future.

show abstract

Section: Address Optimizationsmentioning

confidence: 99%

Address Generation Optimization for Embedded High-Performance Processors: A Survey

Talavera

Jayapala

Carrabina

et al. 2008

J Sign Process Syst Sign Image Video Technol

View full text Add to dashboard Cite

show abstract

“…In [5] a technique based on software transformations is described by reducing the processing power for calculating the addresses of accessed data resulting in speeding up the target programmable cores. In [6] the addressing is separated from processing and assigned for execution to custom and counter based address generators. This work is extended in the proposed architecture by using the Access processor and the DMAs which offer flexibility in address generation and can exploit the various application characteristics obtained after compile time analysis.…”

Section: Related Workmentioning

confidence: 99%

Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy

Milidonis

Alachiotis

Porpodas

et al. 2009

J Sign Process Syst Sign Image Video Technol

View full text Add to dashboard Cite

We present an architecture of decoupled processors with a memory hierarchy consisting only of scratch-pad memories, and a main memory. This architecture exploits the more efficient pre-fetching of Decoupled processors, that make use of the parallelism between address computation and application data processing, which mainly exists in streaming applications. This benefit combined with the ability of scratch-pad memories to store data with no conflict misses and low energy per access contributes significantly for increasing the system's performance. The application code is split in two parallel programs the first runs on the Access processor and computes the addresses of the data in the memory hierarchy. The second processes the application data and runs on the Execute processor, a processor with a limited address space-just the register file addresses. Each transfer of any block in the memory hierarchy up to the Execute processor's register file is controlled by the Access processor and the DMA units. This strongly differentiates this architecture from traditional uniprocessors and existing decoupled processors with cache memory hierarchies. The architecture is compared in performance with uniprocessor architectures with (a) scratch-pad and (b) cache memory hierarchies and (c) the existing decoupled architectures, showing its higher normalized performance. The reason for this gain is the efficiency of data transferring that the scratch-pad memory hierarchy provides combined with the ability of the Decoupled processors to eliminate memory latency using memory management techniques for transferring data instead of fixed prefetching methods. Experimental results show that the performance is increased up to almost 2 times compared to uniprocessor architectures with scratch-pad and up to 3.7 times compared to the ones with cache. The proposed architecture achieves the above performance without having penalties in energy delay product costs.

show abstract

“…Sheldon et al [16] present techniques to eliminate division and modulo operations, by inserting conditionals and using algebraic axioms and loop transformations. Most techniques optimize the evaluation of a given set of address expressions, possibly sharing logic among different address expressions [17]. Only a few remap data to simplify the address expressions [18,8].…”

Section: Address Expressionsmentioning

confidence: 99%

Constructing Application-Specific Memory Hierarchies on FPGAs

Devos

Campenhout

Verbauwhede

et al. 2011

Transactions on High-Performance Embedded Architectures and Compilers III

View full text Add to dashboard Cite

Abstract. The high performance potential of an FPGA is not fully exploited if a design suffers a memory bottleneck. Therefore, a memory hierarchy is needed to reuse data in on-chip buffer memories and minimize the number of accesses to off-chip memory. Buffer memories not only hide the external memory latency, but can also be used to remap data and augment the on-chip bandwidth through parallel access of multiple buffers. This paper discusses the differences and similarities of memory hierarchies on processor-and on FPGA-based systems and presents a step-by-step methodology to construct a memory hierarchy on an FPGA.

show abstract

High-level address optimization and synthesis techniques for data-transfer-intensive applications

Cited by 57 publications

References 21 publications

Address Generation Optimization for Embedded High-Performance Processors: A Survey

Address Generation Optimization for Embedded High-Performance Processors: A Survey

Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy

Constructing Application-Specific Memory Hierarchies on FPGAs

Contact Info

Product

Resources

About