Systematic and Automated Multiprocessor System Design, Programming, and Implementation

Nikolov, Hristo N.; Stefanov, Todor; Deprettere, E.F.

doi:10.1109/tcad.2007.911337

Cited by 107 publications

(67 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, these approaches restrict to one-dimensional streams of data whereas image processing applications communicate multi-dimensional image arrays. In particular for buffer analysis this information can be advantageously exploited [11], [12], [13], [14]. However, none of these approaches considers synthesis of hardware accelerators and the special properties of multi-rate systems.…”

Section: Problem Formulation and Related Workmentioning

confidence: 99%

Model-based synthesis and optimization of static multi-rate image processing algorithms

Keinert

Dutta

Hannig

et al. 2009

2009 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition

View full text Add to dashboard Cite

Abstract-High computational effort in modern image processing applications like medical imaging or high-resolution video processing often demands for massively parallel special purpose architectures in form of FPGAs or ASICs. However, their efficient implementation is still a challenge, as the design complexity causes exploding development times and costs. This paper presents a new design flow which permits to specify, analyze, and synthesize complex image processing algorithms. A novel buffer requirement analysis allows exploiting possible tradeoffs between required communication memory and computational logic for multi-rate applications. The derived schedule and buffer results are taken into account for resource optimized synthesis of the required hardware accelerators. Application to a multi-resolution filter shows that buffer analysis is possible in less than one second and that scheduling alternatives influence the required communication memory by up to 24% and the computational resources by up to 16%. I. INTRODUCTIONAs design complexity is becoming a major barrier for technical progress because of expensive and error-prone development, new design methodologies raising the level of abstraction are becoming increasingly popular. Simulink [1] or SystemC based high-level synthesis [2] tools for instance permit to compose complex systems by communicating blocks. However, these approaches do not allow for system-level analysis like determination of required communication buffer sizes, as the blocks can contain arbitrarily complex operations. Alternative approaches like [3], [4] are restricted to a subset of sequential languages like C. However, extraction of the contained parallelism is challenging, especially as analysis on individual statements can get computationally expensive [5].In order to address these aspects, this paper presents a novel design flow for high-level synthesis of complex multi-rate image processing applications containing up-and downsamplers. It extends existing previous work by usage of latticebased buffer analysis which considers different scheduling alternatives for multi-rate systems. As the obtained results are directly taken into account during hardware synthesis, we are able to exploit tradeoffs between required communication memory and computational logic. Furthermore, in contrast to many other approaches, analysis of the overall system does not rely on solving Integer Linear Programs (ILPs) in case of acyclic problems. Instead ILPs are only required for local analysis like actor synthesis or dependency calculation in order to assure good scaling properties of our design flow.

show abstract

Section: Problem Formulation and Related Workmentioning

confidence: 99%

Model-based synthesis and optimization of static multi-rate image processing algorithms

Keinert

Dutta

Hannig

et al. 2009

2009 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition

View full text Add to dashboard Cite

show abstract

“…The first bar in Figure 4 corresponds to the performance result for the unmodified application and its derived KPN in Figure 1 A) mapped on the ESPAM platform [7,8]. The application is executed We observe that by introducing modulo statements, the communication (the control part) becomes more costly as the modulo expressions will appear in the definitions of the input/output ports.…”

Section: Motivating Examplesmentioning

confidence: 99%

On compile-time evaluation of process partitioning transformations for Kahn process networks

Meijer

Nikolov

Stefanov

2009

Proceedings of the 7th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis

View full text Add to dashboard Cite

Kahn Process Networks is an appealing model of computation for programming and mapping applications onto multi-processor platforms. Autonomous processes communicate through unbounded FIFO channels in absence of a global scheduler. We derive Kahn process networks from sequential applications using the pn compiler, but the derived networks do not necessarily meet the performance requirements. Process partitioning transformations can achieve a more balanced network improving the performance results significantly. There are a number of process partitioning transformations that can be used, but no hints are given to the designer which transformation should be applied to minimize, for example, the execution time. Therefore, we investigate a compile-time approach for selecting the best transformation candidate and show results on a Xilinx Virtex 2 FPGA and the Cell BE processor.

show abstract

“…Sesame allows for quickly evaluating the performance of different application to architecture mappings, HW/SW partitionings, and target platform architectures. Such exploration should result in a number of promising candidate system designs, of which their specifications (system-level platform description, application-architecture mapping description, and application description) act as input to the ESPAM tool [11,12]. This tool uses these system-level input specifications, together with RTL versions of the components from the IP library, to automatically generate synthesizable VHDL that implements the candidate MP-SoC platform architecture.…”

Section: The Daedalus Design Flowmentioning

confidence: 99%

Tool Integration and Interoperability Challenges of a System-Level Design Flow: A Case Study

Pimentel

Stefanov

Nikolov

et al.

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Daedalus is a system-level design flow for the design of multiprocessor system-on-chip (MP-SoC) based embedded multimedia systems. It offers a fully integrated tool-flow in which design exploration, system-level synthesis, application mapping, and system prototyping of MP-SoC architectures are highly automated. In this paper, we describe Daedalus from a software perspective, explaining its supporting software infrastructure and the way the various tools interoperate. Moreover, we discuss the lack of support for achieving tool interoperability that we have encountered during the development of Daedalus, and present several ideas of future research directions to address this issue. More specifically, we argue that a so-called Common Design Flow Infrastructure (CDFI) for system-level design flows is needed to improve and stimulate research and development in the area of system-level design methodology.

show abstract

Systematic and Automated Multiprocessor System Design, Programming, and Implementation

Cited by 107 publications

References 23 publications

Model-based synthesis and optimization of static multi-rate image processing algorithms

Model-based synthesis and optimization of static multi-rate image processing algorithms

On compile-time evaluation of process partitioning transformations for Kahn process networks

Tool Integration and Interoperability Challenges of a System-Level Design Flow: A Case Study

Contact Info

Product

Resources

About