Implementing image applications on FPGAs

Draper, Bruce A.; Beveridge, J. Ross; Böhm, A. P. Wim; Ross, Charles A.; Chawathe, M.

doi:10.1109/icpr.2002.1047845

Cited by 15 publications

(9 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Binary morphologic filters, because of their usefulness and relative simplicity, were some of the first image processing operations to be implemented on FPGAs [2,3]. Their regular structure makes a streamed pipeline implementation attractive, and most FPGA based filter implementations use this structure.…”

Section: B Prior Workmentioning

confidence: 99%

Efficient implementation of greyscale morphological filters

Bailey

2010

2010 International Conference on Field-Programmable Technology

View full text Add to dashboard Cite

Morphological filters are often implemented using a series decomposition. This paper presents a parallel decomposition that is able to exploit separability. By maximising the reuse of hardware between the parallel filters, a novel computationally efficient filter structure may be derived. Results show that such filters may be implemented on a Virtex-5 FPGA with pixel clock rates approaching 1 GHz.

show abstract

Section: B Prior Workmentioning

confidence: 99%

Efficient implementation of greyscale morphological filters

Bailey

2010

2010 International Conference on Field-Programmable Technology

View full text Add to dashboard Cite

show abstract

“…The parameter w represents the width of the window, which is basically the number of k operators working together at the same time, while S i and S o represent the set of input and output elements, respectively. This approach is widely used in literature [31], [34], [36], [37], especially on algorithms characterized by simple dependencies. However, this approach cannot be considered a viable solution when dealing with algorithms characterized by complex dependencies, since it does not take into account the relations between successive frames, and therefore it is generally suboptimal when multiple iterations are performed at once.…”

Section: State-of-the-art Implementationsmentioning

confidence: 99%

“…While existing implementations of ISLs on CPUs [5] [6] and GPGPUs [7] [8] have ultimately struggled achieving high performance, groundbreaking works on FPGAs (such as [47]), have demonstrated high potential. In fact, CPUs and GPGPUs have rigid architectures in terms of memory organization, which may not map Seq (e.g., [32], [33] [34], [35], [31], [36], [37]…”

Section: B Evaluation and Comparison Of Existing Implementationsmentioning

confidence: 99%

Efficient Hardware Design of Iterative Stencil Loops

Rana

Beretta

Bruschi

et al. 2016

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The WestminsterResearch online digital archive at the University of Westminster aims to make the research output of the University available to a wider audience. Copyright and Moral Rights remain with the authors and/or copyright owners.Whilst further distribution of specific materials from within this archive is forbidden, you may freely distribute the URL of WestminsterResearch: ((http://westminsterresearch.wmin.ac.uk/).In case of abuse or copyright appearing without permission e-mail repository@westminster.ac.uk Abstract-A large number of algorithms for multidimensional signals processing and scientific computation come in the form of iterative stencil loops (ISLs), whose data dependencies span across multiple iterations. Because of their complex inner structure, automatic hardware acceleration of such algorithms is traditionally considered as a difficult task.In this paper, we introduce an automatic design flow that identifies, in a wide family of bidimensional data processing algorithms, sub-portions that exhibit a kind of parallelism close to that of ISLs; these are mapped onto a space of highly optimized ad-hoc architectures, which is efficiently explored to identify the best implementations with respect to both area and throughput. Experimental results show that the proposed methodology generates circuits whose performance is comparable to that of manually-optimized solutions, and orders of magnitude higher than those generated by commercial HLS tools.

show abstract

“…Also, many special purpose architectures (e.g., ASICs [7,8], FPGAs [9,10], DSPs [3,11]), and enhanced general purpose CPUs (see, e.g., [12][13][14]), have been developed to deliver even higher performance for specific imaging tasks [15].…”

Section: Hardware Architecturesmentioning

confidence: 99%

User transparency: a fully sequential programming model for efficient data parallel image processing

Seinstra

Koelma

2004

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYAlthough many image processing applications are ideally suited for parallel implementation, most researchers in imaging do not benefit from high-performance computing on a daily basis. Essentially, this is due to the fact that no parallelization tools exist that truly match the image processing researcher's frame of reference. As it is unrealistic to expect imaging researchers to become experts in parallel computing, tools must be provided to allow them to develop high-performance applications in a highly familiar manner. In an attempt to provide such a tool, we have designed a software architecture that allows transparent (i.e. sequential) implementation of data parallel imaging applications for execution on homogeneous distributed memory MIMD-style multicomputers. This paper presents an extensive overview of the design rationale behind the software architecture, and gives an assessment of the architecture's effectiveness in providing significant performance gains. In particular, we describe the implementation and automatic parallelization of three well-known example applications that contain many fundamental imaging operations: (1) template matching; (2) multi-baseline stereo vision; and (3) line detection. Based on experimental results we conclude that our software architecture constitutes a powerful and user-friendly tool for obtaining high performance in many important image processing research areas.

show abstract

Implementing image applications on FPGAs

Cited by 15 publications

References 8 publications

Efficient implementation of greyscale morphological filters

Efficient implementation of greyscale morphological filters

Efficient Hardware Design of Iterative Stencil Loops

User transparency: a fully sequential programming model for efficient data parallel image processing

Contact Info

Product

Resources

About