Parallel computer vision on a reconfigurable multiprocessor network

Ehandarkar, S.M.; Arabnia, Hamid R.

doi:10.1109/71.584095

Cited by 31 publications

(7 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The related work mainly comes from the areas of pre-compiler transformation systems, parallel compilers, and algorithm design. The pre-compiler presented in this paper would be naturally portable to different types of network topology in high performance systems [2,5].…”

Section: Related Workmentioning

confidence: 99%

Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

et al. 2006

View full text Add to dashboard Cite

Computational Fluid Dynamics (CFD) applications are highly demanding for parallel computing. Many such applications have been shifted from expensive MPP boxes to cost-effective Networks of Workstations (NOW). Auto-CFD-NOW is a pre-compiler that transforms Fortran CFD sequential programs to efficient message-passing parallel programs running on NOW. Our work makes the following three unique contributions. First, this precompiler is highly automatic, requiring a minimum number of user directives for parallelization. Second, we have applied a dependency analysis technique for the CFD applications, called analysis after partitioning. We propose a mirror-image decomposition technique to parallelize self-dependent field loops that are hard to parallelize by existing methods. Finally, traditional optimizations of communication focus on eliminating redundant synchronizations. We have developed an optimization scheme which combines all the non-redundant synchronizations in CFD programs to further reduce the communication overhead. The Auto-CFD-NOW has been implemented on networks of workstations and has been successfully used for automatically parallelizing structured CFD application programs. Our experiments show its effectiveness and scalability for parallelizing large CFD applications.

show abstract

Section: Related Workmentioning

confidence: 99%

Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

et al. 2006

View full text Add to dashboard Cite

show abstract

“…However, wormhole routing has attracted lots of interest for its low latency and less requirement of buffer storage. On the other hand, for purposes of scalability, a reconfigurable multiring network (RMRN) was proposed and developed [3][4][5]. This scheme is based on the ring topology and was shown to be far more scalable.…”

Section: Introductionmentioning

confidence: 99%

An Efficient Tree-Based Multicasting Algorithm on Wormhole-Routed Star Graph Interconnection Networks Embedded with Hamiltonian Path

Wang

Chu

2005

J Supercomput

View full text Add to dashboard Cite

Multicasting is an important issue for numerous applications in parallel and distributed computing. In multicasting, the same message is delivered from a source node to an arbitrary number of destination nodes. The star graph interconnection network has been recognized as an attractive alternative to the popular hypercube network. In this paper, we propose an efficient and deadlock-free tree-based multi-cast routing scheme for wormhole-routed star graph networks with hamiltonian path. In our proposed routing scheme, the router is with the input-buffer-based asynchronous replication mechanism that requires extra hardware cost. Meanwhile, the router simultaneously sends incoming flits on more than one outgoing channel. We perform simulation experiments with the network latency and the network traffic. Experimental results show that the proposed scheme reduces multicast latency more efficiently than other schemes.

show abstract

“…Speedup figures and frame rates are usually presented. As realized by much research, it is crucial that both the spatial-and temporal-domain decomposition are fully exploited and the eventual parallel algorithm matches the parallel architecture [21]- [23]. Therefore, it is the purpose of this research to investigate the effect of parallelizing computations and communications in the spatial, temporal, and both spatial-temporal domains through the study of frame rate, speedup, and implementation efficiency.…”

Section: Introductionmentioning

confidence: 99%

Spatial and temporal data parallelization of the H.261 video coding algorithm

Yung

Leung

2001

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

In this paper, the parallelization of the H.261 video coding algorithm on the IBM SP2 ® multiprocessor system is described. The effect of parallelizing computations and communications in the spatial, temporal, and both spatial-temporal domains are considered through the study of frame rate, speedup, and implementation efficiency, which are modeled and measured with respect to the number of nodes () and parallel methods used. Four parallel algorithms were developed, of which the first two exploited the spatial parallelism in each frame, and the last two exploited both the temporal and spatial parallelism over a sequence of frames. The two spatial algorithms differ in that one utilizes a single communication master, while the other attempts to distribute communications across three masters. On the other hand, the spatial-temporal algorithms use a pipeline structure for exploiting the temporal parallelism together with either a single master or multiple masters. The best median speedup (frame rate) achieved was close to 15 [15 frames per second (fps)] for 352 240 video on 24 nodes, and 13 (37 fps) for QCIF video, by the spatial algorithm with distributed communications. For 10, the single-master spatial algorithm performs better with efficiency up to 90%, while the multiple-master spatial algorithm is superior for 10, with efficiency up to 70%. The spatial-temporal algorithms achieved average speedup performance, but are most scalable for large .

show abstract

Parallel computer vision on a reconfigurable multiprocessor network

Cited by 31 publications

References 33 publications

Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

Auto-CFD-NOW: A pre-compiler for effectively parallelizing CFD applications on networks of workstations

An Efficient Tree-Based Multicasting Algorithm on Wormhole-Routed Star Graph Interconnection Networks Embedded with Hamiltonian Path

Spatial and temporal data parallelization of the H.261 video coding algorithm

Contact Info

Product

Resources

About