A parallel algorithm for graph matching and its MasPar implementation

Allen, R.; Cinque, Luigi; Tanimoto, Steven L.; Shapiro, Linda G.; Yasuda, D.

doi:10.1109/71.598276

Cited by 20 publications

(8 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Let suppose that the code of Example 1 is executed by 32 threads, pool [thread_idx].begin is equal to 0 for the first thread, and pool [thread_idx].begin is not equal to 0 for the other 31 threads. When the first thread executes the statement "time = TimeArrival [1];", all the other 31 threads remain idle. Therefore, the GPU cores on which these 31 threads are executed remain idle and cannot be used during the execution of the statement "time = TimeArrival [1];".…”

Section: Divergence Related To the Location Of Nodesmentioning

confidence: 99%

“…When the first thread executes the statement "time = TimeArrival [1];", all the other 31 threads remain idle. Therefore, the GPU cores on which these 31 threads are executed remain idle and cannot be used during the execution of the statement "time = TimeArrival [1];".…”

Section: Divergence Related To the Location Of Nodesmentioning

confidence: 99%

“…For this reason, over the last decades, parallel computing has been identified as an attractive way to deal with larger instances of COPs. However, although many contributions have been proposed for parallel B&B methods using massively parallel processors [1], networks or clusters of workstations [2], and SMP machines Dynamic warp formation (DWF) [9] is a hardware mechanism proposed in order to improve the efficiency of SIMD branch execution. Every cycle the thread scheduler recomposes warps from the active threads by grouping those that are executing the same path into the same warp.…”

Section: Introductionmentioning

confidence: 99%

“…For this reason, over the last decades, parallel computing has been identified as an attractive way to deal with larger instances of COPs. However, although many contributions have been proposed for parallel B&B methods using massively parallel processors [1], networks or clusters of workstations [2], and SMP machines 1122 I. CHAKROUN ET AL. [3] to the best of our knowledge, no contribution has been proposed for designing B&B algorithms on graphical processing units (GPUs).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Reducing thread divergence in a GPU‐accelerated branch‐and‐bound algorithm

Chakroun

Mezmaz

Melab

et al. 2012

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYIn this paper, we address the design and implementation of graphical processing unit (GPU)-accelerated branch-and-bound algorithms (B&B) for solving flow-shop scheduling optimization problems (FSP). Such applications are CPU-time consuming and highly irregular. On the other hand, GPUs are massively multithreaded accelerators using the single instruction multiple data model at execution. A major issue that arises when executing on GPU, a B&B applied to FSP is thread or branch divergence. Such divergence is caused by the lower bound function of FSP that contains many irregular loops and conditional instructions. Our challenge is therefore to revisit the design and implementation of B&B applied to FSP dealing with thread divergence. Extensive experiments of the proposed approach have been carried out on wellknown FSP benchmarks using an Nvidia Tesla (C2050 GPU card (http://www.nvidia.com/docs/IO/43395/ NV_DS_Tesla_C2050_C2070_jul10_lores.pdf)). Compared with a CPU-based execution, accelerations up to 77.46 are achieved for large problem instances.

show abstract

Section: Divergence Related To the Location Of Nodesmentioning

confidence: 99%

Section: Divergence Related To the Location Of Nodesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…For this reason, over the last decades, parallel computing has been identified as an attractive way to deal with larger instances of COPs. However, although many contributions have been proposed for parallel B&B methods using massively parallel processors [1], networks or clusters of workstations [2], and SMP machines 1122 I. CHAKROUN ET AL. [3] to the best of our knowledge, no contribution has been proposed for designing B&B algorithms on graphical processing units (GPUs).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Reducing thread divergence in a GPU‐accelerated branch‐and‐bound algorithm

Chakroun

Mezmaz

Melab

et al. 2012

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…The design and implementation of parallel B&B is strongly influenced by the computing platform [1]. Many contributions have been proposed for the design and implementation of parallel B&B methods using massively parallel processors [2], networks or clusters of workstations [3,4] and Shared Memory or SMP machines [5]. The proposed approaches are based on three parallel models presented in [6]: parallel application of the operators on the generated subproblems (Type 1), parallel building and exploration of a B&B tree (Type 2), and parallel (cooperative or independent) building and exploration of several B&B trees (Type 3).…”

Section: Introductionmentioning

confidence: 99%

Graphics processing unit‐accelerated bounding for branch‐and‐bound applied to a permutation problem using data access optimization

Melab

Chakroun

Bendjoudi

2013

Concurrency and Computation

View full text Add to dashboard Cite

International audienceBranch-and-bound (B&B) algorithms are attractive methods for solving to optimality combinatorial opti-mization problems using an implicit enumeration of a dynamically built tree-based search space. Neverthe-less, they are time-consuming when dealing with large problem instances. Therefore, pruning tree nodes (subproblems) is traditionally used as a powerful mechanism to reduce the size of the explored search space. Pruning requires to perform the bounding operation, which consists of applying a lower bound function to the subproblems generated during the exploration process. Preliminary experiments performed on the Flow-Shop scheduling problem (FSP) have shown that the bounding operation consumes over 98% of the execution time of the B&B algorithm. In this paper, we investigate the use of graphics processing unit (GPU) computing as a major complementary way to speed up the search. We revisit the design and implementation of the parallel bounding model on GPU accelerators. The proposed approach enables data access optimiza-tion. Extensive experiments have been carried out on well-known FSP benchmarks using an Nvidia Tesla C2050 GPU card. Compared to a CPU-based single core execution using an Intel Core i7-970 processor without GPU, speedups higher than 100 times faster are achieved for large problem instances. At an equiv-alent peak performance, GPU-accelerated B&B is twice faster than its multi-core counterpart

show abstract

A Robust Neural Network Based Object Recognition System and Its SIMD Implementation

Petrosino

Salvi

1999

Euro-Par’99 Parallel Processing

View full text Add to dashboard Cite

Recognition of objects is a particularly demanding problem, if one considers that each image must be interpreted in milliseconds (usually 30 or 40 frames/second). The problem becomes more difficult if the objects are distorted and/or partially occluded. In this case a sequence of local features are to be extracted, combined in a global shape description and classified as belonging to pre-defined sets of known shapes (reference shapes). In this paper we propose a massively parallel object recognition system, which makes use of the multi polygonal approximation scheme for the extraction of rotation and translation invariant shape features, in connection with artificial neural networks for the parallel classification of the extracted features. The system has been successfully applied for recognizing aircraft shapes in different sizes, orientations, with the addition of noise distortion and occlusion. Timings on the Connection Machine 200 are also reporte

show abstract

A parallel algorithm for graph matching and its MasPar implementation

Cited by 20 publications

References 40 publications

Reducing thread divergence in a GPU‐accelerated branch‐and‐bound algorithm

Reducing thread divergence in a GPU‐accelerated branch‐and‐bound algorithm

Graphics processing unit‐accelerated bounding for branch‐and‐bound applied to a permutation problem using data access optimization

A Robust Neural Network Based Object Recognition System and Its SIMD Implementation

Contact Info

Product

Resources

About