Proceedings of the 2007 ACM/IEEE Conference on Supercomputing 2007
DOI: 10.1145/1362622.1362652
|View full text |Cite
|
Sign up to set email alerts
|

Scaling performance of interior-point method on large-scale chip multiprocessor system

Abstract: In this paper we describe parallelization of interior-point method (IPM) aimed at achieving high scalability on large-scale chipmultiprocessors (CMPs). IPM is an important computational technique used to solve optimization problems in many areas of science, engineering and finance. IPM spends most of its computation time in a few sparse linear algebra kernels. While each of these kernels contains a large amount of parallelism, sparse irregular datasets seen in many optimization problems make parallelism diffic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2008
2008
2012
2012

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…Eleyat and Natvig [16], propose an optimized parallel solver leveraging the special capabilities of a Cell processor. Smelyanskiy et al [17] present an optimization of the interior-point methods for large-scale chip multiprocessor systems. Budiu et al [18] describe an alternative implementation of a distributed branch-and-bound solver.…”
Section: Related Workmentioning
confidence: 99%
“…Eleyat and Natvig [16], propose an optimized parallel solver leveraging the special capabilities of a Cell processor. Smelyanskiy et al [17] present an optimization of the interior-point methods for large-scale chip multiprocessor systems. Budiu et al [18] describe an alternative implementation of a distributed branch-and-bound solver.…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, Algorithm 3.1 typically spends only 10% of the time in the PGS phase and 90% in the subspace minimization. Hence it effectively offloads the parallelization task to a direct linear solver that is known to parallelize well [28]. Specifically, our adaptation of pardiso, the Cholesky solver used for the experiments in this paper, yields more than 65% utilization on a 64-core chip multiprocessor simulator (see also Schenk [25] for a scalability study for a moderate number of processors on a coarsely coupled shared memory system).…”
Section: Final Remarksmentioning
confidence: 99%
“…Smelyanskiy et al (2007) used supernode-based blocking without use of amalgamation. Instead, they show, using a cycle accurate simulator, that the hardware support for low overhead task queues proposed by Kumar et al (2007) can be used to accelerate the scheduling of small tasks.…”
Section: Introductionmentioning
confidence: 99%