2012
DOI: 10.1016/j.procs.2012.04.012
|View full text |Cite
|
Sign up to set email alerts
|

Multi-GPU Implementation of LU Factorization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 18 publications
0
6
0
Order By: Relevance
“…Algorithm 2 shows how the decision about compressing each message sent is calculated. This algorithm must be included in the application to be executed (see 4 in Figure 1). Initially, values stored in Compression Heuristics File are loaded in internal tables.…”
Section: Basic Architecture Of the Proposed Frameworkmentioning
confidence: 99%
See 1 more Smart Citation
“…Algorithm 2 shows how the decision about compressing each message sent is calculated. This algorithm must be included in the application to be executed (see 4 in Figure 1). Initially, values stored in Compression Heuristics File are loaded in internal tables.…”
Section: Basic Architecture Of the Proposed Frameworkmentioning
confidence: 99%
“…The current trend is to use multicore clusters in order to increase the computation capability, thus allowing an increase in the number of processes per application. Examples of these applications can be found in many fields of computational science like MRI scan data [1], molecular dynamics [2], simulations [3] and mathematics [4].…”
Section: Introductionmentioning
confidence: 99%
“…Mapping the LU algorithm over the graphics processor core is not an easy task to do since this process depends on massive memory references which will not fit in the GPU's core memory due to its relatively small size introducing unnecessary delays in the operation. Another hybrid organization connecting 48 AMD CPUs and 4 Fermi GPUs was used in [5] and another by E. Agullo in [6] where Nvidia tesla GPUs and Fermi based GPUs were used to test their algorithm. [7].…”
Section: B Gpu Based Solutionmentioning
confidence: 99%
“…In addition to scalability issues, general purpose architectures such as multicores and many cores have inefficiencies which deviate the algorithm performances largely from the peak performances of the hardware. Massively parallel GPU's [4], [5] have this same problem in a much greater amount because of their architecture. Application Specific Instruction set Processors (ASIP) are used to implement an optimized architecture able to serve a group of applications from the same domain (i.e.…”
Section: Introductionmentioning
confidence: 99%
“…We focus on the LU factorization because of the constraints related to the synchronization of the processes that are involved during the panel factorization. For this case, the load balancing problem has been well studied by several authors [20,19,15,12,18,14]. For most implementations, the main idea is to determine empirically the amount of work to assign to the different computational units, or perform some necessary adjustments depending on the problem size in order to keep CPUs busy.…”
Section: Introductionmentioning
confidence: 99%