1998
DOI: 10.1109/12.675711
|View full text |Cite
|
Sign up to set email alerts
|

Per-node multithreading and remote latency

Abstract: This paper evaluates the use of per-node multi-threading to hide remote memory and synchronization latencies in software DSMs. As with hardware systems, multi-threading in software systems can be used to reduce the costs of remote requests by running other threads when the current thread blocks. We added multi-threading to the CVM software DSM and evaluated its impact on the performance of a suite of common shared memory programs. Multi-threading resulted in speed improvements of at least 20% in two of the app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2000
2000
2006
2006

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 17 publications
0
8
0
Order By: Relevance
“…In order to achieve high CPU resource utilization, multithreading [6,17,22] is generally supported in most modern software DSM systems. It allows their users to create multiple threads on each processor to overlap the CPU computation time and network communication time.…”
Section: Performance Model Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…In order to achieve high CPU resource utilization, multithreading [6,17,22] is generally supported in most modern software DSM systems. It allows their users to create multiple threads on each processor to overlap the CPU computation time and network communication time.…”
Section: Performance Model Analysismentioning
confidence: 99%
“…However, network speed and the amount of network traffic have great effect on the performance of programs in a DSM system. Consequently, most of the researches in the past focused on the design of performance-enhancing technologies such as relaxed consistency protocols [2,7,8,16,23], multithreading [6,17,22], load balance [24][25][26], communication minimization [19], etc.…”
Section: Introductionmentioning
confidence: 99%
“…Since the amount of computation for the application program is fixed, it is important to minimize the delay due to page faults, lock waiting, and barrier waiting to obtain proper performance from the DSM system. An approach alleviating this problem has been proposed in [18] and [19]. Their research is based on using multi-threading so that another thread can execute while a thread is blocked.…”
Section: Introductionmentioning
confidence: 98%
“…Figure 2 shows application performance as the number of threads per processor is increased from 1 to 8 threads per node on conf-hom. The slight increases in performance at small numbers of threads are due to latency hiding [9,10]. Note that this increase would be much larger if we took startup costs into account, as in [10], but we consider only steady-state execution here.…”
Section: Load Balancingmentioning
confidence: 99%
“…Note that this increase would be much larger if we took startup costs into account, as in [10], but we consider only steady-state execution here. Additionally, we are not currently restructuring applications to avoid the problems discussed by Thitikamol [9], such as per-thread reductions, and duplication of thread state. These numbers impose a limit on the number of threads that we can use before performance degrades.…”
Section: Load Balancingmentioning
confidence: 99%