1996
DOI: 10.1145/232974.232988
|View full text |Cite
|
Sign up to set email alerts
|

Application and architectural bottlenecks in large scale distributed shared memory machines

Abstract: Many of the programming challenges encountered in small to moderate-scale hardware cache-coherent shared memory machines have been extensively studied. While work remains to be done, the basic techniques needed to efficiently program such machines have been well explored. Recently, a number of researchers have presented architectural techniques for scaling a cache coherent shared address space to much larger processor counts. In this paper, we examine the extent to which applications can achieve reasonable per… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0
1

Year Published

1998
1998
2000
2000

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 12 publications
0
7
0
1
Order By: Relevance
“…Although one could argue that the overhead of managing parallelism is a problem of the shared memory programming paradigm in general, it has been shown that programs parallelized for shared memory architectures can achieve satisfactory scaling up to a few hundreds of processors [3,4]. This is possible with reasonable scaling of the problem size to increase the granularity of threads and reduce the frequency of synchronization.…”
Section: Openmp and Data Distributionmentioning
confidence: 99%
See 2 more Smart Citations
“…Although one could argue that the overhead of managing parallelism is a problem of the shared memory programming paradigm in general, it has been shown that programs parallelized for shared memory architectures can achieve satisfactory scaling up to a few hundreds of processors [3,4]. This is possible with reasonable scaling of the problem size to increase the granularity of threads and reduce the frequency of synchronization.…”
Section: Openmp and Data Distributionmentioning
confidence: 99%
“…The runtime system records the memory reference trace of the parallel program after the execution of the first iteration. This trace indicates accurately which processor accesses each page more frequently, while the structure of the program ensures that the same reference trace will be repeated throughout the execution of the program, unless the operating system intervenes and preempts or migrates threads 3 . The trace of the first iteration can be used to migrate each page to the node that will minimize the maximum latency due to remote memory accesses to this page, by applying a competitive page migration criterion after the execution of the first iteration.…”
Section: Emulating Data Distributionmentioning
confidence: 99%
See 1 more Smart Citation
“…Dabei sind sowohl Sockets als auch RMIüberaus komplex, bringen erhebliche zusätzliche Implementierungsarbeit mit sich und erschweren das Verständnis sowie die Wartung der Programme. • Obwohl der Quellcode von MPI-Programmen mit statisch festgelegten Kommunikationsinstruktionen und Thread-Verteilung oft flexibel auf den verschiedenen Architekturen einsetzbar ist, verursacht Portieren häufig erhebliche Leistungseinbußen [5,8]. Darüberhinaus ist es schwierig, Programme mit expliziter Kommunikation an eine sichändernde Anzahl von Knoten, unterschiedliche Architekturen oder verschiedene Netzwerktopologien anzupassen.…”
Section: Javaparty -Paralleles Und Verteiltes Programmieren In Javaunclassified
“…Congestion can be a source of performance degradation in multicomputer networks [3,9]. Congestion arises when the nodes collectively generate traffic which exceeds the network total capacity.…”
Section: Introductionmentioning
confidence: 99%