Latency, occupancy, and bandwidth in dsm multiprocessors: a performance evaluation

Chaudhuri, Mainak; Heinrich, Mark; Holt, Chris; Singh, Jaswinder Pal; Rothberg, Edward; Hennessy, John L.

doi:10.1109/tc.2003.1214336

Cited by 16 publications

(11 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The immediate successor of shared memory from multiprocessor world would have a page as the unit for data transfer. Granularity describes the size of the minimum unit of shared memory [13,14,15]. In the said DSM framework it is the page-size.…”

Section: Granularitymentioning

confidence: 99%

Memory Management Technique for Paging on Distributed Shared Memory Framework

Gopal¹,

Beg²,

Kumar³

2010

IJCSIT

View full text Add to dashboard Cite

Distributed Shared Memory (DSM) System has become popular paradigm in distributed system. As DSM system involves moving of data from on node to another node which is in typical network, so performance is the important criteria of designing DSM system. A DSM system can be designed as paged-based, shared variable based and object based. In paged-based DSM system the unit of data sharing is the memory page. In this paper we describe the memory management technique for paging of DSM framework. Implementing DSM framework with paging scheme leads to false sharing and high cost associated with virtual memory operation. The paper discusses the effect of granularity and finds the solution of false sharing. The paper also analysis the different overheads of DSM framework with respect to the page-size and virtual memory operation.

show abstract

Section: Granularitymentioning

confidence: 99%

Memory Management Technique for Paging on Distributed Shared Memory Framework

Gopal¹,

Beg²,

Kumar³

2010

IJCSIT

View full text Add to dashboard Cite

show abstract

“…If we continue to assume that network latency is the primary performance determinant, the time complexity of the release stage is O(1), because the N invalidation messages and subsequent N reload requests can be pipelined. However, researchers have reported that memory controller (MMC) occupancy has a greater impact on barrier performance than network latency for medium-sized DSM multiprocessors [6]. In other words, the assumption that coherence messages can be sent from or processed by a particular memory controller in negligible time does not hold.…”

Section: Time Complexity Analysismentioning

confidence: 99%

Fast synchronization on shared-memory multiprocessors: An architectural approach

Fang

Zhang

Carter

et al. 2005

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

“…We accurately model the latency and cache effects of TLB misses. On two different occasions our processor model has been validated against real hardware [2], [8].…”

Section: Simulation Environmentmentioning

confidence: 99%

Exploring virtual network selection algorithms in DSM cache coherence protocols

Chaudhuri

Heinrich

2004

IEEE Trans. Parallel Distrib. Syst.

Self Cite

View full text Add to dashboard Cite

Abstract-Distributed shared memory (DSM) multiprocessors typically require disjoint networks for deadlock-free execution of cache coherence protocols. This is normally achieved by implementing virtual networks with the help of virtual channels or virtual lanes multiplexed on a single physical network. To keep the coherence protocol simple, messages are usually assigned to virtual lanes in a predefined static manner based on a cycle-free lane assignment dependence graph. However, this static split of virtual networks (such as request and reply networks) may lead to underutilization of certain virtual networks while saturating the other networks. In this paper, we explore different static and dynamic schemes to select the virtual lanes for outgoing messages and mix the load among them without restricting any particular type of message to be carried only by a particular virtual network. We achieve this by exposing the selection algorithms to the coherence protocol itself, so that it can inject messages into selected virtual lanes based on some local information, and still enjoy deadlock-freedom. Our execution-driven simulation on five applications from the SPLASH-2 suite shows that as the system scales, the virtual network selection algorithms play an important role. For 128-node systems, our dynamic selection algorithm speeds up parallel execution by as much as 22 percent over an optimized baseline system running a modified SGI Origin 2000 protocol. We also explore how network latency, the number of message buffers per virtual lane, and the depth of network interface output queues affect the relative performance of various virtual lane selection algorithms.

show abstract

Latency, occupancy, and bandwidth in dsm multiprocessors: a performance evaluation

Cited by 16 publications

References 34 publications

Memory Management Technique for Paging on Distributed Shared Memory Framework

Memory Management Technique for Paging on Distributed Shared Memory Framework

Fast synchronization on shared-memory multiprocessors: An architectural approach

Exploring virtual network selection algorithms in DSM cache coherence protocols

Contact Info

Product

Resources

About