Symmetric multiprocessing in Solaris 2.0

Kleiman, Steve; Voll, Jim; Eykholt, Joe; Shivalingiah, A.; Williams, D.; Smith, Mark; Barton, S.; Skinner, G.

doi:10.1109/cmpcon.1992.186706

Cited by 11 publications

(6 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each of our three platforms runs Solaris 2.5.1, a modern, multi-threaded operating system [11]. Though we are presenting a study of architectural characteristics, operating system behavior often dictates usage patterns of the underlying hardware, as shown in [6,16,19].…”

Section: Single Worktationmentioning

confidence: 99%

The architectural costs of streaming I/O: A comparison of workstations, clusters, and SMPs

Arpaci-Dusseau

Arpaci-Dusseau²,

Culler³

et al.

Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture

View full text Add to dashboard Cite

We investigate resource usage while performing streaming I/O by contrasting three architectures, a single workstation, a cluster, and an SMP, under various I/O benchmarks. We derive analytical and empiricallybased models of resource usage during data transfer, examining the I/O bus, memory bus, network, and processor of each system. By investigating each resource in detail, we assess what comprises a wellbalanced system for these workloads.We find that the architectures we study are not well balanced for streaming I/O applications. Across the platforms, the main limitation to attaining peak performance is the CPU, due to lack of data locality. Increasing processor performance (especially with improved block operation performance) will be of great aid for these workloads in the future. For a cluster workstation, the I/O bus is a major system bottleneck, because of the increased load placed on it from network communication. A well-balanced cluster workstation should have copious I/O bus bandwidth, perhaps via multiple I/O busses. The SMP suffers from poor memory-system performance; even when there is true parallelism in the benchmark, contention in the shared-memory system leads to reduced performance. As a result, the clustered workstations provide higher absolute performance for streaming I/O workloads.

show abstract

Section: Single Worktationmentioning

confidence: 99%

The architectural costs of streaming I/O: A comparison of workstations, clusters, and SMPs

Arpaci-Dusseau

Arpaci-Dusseau²,

Culler³

et al.

Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture

View full text Add to dashboard Cite

show abstract

“…STREAMS has been extended to facilitate the development of components on a symmetric multiprocessor platform [Garg90,Kleiman92,Saxena93]. These extensions, which are proprietary, define several levels of parallelism (Figure 3), according to the span of the mutual exclusion section (e.g.…”

Section: Parallelization Of the Demultiplexed Streams Stackmentioning

confidence: 99%

“…• Use locks within STREAMS [Kleiman92,Saxena93]: A thread that wants to perform some work on a given queue must first acquire the associated lock.…”

Section: Parallelization Of the Demultiplexed Streams Stackmentioning

confidence: 99%

Demultiplexed architectures: a solution for efficient STREAMS-based communication stacks

Roca¹,

Braun

Diot

1997

IEEE Network

View full text Add to dashboard Cite

This paper analyzes the efficiency of various high performance implementation techniques for the communication system of UNIX workstations. Using an Open System implies that a certain compatibility level is required from the protocol, user interface, and implementation framework. These constraints limit the opportunities to design a high performance communication system. We have designed an experimental platform around the TCP/IP protocol suite, using the STREAMS environment. A BSD TCP/IP stack and a classic STREAMS based TCP/IP stack serve as reference implementations for performance comparisons. We explain why the efficiency of some high performance implementation techniques we applied to this platform is limited. The impacts of the hardware architecture, of the operating system, and of the communication stack architecture on performances are analyzed. It is shown that the efficiency of data transmission would benefit from more simplicity and more synchronism in the communication environment, from direct data paths between the applications and the device drivers, and from a limited ILP integration.

show abstract

“…Both reasons justify an effort to develop multithreaded software and have enough impact to force programmers to rewrite many existing applications [1][2][3].…”

Section: Introductionmentioning

confidence: 98%