Vijay S. Pai scite author profile

We consider cluster-based network servers in which a front-end directs incoming requests to one of a number of back-ends. Speci cally, w e consider content-based request distribution: the front-end uses the content r equested, in addition to information about the load on the back-end nodes, to choose which b a c k-end will handle this request. Content-based request distribution can improve locality in the back-ends' main memory caches, increase secondary storage scalability b y partitioning the server's database, and provide the ability to employ back-end nodes that are specialized for certain types of requests.As a speci c policy for content-based request distribution, we i n troduce a simple, practical strategy for locality-aware request distribution (LARD). With LARD, the front-end distributes incoming requests in a manner that achieves high locality in the back-ends' main memory caches as well as load balancing. Locality is increased by dynamically subdividing the server's working set over the back-ends. Trace-based simulation results and measurements on a prototype implementation demonstrate substantial performance improvements over state-of-the-art approaches that use only load information to distribute requests. On workloads with working sets that do not t in a single server node's main memory cache, the achieved throughput exceeds that of the state-of-the-art approach b y a factor of two to four.With content-based distribution, incoming requests must be handed o to a back-end in a manner transparent to the client, after the front-end has inspected the content of the request. To this end, we i n troduce an e cient TCP hando protocol that can hand o an established TCP connection in a client-transparent manner.To appear in the Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, CA, Oct 1998. IntroductionNetwork servers based on clusters of commodity w orkstations or PCs connected by high-speed LANs combine cutting-edge performance and low cost. A cluster-based network server consists of a front-end, responsible for request distribution, and a number of back-end nodes, responsible for request processing. The use of a front-end makes the distributed nature of the server transparent to the clients. In most current cluster servers the frontend distributes requests to back-end nodes without regard to the type of service or the content requested. That is, all back-end nodes are considered equally capable of serving a given request and the only factor guiding the request distribution is the current load of the backend nodes.With content-based r equest distribution, the frontend takes into account both the service/content r equested and the current load on the back-end nodes when deciding which back-end node should serve a given request. The potential advantages of content-based request distribution are: (1) increased performance due to improved hit rates in the back-end's main memory caches, (2) increased secon...

show abstract

Rsim: simulating shared-memory multiprocessors with ILP processors

Hughes

Pai

Ranganathan³

et al. 2002

Computer

147

View full text Add to dashboard Cite

Given the complexity and associated cost of building modern computer systems, simulation is often the only practical way to test architectural ideas and assess system performance. Simulators provide the flexibility to modify and analyze the impact of various architectural parameters and components as well as enable more detailed statistics collection than real hardware. These benefits make simulation useful even for projects that will eventually implement hardware.Prior to 1994, most academic shared-memory multiprocessor studies largely ignored the processor model, focusing instead on the memory system as the most important performance bottleneck. These studies assumed a simplistic processor model based on in-order issue, blocking reads, and no speculation. However, the early 1990s saw several announcements of commercial shared-memory systems using processors that aggressively exploited instruction-level parallelism (ILP) such as the MIPS R10000, Hewlett-Packard PA8000, and Intel Pentium Pro. These processors had the potential to reduce memory read stalls by overlapping read latency with other operations, possibly changing the nature of performance bottlenecks in the system.Because no shared-memory ILP systems or simulators were available at that time, we designed Rsim-originally an acronym for Rice simulator for ILP multiprocessors-to study such systems. Two major questions guided our efforts:• Does processor microarchitecture influence shared-memory performance and design to the extent that it justifies its detailed modeling and associated performance costs in a shared-memory simulator? • With simple processor-based simulators already taking a long time to run, could we build such a detailed simulator efficiently enough to perform substantive architecture studies in reasonable time?Our experience with Rsim demonstrates that modeling ILP features is important even in sharedmemory multiprocessor systems. In particular, current simple processor-based approximations cannot model significant performance effects for applications exhibiting parallel read misses. Further, recent shared-memory designs-for example, aggressive implementations of sequential consistency 1 -directly use the aggressive ILP-enhancing features of modern processors that simple processor-based simulators do not model.We have also demonstrated that significant multiprocessor studies can be performed with the current speed of ILP simulators. However, improving their speed is crucial for future workloads. Our Rsim is a publicly available architecture simulator for shared-memory systems built from processors that aggressively exploit instruction-level parallelism. Modeling ILP features in a multiprocessor is particularly important for applications that exhibit parallelism among read misses.

show abstract

Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

1997

View full text Add to dashboard Cite

Accelerating multicore reuse distance analysis with sampling and parallelization

Schuff

Kulkarni

Pai

2010

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Vijay S. Pai

Locality-aware request distribution in cluster-based network servers

Locality-aware request distribution in cluster-based network servers

Rsim: simulating shared-memory multiprocessors with ILP processors

Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Accelerating multicore reuse distance analysis with sampling and parallelization

Contact Info

Product

Resources

About