Bianca Schroeder scite author profile

Bianca Schroeder

5Publications

1,079Citation Statements Received

77Citation Statements Given

How they've been cited

1,603

1,062

How they cite others

Affiliations

University of Toronto, Carnegie Mellon University, University of Wisconsin–Madison

Publications

Order By: Most citations

A large-scale study of failures in high-performance computing systems

View full text Add to dashboard Cite

Designing highly dependable systems requires a good understanding of failure characteristics. Unfortunately, little raw data on failures in large IT installations is publicly available. This paper analyzes failure data recently made publicy available by one of the largest high-performance computing sites. The data has been collected over the past 9 years at Los Alamos National Laboratory and includes 23000 failures recorded on more than 20 different systems, mostly large clusters of SMP and NUMA nodes. We study the statistics of the data, including the root cause of failures, the mean time between failures, and the mean time to repair. We find for example that average failure rates differ wildly across systems, ranging from 20-1000 failures per year, and that time between failures is modeled well by a Weibull distribution with decreasing hazard rate. From one system to another, mean repair time varies from less than an hour to more than a day, and repair times are well modeled by a lognormal distribution.

show abstract

A Large-Scale Study of Failures in High-Performance Computing Systems

Schroeder

Gibson

2010

IEEE Trans. Dependable and Secure Comput.

492

222

View full text Add to dashboard Cite

Understanding failures in petascale computers

Schroeder

Gibson

2007

J. Phys.: Conf. Ser.

273

195

View full text Add to dashboard Cite

Size-based scheduling to improve web performance

Schroeder

Bansal

Agrawal

2003

ACM Trans. Comput. Syst.

255

165

View full text Add to dashboard Cite

Is it possible to reduce the expected response time of every request at a web server, simply by changing the order in which we schedule the requests? That is the question we ask in this paper.This paper proposes a method for improving the performance of web servers servicing static HTTP requests. The idea is to give preference to requests for small files or requests with short remaining file size, in accordance with the SRPT (Shortest Remaining Processing Time) scheduling policy.The implementation is at the kernel level and involves controlling the order in which socket buffers are drained into the network. Experiments are executed both in a LAN and a WAN environment. We use the Linux operating system and the Apache and Flash web servers.Results indicate that SRPT-based scheduling of connections yields significant reductions in delay at the web server. These result in a substantial reduction in mean response time and mean slowdown for both the LAN and WAN environments. Significantly, and counter to intuition, the requests for large files are only negligibly penalized or not at all penalized as a result of SRPT-based scheduling.

show abstract

Web servers under overload: How scheduling can help

Schroeder¹,

Harchol-Balter²

2003

View full text Add to dashboard Cite

Most well-managed web servers perform well most of the time. Occasionally, however, every popular web server experiences transient overload. An overloaded web server typically displays signs of its affliction within a few seconds. Work enters the web server at a greater rate than the web server can complete it, causing the number of connections at the server to build up. This implies large delays for clients accessing the server. This paper provides a systematic performance study of exactly what happens when a web server is run under transient overload, both from the perspective of the server and from the perspective of the client. Second, this paper proposes and evaluates a particular kernel-level solution for improving the performance of web servers under overload. The solution is based on SRPT connection scheduling. We show that SRPT-based scheduling improves overload performance across a variety of client and server-oriented metrics.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.