Although Microsoft Windows is being deployed in mission-critical applications, little quantitative data has been published about its robustness. We present the results of executing over two million Ballista-generated exception handling tests across 237 functions and system calls involving six Windows variants, as well as similar tests conducted on the Linux operating system. Windows 95, Windows 98, and Windows CE were found to be vulnerable to complete system crashes caused by very simple C programs for several different functions. No system crashes were observed on Windows NT, Windows 2000, and Linux. Linux was significantly more graceful at handling exceptions from system calls in a program-recoverable manner than Windows NT and Windows 2000, but those Windows variants were more robust than Linux (with glibc) at handling C library exceptions. While the choice of operating systems cannot be made solely on the basis of one set of tests, it is hoped that such results will form a starting point for comparing dependability across heterogeneous platforms.
When the Ballista project started in 1996 as a 3-year DARPA-funded research project, the original goal was to create a Web-based testing service to identify robustness faults in software running on client computers via the Internet. Previous experience suggested that such tests would find interesting problems but it was unclear how to make robustness testing scalable to large interfaces. A major challenge was finding a way to test something as large and complex as an operating system (OS) application programming interface (API) without having to resort to labor-intensive manual test construction for each API function to be tested. In the end, a scalable approach was found and was successfully applied not only to operating system APIs but several other nonoperating system APIs as well.The robustness testing methodology Ballista is based upon using combinational tests of valid and invalid parameter values for system calls and functions. In each test case, a single software module under test (or MuT) is called once. An MuT can be a stand-alone program, function, system call, method, or any other software that can be invoked with a procedure call. (The term MuT is similar in meaning to the more recent term dependability benchmark target [DBench 2004].) In most cases, MuTs are calling points into an API. Each invocation of a test case determines whether a particular MuT provides robust exception handling when called with a particular set of parameter values. These parameter values, or test values, are drawn from a pool of normal and exceptional values based on the data type of each argument passed to the MuT. Each test value has an associated test object, which holds code to create and clean up the related system state for a test (for example, a file handle test object has code to create a file, return a file handle test value, and subsequently delete the file after the test case has been executed). A test case, therefore, consists of the name of the MuT and a tuple of test values that are passed as parameters Dependability Benchmarking for Computer Systems. Edited by Karama Kanoun and Lisa Spainhower 201
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.