In this paper, we investigate the availability requirement for the fault management server in high-availability communication systems. According to our study, we find that the availability of the fault management server does not need to be 99.999% in order to guarantee a 99.999% system availability as long as the fail-safe ratio (the probability that the failure of the fault management server will not bring down the system) and the fault coverage ratio (the probability that the failure in the system can be detected and recovered by the fault management server) are sufficiently high. Tradeoffs can be made among the availability of the fault management server, the fail-safe ratio and the fault coverage ratio to optimize system availability. A cost-effective design for the fault management server is proposed in this paper. 1.INTRODUCTION Fault management plays an indispensable role in today's high-availability communication system. Fault management involves techniques for rapidly detecting, isolating and recovering system from faults, either automatically by the fault management software or manually by operators [1]. According to its function coverage, there are two levels of fault management in a communication system, i.e., equipment level and network level. At equipment level, fault management resides on the operational equipment, and detects, isolates and recovers failures in the equipment, e.g., brings up redundant power supply when the primary power supply fails. At the network level, fault management may adopt a server/client architecture with the server entity residing on specific equipment and the clients residing on the managed functional units. The fault management server detects, isolates and recovers failures in the system, e.g., redirects traffic to redundant equipment when the primary one fails. Common intuition indicates that the server providing network-level fault management should be highly available in order to achieve higher system availability, for example, the fault management server should provide at least 5-nine (0.99999) availability in order to achieve 5