The softwarization of mobile networks enables an efficient use of resources, by dynamically scaling and re-assigning them following variations in demand. Given that the activation of additional servers is not immediate, scaling up resources should anticipate traffic demands to prevent service disruption. At the same time, the activation of more servers than strictly necessary results in a waste of resources, and thus should be avoided. Given the stringent reliability requirements of 5G applications (up to 6 nines) and the fallible nature of servers, finding the right trade-off between efficiency and service disruption is particularly critical.In this paper, we analyze a generic auto-scaling mechanism for communication services, used to de(activate) servers in a cluster, based on occupation thresholds. We model the impact of the activation delay and the finite lifetime of the servers on performance, in terms of power consumption and failure probability. Based on this model, we derive an algorithm to optimally configure the thresholds. Simulation results confirm the accuracy of the model both under synthetic and realistic traffic patterns as well as the effectiveness of the configuration algorithm. We also provide some insights on the best strategy to support an energy-efficient highly-reliable service: deploying a few powerful and reliable machines versus deploying many machines, but less powerful and reliable.