Abstract-Scalability has been studied in several areas of Computer Science and scalability testing and evaluation of contemporary software systems is an active topic. However, most of the times, these activities are still performed in a predominantly ad hoc fashion. There are a few tools to automate this process, but they present several restrictions about what systems can be tested and how to evaluate scalability. In this paper, we introduce a flexible and extensible framework for automated scalability testing of software offered as a service and propose to evaluate the scalability using hypothesis tests. Additionally, we argue that, instead of stating if a system is scalable or not, we should find out how it could scale better.