Data-driven and continuous development and deployment of modern web applications depend critically on registering changes as fast as possible, paving the way for short innovation cycles. A/B testing is a popular tool for comparing the performance of different variants. Despite the widespread use of A/B tests, there is little research on how to assert the validity of such tests. Even small changes in the application's user base, hard-or software stack not related to the variants under test can transform on possibly hidden paths into significant disturbances of the overall evaluation criterion (OEC) of an A/B test and, hence, invalidate such a test. Therefore, the highly dynamic server and client run-time environments of modern web applications make it difficult to assert correctly the validity of an A/B test. We propose the concept of test context to capture data relevant for the chosen OEC. We use pre-test data for dynamic base-lining of the target space of the system under test and to increase the statistical power. During an A/B experiment, the contexts of each variant are compared to the pre-test context to ensure the validity of the test. We have implemented this method using a generic parameterfree statistical test based on the bootstrap method focussing on frontend performance metrics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.