Safely and Quickly Deploying New Features with a Staged Rollout Framework Using Sequential Test and Adaptive Experimental Design

Zhao, Zhenyu; Liu, Mandie; Deb, Anirban

doi:10.1109/iccia.2018.00019

Cited by 10 publications

(7 citation statements)

References 18 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…when a regression is detected. Towards this end, sequential tests for difference-in-means of Gaussian random variables have already been widely used for online A/B experiments [11][12][13]27]. However, we argue that performing inference about means is too limited for canary tests, for not all bugs or performance regressions can be captured by differences in the mean alone, as the following example demonstrates.…”

Section: Regression-driven Experimentsmentioning

confidence: 99%

“…Johari et al [11,13] proposed an "always-valid" sequential inference framework for differences in the means of Gaussian random variables using the mSPRT to provide confidence sequences and sequential 𝑝-values. In addition to being used in commercial A/B testing software, [27] use this framework for managing the automated rollout of new software features, formulating performance regressions as differences in the mean.…”

Section: Related Workmentioning

confidence: 99%

“…Many of the earlier works formulate performance regressions as differences in the mean between arms A and B [27], with a performance regression occurring if the mean shifts in the undesirable direction of that metric. We argue from a decision-theoretic perspective that comparing means alone is insufficient to define a performance regression.…”

Section: Beyond Inference On the Meanmentioning

confidence: 99%

See 2 more Smart Citations

Rapid Regression Detection in Software Deployments through Sequential Testing

Lindon,

Sanden,

Shirikian

2022

Preprint

View full text Add to dashboard Cite

The practice of continuous deployment has enabled companies to reduce time-to-market by increasing the rate at which software can be deployed. However, deploying more frequently bears the risk that occasionally defective changes are released. For Internet companies, this has the potential to degrade the user experience and increase user abandonment. Therefore, quality control gates are an important component of the software delivery process. These are used to build confidence in the reliability of a release or change. Towards this end, a common approach is to perform a canary test to evaluate new software under production workloads. Detecting defects as early as possible is necessary to reduce exposure and to provide immediate feedback to the developer.We present a statistical framework for rapidly detecting regressions in software deployments. Our approach is based on sequential tests of stochastic order and of equality in distribution. This enables canary tests to be continuously monitored, permitting regressions to be rapidly detected while strictly controlling the false detection probability throughout. The utility of this approach is demonstrated based on two case studies at Netflix.

show abstract

Section: Regression-driven Experimentsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Beyond Inference On the Meanmentioning

confidence: 99%

See 1 more Smart Citation

Rapid Regression Detection in Software Deployments through Sequential Testing

Lindon,

Sanden,

Shirikian

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Besides, testing immature features could cause outages or critical harmness on key business metrics. The monitoring on key metrics [20,22,24] can help alarm experimenter at the very early hours and ensure a safe data collection, which can be another interesting application to further explore.…”

Section: Summary and Future Workmentioning

confidence: 99%

“…Both are targeted to minimize false positives in their processes while attaining good recall of the signals. The methodologies involve a novel Population Stability Index (PSI [21]) based test and a sequential probability ratio test (SPRT [4,5,14,18,19,24]). To our knowledge, we are the first to automate the methods (in early 2019) in a large scale experimentation platform to continuously monitor experiment quality.…”

Section: Introductionmentioning

confidence: 99%

Ensure A/B Test Quality at Scale with Automated Randomization Validation and Sample Ratio Mismatch Detection

Nie,

Zhang,

et al. 2022

Preprint

View full text Add to dashboard Cite

eBay's experimentation platform runs hundreds of A/B tests on any given day. The platform integrates with the tracking infrastructure and customer experience servers, provides the sampling service for experiments, and has the responsibility to monitor the progress of each A/B test. There are many challenges especially when it is required to ensure experiment quality at the large scale. We discuss two automated test quality monitoring processes and methodologies, namely randomization validation using population stability index (PSI) and sample ratio mismatch (a.k.a. sample delta) detection using sequential analysis. The automated processes assist the experimentation platform to run high quality and trustworthy tests not only effectively on a large scale, but also efficiently by minimizing false positive monitoring alarms to experimenters. CCS Concepts: • Mathematics of computing → Hypothesis testing and confidence interval computation; • General and reference → Experimentation.

show abstract