Controlled experiments on the web: survey and practical guide

Kohavi, Ron; Longbotham, Roger; Sommerfield, Dan; Henne, Randal M.

doi:10.1007/s10618-008-0114-1

Cited by 575 publications

(427 citation statements)

References 8 publications

Supporting

Mentioning

422

Contrasting

Unclassified

Order By: Relevance

“…We have previously made the case that in the online world, agility and continuous availability of users makes MVTs less appealing [18]. Researchers at Google made similar observations [3].…”

Section: Multivariate Testsmentioning

confidence: 81%

“…Multiple papers and books have been written on how to run an online controlled experiment [18; 7; 19; 20] and we will not address that here; we follow the terminology of Controlled experiments on the web: survey and practical guide [18]. We build upon that work and share how to scale experimentation, i.e., how to run many experiments to accelerate innovation in product development.…”

Section: Related Work and Contributionsmentioning

confidence: 99%

“…Sessions per user, or repeat visits, is a much better factor in the OEC, and one that we use at Bing. Thinking of the drivers of lifetime value can lead to a strategically powerful OEC [18]. We cannot overemphasize the importance of coming up with a good OEC that the organization can align behind, but for this paper we will assume this has been done.…”

Section: Tenetmentioning

confidence: 99%

“…As a request is received from a browser, Bing's frontend servers assign each request to multiple flights running on a set of number lines. To ensure the assignment is consistent, a pseudo random hash of an anonymous user id is used [18]. The assignment happens as soon as the request is received and the frontend then passes each request's flight assignments as part of the requests sent to lower layers of the system.…”

Section: Architecturementioning

confidence: 99%

See 3 more Smart Citations

Online controlled experiments at large scale

Kohavi

Deng

Frasca

et al. 2013

Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Self Cite

373

237

View full text Add to dashboard Cite

Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At Microsoft's Bing, the use of controlled experiments has grown exponentially over time, with over 200 concurrent experiments now running on any given day. Running experiments at large scale requires addressing multiple challenges in three areas: cultural/organizational, engineering, and trustworthiness. On the cultural and organizational front, the larger organization needs to learn the reasons for running controlled experiments and the tradeoffs between controlled experiments and other methods of evaluating ideas. We discuss why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits. On the engineering side, we architected a highly scalable system, able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users. Classical testing and debugging techniques no longer apply when there are billions of live variants of the site, so alerts are used to identify issues rather than relying on heavy upfront testing. On the trustworthiness front, we have a high occurrence of false positives that we address, and we alert experimenters to statistical interactions between experiments. The Bing Experimentation System is credited with having accelerated innovation and increased annual revenues by hundreds of millions of dollars, by allowing us to find and focus on key ideas evaluated through thousands of controlled experiments. A 1% improvement to revenue equals more than $10M annually in the US, yet many ideas impact key metrics by 1% and are not well estimated a-priori. The system has also identified many negative features that we avoided deploying, despite key stakeholders' early excitement, saving us similar large amounts.

show abstract

Section: Multivariate Testsmentioning

confidence: 81%

Section: Related Work and Contributionsmentioning

confidence: 99%

Section: Tenetmentioning

confidence: 99%

Section: Architecturementioning

confidence: 99%

See 2 more Smart Citations

Online controlled experiments at large scale

Kohavi

Deng

Frasca

et al. 2013

Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Self Cite

373

237

View full text Add to dashboard Cite

show abstract

“…These factors have significantly contributed to the rapid adoption of online evaluation techniques in these settings. In industry, online evaluation approaches such as AB tests (c.f., Section 2.4) and interleaved comparisons (Section 2.6) are now the state of the art for evaluating system effectiveness [Kohavi et al, 2009, Radlinski and Craswell, 2010, Bendersky et al, 2014.…”

Section: Motivation and Usesmentioning

confidence: 99%

Online Evaluation for Information Retrieval

Hofmann

Radlinski

2016

FNT in Information Retrieval

View full text Add to dashboard Cite

USPSTF Colorectal Cancer Screening Recommendation and Uptake for Individuals Aged 45 to 49 Years

Siddique,

Wang,

Yasin

et al. 2024

JAMA Netw Open

View full text Add to dashboard Cite

ImportanceIn May 2021, the US Preventive Services Task Force (USPSTF) issued a grade B recommendation encouraging colorectal cancer (CRC) screening among average-risk individuals aged 45 to 49 years. The patterns of screening uptake and possible socioeconomic disparities in screening in this age group remain unknown.ObjectiveTo evaluate changes in CRC screening uptake among average-risk individuals aged 45 to 49 years after the USPSTF recommendation was issued in 2021.Design, Setting, and ParticipantsThis retrospective cohort study used deidentified claims data from commercially insured Blue Cross Blue Shield beneficiaries aged 45 to 49 years across the US between January 1, 2017, and December 31, 2022.ExposurePublication of the May 2021 USPSTF CRC screening recommendation for adults aged 45 to 49 years.Main Outcomes and MeasuresAbsolute and relative changes in screening uptake were compared between a 20-month period preceding (May 1, 2018, to December 31, 2019) and a 20-month period following (May 1, 2021, to December, 31, 2022) the USPSTF recommendation. Interrupted time-series analysis and autoregressive integrated moving average models were used to evaluate changes in screening rates, adjusting for temporal autocorrelation and seasonality.ResultsIn this cohort study of 10 221 114 distinct beneficiaries aged 45 to 49 years (mean [SD] age, 47.04 [1.41] years; 51.04% female), bimonthly mean (SD) numbers of average-risk beneficiaries were 3 213 935 (31 508) and 2 923 327 (105 716) in the prerecommendation and postrecommendation periods, respectively. Mean (SD) screening uptake increased from 0.50% (0.02%) to 1.51% (0.59%) between the 2 periods (P &lt; .001), representing an absolute change of 1.01 percentage points (95% CI, 0.62-1.40 percentage points) but no significant relative change (202.51%; 95% CI, −30.59% to 436.87%). Compared with average-risk beneficiaries residing in areas with the lowest socioeconomic status (SES), those residing in areas with the highest SES experienced the largest absolute change in screening (1.25 [95% CI, 0.77-1.74] percentage points vs 0.75 [95% CI, 0.47-1.02] percentage points), but relative changes were not significant (214.01% [95% CI, −30.91% to 461.15%] vs 167.73% [95% CI, −16.30% to 352.62%]). After the recommendation was issued, the screening uptake rate also increased fastest among average-risk beneficiaries residing in the areas with highest SES (0.24 [95% CI, 0.23-0.25] percentage points every 2 months) and metropolitan areas (0.20 [95% CI, 0.19-0.21] percentage points every 2 months).Conclusions and RelevanceThis study found that among privately insured beneficiaries aged 45 to 49 years, CRC screening uptake increased after the USPSTF recommendation, with potential disparities based on SES and locality.

show abstract

Controlled experiments on the web: survey and practical guide

Cited by 575 publications

References 8 publications

Online controlled experiments at large scale

Online controlled experiments at large scale

Online Evaluation for Information Retrieval

USPSTF Colorectal Cancer Screening Recommendation and Uptake for Individuals Aged 45 to 49 Years

Contact Info

Product

Resources

About