Engineering for a science-centric experimentation platform

Diamantopoulos, Nikos; Wong, Janet S.S.; Mattos, David Issa; Gerostathopoulos, Ilias; Wardrop, Matthew; Mao, Tobias; McFarland, Colin

doi:10.1145/3377813.3381349

Cited by 6 publications

(5 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It allows us to understand whether changing a ranking algorithm improves search results, sending a notification increases app visits, or a bug fix that should do nothing has any surprising results. When possible, A/B testing is the gold standard for understanding treatment effects, and therefore companies are increasingly using A/B tests to make decisions (Yin and Hong, 2019;Diamantopoulos et al, 2020;Karrer et al, 2021). LinkedIn currently runs hundreds of A/B tests per day, analyzing the impact of both major and minor adjustments, and always attempting to make data-driven decisions built on insights from experiments (Xu et al, 2015).…”

Section: Introductionmentioning

confidence: 99%

Representation-Aware Experimentation: Group Inequality Analysis for A/B Testing and Alerting

Friedberg¹,

Ambler²,

Saint-Jacques³

2022

Preprint

View full text Add to dashboard Cite

As companies adopt increasingly experimentation-driven cultures, it is crucial to develop methods for understanding any potential unintended consequences of those experiments. We might have specific questions about those consequences (did a change increase or decrease gender representation equality among content creators?); we might also wonder whether if we have not yet considered the right question (that is, we don't know what we don't know). Hence we address the problem of unintended consequences in experimentation from two perspectives: namely, pre-specified vs. data-driven selection, of dimensions of interest. For a specified dimension, we introduce a statistic to measure deviation from equal representation (DER statistic), give its asymptotic distribution, and evaluate finite-sample performance. We explain how to use this statistic to search across large-scale experimentation systems to alert us to any extreme unintended consequences on group representation. We complement this methodology by discussing a search for heterogeneous treatment effects along a set of dimensions with causal trees, modified slightly for practicalities in our ecosystem, and used here as a way to dive deeper into experiments flagged by the DER statistic alerts. We introduce a method for simulating data that closely mimics observed data at LinkedIn, and evaluate the performance of DER statistics in simulations. Last, we give a case study from LinkedIn, and show how these methodologies empowered us to discover surprising and important insights about group representation. Code for replication is available in an appendix.

show abstract

Section: Introductionmentioning

confidence: 99%

Representation-Aware Experimentation: Group Inequality Analysis for A/B Testing and Alerting

Friedberg¹,

Ambler²,

Saint-Jacques³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Controlled experiments, also known as "A/B testing," continue to serve as the cornerstone for making strategic decisions in business, including new product launches, marketing campaigns, and algorithm updates (Bakshy et al 2014, Kohavi et al 2020, Diamantopoulos et al 2020, Bojinov and Gupta 2022, Koning et al 2022. Through the random assignment of treatment or control groups, A/B testing facilitates the evaluation of causal, rather than merely correlational, impacts of a product intervention on business outcomes.…”

Section: Introductionmentioning

confidence: 99%

Characterizing Interference Heterogeneity and Improving Estimation for Experiments in Networks

Yuan¹,

Altenburger

2022

SSRN Journal

View full text Add to dashboard Cite

“…First, it must be able to scale both to large sample sizes, which can be as large as hundreds of millions of observations, and to many features, sometimes in the thousands. Second, it should be reproducible and extensible such that software engineers and researchers can interact with, iterate on, and subsequently contribute to it (Diamantopoulos et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

“…Addressing the second challenge, Netflix described an inclusive XP that makes use of single-machine computation for modeling, allowing it to be more interactive and consistent with the way researchers iterate (Diamantopoulos et al, 2020). As a result, researchers can reproduce analyses from the XP, iterate, follow up, and debug using Python and R, and then contribute improvements to statistical methodology back to the engineering systems.…”

Section: Introductionmentioning

confidence: 99%

You Only Compress Once: Optimal Data Compression for Estimating Linear Models

Wong¹,

Forsell²,

Lewis³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Linear models are used in online decision making, such as in machine learning, policy algorithms, and experimentation platforms. Many engineering systems that use linear models achieve computational efficiency through distributed systems and expert configuration. While there are strengths to this approach, it is still difficult to have an environment that enables researchers to interactively iterate and explore data and models, as well as leverage analytics solutions from the open source community. Consequently, innovation can be blocked.Conditionally sufficient statistics is a unified data compression and estimation strategy that is useful for the model development process, as well as the engineering deployment process. The strategy estimates linear models from compressed data without loss on the estimated parameters and their covariances, even when errors are autocorrelated within clusters of observations. Additionally, the compression preserves almost all interactions with the the original data, unlocking better productivity for both researchers and engineering systems.

show abstract

Engineering for a science-centric experimentation platform

Cited by 6 publications

References 36 publications

Representation-Aware Experimentation: Group Inequality Analysis for A/B Testing and Alerting

Representation-Aware Experimentation: Group Inequality Analysis for A/B Testing and Alerting

Characterizing Interference Heterogeneity and Improving Estimation for Experiments in Networks

You Only Compress Once: Optimal Data Compression for Estimating Linear Models

Contact Info

Product

Resources

About