2022
DOI: 10.48550/arxiv.2203.09118
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Time and the Value of Data

Abstract: Managers often believe that collecting more data will continually improve the accuracy of their machine learning models. However, we argue in this paper that when data lose relevance over time, it may be optimal to collect a limited amount of recent data instead of keeping around an infinite supply of older (less relevant) data. In addition, we argue that increasing the stock of data by including older datasets may, in fact, damage the model's accuracy. Expectedly, the model's accuracy improves by increasing t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 26 publications
0
2
0
Order By: Relevance
“…Moreover, our large-scale field experiment provides causal estimates by going beyond offline evaluations, which often form the basis for a majority of applied technical research (e.g., the studies reviewed in Jannach andJugovac 2019 andWu et al 2022). Such offline evaluations often suffer from endogeneity concerns because of the inability of historical data to account for fast-paced dynamics and customer feedback loops (Valavi et al 2022). More recently, this literature has run into severe methodological issues and problems of replication (Kapoor and Narayanan 2023), highlighting the need for credible field experiments.…”
Section: Related Literaturementioning
confidence: 99%
“…Moreover, our large-scale field experiment provides causal estimates by going beyond offline evaluations, which often form the basis for a majority of applied technical research (e.g., the studies reviewed in Jannach andJugovac 2019 andWu et al 2022). Such offline evaluations often suffer from endogeneity concerns because of the inability of historical data to account for fast-paced dynamics and customer feedback loops (Valavi et al 2022). More recently, this literature has run into severe methodological issues and problems of replication (Kapoor and Narayanan 2023), highlighting the need for credible field experiments.…”
Section: Related Literaturementioning
confidence: 99%
“…It is not uncommon to consider a S-shaped relationship between data and the value it creates (Hagiu & Wright, 2020a;Parker et al, 2022;Posner & Weyl, 2018;Valavi, Hestness, Ardalani, & Iansiti, 2022). Consider, for example, the data-network value relationship depicted in Figure 3.…”
mentioning
confidence: 99%