2021
DOI: 10.14778/3467861.3467872
|View full text |Cite
|
Sign up to set email alerts
|

Data acquisition for improving machine learning models

Abstract: The vast advances in Machine Learning (ML) over the last ten years have been powered by the availability of suitably prepared data for training purposes. The future of ML-enabled enterprise hinges on data. As such, there is already a vibrant market offering data annotation services to tailor sophisticated ML models. In this paper, inspired by the recent vision of online data markets and associated market designs, we present research on the practical problem of obtaining data in order to … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(4 citation statements)
references
References 34 publications
0
4
0
Order By: Relevance
“…There is a heuristic called Thompson sampling [9], [10]. It was used in revenue management [11], web site optimization [12], in online advertising [13], in selective data acquisition for improving machine learning models [14], and more interestingly in designing multi-armed bandits (MABs) [15]. The MABs were shown to be useful in a wide range of applications, which require continuous improvement, ranging from online service economy [16], to portfolio selection in finance [17], [18], and to real time bid prediction in online advertising [19].…”
Section: Literature Reviewmentioning
confidence: 99%
“…There is a heuristic called Thompson sampling [9], [10]. It was used in revenue management [11], web site optimization [12], in online advertising [13], in selective data acquisition for improving machine learning models [14], and more interestingly in designing multi-armed bandits (MABs) [15]. The MABs were shown to be useful in a wide range of applications, which require continuous improvement, ranging from online service economy [16], to portfolio selection in finance [17], [18], and to real time bid prediction in online advertising [19].…”
Section: Literature Reviewmentioning
confidence: 99%
“…Tuple discovery [11,14,18,27,34,38,58] selects tuples from a pool of available datasets that are the most useful for a pre-defined target (model training [11], causal inference in question answering [18]), and the usefulness of tuples is checked for a single target. In contrast, our work focuses on acquiring entire datasets rather than tuples.…”
Section: Related Workmentioning
confidence: 99%
“…Training data acquisition has been a challenge for AI model training, because of the general lack of training data, the increasing demand for training data with the wider application of AI, and the high cost to acquire data (Li et al, 2021;Roh et al, 2021). While this is the case for many types of AI modeling, one important example relevant to digital rock physics is high-resolution imaging of porous media, such as X-ray Micro-computed tomography (Micro-CT, see review in Wildenschild & Sheppard, 2013), which is time-consuming especially for high-resolution scans and results in limited data acquisition with scanning one sample at a time.…”
Section: Introductionmentioning
confidence: 99%