Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data 2020
DOI: 10.1145/3318464.3380571
|View full text |Cite
|
Sign up to set email alerts
|

PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models

Abstract: The ubiquitous use of machine learning algorithms brings new challenges to traditional database problems such as incremental view update. Much effort is being put in better understanding and debugging machine learning models, as well as in identifying and repairing errors in training datasets. Our focus is on how to assist these activities when they have to retrain the machine learning model after removing problematic training samples in cleaning or selecting different subsets of training data for interpretabi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(16 citation statements)
references
References 40 publications
0
16
0
Order By: Relevance
“…3) Gradient unlearning method. The third group focuses on approximating the SGD steps as if full retraining was performed [11,12,18,19]. To aid in a history of accurate computations to produce an effective approximation, the unlearning method periodically computes the exact gradient after some iterations.…”
Section: Machine Unlearningmentioning
confidence: 99%
See 1 more Smart Citation
“…3) Gradient unlearning method. The third group focuses on approximating the SGD steps as if full retraining was performed [11,12,18,19]. To aid in a history of accurate computations to produce an effective approximation, the unlearning method periodically computes the exact gradient after some iterations.…”
Section: Machine Unlearningmentioning
confidence: 99%
“…The second group updates the trained ML model using the remaining data D-D i to perform a corrective Newton step; it follows [8,9,10]. The third group updates the trained ML model by correcting the SGD steps that led to the trained model; it follows the method defined in [11,12]. Note that, the premise for above three groups to compute an accurate approximation is to obtain accurate D and D i , while both of data cannot be accessed globally in the FL settings.…”
Section: Introductionmentioning
confidence: 99%
“…Based on first-order influence functions, [95] introduced an approach for identifying training data points that are responsible for user constraints specified by an SQL query. There are other methods than influence functions, however, e.g, [98] develops an approach for identifying and ranking training data points based on their influence on predictions of neural networks, and [96] develops an approach for incremental computation of the influence of removing subset of training data points Furthermore, other recent work argues for the use of data Shapley values to quantify the contribution of individual data instances [33,34,54]; these approaches are computationally expensive because each data instance requires the model to be retrained. Unlike prior methods, our method generates: (1) explanations for fairness of an ML model, (2) interpretable explanations based are first-order predicates that pinpoint a subset of training data responsible for model bias, and (3) update-based explanations that reveal data-errors in certain attributes of a training data subset.…”
Section: Related Workmentioning
confidence: 99%
“…Again, however, these are not geared for deep data introspection. Priu [47], helps users understand data changes, particularly deletions, that are used in regression models. Unfortunately, this work only tracks deletions, and not additions or updates to data.…”
Section: Related Workmentioning
confidence: 99%
“…Then, with each of those operators we associate a provenance pattern that describes the effect of the operator on the data at the appropriate level of detail, i.e., on individual dataframe elements, columns, rows, or collections of those. Effectively, the provenance patterns defined in this work for well-defined data science operators play a similar role to that of provenance polynomials [13], i.e., annotations that are associated to relational algebra operators to describe the fine-grained provenance of the result of relational as well as linear algebra operators [47,48]. We then associate a provenance function pf o () to each operator o, which generates a provenance document pf o ( ) when a dataset is processed using o.…”
Section: Introductionmentioning
confidence: 99%