PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models

Wu, Yinjun; Tannen, Val; Davidson, Susan B.

doi:10.1145/3318464.3380571

Cited by 28 publications

(16 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3) Gradient unlearning method. The third group focuses on approximating the SGD steps as if full retraining was performed [11,12,18,19]. To aid in a history of accurate computations to produce an effective approximation, the unlearning method periodically computes the exact gradient after some iterations.…”

Section: Machine Unlearningmentioning

confidence: 99%

“…The second group updates the trained ML model using the remaining data D-D i to perform a corrective Newton step; it follows [8,9,10]. The third group updates the trained ML model by correcting the SGD steps that led to the trained model; it follows the method defined in [11,12]. Note that, the premise for above three groups to compute an accurate approximation is to obtain accurate D and D i , while both of data cannot be accessed globally in the FL settings.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Federated Unlearning via Class-Discriminative Pruning

Wang¹,

Guo²,

Xie³

et al. 2021

Preprint

View full text Add to dashboard Cite

We explore the problem of selectively forgetting categories from trained CNN classification models in the federated learning (FL). Given that the data used for training cannot be accessed globally in FL, our insights probe deep into the internal influence of each channel. Through the visualization of feature maps activated by different channels, we observe that different channels have a varying contribution to different categories in image classification. Inspired by this, we propose a method for scrubbing the model clean of information about particular categories. The method does not require retraining from scratch, nor global access to the data used for training. Instead, we introduce the concept of Term Frequency Inverse Document Frequency (TF-IDF) to quantize the class discrimination of channels. Channels with high TF-IDF scores have more discrimination on the target categories and thus need to be pruned to unlearn. The channel pruning is followed by a fine-tuning process to recover the performance of the pruned model. Evaluated on CIFAR10 dataset, our method accelerates the speed of unlearning by 8.9× for the ResNet model, and 7.9× for the VGG model under no degradation in accuracy, compared to retraining from scratch. For CIFAR100 dataset, the speedups are 9.9× and 8.4×, respectively. We envision this work as a complementary block for FL towards compliance with legal and ethical criteria.

show abstract

Section: Machine Unlearningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Federated Unlearning via Class-Discriminative Pruning

Wang¹,

Guo²,

Xie³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Based on first-order influence functions, [95] introduced an approach for identifying training data points that are responsible for user constraints specified by an SQL query. There are other methods than influence functions, however, e.g, [98] develops an approach for identifying and ranking training data points based on their influence on predictions of neural networks, and [96] develops an approach for incremental computation of the influence of removing subset of training data points Furthermore, other recent work argues for the use of data Shapley values to quantify the contribution of individual data instances [33,34,54]; these approaches are computationally expensive because each data instance requires the model to be retrained. Unlike prior methods, our method generates: (1) explanations for fairness of an ML model, (2) interpretable explanations based are first-order predicates that pinpoint a subset of training data responsible for model bias, and (3) update-based explanations that reveal data-errors in certain attributes of a training data subset.…”

Section: Related Workmentioning

confidence: 99%

Interpretable Data-Based Explanations for Fairness Debugging

Pradhan¹,

Zhu²,

Glavic³

et al. 2021

Preprint

View full text Add to dashboard Cite

A wide variety of fairness metrics and eXplainable Artificial Intelligence (XAI) approaches have been proposed in the literature to identify bias in machine learning models that are used in critical real-life contexts. However, merely reporting on a model's bias, or generating explanations using existing XAI techniques is insufficient to locate and eventually mitigate sources of bias. In this work, we introduce Gopher, a system that produces compact, interpretable and causal explanations for bias or unexpected model behavior by identifying coherent subsets of the training data that are root-causes for this behavior. Specifically, we introduce the concept of causal responsibility that quantifies the extent to which intervening on training data by removing or updating subsets of it can resolve the bias. Building on this concept, we develop an efficient approach for generating the top-𝑘 patterns that explain model bias that utilizes techniques from the ML community to approximate causal responsibility and uses pruning rules to manage the large search space for patterns. Our experimental evaluation demonstrates the effectiveness of Gopher in generating interpretable explanations for identifying and debugging sources of bias.

show abstract

“…Again, however, these are not geared for deep data introspection. Priu [47], helps users understand data changes, particularly deletions, that are used in regression models. Unfortunately, this work only tracks deletions, and not additions or updates to data.…”

Section: Related Workmentioning

confidence: 99%

“…Then, with each of those operators we associate a provenance pattern that describes the effect of the operator on the data at the appropriate level of detail, i.e., on individual dataframe elements, columns, rows, or collections of those. Effectively, the provenance patterns defined in this work for well-defined data science operators play a similar role to that of provenance polynomials [13], i.e., annotations that are associated to relational algebra operators to describe the fine-grained provenance of the result of relational as well as linear algebra operators [47,48]. We then associate a provenance function pf o () to each operator o, which generates a provenance document pf o ( ) when a dataset is processed using o.…”

Section: Introductionmentioning

confidence: 99%

Capturing and querying fine-grained provenance of preprocessing pipelines in data science

et al. 2020

View full text Add to dashboard Cite

Data processing pipelines that are designed to clean, transform and alter data in preparation for learning predictive models, have an impact on those models' accuracy and performance, as well on other properties, such as model fairness. It is therefore important to provide developers with the means to gain an in-depth understanding of how the pipeline steps affect the data, from the raw input to training sets ready to be used for learning. While other efforts track creation and changes of pipelines of relational operators, in this work we analyze the typical operations of data preparation within a machine learning process, and provide infrastructure for generating very granular provenance records from it, at the level of individual elements within a dataset. Our contributions include: (i) the formal definition of a core set of preprocessing operators, and the definition of provenance patterns for each of them, and (ii) a prototype implementation of an application-level provenance capture library that works alongside Python. We report on provenance processing and storage overhead and scalability experiments, carried out over both real ML benchmark pipelines and over TCP-DI, and show how the resulting provenance can be used to answer a suite of provenance benchmark queries that underpin some of the developers' debugging questions, as expressed on the Data Science Stack Exchange.

show abstract

PrIU: A Provenance-Based Approach for Incrementally Updating Regression Models

Cited by 28 publications

References 40 publications

Federated Unlearning via Class-Discriminative Pruning

Federated Unlearning via Class-Discriminative Pruning

Interpretable Data-Based Explanations for Fairness Debugging

Capturing and querying fine-grained provenance of preprocessing pipelines in data science

Contact Info

Product

Resources

About