Unlearning the data observed during the training of a machine learning (ML) model is an important task that can play a pivotal role in fortifying the privacy and security of ML-based applications. This paper raises the following questions: (i) can we unlearn a class/classes of data from a ML model without looking at the full training data even once? (ii) can we make the process of unlearning fast and scalable to large datasets, and generalize it to different deep networks? We introduce a novel machine unlearning framework with error-maximizing noise generation and impair-repair based weight manipulation that offers an efficient solution to the above questions. An error-maximizing noise matrix is learned for the class to be unlearned using the original model. The noise matrix is used to manipulate the model weights to unlearn the targeted class of data. We introduce impair and repair steps for a controlled manipulation of the network weights. In the impair step, the noise matrix along with a very high learning rate is used to induce sharp unlearning in the model. Thereafter, the repair step is used to regain the overall performance. With very few update steps, we show excellent unlearning while substantially retaining the overall model accuracy. Unlearning multiple classes requires a similar number of update steps as for the single class, making our approach scalable to large problems. Our method is quite efficient in comparison to the existing methods, works for multi-class unlearning, doesn't put any constraints on the original optimization mechanism or network design, and works well in both small and large-scale vision tasks. This work is an important step towards fast and easy implementation of unlearning in deep networks. We will make the source code publicly available.
With the introduction of new privacy regulations, machine unlearning is becoming an emerging research problem due to an increasing need for regulatory compliance required for machine learning (ML) applications. Modern privacy regulations grant citizens the right to be forgotten by products, services and companies. This necessitates deletion of data not only from storage archives but also from ML model. The right to be forgotten requests come in the form of removal of a certain set or class of data from the already trained ML model. Practical considerations preclude retraining of the model from scratch minus the deleted data. The few existing studies use the whole training data, or a subset of training data, or some metadata stored during training to update the model weights for unlearning. However, strict regulatory compliance requires time-bound deletion of data. Thus, in many cases, no data related to the training process or training samples may be accessible even for the unlearning purpose. We therefore ask the question: is it possible to achieve unlearning with zero training samples? In this paper, we introduce the novel problem of zero-shot machine unlearning that caters for the extreme but practical scenario where zero original data samples are available for use. We then propose two novel solutions for zero-shot machine unlearning based on (a) error minimizing-maximizing noise and (b) gated knowledge transfer. We also introduce a new evaluation metric, Anamnesis Index (AIN) to effectively measure the quality of the unlearning method. The experiments show promising results for unlearning in deep learning models on benchmark vision datasets. The source code will be made publicly available.
Machine unlearning has become an important field of research due to an increasing focus on addressing the evolving data privacy rules and regulations into the machine learning (ML) applications. It facilitates the request for removal of certain set or class of data from the already trained ML model without retraining from scratch. Recently, several efforts have been made to perform unlearning in an effective and efficient manner. We propose a novel machine unlearning method by exploring the utility of competent and incompetent teachers in a student-teacher framework to induce forgetfulness. The knowledge from the competent and incompetent teachers is selectively transferred to the student to obtain a model that doesn't contain any information about the forget data. We experimentally show that this method is well generalized, fast, and effective. Furthermore, we introduce a zero retrain forgetting (ZRF) metric to evaluate the unlearning method. Unlike the existing unlearning metrics, the ZRF score does not depend on the availability of the expensive retrained model. This makes it useful for analysis of the unlearned model after deployment as well. The experiments are conducted for random subset forgetting and class forgetting on various deep networks and across different application domains. A use case of forgetting information about the patients' medical records is also presented. The source code will be made publicly available.
With the introduction of data protection and privacy regulations, it has become crucial to remove the lineage of data on demand in a machine learning system. In past few years, there has been notable development in machine unlearning to remove the information of certain training data points efficiently and effectively from the model. In this work, we explore unlearning in a regression problem, particularly in deep learning models. Unlearning in classification and simple linear regression has been investigated considerably. However, unlearning in deep regression models largely remain an untouched problem till now. In this work, we introduce deep regression unlearning methods that are well generalized and robust to privacy attacks. We propose the Blindspot unlearning method which uses a novel weight optimization process. A randomly initialized model, partially exposed to the retain samples and a copy of original model are used together to selectively imprint knowledge about the data that we wish to keep and scrub the information of the data we wish to forget. We also propose a Gaussian distribution based fine tuning method for regression unlearning. The existing evaluation metrics for unlearning in a classification task are not directly applicable for regression unlearning. Therefore, we adapt these metrics for regression task. We devise a membership inference attack to check the privacy leaks in the unlearned regression model. We conduct the experiments on regression tasks for computer vision, natural language processing and forecasting applications. Our deep regression unlearning methods show excellent performance across all of these datasets and metrics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.