Generative models learn the distribution of data from a sample dataset and can then generate new data instances. Recent advances in deep learning has brought forth improvements in generative model architectures, and some state-of-the-art models can (in some cases) produce outputs realistic enough to fool humans.We survey recent research at the intersection of security and privacy and generative models. In particular, we discuss the use of generative models in adversarial machine learning, in helping automate or enhance existing attacks, and as building blocks for defenses in contexts such as intrusion detection, biometrics spoofing, and malware obfuscation. We also describe the use of generative models in diverse applications such as fairness in machine learning, privacy-preserving data synthesis, and steganography. Finally, we discuss new threats due to generative models: the creation of synthetic media such as deepfakes that can be used for disinformation.
INTRODUCTIONGenerative models learn to characterize the distribution of data using only samples from it and then generate new data instances from this distribution. Although generative models are not new, the deep learning revolution has reinvigorated research into generative model architectures, and deep generative models using state-of-the-art architectures can now produce output that is sometimes indistinguishable from real-world data. Along with this, comes a host of new issues and opportunities relating to security and privacy.This paper provides a comprehensive survey of research at the intersection of generative models and security and privacy. In particular, we describe the recent use of generative models in adversarial machine learning. We also discuss applications such as producing adversarial examples without perturbations, steganography, and privacy-preserving data synthesis. We show how a number of attacks and defenses for various cybersecurity problems such as password generation, intrusion, and malware detection, can benefit from generative models because of their ability to learn the distribution of the training data. By characterizing the data distribution, generative models help practitioners better understand phenomena they are studying and supplement existing attack and defense techniques. Finally, we discuss the extent to which deep generative models present new threats when used to produce synthetic media.The increased intensity of work at the intersection of generative models and security/privacy is evidenced by a growing body of literature. This is illustrated in Fig. 1, which shows the number of papers (published and pre-prints) on this topic from 2000 to 2020.This survey is structured as follows. Section 2 provides a brief overview of generative model architectures, as well as a discussion of metrics used to evaluate generative models. Section 3, we survey the use of generative models for intrusion detection, malware detection, and biometric spoofing. In Section 4, we describe how generative models can be used to create synthetic dat...