TextKD-GAN: Text Generation Using Knowledge Distillation and Generative Adversarial Networks

Haidar, Md. Akmal; Rezagholizadeh, Mehdi

doi:10.1007/978-3-030-18305-9_9

Cited by 35 publications

(14 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Knowledge distillation is extensively studied in the field of natural language processing (NLP), in order to obtain the lightweight, efficient and effective language models. More and more KD methods are proposed for solving the numerous NLP tasks (Liu et al, 2019b;Gordon and Duh, 2019;Haidar and Rezagholizadeh, 2019;Yang et al, 2020b;Tang et al, 2019;Hu et al, 2018;Nakashole and Flauger, 2017;Jiao et al, 2019;Wang et al, 2018c;Zhou et al, 2019a;Sanh et al, 2019;Turc et al, 2019;Arora et al, 2019;Clark et al, 2019;Kim and Rush, 2016;Mou et al, 2016;Liu et al, 2019e;Hahn and Choi, 2019;Kuncoro et al, 2016;Cui et al, 2017;Wei et al, 2019;Freitag et al, 2017;Shakeri et al, 2019;Aguilar et al, 2020). The existing NLP tasks using KD contain neural machine translation (NMT) (Hahn and Choi, 2019;Zhou et al, 2019a;Kim and Rush, 2016;Wei et al, 2019;Freitag et al, 2017;Gordon and Duh, 2019), question answering system (Wang et al, 2018c;Arora et al, 2019;Yang et al, 2020b;Hu et al, 2018), document retrieval (Shakeri et al, 2019), event detection (Liu et al, 2019b), text generation (Haidar and Rezagholizadeh, 2019)...…”

Section: Kd In Nlpmentioning

confidence: 99%

Knowledge Distillation: A Survey

Gou,

Yu,

Maybank

et al. 2020

Preprint

View full text Add to dashboard Cite

In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, distillation algorithms and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

show abstract

Section: Kd In Nlpmentioning

confidence: 99%

Knowledge Distillation: A Survey

Gou,

Yu,

Maybank

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…It is obvious that (A.7) have 6 roots, and λ1 = λ2 = τ are two of the 6 roots. According to the convergence of formula (12), we can obtain the τ is almost small and |τ | < 1. However, in this case, whatever the values of the α, β1andγ are, the dynamic system will converge to the Nash Equilibrium, which is meaningless.…”

Section: A Proofs Inmentioning

confidence: 92%

“…GANs have a wide range of applications [13] because of their capability, which can learn to generate complex and high dimensional target distribution. The existing literature about GANs can be divided into four categories, including music generation [8,11,52], natural languages [5,12,14,25], methods of training GANs [15,33,36,38],images processing [20,44,47,55]. GANs have obtained remarkable progress in image processing, such as video generation [40,42,43], noise removal [53], deblur [18], image to image translation [16,51], image super-resolution [20], medical image processing [6,27,49].…”

Section: Introductionmentioning

confidence: 99%

Training Generative Adversarial Networks with Adaptive Composite Gradient

Huiqing¹,

Li²,

Tan³

et al. 2021

Preprint

View full text Add to dashboard Cite

The wide applications of Generative adversarial networks benefit from the successful training methods, guaranteeing that an object function converges to the local minima. Nevertheless, designing an efficient and competitive training method is still a challenging task due to the cyclic behaviors of some gradient-based ways and the expensive computational cost of these methods based on the Hessian matrix. This paper proposed the adaptive Composite Gradients (ACG) method, linearly convergent in bilinear games under suitable settings. Theory and toy-function experiments suggest that our approach can alleviate the cyclic behaviors and converge faster than recently proposed algorithms. Significantly, the ACG method is not only used to find stable fixed points in bilinear games as well as in general games. The ACG method is a novel semi-gradient-free algorithm since it does not need to calculate the gradient of each step, reducing the computational cost of gradient and Hessian by utilizing the predictive information in future iterations. We conducted two mixture of Gaussians experiments by integrating ACG to existing algorithms with Linear GANs. Results show ACG is competitive with the previous algorithms. Realistic experiments on four prevalent data sets (MNIST, Fashion-MNIST, CIFAR-10, and CelebA) with DCGANs show that our ACG method outperforms several baselines, which illustrates the superiority and efficacy of our method.

show abstract

“…Some recent works involving deep learning are based on CNN [50,48,6], CNN with Multi-headed Attention [46], RNN with Attention module [18], and LSTM [11,39,26]. There are two problems with this technique, a) the models are trained on lexical features such as URL Length, Count of top-level-domain, Number of punctuation symbols, and b) There are no adversarial elements present in training strategies, meaning no synthetic samples are generated or With the popularity of Generative adversarial networks (GAN), there has been a surge in NLP applications ranging from text generation [51,15,13], language models, [47,44] and text classification [12,25]. The reason being, GANs can extract and learn fine-grained information from texts and utilize that to create synthetic examples using a generator architecture.…”

Section: Literature Reviewmentioning

confidence: 99%

Semi-supervised Conditional GAN for Simultaneous Generation and Detection of Phishing URLs: A Game theoretic Perspective

Kamran¹,

Sengupta²,

Tavakkoli³

2021

Preprint

View full text Add to dashboard Cite

Spear Phishing is a type of cyber-attack where the attacker sends hyperlinks through email on well-researched targets. The objective is to obtain sensitive information such as name, credentials, credit card numbers, or other crucial data by imitating oneself as a trustworthy website. According to a recent report, phishing incidents nearly doubled in frequency in 2020. In recent times, machine learning techniques have become the standard for defending against such attacks. Many augmentations have been made for improving the existing architectures, such as Convolutional Networks, Recurrent Networks, and Generative Adversarial Networks. However, these architectures were designed with only defense in mind. Moreover, the attacker's perspective and motivation are absent while creating such training and deployment pipelines. To address this, we need a gametheoretic approach to understand the rational decision-making process of the attacker (Hacker) and the defender (Phishing URL detector). We propose a Conditional Generative Adversarial Network for real-time phishing URL detection. Additionally, we train our architecture in a semi-supervised manner to distinguish between adversarial and real examples, along with detecting malicious and benign URLs. We also design two games between the attacker and defender in training and deployment settings by utilizing the game-theoretic perspective. Our experiments confirm that the proposed architecture surpasses recent state-of-theart architectures for phishing URLs detection.

show abstract

TextKD-GAN: Text Generation Using Knowledge Distillation and Generative Adversarial Networks

Cited by 35 publications

References 32 publications

Knowledge Distillation: A Survey

Knowledge Distillation: A Survey

Training Generative Adversarial Networks with Adaptive Composite Gradient

Semi-supervised Conditional GAN for Simultaneous Generation and Detection of Phishing URLs: A Game theoretic Perspective

Contact Info

Product

Resources

About