Guided Dropout

Keshari, Rohit; Singh, Richa; Vatsa, Mayank

doi:10.1609/aaai.v33i01.33014065

Cited by 32 publications

(38 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…To verify the effectiveness of our proposed algorithms, we compare our approaches with recent dropout techniques, including the Automatic dropout [ 17 ], Controlled dropout [ 22 ], DropMI dropout [ 24 ], Guided dropout [ 15 ], Concrete dropout [ 23 ], and Targeted dropout [ 18 ], as well as the Standard dropout [ 10 ]. All the experiments are carried out using GPU-based Tensorflow [ 44 ] on Python 3.…”

Section: Resultsmentioning

confidence: 99%

“…The Standard dropout removes each computational latent unit using a fixed removal probability p independent of the rest of latent units. In recent studies, a variety of methods such as Standout [ 14 ], Guided dropout [ 15 ], Adversarial dropout [ 16 ], Automatic dropout [ 17 ], and Targeted dropout [ 18 ] etc. are proposed to achieve a more semantic dropout mechanism.…”

Section: Related Workmentioning

confidence: 99%

“…Controlled dropout [ 22 ] is a more memory efficient and faster version of standard dropout that the authors have suggested to gather and relocate non-zero weights in a new memory. Guided dropout [ 15 ] and Concrete dropout [ 23 ] are two efficient approaches that seek to find the p parameter by minimizing a defined objective function. By making use of the MI concept, Chen et al [ 24 ], proposed DropMI method that selects most relevant neurons to the target vector.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Maximum Relevance Minimum Redundancy Dropout with Informative Kernel Determinantal Point Process

Saffari

Khodayar

Ebrahimi

et al. 2021

Sensors

View full text Add to dashboard Cite

In recent years, deep neural networks have shown significant progress in computer vision due to their large generalization capacity; however, the overfitting problem ubiquitously threatens the learning process of these highly nonlinear architectures. Dropout is a recent solution to mitigate overfitting that has witnessed significant success in various classification applications. Recently, many efforts have been made to improve the Standard dropout using an unsupervised merit-based semantic selection of neurons in the latent space. However, these studies do not consider the task-relevant information quality and quantity and the diversity of the latent kernels. To solve the challenge of dropping less informative neurons in deep learning, we propose an efficient end-to-end dropout algorithm that selects the most informative neurons with the highest correlation with the target output considering the sparsity in its selection procedure. First, to promote activation diversity, we devise an approach to select the most diverse set of neurons by making use of determinantal point process (DPP) sampling. Furthermore, to incorporate task specificity into deep latent features, a mutual information (MI)-based merit function is developed. Leveraging the proposed MI with DPP sampling, we introduce the novel DPPMI dropout that adaptively adjusts the retention rate of neurons based on their contribution to the neural network task. Empirical studies on real-world classification benchmarks including, MNIST, SVHN, CIFAR10, CIFAR100, demonstrate the superiority of our proposed method over recent state-of-the-art dropout algorithms in the literature.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Maximum Relevance Minimum Redundancy Dropout with Informative Kernel Determinantal Point Process

Saffari

Khodayar

Ebrahimi

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…However, the side-model based approaches introduce significant computation and memory overhead. (Keshari, Singh, and Vatsa 2019) proposed the guided dropout to drop network nodes with high strength to encourage low strength nodes. (Wang, Zhou, and Bilmes 2019) proposed to Jumpout samples the dropout probability from a monotone decreasing distribution (e.g., the right half of a Gaussian) such that each linear piece of the network can learn better for data points from nearby than more distant regions to improve generalization of DNNs with ReLU activations.…”

Section: Related Workmentioning

confidence: 99%

“…In particular, it was shown that the generalization ability can be improved by dropping nodes selectively based on some prior knowledge of the network. For instance, (Keshari, Singh, and Vatsa 2019) learns a strength parameter by stochastic gradient descent (SGD) of the network for guiding dropout regularization of each node. (Wang, Zhou, and Bilmes 2019) adapts the dropout probability by normalizing it at each layer and every training batch such that the effective dropping rate on those activated units is kept the same during the training.…”

Section: Introductionmentioning

confidence: 99%

Group-Wise Dynamic Dropout Based on Latent Semantic Variations

Zheng

Xie

et al. 2020

AAAI

View full text Add to dashboard Cite

Dropout regularization has been widely used in various deep neural networks to combat overfitting. It works by training a network to be more robust on information-degraded data points for better generalization. Conventional dropout and variants are often applied to individual hidden units in a layer to break up co-adaptations of feature detectors. In this paper, we propose an adaptive dropout to reduce the co-adaptations in a group-wise manner by coarse semantic information to improve feature discriminability. In particular, we showed that adjusting the dropout probability based on local feature densities can not only improve the classification performance significantly but also enhance the network robustness against adversarial examples in some cases. The proposed approach was evaluated in comparison with the baseline and several state-of-the-art adaptive dropouts over four public datasets of Fashion-MNIST, CIFAR-10, CIFAR-100 and SVHN.

show abstract