“…The approach using the Kullback‐Leibler divergence has been used extensively over the last three decades for imposing constraints in the framework of learning with statistical models. Many developments and applications have been published such as, for instance, the following interesting papers devoted to the training of neural network classifiers, learning topological, ensemble learning related to multilayer networks, constraints on parameters during learning, recognition of human activities, text categorization, exploration of data analysis, classification problems with constrained data, statistical learning algorithms with linguistic constraints, reinforcement learning in finite Markov decision processes, visual recognition, optimal sequential allocation, identifiability of parameter learning machines, speech enhancement, the training of generative neural samplers, pessimistic uncertainty quantification problem, inference given summary statistics, optimization framework concerning distance metric learning unsupervised machine learning techniques to learn latent parameters, graph‐based semisupervised learning, speech enhancement, and finally, to the broad learning systems …”