“…This formulation applies to a discriminant variant of the RBM called Discriminative RBM . Such conditional energy-based models have also been exploited in a series of probabilistic language models based on neural networks (Bengio et al, 2001;Schwenk & Gauvain, 2002;Bengio, Ducharme, Vincent, & Jauvin, 2003;Xu, Emami, & Jelinek, 2003;Schwenk, 2004;Schwenk & Gauvain, 2005;Mnih & Hinton, 2009). That formulation (or generally when it is easy to sum or maximize over the set of values of the terms of the partition function) has been explored at length (LeCun & Huang, 2005;LeCun et al, 2006;).…”