Many state-of-the-art solutions for the understanding of speech data have in common to be probabilistic and to rely on machine learning algorithms to train their models from large amount of data. The difficulty remains in the cost of collecting and annotating such data. Another point is the time for updating an existing model to a new domain. Recent works showed that a zero-shot learning method allows to bootstrap a model with good initial performance. To do so, this method relies on exploiting both a small-sized ontological description of the target domain and a generic word-embedding semantic space for generalization. Then, this framework has been extended to exploit user feedbacks to refine the zero-shot semantic parser parameters and increase its performance online. In this paper, we propose to drive this online adaptive process with a policy learnt using the Adversarial Bandit algorithm Exp3. We show, on the second Dialog State Tracking Challenge (DSTC2) datasets, that this proposition can optimally balance the cost of gathering valuable user feedbacks and the overall performance of the spoken language understanding module.Index Terms-Spoken language understanding, zero-shot learning, bandit problem, out-of-domain training data, online adaptation.