“…Vector representation of frame [Ray et al 2018] Rare, OOV words Paraphrasing input utterances [Liu et al 2019b] Unidirectional information flow Memory network [Shen et al 2019a] Poor generalisation in deployment Sparse word embedding (prune useless words) [Ray et al 2019] Slots which take many values perform poorly Delexicalisation Language knowledge base, history context Attention over external knowledge base, multiturn history Implicit knowledge sharing between tasks BiLSTM, multi-task (DA) [Gupta et al 2019a] Speed Non-recurrent and label recurrent networks [Gupta et al 2019b] Multi-turn dialogue, using context Token attention, previous history Capturing intent-slot correlation Multi-head self attention, masked intent Poor generalisation BERT [Bhasin et al 2019] Learning joint distribution CNN, BiLSTM, cross-fusion, masking [Thi Do and Gaspers 2019] Lack of annotated data, flexibility Language transfer, multitasking, modularisation ] Key verb-slot correlation Key verb in features, BiLSTM, attention [Zhang and Wang 2019] Learning joint distribution Transformer architecture [Daha and Hewavitharana 2019] Efficient modelling of temporal dependency Character embedding and RNN [Dadas et al 2019] Lack of annotated data, small data sets Augmented data set Learning joint distribution Word embedding attention [E et al 2019] Learning joint distribution Bidirectional architecture, feedback Poor generalisation BERT encoding, multi-head self attention [Qin et al 2019] Weak influence of intent on slot Use intent prediction instead of summarised intent info in slot tagging [Gangadharaiah and Narayanaswamy 2019] Multi-intent samples Multi-label classification methods [Firdaus et al 2019] Multi-turn dialogue history, learning joint distribution RNN, CRF [Pentyala et al 2019] Optimal architecture BiLSTM, different architectures Non-recurrent model, transfer learning BERT, language transfer [Schuster et al 2019] Low resource languages Transfer methods with SLU test case [Okur et al 2019] Natural language Locate intent keywords, non-other slots [Xu et al 2020] Only good performance in one sub-task Joint intent/slot tagging, length variable attention [Bhasin et al 2020] Learning joint distribution Multimodal Low-rank Bilinear Attention Network [Firdaus et al 2020] Learning joint distribution Stacked BiLSTM [Zhang et al 2020b] Limitations of sequential analysis Graph representation of text Non-convex optimisation Convex combination of ensemble of models BERT issues with logical d...…”