“…Most solutions in HRI either rely on touch screens or tele-operated robots (e.g., Kanda et al, 2010 , Lee et al, 2012 , Leite et al, 2017 , Glas et al, 2017 , Kanda et al, 2010 ; Lee et al, 2012 ; Leite et al, 2017 ; Glas et al, 2017 ) to bypass these issues, or use rule-based methods in structured transaction-oriented interactions by matching user responses to predefined templates (e.g., Kanda et al, 2007 , Churamani et al, 2017 , Zheng et al, 2019 , Kanda et al, 2007 ; Churamani et al, 2017 ; Zheng et al, 2019 ). However, rule-based approaches are inflexible to the variations in the user responses and are often experienced as time consuming and frustrating ( Williams et al, 2018 ; Bartneck et al, 2019 ; Irfan et al, 2020a ). Moreover, automatic speech recognition errors may arise from various accents, quietly speaking users and pronunciation errors of non-native speakers, which could decrease the robustness of rule-based approaches ( Irfan et al, 2020a ).…”