Human–robot interaction is becoming increasingly common to perform useful tasks in everyday life. From the human–machine communication perspective, achieving effective interaction in natural language is one challenge. To address it, natural language processing strategies have recently been used, commonly following a supervised machine learning framework. In this context, most approaches rely on the use of linguistic resources (e.g., taggers or embeddings), including training corpora. Unfortunately, such resources are scarce for some languages in specific domains, increasing the complexity of solution approaches. Motivated by these challenges, this paper explores deep learning methods for understanding natural language commands emitted to service robots that guide their movements in low-resource scenarios, defined by the use of Spanish and Nahuatl languages, for which linguistic resources are scarcely unavailable for this specific task. Particularly, we applied natural language understanding (NLU) techniques using deep neural networks and transformers-based models. As part of the research methodology, we introduced a labeled dataset of movement commands in the mentioned languages. The results show that models based on transformers work well to recognize commands (intent classification task) and their parameters (e.g., quantities and movement units) in Spanish, achieving a performance of 98.70% (accuracy) and 96.96% (F1) for the intent classification and slot-filling tasks, respectively). In Nahuatl, the best performance obtained was 93.5% (accuracy) and 88.57% (F1) in these tasks, respectively. In general, this study shows that robot movements can be guided in natural language through machine learning models using neural models and cross-lingual transfer strategies, even in low-resource scenarios.