Despite the fact that natural language conversations with machines represent one of the central objectives of AI, and despite the massive increase of research and development efforts in conversational AI, task-oriented dialogue (TOD) -i.e., conversations with an artificial agent with the aim of completing a concrete task -is currently limited to a few narrow domains (e.g., food ordering, ticket booking) and a handful of major languages (e.g., English, Chinese). In this work, we provide an extensive overview of existing efforts in multilingual TOD and analyse the factors preventing the development of truly multilingual TOD systems. We identify two main challenges that combined hinder the faster progress in multilingual TOD: (1) current state-of-the-art TOD models based on large pretrained neural language models are data hungry; at the same time (2) data acquisition for TOD use cases is expensive and tedious. Most existing approaches to multilingual TOD thus rely on (zero-or few-shot) cross-lingual transfer from resource-rich languages (in TOD, this is basically only English), either by means of (i) machine translation or (ii) multilingual representation spaces. However, such approaches are currently not a viable solution for a large number of low-resource languages without parallel data and/or limited monolingual corpora. Finally, we discuss critical challenges and potential solutions by drawing parallels between TOD and other cross-lingual and multilingual NLP research.