The development of language macromodeling has led to the widespread adoption of spoken English conversation systems in various industries. Similarly, it has had an impact on the tourism industry, where the involvement of machine learning has taken a big step forward. The purpose of this paper is to explore the use of machine learning methods in spoken dialog systems to construct a dialog model that is suitable for the tourism industry. Speech, map image, and tourism information text data are collected first, which form the database of the dialog model through processing, storage, and application. A bag-of-words model and word vectors represent the text, while speech activity and an English conversation system construct the speech and intention recognition model. On the test set, the best speech recognition rate using double-threshold endpoint detection is up to 98.73%, and the corresponding sentence recognition rate is also up to 93.67%. This paper’s model achieves the highest intention recognition accuracy, scoring 0.943 and 0.955 on the CCKS2018 and SMP2018 datasets, respectively. This paper proposes a machine-learning conversation model that provides technical support for conversational AI development in the tourism industry.