In recent years, gesture recognition and speech recognition, as important input methods in Human–Computer Interaction (HCI), have been widely used in the field of virtual reality. In particular, with the rapid development of deep learning, artificial intelligence, and other computer technologies, gesture recognition and speech recognition have achieved breakthrough research progress. The search platform used in this work is mainly the Google Academic and literature database Web of Science. According to the keywords related to HCI and deep learning, such as “intelligent HCI”, “speech recognition”, “gesture recognition”, and “natural language processing”, nearly 1000 studies were selected. Then, nearly 500 studies of research methods were selected and 100 studies were finally selected as the research content of this work after five years (2019–2022) of year screening. First, the current situation of the HCI intelligent system is analyzed, the realization of gesture interaction and voice interaction in HCI is summarized, and the advantages brought by deep learning are selected for research. Then, the core concepts of gesture interaction are introduced and the progress of gesture recognition and speech recognition interaction is analyzed. Furthermore, the representative applications of gesture recognition and speech recognition interaction are described. Finally, the current HCI in the direction of natural language processing is investigated. The results show that the combination of intelligent HCI and deep learning is deeply applied in gesture recognition, speech recognition, emotion recognition, and intelligent robot direction. A wide variety of recognition methods were proposed in related research fields and verified by experiments. Compared with interactive methods without deep learning, high recognition accuracy was achieved. In Human–Machine Interfaces (HMIs) with voice support, context plays an important role in improving user interfaces. Whether it is voice search, mobile communication, or children’s speech recognition, HCI combined with deep learning can maintain better robustness. The combination of convolutional neural networks and long short-term memory networks can greatly improve the accuracy and precision of action recognition. Therefore, in the future, the application field of HCI will involve more industries and greater prospects are expected.