Multimodal human–computer interaction has been sought to provide not only more compelling interactive experiences, but also more accessible interfaces to mobile devices. With the advance in mobile technology and in affordable sensors, multimodal research that leverages and combines multiple interaction modalities (such as speech, touch, vision, and gesture) has become more and more prominent. This article provides a framework for the key aspects in mid-air gesture and speech-based interaction for older adults. It explores the literature on multimodal interaction and older adults as technology users and summarises the main findings for this type of users. Building on these findings, a number of crucial factors to take into consideration when designing multimodal mobile technology for older adults are described. The aim of this work is to promote the usefulness and potential of multimodal technologies based on mid-air gestures and voice input for making older adults' interaction with mobile devices more accessible and inclusive.