Educational testing and learning have evolved from using standard True/False, fill-in-the-blank and multiple choice on paper to more visually enriched formats using interactive multimedia content on digital displays. However, traditional educational application interfaces are primarily mouse-driven, which prevents multiple users working simultaneously. Although touch-based displays have emerged and inspired new developments, they are mainly used in simple tasks. In this paper we show how the multi-touch technology can be extended to collaborative learning and testing at a larger scale, using an existing education implementation for illustration. We propose a Human-Intention-Machine-Interpretation (HIMI) model, which applies a graph-based approach to recognize hand gestures and interpret user intentions. Our focus is not to build a new multitouch system but to make use of the existing multi-touch technology to enhance learning performance. The HIMI model not only facilitates natural interactions using hand movements on simple tasks, but also supports complex collaborative operations. Our contribution lies in embedding the multi-touch technology in multimedia education, providing a multi-user learning and testing environment which would not have been possible using traditional input devices. We formalize a conceptual model to uniquely interpret user intentions via touch states, state transitions and transition associations. We also propose a set of hand gestures for working with multimedia educational items. User evaluations are conducted to show the feasibility of the proposed hand gestures.