Abstract-We discuss issues related to the design of a multitouch gesture sensing environment, allowing the user to execute both independent and coordinated gestures. We discuss different approaches, comparing frontal vs. back projection devices and gesture tracking vs. shape recognition. To compare the approaches we introduce a simple gesture language for drawing diagrams. A test multitouch device built around FTIR technology is illustrated; a vision system, driven by a visual dataflow programming environment, interprets the user gestures and classifies them into a set of predefined patterns, corresponding to language commands.has received great attention in last years, due to progress in research about interaction devices and to new applications, mainly in the Web2.0 area. Ubiquitous computing and ambient intelligence compel us to rethink our relations with information devices, trying to surpass the traditional "window, icon, menu, pointing device" (WIMP) style of interaction with multimodal and distributed interfaces.The WIMP paradigm is, however, so well rooted that many efforts to find more natural interaction styles through free gesture interpretation are still based on the model of a pointer moving in a limited area, commanding actions by "clicking" on objects representing programs and documents. Indeed, the 2D desktop layout doesn't leave too much space to freely propose "real world" gestures, possible only in an immersive environment and with constraints.In this paper we argument about the choice of a gesture sensing device and the design of a gesture language for it. We discuss the pros and cons of different interaction styles, and their impact on the gesture structure and interpretation. We define a drawing-style gesture language suited for a multitouch device based on the Frustrated Total Internal Reflection (FTIR) technololgy proposed by Han [4], allowing users to interact with real multitouch, independent gestures. The gesture language is targeted to drawing simple graphs such as the ones used in conceptual and mental maps. The operations permitted are: create an object (a node), move an object, connect two or more objects, write text labels, delete objects.We have built a prototype system to experiment our proposal. Gesture are captured and interpreted by a vision system built around a dataflow programming environment; gestures are classified into a set of predefined patterns corresponding to the language commands. Having multiple pointers allows the user to execute both parallel independent gestures (e.g., to select many objects at the same time), or coordinated gestures (e.g., to connect two objects).