This paper addresses the challenging problem of disorientation of elderly people living at home. In order to detect confusion, we monitor the behaviour of the elderly and identify actions that appear alarming in a sensorized and video-controlled smart environment. In the past, our research has focused on identifying situations, activities and interactions between various actors based on user-understandable models. This work addresses the development of a simulation tool capable of synthesizing sensor data and low-level/medium-level scene events. The tool is of great interest with regard to the design and configuration of an elderly disorientation recognition system because it reduces the laborious and expensive need for experimentation with real devices. We integrate this proposal in a comprehensive framework that distinguishes between a recognition line and a simulation line in a potentially continuous and closed cycle. The recognition line goes from a multisensory monitored scene to its semantic interpretation, which could be completed even with only the narration of the facts. In the opposite direction, the simulation line goes from the narration of a scene to its synthesis with the same semantic content into a 3D simulation and the corresponding sensor signals and low/medium events at specific location points.