SUMMARYThis paper proposes a system architecture for event recognition that dynamically integrates information from multiple sources (e.g., multimodal data from visual and auditory sensors). The proposed system consists of multiple event classifiers called Continuous State Machines (CSMs). Each CSM has a state transition rule in a continuous state space and classifies time-varying patterns from a different single source. Since the rule is defined as an extension of Kalman filters (i.e., the next state is deduced from the trade-off scheme between the input data and the model's prediction), CSMs support dynamic time warping and robustness against noise. We then introduce an interaction method among CSMs to classify events from multiple sources. A continuous state space (i.e., vector space) allows us to design interaction as minimization of an energy function. This interaction enables the system to dynamically suppress unreliable classifiers and improves system reliability and the accuracy of classifying events in dynamically changing situations (e.g., the object is temporary occluded from one of multiple cameras in a gesture recognition task). Experimental results on gesture recognition by two cameras show the effectiveness of our proposed system.