We address the problem of uncooperative person recognition through continuous monitoring. Multiple modalities, such as face, height, clothes color, and voice, can be used when attempting to recognize a person. In general, not all modalities are available for a given frame; furthermore, only some modalities will be useful as some frames in a video sequence are of a quality that is too low to be able to recognize a person. We propose a method that makes use of stochastic information updates of temporal modalities and environment estimators to improve person recognition performance. The environment estimators provide information on whether a given modality is reliable enough to be used in a particular instance; such indicators mean that we can easily identify and eliminate meaningless data, thus increasing the overall efficiency of the method. Our proposed method was tested using movie clips acquired under an unconstrained environment that included a wide variation of scale and rotation; illumination changes; uncontrolled distances from a camera to users (varying from 0.5 m to 5 m); and natural views of the human body with various types of noise. In this real and challenging scenario, our proposed method resulted in an outstanding performance.