A method to monitor elderly people in an indoor environment using conventional cameras is presented. The method can be used to identify people's activities and initiate suitable actions as needed. The originality of our approach is in combining spatial and temporal contexts with the position and orientation for the detected person. Preliminary evaluation, based only on the first two features (spatial and temporal), achieved the accuracy over 60% in a realistic residential environment. Although the results are based on using only two out of the four proposed input features, they already demonstrate a promising improvement over using a single feature in isolation.