What good are surveillance videos without knowing what objects are there? Object classification has been actively researched for images and more recently, for videos, but not in the long-term sense. Videos that span a long period of time has its arduous challenges in such a task. This paper intends to bridge that gap by exploring object classification in long-term surveillance videos. In this work, we introduce a complete framework for processing longterm surveillance videos with the aim of classifying moving objects into five distinct classes commonly found in these scenes. With effective extraction of moving objects and track creation, object features are then encoded in a bag-of-words model before performing classification. Extensive experiments were conducted on a selected portion of the recent LOST dataset. With state-of-the-art PHOW features, we are able to achieve the highest accuracy of around 92% using a trackbased classification scheme that is robust against potential frame-level misclassifications.