Video surveillance plays an important role in our times. It is a great help in reducing the crime rate, and it can also help monitoring the status of facilities or animals. The contribution video surveillance is significant especially regarding the public and home safety.However, the performance of video surveillance system is limited by human factors such as fatigue, time efficiency, and human resources. It would be beneficial for all if fully automatic video surveillance systems are employed to do the job. With this aim, many commercial products and research prototypes have been developed for the automation of video surveillance. Although progress has been made, the performance and ability of the emerging automatic surveillance systems are still not satisfying regarding two major unsolved problems: 1) The ability to work in multiple domains; 2)The ability to understand and describe the events like the human.On the one hand, special domains such as the thermal camera or hidden camera in the wild forest are seldom considered today. We can develop great potential usage for these novel domains. For example, we can catch the thief even if the light is turned off at night with a thermal face recognition surveillance system. We can also study surveillance system with wildlife without worrying about privacy issues with the human. And surveillance in wildlife domain is a great help to zoologists and biologists. In a short word, the ability to work under different circumstances is an important sign of an advanced surveillance system and it can have numerous benefits.On the other hand, existing surveillance systems lack the event understanding and description ability. For example, current commercialised smart surveillance systems can only detect motion changes or human faces. They are far away from the ideal system that can understand the meaning of the motion in the perspective of events. Even the research prototypes which have the abilities of event recognition and anomaly detection are not able to deal with unseen events or understand relationships between roles in an event. For instance, when the detector of the smart surveillance system detected keywords "person", "baby" and "feed", we do not want the system to report "a person feeding a baby" when the actual screen is a scene where the baby is trying to share the food.This thesis aims to explore the frontier of smart video surveillance systems by investigating the above-mentioned two problems. In this thesis, several methods are proposed to study video surveillance in various domains such as crowded scene domain, and more importantly, in novel domains that are barely investigated such as thermal imaging and wildlife domain. Also, novel methods of event analysis and video description are proposed to enhance the efficiency of human-machine interaction.All the methods are connected in the scope of smart video surveillance framework. With the novel directions, three new datasets are built in this thesis as well. Enhancing the smart video surveillance system wit...