Surveillance cameras are inexpensive and everywhere these days but the manpower required to monitor and analyze them is expensive. Consequently the videos from these cameras are usually monitored sparingly or not at all; they are often used merely as archive, to refer back to once an incident is known to have taken place. Surveillance cameras can be a far more useful tool if instead of passively recording footage, they can be used to detect events requiring attention as they happen, and take action in real time. This is the goal of automated visual surveillance: to obtain a description of what is happening in a monitored area, and then to take appropriate action based on that interpretation. Video surveillance for humans is one of the most active research topics in computer vision. It has a wide spectrum of promising homeland security applications. Video management and interpretation systems have become quite capable in recent years. This paper looks into how hardware and software can be put together to solve surveillance problems in an age of increased concern with public safety and security. In general, the framework of a video surveillance system includes the following stages: modeling of environments, detection of motion, classification of moving objects, tracking, behavior understanding and description, and fusion of information from multiple cameras. Despite recent progress in computer vision and other related areas, there are still major technical challenges to be overcome before reliable automated video surveillance can be realized. This paper reviews developments and general strategies of stages involved in video surveillance, and analyzes the feasibility and challenges for combining motion analysis, behavior analysis, and standoff biometrics for identification of known suspects, anomaly detection, and behavior understanding.