Video surveillance has become ubiquitous due to the increasing security requirements in every sphere of life. The next generation video surveillance system (VSS) possesses great challenges in various applications, such as intelligent urban surveillance systems and smart cities. In these applications, we need to deal with the fast-growing number of surveillance nodes which introduce several constraints, e.g., high latency, high bandwidth, high energy consumption, and CPU and memory usage. To address these issues, the Internet of Video Things (IoVT), which is considered to be a part of the Internet of Things (IoT), can be a solution. The IoVT is composed of visual sensors (i.e., cameras) connected to the Internet. Unlike conventional systems, the VSS under an IoVT framework provides multiple layers (i.e., edge, fog, and cloud) of communication and decision making by capturing and analyzing rich contextual and behavioral information. Since an appropriate application layer protocol (ALP) can help in alleviating the challenges of future VSSs, the selection of ALPs is important for IoVT-based systems. Therefore, this paper presents a generic architecture of an IoVT-based VSS and a comparative analysis of several ALPs, such as MQTT, AMQP, HTTP, XMPP, CoAP, and DDS, with real-time experimentation. This analysis will assist the users to choose the appropriate ALPs in various surveillance applications and determine their suitability at different nodes of the IoVT framework.INDEX TERMS Application layer protocols, the Internet of Video Things (IoVT), video analytics, video surveillance.