“…Alternatively, multi-modal approaches that combine different streams of the media content have been also proposed, such as the AXES-LITE video search engine [11], which integrates algorithms for textbased, visual-concept-based and visual-similarity-based retrieval of videos; and, the interactive system of [12], which represents the visual content of a video collection with the help of over 2500 highquality pre-trained semantic concept detectors and applies text analysis on ASR and OCR data, allowing users to do multi-modal text-to-video and video-to-video search in large video collections. Many more interactive video search engines have been presented, e.g., [23], [13], [15] and [20].…”