Modern distributed computing infrastructure need to process vast quantities of data streams generated by a growing number of participants with information generated in multiple formats. With the Internet of Multimedia Things (IoMT) becoming a reality, new approaches are needed to process realtime multimodal event data streams. Existing approaches to event processing have limited consideration for the challenges of multimodal events, including the need for complex content extraction, increased computational and memory costs. The paper explores event processing as a basis for processing real-time IoMT data. The paper introduces the Multimodal Event Processing (MEP) paradigm, which provides a formal basis for native approaches to neural multimodal content analysis (i.e., computer vision, linguistics, and audition) with symbolic event processing rules to support real-time queries over multimodal data streams using the Multimodal Event Processing Language to express single, primitive multimodal, and complex multimodal event patterns. The content of multimodal streams is represented using Multimodal Event Knowledge Graphs to capture the semantic, spatial, and temporal content of the multimodal streams. The approach is implemented and evaluated within an MEP Engine using single and multimodal queries achieving near real-time performance with a throughput of ~30 fps and sub-second latency of 0.075-0.30 seconds for video streams of 30 fps input rate. Support for high input stream rates (45 fps) is achieved through content-aware load shedding techniques with a ~127X latency improvement resulting in only a minor decrease in accuracy.
Advances in Deep Neural Network (DNN) techniques have revolutionized video analytics and unlocked the potential for querying and mining video event patterns. This paper details GNOSIS, an event processing platform to perform near-real-time video event detection in a distributed setting. GNOSIS follows a serverless approach where its component acts as independent microservices and can be deployed at multiple nodes. GNOSIS uses a declarative query-driven approach where users can write customize queries for spatiotemporal video event reasoning. The system converts the incoming video streams into a continuous evolving graph stream using machine learning (ML) and DNN models pipeline and applies graph matching for video event pattern detection. GNOSIS can perform both stateful and stateless video event matching. To improve Quality of Service (QoS), recent work in GNOSIS incorporates optimization techniques like adaptive scheduling, energy efficiency, and content-driven windows. This paper demonstrates the Occupational Health and Safety query use cases to show the GNOSIS efficacy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.