The high-volume, low-latency world of network traffic presents significant obstacles for complex analysis techniques. The unique challenge of adapting powerful but high-latency models to realtime network streams is the basis of our cyber security project. In this paper we discuss our use of NoSQL databases in a framework that enables the application of computationally expensive models against a real-time network data stream. We describe how this approach transforms the highly constrained (and sometimes arcane) world of real-time network analysis into a more developer friendly model that relaxes many of the traditional constraints associated with streaming data. Our primary use of the system is for conducting streaming text analysis and classification activities on a network link receiving ~200,000 emails per day.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.