After Action Reports (AARs) provide incisive analysis of cyber-incidents. Extracting cyberknowledge from these sources would provide security analysts with credible information, which they can use to detect or find patterns indicative of a cyber-attack. In this paper, we describe a system to extract information from AARs, aggregate the extracted information by fusing similar entities together, and represent that extracted information in a Cybersecurity Knowledge Graph (CKG). We extract entities by building a customized named entity recognizer called 'Malware Entity Extractor' (MEE). We then build a neural network to predict how pairs of 'malware entities' are related to each other. When we have predicted entity pairs and the relationship between them, we assert the 'entity-relationship set' in a CKG. Our next step in the process is to fuse similar entities, to improve our CKG. This fusion helps represent intelligence extracted from multiple documents and reports. The fused CKG has knowledge from multiple AARs, with relationships between entities extracted from separate reports. As a result of this fusion, a security analyst can execute queries and retrieve better answers on the fused CKG, than a knowledge graph with no fusion. We also showcase various reasoning capabilities that can be leveraged by a security analyst using our fused CKG.
As machine-learning (ML) based systems for malware detection become more prevalent, it becomes necessary to quantify the benefits compared to the more traditional anti-virus (AV) systems widely used today. It is not practical to build an agreed upon test set to benchmark malware detection systems on pure classification performance. Instead we tackle the problem by creating a new testing methodology, where we evaluate the change in performance on a set of known benign & malicious files as adversarial modifications are performed. The change in performance combined with the evasion techniques then quantifies a system's robustness against that approach. Through these experiments we are able to show in a quantifiable way how purely ML based systems can be more robust than AV products at detecting malware that attempts evasion through modification, but may be slower to adapt in the face of significantly novel attacks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.