This paper reports on methods and results of an applied research project by a team consisting of SAIC and four universities to develop, integrate, and evaluate new approaches to detect the weak signals characteristic of insider threats on organizations' information systems. Our system combines structural and semantic information from a real corporate database of monitored activity on their users' computers to detect independently developed red team inserts of malicious insider activities. We have developed and applied multiple algorithms for anomaly detection based on suspected scenarios of malicious insider behavior, indicators of unusual activities, high-dimensional statistical patterns, temporal sequences, and normal graph evolution. Algorithms and representations for dynamic graph processing provide the ability to scale as needed for enterpriselevel deployments on real-time data streams. We have also developed a visual language for specifying combinations of features, baselines, peer groups, time periods, and algorithms to detect anomalies suggestive of instances of insider threat behavior. We defined over 100 data features in seven categories based on approximately 5.5 million actions per day from approximately 5,500 users. We have achieved area under the ROC curve values of up to 0.979 and lift values of 65 on the top 50 user-days identified on two months of real data.
This paper reports the first set of results from a comprehensive set of experiments to detect realistic insider threat instances in a real corporate database of computer usage activity. It focuses on the application of domain knowledge to provide starting points for further analysis. Domain knowledge is applied (1) to select appropriate features for use by structural anomaly detection algorithms, (2) to identify features indicative of activity known to be associated with insider threat, and (3) to model known or suspected instances of insider threat scenarios. We also introduce a visual language for specifying anomalies across different types of data, entities, baseline populations, and temporal ranges. Preliminary results of our experiments on two months of live data suggest that these methods are promising, with several experiments providing area under the curve scores close to 1.0 and lifts ranging from x20 to x30 over random.
Abstract-This paper reports results from a set of experiments that evaluate an insider threat detection prototype on its ability to detect scenarios that have not previously been seen or contemplated by the developers of the system. We show the ability to detect a large variety of insider threat scenario instances imbedded in real data with no prior knowledge of what scenarios are present or when they occur. We report results of an ensemble-based, unsupervised technique for detecting potential insider threat instances over eight months of real monitored computer usage activity augmented with independently developed, unknown but realistic, insider threat scenarios that robustly achieves results within 5% of the best individual detectors identified after the fact. We explore factors that contribute to the success of the ensemble method, such as the number and variety of unsupervised detectors and the use of prior knowledge encoded in scenario-based detectors designed for known activity patterns. We report results over the entire period of the ensemble approach and of ablation experiments that remove the scenario-based detectors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.