There are lots of different software metrics discovered and used for defect prediction in the literature. Instead of dealing with so many metrics, it would be practical and easy if we could determine the set of metrics that are most important and focus on them more to predict defectiveness. We use Bayesian networks to determine the probabilistic influential relationships among software metrics and defect proneness. In addition to the metrics used in Promise data repository, we define two more metrics, i.e. NOD for the number of developers and LOCQ for the source code quality. We extract these metrics by inspecting the source code repositories of the selected Promise data repository data sets. At the end of our modeling, we learn the marginal defect proneness probability of the whole software system, the set of most effective metrics, and the influential relationships among metrics and defectiveness. Our experiments on nine open source Promise data repository data sets show that response for class (RFC), lines of code (LOC), and lack of coding quality (LOCQ) are the most effective metrics whereas coupling between objects (CBO), weighted method per class (WMC), and lack of cohesion of methods (LCOM) are less effective metrics on defect proneness. Furthermore, number of children (NOC) and depth of inheritance tree (DIT) have very limited effect and are untrustworthy. On the other hand, based on the experiments on Poi, Tomcat, and Xalan data sets, we observe that there is a positive correlation between the number of developers (NOD) and the level of defectiveness. However, further investigation involving a greater number of projects is needed to confirm our findings.
The sophistication of cyberattacks penetrating into enterprise networks has called for predictive defense beyond intrusion detection, where different attack strategies can be analyzed and used to anticipate next malicious actions, especially the unusual ones. Unfortunately, traditional predictive analytics or machine learning techniques that require training data of known attack strategies are not practical, given the scarcity of representative data and the evolving nature of cyberattacks. This paper describes the design and evaluation of a novel automated system, ASSERT, which continuously synthesizes and separates cyberattack behavior models to enable better prediction of future actions. It takes streaming malicious event evidences as inputs, abstracts them to edge-based behavior aggregates, and associates the edges to attack models, where each represents a unique and collective attack behavior. It follows a dynamic Bayesian-based model generation approach to determine when a new attack behavior is present, and creates new attack models by maximizing a cluster validity index. ASSERT generates empirical attack models by separating evidences and use the generated models to predict unseen future incidents. It continuously evaluates the quality of the model separation and triggers a re-clustering process when needed. Through the use of 2017 National Collegiate Penetration Testing Competition data, this work demonstrates the effectiveness of ASSERT in terms of the quality of the generated empirical models and the predictability of future actions using the models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.