The new paradigm of service-oriented computing facilitates easy construction of dynamic, complex distributed systems. Recent research has shown that machine learning methods can be a promising way to autonomously and accurately derive models to assist autonomic management software or humans in understanding system behaviors and making informed decisions. However, the efficacy of different machine learning techniques in describing various system behaviors and meeting distinct application needs has not been systematically understood. Such an understanding can prove crucial in management infrastructure design and implementation for service-oriented systems.This paper is an initial step to bridge the gap and specifically contrasts the applications of Bayesian networks (BN) and neural networks (NN) in modeling the response time of service-oriented systems. Relatively simple BN and NN models are designed and implemented as a base of the comparison study. As far as model performance is concerned, a wide range of simulations show that BNs offer better accuracy, are less sensitive to small data set size and are therefore more suited for environments that change rapidly and need frequent response time model reconstructions; whereas NNs can achieve faster model evaluation time and support management routines that demand intensive response time predictions. From a non-performance perspective, it is analytically concluded that BNs can be more easily understood by human and support multi-direction evaluation, while NNs provide more flexible response time representation.
In distributed, service-oriented environments, performance problem localization is required to provide self-healing capabilities and deliver the desired quality of service (QoS). This paper presents an automated approach to identifying system elements causing performance problems. Applying probabilistic inference to collected response time and elapsed time data, the approach 1) infers elapsed time for services where data is missing, 2) estimates the response time degradation caused by different services using the duration, abnormality and response time correlation of their elapsed times, and 3) identifies the services that are the most important causes of slow response time and yield the most benefit if recovered. The approach has been used to localize a performance problem on the test bed of a real-world serviceoriented Grid. Evaluation using simulations shows that the approach consistently achieves better accuracy than traditional techniques in various service-oriented settings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.