Abstract-The poor locality of operation descriptions expressed in the Web Service Description Language (WSDL) makes them difficult to analyze and compare in web service discovery tasks. This problem has led us to develop a new method for service operation comparison involving contextualizing operation descriptions by inlining related type information from other sections of the service description. In this paper, we show that this contextualization of web service descriptions can enable topic models (statistical techniques for identifying relationships) to produce semantically meaningful results that can be used to reverse engineer service-oriented web systems and automatically identify related web service operations. Specifically, we model contextualized WSDL service operations using Latent Dirichlet Allocation, and show how this approach can be used to more accurately find similar web service operations.Keywords-web services, reverse engineering, topic models I. INTRODUCTION Web services are software components used to communicate over a network. These web services are often described using domain-specific languages, outlining the operations that are available, the type of messages that can be sent, and other information about the provider.The structure of service descriptions written in the Web Service Description Language (WSDL), one such domain-specific language, makes reading and understanding them a difficult task. This problem makes it even more difficult to discover relationships between service operations when considering a large repository of web services. Latent topic models can be used to find these relationships, but without adaptation to the specifics of WSDL, they can produce irrelevant noise. Due to the sparsity of local syntax in WSDL operation descriptions, there are not enough tokens in them to support any meaningful semantic conclusions.In this paper, we use a strategy for restructuring WSDL documents into a set of contextualized operations in an empirical analysis of a repository of web services showing that, by using these contextualized operations, we can find more meaningful relationships when performing Latent Dirichlet Allocation. We use a similarity metric to identify related web service operations based on the derived model, and show by example how significant improvements are made through contextualizing.The main contributions of this paper are: