Abstract-This paper describes a new algorithm for finding software clones. It is conceptually independent of the source language of the analyzed programs, working at the level of abstract syntax trees. The algorithm considers that two sequences of statements form a clone if one of them can be obtained from the other by replacing some subtrees. To our knowledge this notion was not previously employed in the literature. It allows to take into account all information on the syntactic structure of a program. We have implemented this algorithm in the tool Clone Digger. It currently supports the Python and Java languages. Clone Digger is free and provided under the GPL license.I. INTRODUCTION Different researchers report that the amount of duplicate code in software systems varies from 6.4% -7.5% to 13% -20% [1]. Duplicate code can occur as a result of approaches to development and maintenance, due to language or programmer limitations, or simply by accident [1]. Code duplication can be a significant drawback, leading to bad design, and increased probability of bug occurrence and propagation. As a result, it can significantly increase maintenance cost (for instance, any bug in the original has to be fixed in all duplicates), and form a barrier for software evolution. Consequently, duplicate code detectors are a useful class of software analysis tools. Such tools can aid in measuring the quality of software systems and in the process of refactoring. Techniques for detecting duplicate code can be classified according to several criteria. Code can be viewed as similar based on syntactic criteria or at a semantic level (from the point of view of execution effects). In this paper we consider only syntactic similarity. Within this category, duplicate clone detection can be performed at different levels of granularity: strings, tokens, abstract syntax trees, feature vectors [1]. The first two are quite rigid and lowlevel, therefore we use an approach based on abstract syntax trees.Two sequences of statements form duplicate code if they are similar enough according to a selected measure of similarity. Such measures can be defined using a set of allowed editing operations and their cost. According to [1] there are three different types of syntactic changes: adding/removing of whitespaces and comments, changing names of variables, and more complex modifications. We aim to detect a wide range of clones, including the third type: e.g., expressions with similar structure.In essence, we wish to characterize the structural similarity of two code fragments in order to determine whether they should be classified as code duplicates. We can formalize this by using the concept of anti-unifier, which denotes the