SUMMARYAspect mining improves the modularity of legacy software systems through identifying their underlying crosscutting concerns (CCs). However, a realistic CC is a composite one that consists of CC seeds and relative program elements, which makes it a great challenge to identify a composite CC. In this paper, inspired by the state‐of‐the‐art information retrieval techniques, we model this problem as a semi‐supervised learning problem. First, the link analysis technique is adopted to generate CC seeds. Second, we construct a coupling graph, which indicates the relationship between CC seeds. Then, we adopt community detection technique to generate groups of CC seeds as constraints for semi‐supervised learning, which can guide the clustering process. Furthermore, we propose a semi‐supervised graph clustering approach named constrained authority‐shift clustering to identify composite CCs. Two measurements, namely, similarity and connectivity, are defined and seeded graph is generated for clustering program elements. We evaluate constrained authority‐shift clustering on numerous software systems including large‐scale distributed software system. The experimental results demonstrate that our semi‐supervised learning is more effective in detecting composite CCs. Copyright © 2013 John Wiley & Sons, Ltd.