Recovering traceability links between source code and software documentation is an important research topic in software maintenance and software reuse. There have been a lot of research efforts in recovering traceability between documentation and code elements (class, interface, method, etc.), mostly based on program analysis. However, there are still a lot of noise links being established in existing work. In this paper, we propose a novel approach to classifying code elements, occurring in a document, into contextual code elements and salient code elements. As a result, we can filter the noise traceability links between a software document and its contextual code elements and get a higher quality link set. Our classifier is trained based on open source project Lucene's source code and 1899 StackOverflow answer documents about Lucene. We extract code elements from these documents and represent each of these code elements with a 7-dimension feature vector, then we use a decision-tree-based learning model to classify them as salient or not. In the experiments, we get a precision of 70.7% in recognizing the salient code elements of these documents and get 12% improvement compared with Rigby's work. We can filter out 56.5%∼69.3% noise traceability links with different thresholds in our classifier. It can improve the quality of traceability links between source code and their related software documents in application.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.