Many software libraries, especially those commercial ones, provide API documentation in natural languages to describe correct API usages. However, developers may still write code that is inconsistent with API documentation, partially because many developers are reluctant to carefully read API documentation as shown by existing research. As these inconsistencies may indicate defects, researchers have proposed various detection approaches, and these approaches need many known specThis paper is a revised, expanded version of a paper (Zhong et al. 2009b) presented at the 24th IEEE/ACM International Conference on Automated Software Engineering Conference (ASE 2009), which won the best paper award of the conference and the ACM SIGSOFT distinguished paper award. The work of this paper was done when Hao Zhong was a PhD student with Peking University under the supervision of Prof. Hong Mei, and the revisions over the previous ASE 2009 paper (Zhong et al. 2009b) were done when Hao Zhong became an assistant professor with Chinese Academy of Sciences since 2009.
228Autom Softw Eng (2011) 18:227-261 ifications. As it is tedious to write specifications manually for all APIs, various approaches have been proposed to mine specifications automatically. In the literature, most existing mining approaches rely on analyzing client code, so these mining approaches would fail to mine specifications when client code is not sufficient. Instead of analyzing client code, we propose an approach, called Doc2Spec, that infers resource specifications from API documentation in natural languages. We evaluated our approach on the Javadocs of five libraries. The results show that our approach performs well on real scale libraries, and infers various specifications with relatively high precisions, recalls, and F-scores. We further used inferred specifications to detect defects in open source projects. The results show that specifications inferred by Doc2Spec are useful to detect real defects in existing projects.