The development of a question answering (QA) system for application programming interface (API) documentation can greatly facilitate developers in API-related tasks. However, when applying deep learning technology, API QA systems suffer from the spurious solution problem. That is, the answer can literally appear in multiple positions (i.e. start-end indices) in the API documentation, though only one of them (called golden solution) correctly solves the question given its context. The other incorrect candidates (called spurious solutions) hinder the neural network model to learn reasonable solutions or correct answers. In this work, we propose Clean-and-Learn, an effective and robust method for API QA over documents. In order to reduce the spuriousness of candidate solutions used for training, we design several scoring functions to rank the candidate occurrences (clean). Only high-quality (top-[Formula: see text]) candidate solutions are involved in training. Then, we perform multi-task learning by weighing the losses computed from the top-k occurrences (learn). We evaluate our method on the constructed APIQASet dataset. The experiment results show that Clean-and-Learn achieves a ROUGE-L score of 75.8 and accuracy of 70.5% in API QA, which significantly outperforms state-of-the-art approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.