The overexpansion problem negatively affects the quality of query expansion.To improve the quality of queries for searching code, this paper proposed a DBN-based algorithm for effective query expansion. The deep belief network (DBN) model is trained on the code sequences and their change sequences, which aims to capture the meaningful terms during the evolution of source code. In contrast to previous studies, the proposed model not only extracts relevant terms to expand a query but also excludes irrelevant terms from the query.It addresses two problems in query expansion, including the overexpansion of the original query and the negative influence of the changed terms in the target source code. Experiments on both artificial queries and real queries show that the proposed algorithm outperforms several query expansion algorithms for code search.
KEYWORDSchange sequence, code search, deep learning, query expansion, semantics 1. A query q(A 1 ) contains only one term of A, the search engine does not return m AB or m AC as the query contains too few relevant terms.