Summarization of software artifacts is an ongoing field of research among the software engineering community due to the benefits that summarization provides like saving of time and efforts in various software engineering tasks like code search, duplicate bug reports detection, traceability link recovery, etc. Summarization is to produce short and concise summaries. The paper presents the review of the state of the art of summarization techniques in software engineering context. The paper gives a brief overview to the software artifacts which are mostly used for summarization or have benefits from summarization. The paper briefly describes the general process of summarization. The paper reviews the papers published from 2010 to June 2017 and classifies the works into extractive and abstractive summarization. The paper also reviews the evaluation techniques used for summarizing software artifacts. The paper discusses the open problems and challenges in this field of research. The paper also discusses the future scopes in this area for new researchers.
With the increasing popularity of open-source platforms, software data is easily available from various open-source tools like GitHub, CVS, SVN, etc. More than 80 percent of the data present in them is unstructured. Mining data from these repositories helps project managers, developers and businesses, in getting interesting insights. Most of the software artefacts present in these repositories are in the natural language form, which makes natural language processing (NLP) an important part of mining to get the useful results. The paper reviews the application of NLP techniques in the field of Mining Software Repositories (MSR). The paper mainly focuses on sentiment analysis, summarization, traceability, norms mining and mobile analytics. The paper presents the major NLP works performed in this area by surveying the research papers from 2000 to 2018. The paper firstly describes the major artefacts present in the software repositories where the NLP techniques have been applied. Next, the paper presents some popular open-source NLP tools that have been used to perform NLP tasks. Later the paper discusses, in brief, the research state of NLP in MSR field. The paper also lists down the various challenges along with the pointers for future work in this field of research and finally the conclusion.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.