Software development is a collaborative task and, hence, involves different persons. Research has shown the relevance of social aspects in the development team for a successful and satisfying project closure. Especially the mood of a team has been proven to be of particular importance. Thus, project managers or project leaders want to be aware of situations in which negative mood is present to allow for interventions. So-called sentiment analysis tools offer a way to determine the mood based on text-based communication. In this paper, we present the results of a systematic literature review of sentiment analysis tools developed for or applied in the context of software engineering. Our results summarize insights from 80 papers with respect to (1) the application domain, (2) the purpose, (3) the used data sets, (4) the approaches for developing sentiment analysis tools and ( 5) the difficulties researchers face when applying sentiment analysis in the context of software projects. According to our results, sentiment analysis is frequently applied to open-source software projects, and most tools are based on support-vector machines. Despite the frequent use of sentiment analysis in software engineering, there are open issues, e.g., regarding the identification of irony or sarcasm, pointing to future research directions.
CCS CONCEPTS• Software and its engineering → Collaboration in software development; • Human-centered computing → Collaborative and social computing systems and tools.
Social aspects of software projects become increasingly important for research and practice. Different approaches analyze the sentiment of a development team, ranging from simply asking the team to so-called sentiment analysis on text-based communication. These sentiment analysis tools are trained using pre-labeled data sets from different sources, including GitHub and Stack Overflow.In this paper, we investigate if the labels of the statements in the data sets coincide with the perception of potential members of a software project team. Based on an international survey, we compare the median perception of 94 participants with the prelabeled data sets as well as every single participant's agreement with the predefined labels. Our results point to three remarkable findings: (1) Although the median values coincide with the predefined labels of the data sets in 62.5% of the cases, we observe a huge difference between the single participant's ratings and the labels; (2) there is not a single participant who totally agrees with the predefined labels; and (3) the data set whose labels are based on guidelines performs better than the ad hoc labeled data set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.