Categorizing software applications for maintenance

McMillan, Collin; Linares‐Vásquez, Mario; Poshyvanyk, Denys; Grechanik, Mark

doi:10.1109/icsm.2011.6080801

Cited by 49 publications

(34 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These actionable guidelines are also pertinent to studies/approaches on software categorization [11,13,16], in which the lexical information in bytecode or source code is used to categorize the apps; given the widespread use of third-party libraries, such as Google Ads or Facebook for Android using the identifiers extracted from those libraries can reduce the variance and consequently impact the categorization process. In addition, studies aimed at identifying similar apps [15], which use non-textual based detection, should also consider the impact of third-party libraries and obfuscation practices.…”

Section: Discussionmentioning

confidence: 99%

Revisiting Android reuse studies in the context of code obfuscation and library usages

Linares‐Vásquez

Holtzhauer

Bernal-Cárdenas

et al. 2014

Proceedings of the 11th Working Conference on Mining Software Repositories

Self Cite

View full text Add to dashboard Cite

In the recent years, studies of design and programming practices in mobile development are gaining more attention from researchers. Several such empirical studies used Android applications (paid, free, and open source) to analyze factors such as size, quality, dependencies, reuse, and cloning. Most of the studies use executable files of the apps (APK files), instead of source code because of availability issues (most of free apps available at the Android official market are not open-source, but still can be downloaded and analyzed in APK format). However, using only APK files in empirical studies comes with some threats to the validity of the results.In this paper, we analyze some of these pertinent threats. In particular, we analyzed the impact of third-party libraries and code obfuscation practices on estimating the amount of reuse by class cloning in Android apps. When including and excluding third-party libraries from the analysis, we found statistically significant differences in the amount of class cloning 24,379 free Android apps. Also, we found some evidence that obfuscation is responsible for increasing a number of false positives when detecting class clones. Finally, based on our findings, we provide a list of actionable guidelines for mining and analyzing large repositories of Android applications and minimizing these threats to validity.

show abstract

Section: Discussionmentioning

confidence: 99%

Revisiting Android reuse studies in the context of code obfuscation and library usages

Linares‐Vásquez

Holtzhauer

Bernal-Cárdenas

et al. 2014

Proceedings of the 11th Working Conference on Mining Software Repositories

Self Cite

View full text Add to dashboard Cite

show abstract

“…[9]- [11] focus on analysing identifiers and comment terms in the source code to do categorization. McMillan et.al [12], [13] propose a brand-new approach which leverages the third-party API calls in the program as semantic anchors to categorize software. In these works, most of them only experiment on relatively small collections of projects with flat and coarse-grained categories like "Internet" and "Games/Entertainment".…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Categorization of Open Source Software by Online Profiles

Wang

Yin

et al. 2014

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThe large amounts of freely available open source software over the Internet are fundamentally changing the traditional paradigms of software development. Efficient categorization of the massive projects for retrieving relevant software is of vital importance for Internet-based software development such as solution searching, best practices learning and so on. Many previous works have been conducted on software categorization by mining source code or byte code, but were verified on only relatively small collections of projects with coarse-grained categories or clusters. However, Internet-based software development requires finer-grained, more scalable and language-independent categorization approaches. In this paper, we propose a novel approach to hierarchically categorize software projects based on their online profiles. We design a SVMbased categorization framework and adopt a weighted combination strategy to aggregate different types of profile attributes from multiple repositories. Different basic classification algorithms and feature selection techniques are employed and compared. Extensive experiments are carried out on more than 21,000 projects across five repositories. The results show that our approach achieves significant improvements by using weighted combination. Compared to the previous work, our approach presents competitive results with more finer-grained and multi-layered category hierarchy with more than 120 categories. Unlike approaches that use source code or byte code, our approach is more effective for large-scale and languageindependent software categorization. In addition, experiments suggest that hierarchical categorization combined with general keyword-based searching improves the retrieval efficiency and accuracy.

show abstract

“…They treat every software system as a document consisted of a collection of words including code identifiers and comments parsed from source code, cluster topics based on topic similarities and categorize software by software-topic matrices. McMillan et al [14] use API calls from third-party libraries as attributes for automatic categorization of software applications. They chose decision trees, naïve Bayes and support vector machines (SVM) to categorize applications, and find that SVM is most-effective.…”

Section: Related Workmentioning

confidence: 99%

“…The software engineering community has conducted plenty of efforts on discovering or retrieval of related software by mining source code identifiers [8,11,20], word frequencies [10], source code comments [19], API calls [14] and hybrid artifacts [1,3]. These works mainly concentrated on a few project repositories while little attention has been paid to the project profiles in global communities and made no use of software labels.…”

Section: Introductionmentioning

confidence: 99%

Labeled topic detection of open source software from mining mass textual project profiles

Wang

Yin

et al. 2012

Proceedings of the First International Workshop on Software Mining

View full text Add to dashboard Cite

Nowadays open source software has become an indispensable basis for both individual and industrial software engineering. Various kinds of labeling mechanisms like categories, keywords and tags are used in open source communities to annotate projects and facilitate the discovery of certain software. However, as large amounts of software are attached with no/few labels or the existing labels are from different ontology space, it is still hard to retrieve potentially topic-relevant software. This paper highlights the valuable semantic information of project descriptions and labels, proposes labeled software topic detection (LSTD), a hybrid approach combining topic models and ranking mechanisms to detect and enrich the topics of software by mining the large amount of textual software profiles, which can be employed to do software categorization and tag recommendation. L-STD makes use of labeled LDA to capture the semantic correlations between labels and descriptions and then construct the label-based topic-word matrix. Based on the generated matrix and the generality of labels, LSTD designs a simple yet efficient algorithm to detect the latent topics of software that expressed as relevant and popular labels. Comprehensive evaluations are conducted on the large-scale datasets of representative open source communities and the results validate the effectiveness of LSTD.

show abstract

Categorizing software applications for maintenance

Cited by 49 publications

References 24 publications

Revisiting Android reuse studies in the context of code obfuscation and library usages

Revisiting Android reuse studies in the context of code obfuscation and library usages

Hierarchical Categorization of Open Source Software by Online Profiles

Labeled topic detection of open source software from mining mass textual project profiles

Contact Info

Product

Resources

About