Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Repairing code smells detected in the code or design of the system is one of the activities contributing to increasing the software quality. In this study, we investigate the impact of non-numerical information of software, such as project status information combined with machine learning techniques, on improving code smell detection. For this purpose, we constructed a dataset consisting of 22 systems with various project statuses, 12,040 classes, and 18 features that included 1935 large classes. A set of experiments was conducted with ten different machine learning techniques by dividing the dataset into training, validation, and testing sets to detect the large class code smell. Feature selection and data balancing techniques have been applied. The classifier’s performance was evaluated using six indicators: precision, recall, F-measure, MCC, ROC area, and Kappa tests. The preliminary experimental results reveal that feature selection and data balancing have poor influence on the accuracy of machine learning classifiers. Moreover, they vary their behavior when utilized in sets with different values for the selected project status information of their classes. The average value of classifiers performance when fed with status information is better than without. The Random Forest achieved the best behavior according to all performance indicators (100%) with status information, while AdaBoostM1 and SMO achieved the worst in most of them (> 86%). According to the findings of this study, providing machine learning techniques with project status information about the classes to be analyzed can improve the results of large class detection.
Repairing code smells detected in the code or design of the system is one of the activities contributing to increasing the software quality. In this study, we investigate the impact of non-numerical information of software, such as project status information combined with machine learning techniques, on improving code smell detection. For this purpose, we constructed a dataset consisting of 22 systems with various project statuses, 12,040 classes, and 18 features that included 1935 large classes. A set of experiments was conducted with ten different machine learning techniques by dividing the dataset into training, validation, and testing sets to detect the large class code smell. Feature selection and data balancing techniques have been applied. The classifier’s performance was evaluated using six indicators: precision, recall, F-measure, MCC, ROC area, and Kappa tests. The preliminary experimental results reveal that feature selection and data balancing have poor influence on the accuracy of machine learning classifiers. Moreover, they vary their behavior when utilized in sets with different values for the selected project status information of their classes. The average value of classifiers performance when fed with status information is better than without. The Random Forest achieved the best behavior according to all performance indicators (100%) with status information, while AdaBoostM1 and SMO achieved the worst in most of them (> 86%). According to the findings of this study, providing machine learning techniques with project status information about the classes to be analyzed can improve the results of large class detection.
Code smells are early warning signs of potential issues in software quality. Various techniques are used in code smell detection, including the Bayesian approach, rule-based automatic antipattern detection, antipattern identification utilizing B-splines, Support Vector Machine direct, SMURF (Support Vector Machines for design smell detection using relevant feedback), and immune-based detection strategy. Machine learning (ML) has taken a great stride in this area. This study includes relevant studies applying ML algorithms from 2005 to 2024 in a comprehensive manner for the survey to provide insight regarding code smell, ML algorithms frequently applied, and software metrics. Forty-two pertinent studies allow us to assess the efficacy of ML algorithms on selected datasets. After evaluating various studies based on open-source and project datasets, this study evaluated additional threats and obstacles to code smell detection, such as the lack of standardized code smell definitions, the difficulty of feature selection, and the challenges of handling large-scale datasets. The current studies only considered a few factors in identifying code smells, while in this study, several potential contributing factors to code smells are included. Several ML algorithms are examined, and various approaches, datasets, dataset languages, and software metrics are presented. This study provides the potential of ML algorithms to produce better results and fills a gap in the body of knowledge by providing class-wise distributions of the ML algorithms. Support Vector Machine, J48, Naive Bayes, and Random Forest models are the most common for detecting code smells. Researchers can find this study helpful in better anticipating and taking care of software development design and implementation issues. The findings from this study, which highlight the practical implications of ML algorithms in software quality improvement, will help software engineers fix problems during software design and development to ensure software quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.