Thanh Tung Khuat scite author profile

2018

In modern software development processes, software effort estimation plays a crucial role. The success or failure of projects depends greatly on the accuracy of effort estimation and schedule results. Many studies focused on proposing novel models to enhance the accuracy of predicted results; however, the question of accurate estimation of effort has been a challenging issue with regards to researchers and practitioners, especially when it comes to projects using agile methodologies. This study aims at introducing a novel formula based on team velocity and story point factors. The parameters of this formula are then optimized by employing swarm optimization algorithms. We also propose an improved algorithm combining the advantages of the artificial bee colony and particle swarm optimization algorithms. The experimental results indicated that our approaches outperformed methods in other studies in terms of the accuracy of predicted results.

Hyperbox-based machine learning algorithms: a comprehensive survey

2020

Evaluation of Sampling-Based Ensembles of Classifiers on Imbalanced Data for Software Defect Prediction Problems

2020

SN COMPUT. SCI.

Defect prediction in software projects plays a crucial role to reduce quality-based risk and increase the capability of detecting faulty program modules. Hence, classification approaches to anticipate software defect proneness based on static code characteristics have become a hot topic with a great deal of attention in recent years. While several novel studies show that the use of a single classifier causes the performance bottleneck, ensembles of classifiers might effectively enhance classification performance compared to a single classifier. However, the class imbalance property of software defect data severely hinders the classification efficiency of ensemble learning. To cope with this problem, resampling methods are usually combined into ensemble models. This paper empirically assesses the importance of sampling with regard to ensembles of various classifiers on imbalanced data in software defect prediction problems. Extensive experiments with the combination of seven different kinds of classification algorithms, three sampling methods, and two balanced data learning schemata were conducted over ten datasets. Empirical results indicated the positive effects of combining sampling techniques and the ensemble learning model on the performance of defect prediction regarding datasets with imbalanced class distributions.

Ensemble learning for software fault prediction problem with imbalanced data

2019

IJECE

Fault prediction problem has a crucial role in the software development process because it contributes to reducing defects and assisting the testing process towards fault-free software components. <span lang="EN-US">Therefore, there are a lot of efforts aiming to address this type of issues, in which static code characteristics are usually adopted to construct fault classification models. </span> One of the challenging problems influencing the performance of predictive classifiers is the high imbalance among patterns belonging to different classes. This paper aims to integrate the sampling techniques and common classification techniques to form a useful ensemble model for the software defect prediction problem. The empirical results conducted on the benchmark datasets of software projects have shown the promising performance of our proposal in comparison with individual classifiers.

A comparative study of general fuzzy min-max neural networks for pattern classification problems

Gabrys

2020

Neurocomputing

General fuzzy min-max (GFMM) neural network is a generalization of fuzzy neural networks formed by hyperbox fuzzy sets for classification and clustering problems. Two principle algorithms are deployed to train this type of neural network, i.e., incremental learning and agglomerative learning. This paper presents a comprehensive empirical study of performance influencing factors, advantages, and drawbacks of the general fuzzy min-max neural network on pattern classification problems. The subjects of this study include (1) the impact of maximum hyperbox size, (2) the influence of the similarity threshold and measures on the agglomerative learning algorithm, (3) the effect of data presentation order, (4) comparative performance evaluation of the GFMM with other types of fuzzy min-max neural networks and prevalent machine learning algorithms. The experimental results on benchmark datasets widely used in machine learning showed overall strong and weak points of the GFMM classifier. These outcomes also informed potential research directions for this class of machine learning algorithms in the future.Index Terms-general fuzzy min-max, classification, fuzzy minmax neural network, hyperbox, pattern recognition