IntroductionK-nearest neighbor (k-NN) classification is conventional non-parametric classifier, which has been used as the baseline classifier in many pattern classification problems. It is based on measuring the distances between the test data and each of the training data to decide the final classification output.Case descriptionSince the Euclidean distance function is the most widely used distance metric in k-NN, no study examines the classification performance of k-NN by different distance functions, especially for various medical domain problems. Therefore, the aim of this paper is to investigate whether the distance function can affect the k-NN performance over different medical datasets. Our experiments are based on three different types of medical datasets containing categorical, numerical, and mixed types of data and four different distance functions including Euclidean, cosine, Chi square, and Minkowsky are used during k-NN classification individually.Discussion and evaluationThe experimental results show that using the Chi square distance function is the best choice for the three different types of datasets. However, using the cosine and Euclidean (and Minkowsky) distance function perform the worst over the mixed type of datasets.ConclusionsIn this paper, we demonstrate that the chosen distance function can affect the classification accuracy of the k-NN classifier. For the medical domain datasets including the categorical, numerical, and mixed types of data, K-NN based on the Chi square distance function performs the best.
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.
IntroductionMore and more universities are receiving accreditation from the Association to Advance Collegiate Schools of Business (AACSB), which is an international association for promoting quality teaching and learning at business schools. To be accredited, the schools are required to meet a number of standards ensuring that certain levels of teaching quality and students’ learning are met. However, there are a variety of points of view espoused in the literature regarding the relationship between research and teaching, some studies have demonstrated that research and teaching these are complementary elements of learning, but others disagree with these findings.Case descriptionUnlike past such studies, we focus on analyzing the research performance of accredited schools during the period prior to and after receiving accreditation. The objective is to answer the question as to whether performance has been improved by comparing the same school’s performance before and after accreditation. In this study, four AACSB accredited universities in Taiwan are analyzed, including one teaching oriented and three research oriented universities. Research performance is evaluated by comparing seven citation statistics, the number of papers published, number of citations, average number of citations per paper, average citations per year, h-index (annual), h-index, and g-index.Discussion and evaluationThe analysis results show that business schools demonstrated enhanced research performance after AACSB accreditation, but in most accredited schools the proportion of faculty members not actively doing research is larger than active ones.ConclusionThis study shows that the AACSB accreditation has a positive impact on research performance. The findings can be used as a reference for current non-accredited schools whose research goals are to improve their research productivity and quality.
Purpose: Data mining is widely considered necessary in many business applications for effective decision making. The importance of business data mining is reflected by the existence of numerous surveys in the literatures focusing on the investigation of related works using data mining techniques for solving specific business problems. However, there has been no recent study answering the following question: What are the widely used data mining techniques in business applications? Design/methodology/approach: The aim of this paper is to examine related surveys in the literature and thus to identify the frequently applied data mining techniques. To ensure the recent relevance and quality of the conclusions, the criterion for selecting related studies are that the works be published in reputed journals within the past 10 years. Findings: There are 33 different data mining techniques employed in eight different application areas. Most of them are supervised learning techniques and the application area where such techniques are most often been is bankruptcy prediction, followed by the areas of customer relationship management, fraud detection, intrusion detection, and recommender systems. Furthermore, the widely used 10 data mining techniques for business applications are the decision tree (including C4.5 and CART), genetic algorithm, k-nearest neighbor, multilayer perceptron neural network, naïve Bayes, and support vector machine as the supervised learning techniques and association rule, expectation maximization, and k-means as the unsupervised learning techniques. Originality/value: The originality of this paper is to survey the recent ten years of related survey and review articles about data mining in business applications in order to identify the most popular techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.