Techniques for automated classification need to be efficient when applied to large datasets. Machine learning techniques such as neural networks have been successfully applied to this class of problem, but training times can blow out as the size of the database increases. Some of the desirable features of classification algorithms for large databases are linear time complexity, training with only a single pass of the data, and accountability for class assignment decisions. A new training algorithm for classifiers based on the Cerebellar Model Articulation Controller (CMAC) possesses these features. An empirical investigation of this algorithm has found it to be superior to the traditional CMAC training algorithm, both in accuracy and time required to learn mappings between input vectors and class labels. . 299 Int. J. Comp. Intel. Appl. 2006.06:299-313. Downloaded from www.worldscientific.com by NANYANG TECHNOLOGICAL UNIVERSITY on 08/26/15. For personal use only.300
D. CornforthMany algorithms for automated classification have an inherently non-linear relationship between time taken by the algorithm to run and the number of training examples. Analysis methods that work well for small data sets are completely impractical when applied to larger data sets. For example, training of a neural network using back-propagation is known to be NP-complete. 5 Some studies suggest that evolutionary algorithms have polynomial time complexity. 6 The work presented here investigates classification algorithms based on the Cerebellar Model Articulation Controller (CMAC), 7 which have linear time complexity.Global error minimization techniques, such as back-propagation, require multiple traversals of the data set during training. If the training set is very large, it cannot fit inside the memory of the machine. This will result in multiple disk read/write operations, which are relatively costly in time and can contribute greatly to data processing time. Current approaches include compression or summary of the data set before processing, and redesign of analysis tools so that analysis can be completed with only one pass of the data. This paper shows how the original CMAC training algorithm, which normally uses an iterative global error minimization technique, may be adapted so that the training set only needs to be accessed once.The usefulness of a classification algorithm may be enhanced by providing an explanation for each class assignment decision. This could take the form of a set of rules that contribute to the assignment, or a probability for each class, given the input. Black box methods such as neural networks do not naturally lend themselves to this form of analysis. The new algorithm described here provides accountability for class assignment decisions in the form of class probabilities.In this paper, I propose the Kernel Addition Training Algorithm (KATA) as a more effective learning algorithm for the CMAC when used as a classifier. The proposed method requires only a single pass of the data and provides a probability mode...