The article concerns the problem of classification based on independent data sets—local decision tables. The aim of the paper is to propose a classification model for dispersed data using a modified k-nearest neighbors algorithm and a neural network. A neural network, more specifically a multilayer perceptron, is used to combine the prediction results obtained based on local tables. Prediction results are stored in the measurement level and generated using a modified k-nearest neighbors algorithm. The task of neural networks is to combine these results and provide a common prediction. In the article various structures of neural networks (different number of neurons in the hidden layer) are studied and the results are compared with the results generated by other fusion methods, such as the majority voting, the Borda count method, the sum rule, the method that is based on decision templates and the method that is based on theory of evidence. Based on the obtained results, it was found that the neural network always generates unambiguous decisions, which is a great advantage as most of the other fusion methods generate ties. Moreover, if only unambiguous results were considered, the use of a neural network gives much better results than other fusion methods. If we allow ambiguity, some fusion methods are slightly better, but it is the result of this fact that it is possible to generate few decisions for the test object.
This study concerns dispersed data stored in independent local tables with different sets of attributes. The paper proposes a new method for training a single neural network—a multilayer perceptron based on dispersed data. The idea is to train local models that have identical structures based on local tables; however, due to different sets of conditional attributes present in local tables, it is necessary to generate some artificial objects to train local models. The paper presents a study on the use of varying parameter values in the proposed method of creating artificial objects to train local models. The paper presents an exhaustive comparison in terms of the number of artificial objects generated based on a single original object, the degree of data dispersion, data balancing, and different network structures—the number of neurons in the hidden layer. It was found that for data sets with a large number of objects, a smaller number of artificial objects is optimal. For smaller data sets, a greater number of artificial objects (three or four) produces better results. For large data sets, data balancing and the degree of dispersion have no significant impact on quality of classification. Rather, a greater number of neurons in the hidden layer produces better results (ranging from three to five times the number of neurons in the input layer).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.