Background: Traditional data classification techniques usually divide the data space into sub-spaces, each representing a class. Such a division is carried out considering only physical attributes of the training data (e.g., distance, similarity, or distribution). This approach is called low-level classification. On the other hand, network or graph-based approach is able to capture spacial, functional, and topological relations among data, providing a so-called high-level classification. Usually, network-based algorithms consist of two steps: network construction and classification. Despite that complex network measures are employed in the classification to capture patterns of the input data, the network formation step is critical and is not well explored. Some of them, such as K-nearest neighbors algorithm (KNN) and -radius, consider strict local information of the data and, moreover, depend on some parameters, which are not easy to be set. Methods: We propose a network-based classification technique, named high-level classification on K-associated optimal graph (HL-KAOG), combining the K-associated optimal graph and high-level prediction. In this way, the network construction algorithm is non-parametric, and it considers both local and global information of the training data. In addition, since the proposed technique combines low-level and high-level terms, it classifies data not only by physical features but also by checking conformation of the test instance to formation pattern of each class component. Computer simulations are conducted to assess the effectiveness of the proposed technique.
Results:The results show that a larger portion of the high-level term is required to get correct classification when there is a complex-formed and well-defined pattern in the data set. In this case, we also show that traditional classification algorithms are unable to identify those data patterns. Moreover, computer simulations on real-world data sets show that HL-KAOG and support vector machines provide similar results and they outperform well-known techniques, such as decision trees and K-nearest neighbors.
Conclusions:The proposed technique works with a very reduced number of parameters and it is able to obtain good predictive performance in comparison with traditional techniques. In addition, the combination of high level and low level algorithms based on network components can allow greater exploration of patterns in data sets.