Background: The surge in data generation has led to a paradigm shift towards big data, where the belief that “more data equals better performance” is challenged by limitations in processing capabilities and time. In this evolving landscape of machine learning and artificial intelligence, instance selection (IS) has become a crucial technique for data reduction that does not compromise the quality of machine learning models. Traditional IS methods, while efficient, often struggle with the complexity and size of large datasets encountered in data mining.
Objective: This study aims to review and evaluate graph reduction techniques, grounded in graph theory, as a novel approach for instance selection. The objective is to leverage the inherent structures of data represented as graphs to enhance the effectiveness of instance selection.
Methods: We conducted a comprehensive evaluation of 35 graph reduction techniques across 29 diverse classification datasets. These techniques were assessed based on various metrics, including accuracy, F1 score, reduction rate, and computational times. The study spans a wide array of techniques and compares their performance to provide a thorough understanding of their suitability for different data sizes and types.
Results: The evaluation revealed significant potential in graph reduction methods, particularly in maintaining data integrity while achieving substantial reductions. The performance metrics indicate that these techniques can be highly effective, offering substantial improvements in various aspects of instance selection.
Conclusion: This research contributes to the theoretical framework of graph-based instance selection and provides practical guidelines for applying these techniques in real-world scenarios. Our findings suggest that graph reduction methods are promising for preserving data quality and enhancing the efficiency of data processing in large and complex datasets.