Evaluation and benchmarking of skin detectors are challenging tasks because of multiple evaluation attributes and conflicting criteria. Although several evaluating and benchmarking techniques have been proposed, these approaches have many limitations. Fixing several attributes based on multi-attribute benchmarking approaches is particularly limited to reliable skin detection. Thus, this study aims to develop a new framework for evaluating and benchmarking skin detection on the basis of artificial intelligent models using multi-criteria analysis. For this purpose, two experiments are conducted. The first experiment consists of two stages: (1) discussing the development of a skin detector using multi-agent learning based on different color spaces to create a dataset of various color space samples for benchmarking and (2) discussing the evaluation and testing the developed skin detector according to multi-evaluation criteria (i.e. reliability, time complexity, and error rate within dataset) to create a decision matrix. The second experiment applies different decision-making techniques (AHP/SAW, AHP/MEW, AHP/HAW, AHP/TOPSIS, AHP/WSM, and AHP/WPM) to benchmark the results of the first experiment (i.e. the developed skin detector). Then, we discuss the use of the mean, standard deviation, and paired sample [Formula: see text]-test to measure the correlations among the different techniques based on ranking results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.