Instance segmentation has gained attention in various computer vision fields, such as autonomous driving, drone control, and sports analysis. Recently, many successful models have been developed, which can be classified into two categories: accuracy- and speed-focused. Accuracy and inference time are important for real-time applications of this task. However, these models just present inference time measured on different hardware, which makes their comparison difficult. This study is the first to evaluate and compare the performances of state-of-the-art instance segmentation models by focusing on their inference time in a fixed experimental environment. For precise comparison, the test hardware and environment should be identical; hence, we present the accuracy and speed of the models in a fixed hardware environment for quantitative and qualitative analyses. Although speed-focused models run in real-time on high-end GPUs, there is a trade-off between speed and accuracy when the computing power is insufficient. The experimental results show that a feature pyramid network structure may be considered when designing a real-time model, and a balance between the speed and accuracy must be achieved for real-time application.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.