Internet of Things (IoT) devices connected to the Internet are exploding, which poses a significant threat for their management and security protection. IoT device identification is a prerequisite for discovering, monitoring, and protecting these devices. Although we can identify the device type easily through grabbing protocol banner information, both brand and model of different types of device are various and diverse. We should therefore utilize multi-protocol probes to improve the fineness of device identification and obtain the corresponding brand and model. However, it is still a challenge to balance between the multi-protocol probe overhead and the identification fineness. To solve this problem, we proposed a time-efficient multi-protocol probe scheme for fine-grain devices identification. We first adopted the concept of reinforcement learning to model the banner-based device identification process into a Markov decision process (MDP). Through the value iteration algorithm, an optimal multi-protocol probe sequence is generated for a type-known IoT device, and then the optimal multi-protocol probes sequence segment is extracted based on the gain threshold of identification accuracy. We took 132,835 webcams as the sample data to experiment. The experimental results showed that our optimal multi-protocol probes sequence segment could reduce the identification time of webcams’ brand and model by 50.76% and achieve the identification accuracy of 90.5% and 92.3% respectively. In addition, we demonstrated that our time-efficient optimal multi-protocol probe scheme could also significantly improve the identification efficiency of other IoT devices, such as routers and printers.