The well-known methods of scene extraction on video are focused on analyzing the similarity between frames. However, they do not all analyze the composition of the image scene, which may remain the same during maintenance. Therefore, this paper proposes an algorithm for equipment maintenance scene detection based on human hand tracking. It is based on the assumption that, when servicing technological equipment, it is possible to determine the change in repair action by the position of the service engineer’s hands. Thus, certain information and the algorithm that processes these changes allow us to segment the video into actions performed during the service. We process the time series obtained by moving the hand position using spectral singular value decomposition for multivariate time series. To verify the algorithm, we performed maintenance on the control cabinet of a mining conveyor and recorded the work on a first-person video, which was processed using the developed method. As a result, we obtained some scenes corresponding to opening the control cabinet, de-energizing the unit, and checking the contacts with a multimeter buzzer test. A third-person video of motor service was similarly processed. The algorithm demonstrated the results in separate scenes of removing screws, working with a multimeter, and disconnecting and replacing motor parts.