Since 1985, China’s pelagic fishing industry has developed rapidly, but it still faces some problems, such as old equipment of fishing vessels, outdated fishing techniques, safety hazards, and non-standard behavior. In recent years, with the development of pelagic fishing, the working environment and monitoring of crew members have become increasingly important. However, traditional methods of pelagic human observers suffer from high costs, low coverage, poor timeliness, and susceptibility to subjective factors. In contrast, the Electronic Monitoring System (EMS) has advantages such as continuous operation under various weather conditions; more objective, transparent, and efficient data; and less interference with fishing operations. This paper shows how the 3DCNN model, LSTM+ResNet model, and TimeSformer model are applied to video-classification tasks, and for the first time, they are applied to an EMS. In addition, this paper tests and compares the application effects of the three models on video classification, and discusses the advantages and challenges of using them for video recognition. Through experiments, we obtained the accuracy and relevant indicators of video recognition using different models. The research results show that when NUM_FRAMES is set to 8, the LSTM+ResNet-50 model has the best performance, with an accuracy of 88.47%, an F1 score of 0.8881, and an map score of 0.8133. Analyzing the EMS for pelagic fishing can improve China’s performance level and management efficiency in pelagic fishing, and promote the development of the fishery knowledge service system and smart fishery engineering.