With the growth of sensing, control and robotic technologies, autonomous underwater vehicles (AUVs) have become useful assistants to human divers for performing various underwater operations. In the current practice, the divers are required to carry expensive, bulky, and waterproof keyboards or joystick-based controllers for the supervision and control of AUVs. Therefore, diver action-based supervision is becoming increasingly popular because it is convenient, easier to use, faster, and cost effective. However, various environmental, diver, and sensing uncertainties make the underwater diver action recognition problem challenging. In this regard, this paper presents DARE, a diver action recognition encoder, which is robust to underwater uncertainties and classifies various diver actions including sixteen gestures and three poses with high accuracy. DARE is based on the fusion of stereo-pairs of underwater camera images using bi-channel convolutional layers for feature extraction followed by a systematically designed decision tree of neural network classifiers. DARE is trained using the Cognitive Autonomous Diving Buddy (CADDY) dataset, which consists of a rich set of images of different diver actions in real underwater environments. DARE requires only a few milliseconds to classify one stereo-pair, thus making it suitable for real-time implementation. The results show that DARE achieves up to 95.87% overall accuracy and 92% minimum class accuracy, thus verifying its robustnesss and reliability. Furthermore, a comparative evaluation against existing deep transfer learning architectures reveals that DARE improves the performance of baseline classifiers by up to 3.44% in the overall accuracy and 30% in the minimum class accuracy.INDEX TERMS Autonomous underwater vehicles, diver action recognition, human-robot interaction, bichannel convolutional neural networks, transfer learning.