The automated analysis of medical diagnostic videos, such as ultrasound and endoscopy, provides significant benefits in clinical practice by improving the efficiency and accuracy of diagnosis. Deep learning techniques show remarkable success in analyzing these videos by automating tasks such as classification, detection, and segmentation. In this paper, we review the application of deep learning techniques for analyzing medical diagnostic videos, with a focus on ultrasound and endoscopy. The methodology for selecting the papers consists of two major steps. First, we selected around 350 papers based on the relevance of their titles to our topic. Second, we chose the research articles that focus on deep learning and medical diagnostic videos based on our inclusion and exclusion criteria. We found that convolutional neural networks (CNNs) and long short-term memory (LSTM) are the two most commonly used models that achieve good results in analyzing different types of medical videos. We also found various limitations and open challenges. We highlight the limitations and open challenges in this field, such as labeling and preprocessing of medical videos, class imbalance, and time complexity, as well as incorporating expert knowledge, k-shot learning, live feedback from experts, and medical history with video data. Our review can encourage collaborative research with domain experts and patients to improve the diagnosis of diseases from medical videos.