Left ventricular hypertrophy (LVH) is a common clinical manifestation associated with cardiovascular adverse events. Relying solely on the subjective judgment, it is challenging to promptly and accurately diagnose mild LVH. Quantitatively measuring the interventricular septum and left ventricular posterior wall to diagnose mild LVH is a time‐consuming and labor‐intensive process that is prone to errors. To propose a novel method for rapid and automatic end‐to‐end diagnosis of mild LVH. We propose a novel end‐to‐end automated method for detecting mild LVH. This method achieves rapid end‐to‐end detection of mild LVH in echocardiographic videos without the need for quantitative measurements. Initially, representative frames are extracted from echocardiographic videos, and these frames are then automatically diagnosed by the proposed network to detect LVH. The network architecture primarily consists of three key components: a feature extractor, bidirectional LSTM, and attention module. The Vit‐b model achieved 88% video classification accuracy in the experiment where 32 frames are extracted, and the ViT‐l model achieves 92% video classification accuracy in the experiment where 16 frames are extracted. It is shown experimentally that extracting fewer video frames can diagnose LVH more accurately. The experiments illustrate the superior performance and competitiveness of this method compared to other approaches, potentially applicable to the clinic.