This paper proposes a video summarization algorithm called the Mutual Information and Entropy based adaptive Sliding Window (MIESW) method, which is specifically for the static summary of gesture videos. Considering that gesture videos usually have uncertain transition postures and unclear movement boundaries or inexplicable frames, we propose a three-step method where the first step involves browsing a video, the second step applies the MIESW method to select candidate key frames, and the third step removes most redundant key frames. In detail, the first step is to convert the video into a sequence of frames and adjust the size of the frames. In the second step, a key frame extraction algorithm named MIESW is executed. The inter-frame mutual information value is used as a metric to adaptively adjust the size of the sliding window to group similar content of the video. Then, based on the entropy value of the frame and the average mutual information value of the frame group, the threshold method is applied to optimize the grouping, and the key frames are extracted. In the third step, speeded up robust features (SURF) analysis is performed to eliminate redundant frames in these candidate key frames. The calculation of Precision, Recall, and Fmeasure are optimized from the perspective of practicality and feasibility. Experiments demonstrate that key frames extracted using our method provide high-quality video summaries and basically cover the main content of the gesture video.