With the ever growing amount of digital video data becoming available, people are gradually challenged to come up with methods that facilitate video indexing and retrieval. This paper presents a key frame based method that employs shot boundary detection and "bag-ofvisual-words (BoW)" based on local keypoints for key frame extraction and semantic concept detection. The performance of BoW features is optimized by choosing appropriate representation choices. Once video frames are represented by BoW features, we can adopt a spectral clustering algorithm for the generation of key frames in each shot, and then we can classify these key frames using support vector machines for video indexing. Finally, this paper performs a query by concept search for video retrieval. The experimental results demonstrate that the proposed approach is capable of retrieving videos. Compared with the existing related method, the proposed method yields better results for key frame extraction and yields a mean average precision (MAP) of 0.68 for the video retrieval model.