In this paper, we present a novel approach that utilizes noisy shot-level visual concept detection to improve text-based video retrieval. As opposed to most of the related work in the field, we consider entire videos as the retrieval units and focus on queries that address a general subject matter (semantic theme) of a video. Retrieval is performed using a coherence-based query performance prediction framework. In this framework, we make use of video representations derived from the visual concepts detected in videos to select the best possible search result given the query, video collection, available search mechanisms and the resources for query modification. In addition to investigating the potential of this approach to outperform typical text-based video retrieval baselines, we also explore the possibility to achieve further improvement in retrieval performance through combining our concept-based query performance indicators with the indicators utilizing the spoken content of the videos. The proposed retrieval approach is data driven, requires no prior training and relies exclusively on the analyses of the video collection and different results lists returned for the given query text. The experiments are performed on the Media Eval 2010 datasets and demonstrate the effectiveness of our approach.