During the last decade, surveillance cameras have spread quickly; their spread is predicted to increase rapidly in the following years. Therefore, browsing and analyzing these vast amounts of created surveillance videos effectively is vital in surveillance applications. Recently, a video synopsis approach was proposed to reduce the surveillance video duration by rearranging the objects to present them in a portion of time. However, performing a synopsis for all the persons in the video is not efficacious for crowded videos. Different clustering and user-defined query methods are introduced to generate the video synopsis according to general descriptions such as color, size, class, and motion. This work presents a user-defined query synopsis video based on motion descriptions and specific visual appearance features such as gender, age, carrying something, having a baby buggy, and upper and lower clothing color. The proposed method assists the camera monitor in retrieving people who meet certain appearance constraints and people who enter a predefined area or move in a specific direction to generate the video, including a suspected person with specific features. After retrieving the persons, a whale optimization algorithm is applied to arrange these persons reserving chronological order, reducing collisions, and assuring a short synopsis video. The evaluation of the proposed work for the retrieval process in terms of precision, recall, and F1 score ranges from 83% to 100%, while for the video synopsis process, the synopsis video length compared to the original video is decreased by 68% to 93.2%, and the interacting tube pairs are preserved in the synopsis video by 78.6% to 100%.