In this paper, we address the problem of detecting the conversation scenes from feature films and propose and efficient and robust method for the stated problem. This method utilizes the structural information of the movie scenes with the combination of the low-level and mid-level features. We propose and demonstrate that a Finite State Machine (FSM) is suitable for detecting movie scenes with conversational settings. Tow major characteristics of motion pictures, motion and audio, are used in our approach. The transitions of the FSM are determined by two mid-level features of each shot in the scene: the activity intensity and the face identity. Our FSM has been experimented on over 50 clips with both positive and negative examples and produces convincing results.