Synchronized video editing system, or mashup generation from multiple synchronized videos, has gained much attention due to its high efficiency and low cost in processing videos to convey information. However, few of the existing methods focus on generating the personal mashup over a complete timeline from synchronized surveillance videos, which is increasingly demanded for effectively presenting personal activities without violating the privacy of others. To fill this gap, we develop a Reinforcement Learning (RL)-based personal mashup generation system, which assesses the frame quality at a semantic level and formulates the view selection as an RL problem to improve the efficiency in retrieving the mashups with arbitrary beginnings. Furthermore, we propose a framing objective to perform spatial editing, which enables the views to automatically zoom in and out, so as to present the target people more comprehensively. Both qualitative and quantitative analyses are presented to demonstrate the effectiveness of the proposed frame quality measurements, the RLbased algorithm, and the framing objective.