In surveillance applications, humans and vehicles are the most important common elements studied. In consequence, detecting and matching a person or a car that appears on several videos is a key problem. Many algorithms have been introduced and nowadays, a major relative problem is to evaluate precisely and to compare these algorithms, in reference to a common ground-truth. In this paper, our goal is to introduce a new dataset for evaluating multi-view based methods. This dataset aims at paving the way for multidisciplinary approaches and applications such as 4D-scene reconstruction, object identification/tracking, audio event detection and multi-source meta-data modeling and querying. Consequently, we provide two sets of 25 synchronized videos with audio tracks, all depicting the same scene from multiple viewpoints, each set of videos following a detailed scenario consisting in comings and goings of people and cars. Every video was annotated by regularly drawing bounding boxes on every moving object with a flag indicating whether the object is fully visible or occluded, specifying its category (human or vehicle), providing visual details (for example clothes types or colors), and timestamps of its apparitions and disappearances. Audio events are also annotated by a category and timestamps.