Digital signage is widely utilized in digital-out-of-home (DOOH) advertising for marketing and business. Recently, the combination of the digital camera and digital signage enables the advertiser to gather the audience demographic for audience measurement. Audience measurement is useful for the advertiser to understand the audience's behavior and improve their business strategies. When an audience is facing the digital display, the vision-based DOOH system will process the audience's face and broadcast a personalized advertisement. Most of the digital signage is available in an uncontrolled environment of public areas. Thus, it poses two main challenges for the vision-based DOOH system to track the audience's movement, which are multiple adjacent faces and occlusion by passer-by. In this paper, a new framework is proposed to combine the digital signage with a depth camera for tracking multi-face in the three-dimensional (3D) environment. The proposed framework extracts the audience's face centroid position (x, y) and depth information (z) and plots into the aerial map to simulate the audience's movement that is corresponding to the real-world environment. The advertiser can further measure the advertising effectiveness through the audience's behavior.