Analyzing people behaviors in smart environment using multimodal sensors requires to answer a set of typical questions: who are the people, where are they, what activities are they doing, when, with whom are they interacting, and how. In this view, locating people or their faces and characterizing them (e.g. extracting their body or head orientation) allows to address the first two questions (who and where), and is usually one of the first steps before applying higher level multimodal scene analysis algorithms that address the other questions. In the last ten years, tracking algorithms have experienced considerable progresses, particularly in indoor environment or for specific applications, where they have reached a maturity allowing their deployment in real systems and applications. Nevertheless, there are still several issues that can make the tracking difficult: background clutter, potentially small object size; complex shape, appearance, and motion, and their changes over time or across camera views; inaccurate/rough scene calibration or inconsistent camera calibration between views for 3D tracking; real-time processing requirements. In what follows, we discuss some important aspects of tracking algorithms, and ultimately introduce the remaining of the chapter content.Scenarios and Set-ups. Scenarios and application needs strongly influence the considered physical environment, and therefore the set-up (where, how many, and what type of sensors are used) and choice of a tracking 84