Virtual environments for multi-users, collaborative virtual environments (CVE), support geographical distant people to experience collaborative learning and team training. In this context, monitoring collaboration provides valuable, and in time, information regarding individual and group indicators, helpful for human instructors or intelligent tutor systems. CVE enable people to share a virtual space, interacting with an avatar, generating nonverbal behavior such as gaze-direction or deictic gestures, a potential means to understand collaboration. This chapter presents an automated model and its inference mechanisms to evaluate collaboration in CVE based on expert human rules of nonverbal participants' activity. The model is a multi-layer analysis that includes data filtering, fuzzy classification, and rule-based inference producing a high-level assessment of group collaboration. This approach was applied to a task-oriented session, where two participants assembled cubes in a CVE to create a figure.