To navigate the social world, humans heavily rely on gaze for non-verbal communication as it conveys information in a highly dynamic and complex, yet concise manner: For instance, humans utilize gaze effortlessly to direct and infer the attention of a possible interaction partner. Many traditional paradigms in social gaze research though rely on static ways of assessing gaze interaction, e.g., by using images or prerecorded videos as stimulus material. Emerging gaze contingent paradigms, in which algorithmically controlled virtual characters can respond flexibly to the gaze behavior of humans, provide high ecological validity. Ideally, these are based on models of human behavior which allow for precise, parameterized characterization of behavior, and should include variable interactive settings and different communicative states of the interacting agents. The present study provides a complete definition and empirical description of a behavioral parameter space of human gaze behavior in extended gaze encounters. To this end, we (i) modeled a shared 2D virtual environment on a computer screen in which a human could interact via gaze with an agent and simultaneously presented objects to create instances of joint attention and (ii) determined quantitatively the free model parameters (temporal and probabilistic) of behavior within this environment to provide a first complete, detailed description of the behavioral parameter space governing joint attention. This knowledge is essential to enable the modeling of interacting agents with a high degree of ecological validity, be it for cognitive studies or applications in human-robot interaction.