In the context of data visualization, as in other grounded settings, referents are created by the task the agents engage in and are salient because they belong to the shared physical setting. Our focus is on resolving references to visualizations on large displays; crucially, reference resolution is directly involved in the process of creating new entities, namely new visualizations. First, we developed a reference resolution model for a conversational assistant. We trained the assistant on controlled dialogues for data visualizations involving a single user. Second, we ported the conversational assistant including its reference resolution model to a different domain, supporting two users collaborating on a data exploration task. We explore how the new setting affects reference detection and resolution; we compare the performance in the controlled vs unconstrained setting, and discuss the general lessons that we draw from this adaptation.* Co-first authors displays better support exploration and collaboration (Andrews et al., 2011;Rupprecht et al., 2019;Lischke et al., 2020). In this paper, we focus on new entity establishment via reference in such contexts. We start from the corpus Chicago-Crime-Vis we collected a few years back (Kumar et al., , 2017 in which a user exploring crime data in Chicago interacts with a Visualization Expert (VE) whom they know to be a person generating visualizations on the screen remotely from a separate room. On the basis of Chicago-Crime-Vis, we designed and developed a version of our assistant which was called Articulate2 Kumar et al., 2020) 1 . We will report the performance of Articulate2 on reference resolution, and especially reference establishment, with respect to the transcribed and annotated Chicago-Crime-Vis corpus, evaluated in an offline manner. The second part of our paper discusses the challenges that arose when we ported Articulate2 to a new setting: two collaborators work together to assess COVID policies given geographic and demographic features of the data, and interact exclusively with the deployed Articulate+ (see Figure 1). We will illustrate the many issues which degrade performance, from speech processing errors, to the adaptation of models to new domains, to the inherently more complex setting in which the assistant is now behaving like an overhearer of somebody else's conversations. For clarity, we will refer to Articulate2 in the city crime domain as Art-City-Asst, and to Articulate+ in the COVID domain, as Art-COVID-Asst.A disclaimer before we proceed: the purpose Chicago-Crime-Vis (H) COVID (A) COVID (T)