Enabling the effective representation of an object’s position and depth in augmented reality (AR) is crucial not just for realism, but also to enable augmented reality’s wider utilization in real world applications. Domains such as architecture and building design cannot leverage AR’s advantages without the effective representation of position. Prior work has examined how the human visual system perceives and interprets such cues in AR. However, it has focused on application systems that only use a single AR modality, i.e., head-mounted display, tablet/handheld, or projection. However, given the respective limitations of each modality regarding shared experience, stereo display, field of view, etc., prior work has ignored the possible benefits of utilizing multiple AR modalities together. By using multiple AR systems together, we can attempt to address the deficiencies of one modality by leveraging the features of other modalities. This work examines methods for representing position in a multi-modal AR system consisting of a stereo head-mounted display and a ceiling mounted projection system. Given that the AR content is now rendered across two separate AR realities, how does the user know which projected object matches the object shown in their head-mounted display? We explore representations to correlate and fuse objects across modalities. In this paper, we review previous work on position and depth in AR, before then describing multiple representations for head-mounted and projector-based AR that can be paired together across modalities. To the authors’ knowledge, this work represents the first step towards utilizing multiple AR modalities in which the AR content is designed directly to compliment deficiencies in the other modality.