Scalable heterogeneous computing (SHC) architectures are emerging as a response to new requirements for low cost, power efficiency, and high performance. For example, numerous contemporary HPC systems are using commodity Graphical Processing Units (GPU) to supplement traditional multicore processors. Yet scientists continue to face challenges in utilizing SHC systems. First and foremost, they are forced to combine a number of programming models and then delicately optimize the data movement among these multiple programming systems on each architecture. In this paper, we investigate a programming model for SHC systems that attempts to unify data access to the aggregate memory available in GPUs in the system. In particular, we extend the popular and easy to use Global Address Space (GAS) programming model to SHC systems. We explore multiple implementation options, and demonstrate our solution in the context of Global Arrays, a library based GAS model. We evaluated these options in the context of kernels and NWChem, a scalable chemistry application . Our results reveal that GA-GPU can offer considerable benefit to users in terms of programmability, and both our empirical results and performance model provide encouraging performance benefits for future systems that offer a tightly integrated memory system.