Due to its rich memory model, the partitioned global address space (PGAS) parallel programming model strikes a balance between locality-awareness and the ease of use of the global address space model. Although locality-awareness can lead to high performance, supporting the PGAS memory model is associated with penalties that can hinder PGAS's potential for scalability and speed of execution. This is because mapping the PGAS memory model to the underlying system requires a mapping process that is done in software, thereby introducing substantial overhead for shared accesses even when they are local. Compiler optimizations have not been sufficient to offset this overhead. On the other hand, manual code optimizations can help, but this eliminates the productivity edge of PGAS. This article proposes a processor microarchitecture extension that can perform such address mapping in hardware with nearly no performance overhead. These extensions are then availed to compilers through extensions to the processor instructions. Thus, the need for manual optimizations is eliminated and the productivity of PGAS languages is unleashed. Using Unified Parallel C (UPC), a PGAS language, we present a case study of a prototype compiler and architecture support. Two different implementations of the system were realized. The first uses a full-system simulator, gem5, which evaluates the overall performance gain of the new hardware support. The second uses an FPGA Leon3 soft-core processor to verify implementation feasibility and to parameterize the cost of the new hardware. The new instructions show promising results on all tested codes, including the NAS Parallel Benchmark kernels in UPC. Performance improvements of up to 5.5× for unmodified codes, sometimes surpassing handoptimized performance, were demonstrated. We also show that our four-core FPGA prototype requires less than 2.4% of the overall chip's area.