Many scientific software applications, that solve complex compute-or data-intensive problems, such as large parallel simulations of physics phenomena, increasingly use HPC systems in order to achieve scientifically relevant results. An increasing number of HPC systems adopt heterogeneous node architectures, combining traditional multi-core CPUs with energy-efficient massively parallel accelerators, such as GPUs. The need to exploit the computing power of these systems, in conjunction with the lack of standardization in their hardware and/or programming frameworks, raises new issues with respect to scientific software development choices, which strongly impact software maintainability, portability and performance. Several new programming environments have been introduced recently, in order to address these issues. In particular, the OpenACC programming standard has been designed to ease the software development process for codes targeted for heterogeneous machines, helping to achieve code and performance portability. In this paper we present, as a specific OpenACC use case, an example of design, porting and optimization of an LQCD Monte Carlo code intended to be portable across present and future heterogeneous HPC architectures; we describe the design process, the most critical design choices and evaluate the tradeoff between portability and efficiency.