The ever-growing demand for more memory capacity from applications has always been a challenging factor in computer architecture. The advent of the Non Unified Memory Access (NUMA) architecture has achieved to work around the physical constraints of a single processor by providing more system memory using pools of processors, each with their own memory elements, but with variable access times. However, the efficient exploitation of such computing systems is a non-trivial task for software engineers. We have observed that the performance of more than half of the applications picked from two distinct benchmark suites is negatively affected when running on a NUMA machine, in the absence of manual tuning. This finding motivated us to develop a new profiling tool, so called PerfUtil, to study, characterize and better understand why those benchmarks have sub-optimal performance on NUMA machines. PerfUtil's effectiveness is based on its ability to track numerous events throughout the system at the managed runtime system level, that, ultimately, assists in demystifying NUMA peculiarities and accurately characterize managed applications profiles. CCS CONCEPTS • Software and its engineering → Software design engineering.