Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processors usually expose a single shared address space. However, due to hardware restrictions, they adopt a NUMA approach, where each processor accesses local memory faster than remote memories. Reducing data motion is crucial to improve the overall performance. Thus, computations must run as close as possible to where the data resides. We propose a new approach that mitigates the NUMA effect on NUMA systems. Our solution is based on the OmpSs-2 programming model, a task-based parallel programming model, similar to OpenMP. We first provide a simple API to allocate memory in NUMA systems using different policies. Then, combining user-given information that specifies dependences between tasks, and information collected in a global directory when allocating data, we extend our runtime library to perform NUMA-aware work scheduling. Our heuristic considers data location, distance between NUMA nodes, and the load of each NUMA node to seamlessly minimize data motion costs and load imbalance. Our evaluation shows that our NUMA support can significantly mitigate the NUMA effect by reducing the amount of remote accesses, and so improving performance on most benchmarks, reaching up to 2x speedup in a 2-NUMA machine, and up to 7.1x in a 8-NUMA machine.