We present the parallel design and performance of the nested filtering factorization preconditioner (NFF), which can be used for solving linear systems arising from the discretization of a system of PDEs on unstructured grids. NFF has limited memory requirements, and it is based on a two level recursive decomposition that exploits a nested block arrow structure of the input matrix, obtained priorly by using graph partitioning techniques. It also allows to preserve several directions of interest of the input matrix to alleviate the effect of low frequency modes on the convergence of iterative methods. For a boundary value problem with highly heterogeneous coefficients, discretized on three-dimensional grids with 64 millions unknowns and 447 millions nonzero entries, we show experimentally that NFF scales up to 2048 cores of Genci's Bull system (Curie), and it is up to 2.6 times faster than the domain decomposition preconditioner Restricted Additive Schwarz implemented in PETSc.