Adapting a source code to the specificity of its host hardware represents one way to implement software optimization. This allows to benefit from processors that are primarily designed to improve system performance. To reach such a software/hardware fitting without narrowing the scope of the optimization to few executions, one needs to have at his disposal relevant performance models of the considered hardware. This paper proposes a new method to optimize software kernels by considering their data-access mode. The proposed method permits to build a data-cache-miss model of a given application regarding its specific memory-access pattern. We apply our method in order to evaluate some custom implementations of matrix data layouts. To validate the functional correctness of the generated models, we propose a reference algorithm that simulates a kernel's exploration of its data. Experimental results show that the proposed data alignment permits to reduce the number of cache misses by a factor up to 50%, and to decrease the execution time by up to 30%. Finally, we show the necessity to integrate the impact of the Translation Lookaside Buffers (TLB) and the memory prefetcher within our performance models.