This paper addresses the so-called split and shift methodology. This methodology deals with the implementation of kernels of sizes that go above the physically implemented resources (local connections and weighting circuits) on synchronous cellular processor arrays (CPA), including the realization of large neighborhood operations and/or the reduction of the available hardware in order to drop the area consumption. Two main goals are pursued in the development of the methodology, namely: (1) minimum penalty at processing time and (2) absolutely no penalty at functional level. The paper presents different techniques and guidelines for the methodology application and introduces a Figure of Merit to evaluate them by relating area gains with time penalty. This, along with a kernel shape analysis, led us to propose more adequate configurations of weighting circuits and to justify the classical choice of North-East-West-South connectivity. To validate the methodology, we realize several estimates over actual physical implementations, and we propose the realization over CPAs of the spin filters, scale invariant feature transform and speeded-up robust features algorithms. A more in-depth trade-off analysis is realized over the implementation of the pixel level snakes algorithm.but it can interact with separated PEs, thanks to the propagative effects of the array dynamics of kernel application. The basic characteristic of local connectivity and its simple SIMD control makes this kind of systems very suitable for hardware implementation. However, as a consequence, the natural size of the kernels to be applied is limited to the smallest one (3 Â 3). This is, on the other hand, an important limitation in the functionality of a CPA as larger neighborhoods are needed in several image processing primitives as diffusion or low-pass filtering operations [1], halftoning [2], texture analysis [3] or matching and hit&miss operations [4], some of them used in algorithms like modern scale-and rotation-invariant feature extractors like scale invariant feature transform (SIFT) and speeded-up robust features (SURF) [5,6].As indicated, a CPA can realize global processing taking into account the whole image information thanks to the propagative effects of the architecture. According to this we can think in solving the remote neighbors interaction through the recursive application of templates. In fact, a recursive process can be summarized in the application of a large neighborhood (LN) template. Nevertheless, the inverse process, the decomposition of a LN template into minimum-sized templates is not trivial, and different approaches have been developed to deal with this issue.The challenge is then to implement any kind of LN operations while keeping the local connectivity and with affordable penalties in performance. Our goal is to do it, in addition, with the minimum impact in the architecture at hardware level.On the other hand, on focal-plane processors with a pixel-to-PE assignment, the area occupation is not only a matter of cost or h...