The design of embedded vision systems, in confronting the "Memory Wall", exhibits many challenges, regarding for example design cost, energy consumption and performance. This paper considers a variant of the Job Shop Scheduling Problem with tooling constraints, arising in this context, in which the completion time (makespan) is to be minimized. This objective corresponds to the performance of the produced circuit.Given a set of tasks and a set of prerequisites, this class of problem aims to schedule all the tasks. Each task can be processed if all its prerequisites (a specific subset of prerequisites) are loaded in the buffers and stay available during its whole operation.We discuss different formulations using integer linear programming and point out its characteristics, namely the size and the quality of the linear programming relaxation bound. To solve this scheduling problem with large size, we compare three sets of approaches including a Constraint Programming, two constructive greedy heuristics (published in previous work), two models of LocalSolver, a Simulated Annealing algorithm and Beam Search algorithm. Numerical experiments are conducted on 16 benchmark instances from the literature as well as on 12 real-life non-linear image processing kernels for validating their efficiency.