This article tackles the entire lifecycle of an algorithm: from its design to its implementation. It exhibits a method for making efficient choices at algorithm design time knowing the characteristics of the underlying hardware target. As of today, computing the optical flow of a stream of images is still a demanding task. In the meantime, the use of Graphics Processing Units (GPU) has become mainstream and allows substantial gains in processing frame rate. In this paper, we focus on a specific variational method (CLG [1]) where linear systems have to be solved. They depend on two parameters α and ρ. To efficiently solve the problem, we look at convergence speed with respect to the model's parameters. We benchmark usual linear solvers with preconditioners to identify the fastest in terms of convergence per iteration. We then show that once implemented on GPUs, the most efficient solver changes depending on the model parameters. For 640 × 480 images, with the right choice of solver and parameters, our implementation can solve the system with relative 10e −7 accuracy in 0.25 ms on a Titan V GPU. All the results are aggregated on a 30-image set to increase confidence in their extendability.