Permanent deformations in the lithosphere can occur in the brittle as well as in the ductile domain. For this reason, the inclusion of viscous creep and frictional plastic deformation is essential for geodynamic models. However, most currently available models of frictional plasticity are rate independent and therefore do not incorporate an internal length scale, which is an indispensible element for imposing a finite width of localized shear zones. Therefore, in computations of localization, either analytical or numerical, resulting shear zone widths tend to zero. In numerical computations, this manifests itself in a severe mesh sensitivity. Moreover, convergence of the global iterative procedure to solve the nonlinear processes is adversely affected, which negatively affects the reliability and the quality of predictions. The viscosity that is inherent in deformation processes in the lithosphere can, in principle, remedy this mesh sensitivity. However, elasto‐viscoplastic models that are commonly used in geodynamics assume a series arrangement of rheological elements (Maxwell‐type approach), which does not introduce an internal length scale. Here, we confirm that a different rheological arrangement that puts a damper in parallel to the plastic slider (Kelvin‐type approach) introduces an internal length scale. As a result, pressure and strain and strain rate profiles across the shear bands converge to finite values upon decreasing the grid spacing. We demonstrate that this holds for nonassociated plasticity with constant frictional properties and with material softening with respect to cohesion. Finally, the introduction of Kelvin‐type viscoplasticity also significantly improves the global convergence of nonlinear solvers.