The depth-of-field (DoF) of localization microscopes can be extended by placing a phase mask in the aperture stop of the objective. To optimize these masks and characterize their performance, defocus is in general modeled by a simple quadratic pupil phase term. However, this model does not take into account two essential characteristics of localization microscopy setups: extremely high numerical aperture (NA) and mismatch between the refractive indices of immersion liquid and sample. Using the more realistic high NA image formation model of Gibson & Lanni (GL), we show that DoF extension is simply reduced by a NA-dependent scaling factor. We also show that, provided this scaled DoF extension factor is taken into account, masks optimized with the approximate quadratic model are still nearly optimal in the framework of the GL model. This result is important since it establishes that generic optimized masks can be used in setups with different NA and immersion indices.