We parameterize sub‐grid scale (SGS) fluxes in sinusoidally forced two‐dimensional turbulence on the β‐plane at high Reynolds numbers (Re ∼25,000) using simple 2‐layer convolutional neural networks (CNN) having only O(1000) parameters, two orders of magnitude smaller than recent studies employing deeper CNNs with 8–10 layers; we obtain stable, accurate, and long‐term online or a posteriori solutions at 16× downscaling factors. Our methodology significantly improves training efficiency and speed of online large eddy simulations runs, while offering insights into the physics of closure in such turbulent flows. Our approach benefits from extensive hyperparameter searching in learning rate and weight decay coefficient space, as well as the use of cyclical learning rate annealing, which leads to more robust and accurate online solutions compared to fixed learning rates. Our CNNs use either the coarse velocity or the vorticity and strain fields as inputs, and output the two components of the deviatoric stress tensor, Sd. We minimize a loss between the SGS vorticity flux divergence (computed from the high‐resolution solver) and that obtained from the CNN‐modeled Sd, without requiring energy or enstrophy preserving constraints. The success of shallow CNNs in accurately parameterizing this class of turbulent flows implies that the SGS stresses have a weak non‐local dependence on coarse fields; it also aligns with our physical conception that small‐scales are locally controlled by larger scales such as vortices and their strained filaments. Furthermore, 2‐layer CNN‐parameterizations are more likely to be interpretable.