Coupled climate simulations that span several hundred years cannot be run at a high-enough spatial resolution to resolve mesoscale ocean dynamics. These mesoscale dynamics backscatter to macroscales. Recently, several studies have considered Deep Learning to parameterize subgrid forcing within macroscale ocean equations using data from idealized simulations. In this manuscript, we present a stochastic Deep Learning parameterization that is trained on data generated by CM2.6, a highresolution state-of-the-art coupled climate model with nominal resolution 1/10°. We train a Convolutional Neural Network for the subgrid momentum forcing using macroscale surface velocities from a few selected subdomains. At each location and each time step of the coarse grid, rather than predicting a single number, we predict the mean and standard deviation of a Gaussian probability distribution. This approach requires training our neural network to minimize a negative log-likelihood loss function rather than the Mean Square Error, which has been the standard in applications of Deep Learning to the problem of parameterizations. Each prediction of the mean subgrid forcing can be associated with an uncertainty estimate and can form the basis for a stochastic subgrid parameterization. Offline tests show that our parameterization generalizes well to the global oceans, and a climate with increased CO2 levels, without further training. We test our stochastic parameterization in an idealized shallow water model. The implementation is stable and improves some statistics of the flow. Our work demonstrates the potential of combining Deep Learning tools with a probabilistic approach in parameterizing unresolved ocean dynamics.