To make weather and climate models computationally affordable, small-scale processes are usually represented in terms of the large-scale, explicitly resolved processes using physics-based/ semi-empirical parameterization schemes. Another approach, computationally more demanding but often more accurate, is super-parameterization (SP). SP involves integrating the equations of small-scale processes on high-resolution grids embedded within the low-resolution grid of large-scale processes. Recently, studies have used machine learning (ML) to develop data-driven parameterization (DD-P) schemes. Here, we propose a new approach, data-driven SP (DD-SP), in which the equations of the small-scale processes are integrated data-drivenly (thus inexpensively) using ML methods such as recurrent neural networks. Employing multiscale Lorenz 96 systems as the testbed, we compare the cost and accuracy (in terms of both short-term prediction and long-term statistics) of parameterized low-resolution (PLR) SP, DD-P, and DD-SP models. We show that with the same computational cost, DD-SP substantially outperforms PLR and is more accurate than DD-P, particularly when scale separation is lacking. DD-SP is much cheaper than SP, yet its accuracy is the same in reproducing long-term statistics (climate prediction) and often comparable in short-term forecasting (weather prediction). We also investigate generalization: when models trained on data from one system are applied to a more chaotic system, we find that models often do not generalize, particularly when short-term prediction accuracies are examined. However, we show that transfer learning, which involves retraining the data-driven model with a small amount of data from the new system, significantly improves generalization. Potential applications of DD-SP and transfer learning in climate/weather modeling are discussed. Plain Language Summary The weather/climate system involves intertwined physical processes acting on scales from centimeters (or even smaller) to tens of thousands of kilometers. Most weather/climate models used in practice include parameterization schemes that relate small-scale processes, which are not explicitly resolved (due to coarse spatiotemporal resolution), to large-scale processes that are resolved. Recently, studies have explored using machine learning for data-driven parameterization (DD-P) of small-scale (subgrid) processes. Here, we first introduce a novel way to leverage recent advances in deep learning to improve the modeling of subgrid processes. In this approach, called data-driven super-parameterization (DD-SP), deep learning is used for fast, data-driven integration of equations of small-scale processes, while other equations are integrated using conventional numerical methods. Employing a relatively simple chaotic system, we show the advantages of DD-SP over DD-P and conventional parameterizations. Second, we examine how these data-driven models generalize (extrapolate) from one system to other (e.g., more chaotic) systems. We demonstrate that these mod...