The Volterra framework plays an important role in digital pre-distortion (DPD) for power amplifier (PA) linearization. In practice, the full-Volterra (FV) model is avoided due to the so-called curse of dimensionality, which has motivated several Volterra pruning techniques in the literature. However, choosing the pruned-Volterra structure with a given complexity that most accurately describes a PA or DPD with unknown characteristics often requires trial and error. The motivation of this article is to propose a reduced-Volterra model that extends traditional memory polynomials, e.g., the generalized memory polynomial (GMP), by including higher-dimensional terms in a flexible and controlled way, avoiding the curse of dimensionality of the Volterra model and resulting in a model that can apply to a wide range of PA and DPD. Furthermore, the article investigates the parsimonious sizing of the proposed model using least absolute shrinkage and selection operator (LASSO) and sparse-group LASSO (SGL) convex optimization, significantly reducing the model's running cost, while preserving its performance. In our experimental validation, the sparse versions of the proposed model achieved 3 dB better adjacent channel power ratio (ACPR) than the conventional GMP model for the same or lower DPD running cost, at the expense of a more costly estimation, which is performed off-line.