Archetypes are typical population representatives in an extremal sense, where typicality is understood as the most extreme manifestation of a trait or feature. In linear feature space, archetypes approximate the data convex hull allowing all data points to be expressed as convex mixtures of archetypes. However, it might not always be possible to identify meaningful archetypes in a given feature space. As features are selected a priori, the resulting representation of the data might only be poorly approximated as a convex mixture. Learning an appropriate feature space and identifying suitable archetypes simultaneously addresses this problem. This paper introduces a generative formulation of the linear archetype model, parameterized by neural networks. By introducing the distance-dependent archetype loss, the linear archetype model can be integrated into the latent space of a variational autoencoder, and an optimal representation with respect to the unknown archetypes can be learned end-to-end. The reformulation of linear Archetypal Analysis as a variational autoencoder naturally leads to an extension of the model to a deep variational information bottleneck, allowing the incorporation of arbitrarily complex side information during training. As a consequence, the answer to the question "What is typical in a given data set?" can be guided by this additional information. Furthermore, an alternative prior, based on a modified Dirichlet distribution, is proposed. On a theoretical level, this makes the relation to the original archetypal analysis model more explicit, where observations are mod-