Offline evaluation is an essential complement to online experiments in the selection, improvement, tuning, and deployment of recommender systems. Offline methodologies for recommender system evaluation evolved from experimental practice in Machine Learning (ML) and Information Retrieval (IR). However, evaluating recommendations involves particularities that pose challenges to the assumptions upon which the ML and IR methodologies were developed. We recap and reflect on the development and current status of recommender system evaluation, providing an updated perspective. With a focus on offline evaluation, we review the adaptation of IR principles, procedures and metrics, and the implications of those techniques when applied to recommender systems. At the same time, we identify the singularities of recommendation that require different responses, or involve specific new needs. In addition, we provide an overview of important choices in the configuration of experiments that require particular care and understanding; discuss broader perspectives of evaluation such as recommendation value beyond accuracy; and survey open challenges such as experimental biases, and the cyclic dimension of recommendation.