In this work a new benchmark of hard instances for the permutation flowshop scheduling problem with the objective of minimising the makespan is proposed. The new benchmark consists of 240 large instances and 240 small instances with up to 800 jobs and 60 machines. One of the objectives of the work is to generate a benchmark which satisfies the desired characteristics of any benchmark: comprehensive, amenable for statistical analysis and discriminant when several algorithms are compared. An exhaustive experimental procedure is carried out in order to select the hard instances, generating thousands of instances and selecting the hardest ones from the point of view of a gap computed as the difference between very good upper and lower bounds for each instance. Extensive generation and computational experiments, which have taken almost six years of combined CPU time, demonstrate that the proposed benchmark is harder and with more discriminant power than the most common benchmark from the literature. Moreover, a website is developed for researchers in order to share sets of instances, best known solutions and lower bounds, etc. for any combinatorial optimisation problem.