Numerous grasp planning algorithms have been proposed since the 1980s. The grasping literature has expanded rapidly in recent years, building on greatly improved vision systems and computing power. Methods have been proposed to plan stable grasps on: known objects (exact 3D model is available), familiar objects (e.g. exploiting a-priori known grasps for different objects of the same category), or novel object shapes observed during task execution. Few of these methods have ever been compared in a systematic way, and objective performance evaluation of such complex systems remains problematic. Difficulties and confounding factors include: different assumptions and amounts of a-priori knowledge in different algorithms; different robots, hands, vision systems and setups in different labs; different choices or application needs for grasped objects. Also, grasp planning can use different grasp quality metrics (including empirical or theoretical stability measures), or other criteria, e.g. computational speed, or combination of grasps with reachability considerations. While acknowledging and discussing the outstanding difficulties surrounding this complex topic, we propose a methodology for reproducible experiments to compare the performance of a variety of grasp planning algorithms. Our protocol attempts to improve the objectivity with which different grasp planners are compared by minimising the influence of key components in the grasping pipeline, e.g. vision and pose estimation. The protocol is demonstrated by evaluating two different grasp planners: a state-of-the-art model-free planner, and a popular open-source model-based planner. We show results from real-robot experiments with a 7-DoF arm and 2-finger hand, and simulation-based evaluations.