We use machine learning methods to predict which patents end up at court using the population of US patents granted between 2002 and 2005. We analyze the role of the different dimensions of an empirical analysis for the performance of the prediction -the number of observations, the number of patent characteristics and the model choice. We find that the extending the set of patent characteristics has the biggest impact on the prediction performance. Small samples have not only a low predictive performance, their predictions are also particularly unstable. However, only samples of intermediate size are required for reasonably stable performance. The model choice matters, too, more sophisticated machine learning methods can provide additional value to a simple logistic regression. Our results provide practical advice to everyone building patent litigation models, e.g., for litigation insurance or patent management in more general.