Mitochondria and plastids import thousands of proteins. Their experimental localisation remains a frequent task, but can be resource-intensive or even impossible especially for species that are genetically not accessible. Hence, hundreds of studies make use of (machine learning) algorithms that predict a sub-cellular localisation based on a protein’s sequence. Their reliability across evolutionary diverse species is unknown. Here, we evaluate the performance of three commonly used algorithms (TargetP, Localizer and WoLFPSORT) for four photosynthetic eukaryotes, for which experimental plastid and mitochondrial proteome data is available. The match between algorithm-based predictions and experimental data ranges from 75% to as low as 2%, with up to thousands of false positives being predicted. Results depend on the algorithm used and the evolutionary distance between the training and query species. Specificity, sensitivity and precision analysis underscore severe limitations outside the training species and especially for plant mitochondria, for which the performance borders on random sampling. The results highlight current issues associated with prediction algorithms and present an opportunity for the next generation of protein localisation prediction tools that should train neural networks on an evolutionary more diverse set of organelle proteins for optimizing their performance and reliability.