249/250 words)20 Sub-species nomenclature systems of pathogens are increasingly based on sequence data. The use of 21 phylogenetics to identify and differentiate between clusters of genetically similar pathogens is 22 particularly prevalent in virology from the nomenclature of human papillomaviruses to highly pathogenic 23 avian influenza (HPAI) H5Nx viruses. These nomenclature systems rely on absolute genetic distance 24 thresholds to define the maximum genetic divergence tolerated between viruses designated as closely 25 related. However, the phylogenetic clustering methods used in these nomenclature systems are limited 26 by the arbitrariness of setting intra-and inter-cluster diversity thresholds. The lack of a consensus 27 ground truth to define well-delineated, meaningful phylogenetic subpopulations amplifies the difficulties 28 in identifying an informative distance threshold. Consequently, phylogenetic clustering often becomes 29 an exploratory, ad-hoc exercise. 30Phylogenetic Clustering by Linear Integer Programming (PhyCLIP) was developed to provide a 31 statistically-principled phylogenetic clustering framework that negates the need for an arbitrarily-defined 32 distance threshold. Using the pairwise patristic distance distributions of an input phylogeny, PhyCLIP 33 parameterises the intra-and inter-cluster divergence limits as statistical bounds in an integer linear 34 programming model which is subsequently optimised to cluster as many sequences as possible. When 35 applied to the haemagglutinin phylogeny of HPAI H5Nx viruses, PhyCLIP was not only able to 36 recapitulate the current WHO/OIE/FAO H5 nomenclature system but also further delineated informative 37 higher resolution clusters that capture geographically-distinct subpopulations of viruses. PhyCLIP is 38 pathogen-agnostic and can be generalised to a wide variety of research questions concerning the 39 identification of biologically informative clusters in pathogen phylogenies. PhyCLIP is freely available at 40