Epitope-based vaccines have revolutionized vaccine research in the last decades. Due to their complex nature, bioinformatics plays a pivotal role in their development. However, existing algorithms address only specific parts of the design process or are unable to provide 15 formal guarantees on the quality of the solution. Here we present a unifying formalism of the general epitope vaccine design problem that tackles all phases of the design process simultaneously and combines all prevalent design principles. We then demonstrate how to formulate the developed formalism as an integer linear program which guarantees optimality of the designs. This makes it possible to explore new regions of the vaccine design space, analyze the 20 trade-offs between the design phases, and balance the many requirements of vaccines.In recent years vaccines based on T-cell epitopes, so called epitope-based vaccines (EV), have 25 become wildly used as therapeutic treatments in case of cancer immunotherapy [1-4] and prophylactically against infectious diseases [5][6][7][8][9][10]. Compared to regular attenuated vaccines, EVs offer several advantages [11]. Since EVs are based on small peptide sequences, they can be rapidly produced using well established technologies and easily stored freeze-dried [11]. EVs also do not bare the risk of reversion to virulence as they do not contain any infectious material, and the selection 30 of epitopes can be tailored to address the genetic variability of a pathogen and that of a targeted population or individual increasing its potential efficacy [11].To aid the design process, bioinformatics approaches have been developed to (1) discover potential candidate epitopes, (2) select a set of epitopes for vaccination, and (3) assemble the selected epitopes into the final vaccine ( Figure 1A). Most of the proposed selection and assembly approaches 35 focus either on peptide cocktail vaccines ( Figure 1A(3a)) or on so-called string-of-beads vaccines ( Figure 1A(3b)), which are polypeptides connecting each epitope directly or by short spacer sequences. Vider Shalit et al. for example developed a genetic algorithm that selects epitopes to maximize the coverage of viral and human variation while simultaneously optimizing the ordering of the string-of-beads to increase efficacy [12]. Toussaint et al. proposed an approach that selects 40 a fixed number of epitopes to maximize vaccine immunogenicity using integer linear programming (ILP) [13], and later established a method to find the optimal string-of-beads ordering based on a traveling salesperson problem (TSP) embedding [14], which has been recently extended by Schubert et al. to incorporate optimal spacer sequences as well [15]. Lundegaard et al. proposed a greedy algorithm for epitope selection to maximize antigen and population coverage using a sub-modular 45 function formulation [16].Recent studies suggest that through the usage of artificial proteins of overlapping epitopes, so-called mosaic vaccines ( Figure 1A(3c)), both depth and breadth of the T-cell r...