“…39 Specifically, we reduced our search space to the 11 3 = 1331 candidates in the set defined by X ∈ {Ala, Gly, Glu, Ile, Leu, Met, Phe, Trp, Tyr, Val, Asp} to avoid charged and/or polar amino acids expected to interfere with low-pH triggered assembly 31 and focus on those residues that have expressed good assembly behavior in previous experimental work. 22,[99][100][101] We perform active learning over DXXX-OPV3-XXXD sequences following the four-part protocol -molecular simulation, VAE latent space embedding, GPR surrogate model construction, optimal selection of next candidates -described in Section 2.2 and illustrated in Fig. 2.…”