“…These similarity scoring methods have been used for a variety of tasks, such as calculating the similarity of entries in a drug molecule library, designing new drug molecules, ranking search results, and calculating the magnitude of a chemical change from one small molecule to another . Specialized machine learning methods also exist for similarity calculations of sequence-defined biomacromolecules, such as proteins, peptides, and polysaccharides. , Both small molecules and sequence-defined biomacromolecules have well-defined deterministic structures that are easily represented by graphs with atoms (or molecular fragments) as nodes and bonds as edges. ,– In contrast, the vast majority of synthetic polymers are characterized by stochastic graphs that represent molecular ensembles or distributions. , Previous studies have used monomers and compositions as representations and utilized methods similar to those developed for small molecules to measure pairwise polymer similarity, but those methods can only be applied to polymers with simple topologies, such as homopolymers and copolymers. – These methods do not take into consideration the variety of topologies and stochastic configurations available to polymers; therefore, it is not possible for these methods to obtain an accurate and meaningful similarity score for polymers with complex topologies and stochastic properties, such as star polymers, graft polymers, and segmented polymers.…”