The
multiparameter model comprising 1D and 2D QSPR/QSAR descriptors
was proposed and validated for phenolic acid binary systems. This
approach is based on the optimization of regression coefficients for
maximization of the percentage of true positives in the pool of systems
comprising either simple binary eutectics or cocrystals. The training
set consisted of 58 eutectics and 168 cocrystals. The solid dispersions
collection used for model generation comprised literature data enriched
with our new experimental results. From all 1445 descriptors computable
in PaDEL, only 13 orthogonal descriptors with the highest predicting
power were taken into account. The analysis revealed the importance
of the parameters characterizing atom types (naaN, SHsOH, SsssN, nHeteroRing,
maxHBint6, C1SP2), autocorrelation functions (ATSC1i, AATSC1v, MATS8m,
GATS1i), and also other molecule structure measures (WTPT-5, MLFER_A,
MDEN-22). The proposed approach is very simple and requires only information
about the structure encoded in canonical SMILES string. The inversion
of the problem of cocrystal screening and focusing on the homogeneous
group of coformers for cocrystallization with a variety of drugs rather
than seeking coformers for a particular active pharmaceutical ingredient
proved to be very efficient. This led to very valuable clues for selection
of pairs for cocrystallization with a probability of about 80%.