This article contributes a highly
accurate model for predicting
the melting points (MPs) of medicinal chemistry compounds. The model
was developed using the largest published data set, comprising more
than 47k compounds. The distributions of MPs in drug-like and drug
lead sets showed that >90% of molecules melt within [50,250]°C.
The final model calculated an RMSE of less than 33 °C for molecules
from this temperature interval, which is the most important for medicinal
chemistry users. This performance was achieved using a consensus model
that performed calculations to a significantly higher accuracy than
the individual models. We found that compounds with reactive and unstable
groups were overrepresented among outlying compounds. These compounds
could decompose during storage or measurement, thus introducing experimental
errors. While filtering the data by removing outliers generally increased
the accuracy of individual models, it did not significantly affect
the results of the consensus models. Three analyzed distance to models
did not allow us to flag molecules, which had MP values fell outside
the applicability domain of the model. We believe that this negative
result and the public availability of data from this article will
encourage future studies to develop better approaches to define the
applicability domain of models. The final model, MP data, and identified
reactive groups are available online at .