Electrophilicity (E) is one of the most
important
parameters to understand the reactivity of an organic molecule. Although
the theoretical electrophilicity index (ω) has been associated
with E in a small homologous series, the use of w to predict E in a structurally heterogeneous
set of compounds is not a trivial task. In this study, a robust ensemble
model is created using Mayr’s database of reactivity parameters.
A combination of topological and quantum mechanical descriptors and
different machine learning algorithms are employed for the model’s
development. The predictability of the model is assessed using different
statistical parameters, and its validation is examined, including
a training/test partition, an applicability domain, and a y-scrambling test. The global ensemble model presents a Q
5‑fold
2 of 0.909 and a Q
ext
2 of 0.912, demonstrating an excellent
predictability performance of E values and showing
that w is not a good descriptor for the prediction
of E, especially for the case of neutral compounds. ElectroPredictor, a noncommercial Python application (), is developed to predict E. QM9, a well-known
large dataset containing 133885 neutral molecules, is used to perform
a virtual screening (94.0% coverage). Finally, the 10 most electrophilic
molecules are analyzed as possible new Mayr’s electrophiles,
which have not yet been experimentally tested. This study confirms
the necessity to build an ensemble model using nonlinear machine learning
algorithms, topographic descriptors, and separating molecules into
charged and neutral compounds to predict E with precision.