Chagas disease affects
8–11 million people worldwide, most
of them living in Latin America. Moreover, migratory phenomena have
spread the infection beyond endemic areas. Efforts for the development
of new pharmacological therapies are paramount as the pharmacological
profile of the two marketed drugs currently available, nifurtimox
and benznidazole, needs to be improved. Cruzain, a parasitic cysteine
protease, is one of the most attractive biological targets due to
its roles in parasite survival and immune evasion. In this work, we
compiled and curated a database of diverse cruzain inhibitors previously
reported in the literature. From this data set, quantitative structure–activity
relationship (QSAR) models for the prediction of their pIC
50
values were generated using
k
-nearest neighbors
and random forest algorithms. Local and global models were calculated
and compared. The statistical parameters for internal and external
validation indicate a significant predictability, with
q
loo
2
values
around 0.66 and 0.61 and external
R
2
coefficients
of 0.725 and 0.766. The applicability domain is quantitatively defined,
according to QSAR good practices, using the leverage and similarity
methods. The models described in this work are readily available in
a Python script for the discovery of novel cruzain inhibitors.