Solute carriers (SLCs) are relatively underexplored compared
to
other prominent protein families such as kinases and G protein-coupled
receptors. However, proteins from the SLC family play an essential
role in various diseases. One such SLC is the high-affinity norepinephrine
transporter (NET/SLC6A2). In contrast to most other SLCs, the NET
has been relatively well studied. However, the chemical space of known
ligands has a low chemical diversity, making it challenging to identify
chemically novel ligands. Here, a computational screening pipeline
was developed to find new NET inhibitors. The approach increases the
chemical space to model for NETs using the chemical space of related
proteins that were selected utilizing similarity networks. Prior proteochemometric
models added data from related proteins, but here we use a data-driven
approach to select the optimal proteins to add to the modeled data
set. After optimizing the data set, the proteochemometric model was
optimized using stepwise feature selection. The final model was created
using a two-step approach combining several proteochemometric machine
learning models through stacking. This model was applied to the extensive
virtual compound database of Enamine, from which the top predicted
22,000 of the 600 million virtual compounds were clustered to end
up with 46 chemically diverse candidates. A subselection of 32 candidates
was synthesized and subsequently tested using an impedance-based assay.
There were five hit compounds identified (hit rate 16%) with sub-micromolar
inhibitory potencies toward NET, which are promising for follow-up
experimental research. This study demonstrates a data-driven approach
to diversify known chemical space to identify novel ligands and is
to our knowledge the first to select this set based on the sequence
similarity of related targets.