High-performance
organic semiconductors (OSCs) can be designed
based on the identification of functional units and their role in
the material properties. Herein, we present a polymer-unit fingerprint
(PUFp) generation framework, “Python-based polymer-unit-recognition
script” (PURS), to identify the subunits “polymer unit”
in the polymer and generate polymer-unit fingerprint (PUFp). Using
678 collected OSC data, machine learning (ML) models can be used to
determine structure–mobility relationships by using PUFp as
a structural input, and the classification accuracy reaches 85.2%.
A polymer-unit library consisting of 445 units is constructed, and
the key polymer units affecting the mobility of OSCs are identified.
By investigating the combinations of polymer units with mobility performance,
a scheme for designing OSCs by combining ML approaches and PUFp information
is proposed. This scheme not only passively predicts OSC mobility
but also actively provides structural guidance for high-mobility OSC
material design. The proposed scheme demonstrates the ability to screen
materials through pre-evaluation and classification ML steps and is
an alternative methodology for applying ML in high-mobility OSC discovery.