Many electrical equipment malfunction text messages are collected during power system operation and maintenance procedures. These texts usually contain crucial information for maintenance and condition monitoring. Because these power system malfunction texts are characterized by multidomain vocabularies, complex-syntactic structures, and long sentences, it is challenging to for automated systems to capture their semantic meaning and essential information. To address this issue, we propose a hybrid natural language processing (hybrid-NLP) algorithm to extract entities that represent electrical equipment. This algorithm is composed of a dictionary-based method, a language technology platform (LTP) tool, and the bidirectional encoder representations from a transformers-conditional random field (BERT-CRF) model. Significantly, the softmax output layer of the bidirectional encoder representations from the transformers (BERT) model is replaced by the conditional random field (CRF) algorithm to strengthen the contextual relationships between words and thus solve the local optimization of the preferred word label. The effectiveness of the proposed hybrid-NLP method is verified on a realistic dataset. Moreover, a statistical analysis is conducted to provide a reference for the operation and maintenance of power systems. INDEX TERMS electrical equipment malfunction text, natural language processing, entity extraction, BERT-CRF model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.