The manner in which a prognostics problem is framed is critical for enabling its solution by the proper method. Recently, data-driven prognostics techniques have demonstrated enormous potential when used alone, or as part of a hybrid solution in conjunction with physics-based models. Historical maintenance data constitutes a critical element for the use of a data-driven approach to prognostics, such as supervised machine learning. The historical data is used to create training and testing data sets to develop the machine learning model. Categorical classes for prediction are required for machine learning methods; however, faults of interest in US Army Ground Vehicle Maintenance Records appear as natural language text descriptions rather than a finite set of discrete labels.
Transforming linguistically complex data into a set of prognostics classes is necessary for utilizing supervised machine learning approaches for prognostics. Manually labeling fault description instances is effective, but extremely time-consuming; thus, an automated approach to labelling is preferred. The approach described in this paper examines key aspects of the fault text relevant to enabling automatic labeling. A method was developed based on the hypothesis that a given fault description could be generalized into a category. This method uses various natural language processing (NLP) techniques and a priori knowledge of ground vehicle faults to assign classes to the maintenance fault descriptions.
The core component of the method used in this paper is a Word2Vec word-embedding model. Word embeddings are used in conjunction with a token-oriented rule-based data structure for document classification. This methodology tags text with user-provided classes using a corpus of similar text fields as its training set. With classes of faults reliably assigned to a given description, supervised machine learning with these classes can be applied using related maintenance information that preceded the fault.
This method was developed for labeling US Army Ground Vehicle Maintenance Records, but is general enough to be applied to any natural language data sets accompanied with a priori knowledge of its contents for consistent labeling. In addition to applications in machine learning, generated labels are also conducive to general summarization and case-by-case analysis of faults. The maintenance components of interest for this current application are alternators and gaskets, with future development directed towards determining the RUL of these components based on the labeled data.