A population-level analysis is proposed to address data sparsity when building predictive models for engineering infrastructure. Utilizing an interpretable hierarchical Bayesian approach and operational fleet data, domain expertise is naturally encoded (and appropriately shared) between different subgroups, representing (1) use-type, (2) component, or (3) operating condition. Specifically, domain expertise is exploited to constrain the model via assumptions (and prior distributions) allowing the methodology to automatically share information between similar assets, improving the survival analysis of a truck fleet (15% and 13% increases in predictive log-likelihood of hazard) and power prediction in a wind farm (up to 82% reduction in the standard deviation of maximum output prediction). In each asset management example, a set of correlated functions is learnt over the fleet, in a combined inference, to learn a population model. Parameter estimation is improved when subfleets are allowed to share correlated information at different levels in the hierarchy; the (averaged) reduction in standard deviation for interpretable parameters in the survival analysis is 70%, alongside 32% in wind farm power models. In turn, groups with incomplete data automatically borrow statistical strength from those that are data-rich. The statistical correlations enable knowledge transfer via Bayesian transfer learning, and the correlations can be inspected to inform which assets share information for which effect (i.e., parameter). Successes in both case studies demonstrate the wide applicability in practical infrastructure monitoring, since the approach is naturally adapted between interpretable fleet models of different in situ examples.
Weibull time-to-event recurrent neural networks (WTTE-RNN) is a simple and versatile prognosis algorithm that works by optimising a Weibull survival function using a recurrent neural network. It offers the combined benefits of the sequential nature of the recurrent neural network, and the ability of the Weibull loss function to incorporate censored data. The goal of this paper is to present the first industrial use case of WTTE-RNN for prognosis. Prognosis of turbocharger conditions in a fleet of heavy-duty trucks is presented here, where the condition data used in the case study were recorded as a time series of sparsely sampled histograms. The experiments include comparison of the prediction models trained using data from the entire fleet of trucks vs data from clustered sub-fleets, where it is concluded that clustering is only beneficial as long as the training dataset is large enough for the model to not overfit. Moreover, the censored data from assets that did not fail are also shown to be incorporated while optimising the Weibull loss function and improve prediction performance. Overall, this paper concludes that WTTE-RNN-based failure predictions enable predictive maintenance policies, which are enhanced by identifying the sub-fleets of similar trucks.
Data driven prognostic models are becoming more prevalent in many areas, ranging from heavy trucks to gas turbines. One aspect of certain prognostic models is the need for labeled failures, which then can be used as positive examples, when modelling the prognostic problem. Unfortunately, standard algorithms for creating prognostic models can suffer when labeled data is unbalanced, w.r.t. class distribution, leading to prognostic models with poor performance. In this paper we present a methodology for creating synthetic data that can be used to augment the underrepresented class and hence dramatically increase performance of the data driven predictive model. In our study we utilize data collected from heavy trucks and focus on predicting failure of one engine component that is crucial for the operation of heavy trucks. We examine different way of generating synthetic examples in a low dimensional setting, it is found that three methods out of the six methods studied does not improve performance compared to using only the original data. The other three methods based on interpolation is superior to only using the original data, with SMOTE outperforming the two other interpolation methods. SMOTE lowers the estimated cost on test data, compared to using a model trained on the original data set only, with 67%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.