Resumo: Os ambientes de missão crítica de supercomputação (AMCS) estão diretamente relacionados a equipamentos energizados ininterruptamente, em que qualquer falha prejudica na sua disponibilidade e no aumento dos custos operacionais. Conhecer os estudos sobre esta problemática permite verificar soluções existentes que detectem automaticamente padrões de comportamento e falhas futuras. Assim, esta revisão sistemática da literatura tem como objetivo identificar e sumarizar as contribuições neste ramo específico do conhecimento, visando uma maior qualidade e confiabilidade dos resultados, os quais foram classificados e sintetizados. Este estudo contribui tanto para a área de manutenção preditiva aplicada em AMCS quanto para área de ciência da computação com aplicação da inteligência artificial.
Given the growth and availability of computing power, Artificial Intelligence (AI) techniques have been applied to industrial equipment and computing devices in order to identify abnormalities in operation and predict the remaining useful life (RUL) of equipment with superior performance than traditional predictive maintenance. In this sense, this research aims to develop a neural network applied to predictive maintenance in mission critical supercomputing environments (MCSE) using deep learning techniques to predict the RUL of an equipment before the occurrence of failures, by using real historical unlabeled data, which were collected by sensors installed in a supercomputing environment. The method was developed using a hybrid approach based on a combination of Fully Convolutional Neural Network, Long Short-Term Memory and Multilayer Perceptron. The results presented a Pearson R of 0.87, R² of Predictive Maintenance applied to mission critical supercomputing environments: rem 0.77, Factor of 2 of 0.89, and Normalized Mean Square Error (NMSE) of 0.79, considering the predicted RUL value and the observed RUL value for the pre-failure behavior moments of the equipment. Thus, we can conclude that the developed approach had good performance to predict the RUL, increasing the ability to anticipate the failure situation of the MCSE, further increasing its availability and operating time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.