Fault detection and handling are crucial tasks in cloud systems. As these infrastructures are growing and evolving, manual monitoring and interaction have become less feasible. To deal with this issue, monitoring systems are developed to track the behavior of the various components (e.g. nodes) in cloud systems, as well as the served applications in the virtual environment. Nowadays, most cloud environments provide graphics accelerators for their users leading to different problems. However, the application of GPUs in deep learning could also help the detection of incorrect behavior. In this paper, a short overview of cloud monitoring and fault detection methods is given focusing on GPU-enabled nodes.