End-to-End Diagnosis of Cloud Systems Against Intermittent Faults


Chao Wang, Zhongchuan Fu, Yanyan Huo




The diagnosis of intermittent faults is challenging because of their random manifestation due to intricate mechanisms. Conventional diagnosis methods are no longer effective for these faults, especially for hierarchical environment, such as cloud computing. This paper proposes a fault diagnosis method that can effectively identify and locate intermittent faults originating from (but not limited to) processors in the cloud computing environment. The method is end-to-end in that it does not rely on artificial feature extraction for applied scenarios, making it more generalizable than conventional neural network-based methods. It can be implemented with no additional fault detection mechanisms, and is realized by software with almost zero hardware cost. The proposed method shows a higher fault diagnosis accuracy than BP network, reaching 97.98% with low latency.