Machine Learning-Enhanced Predictive Maintenance for High-Performance Computing (HPC) Systems

Authors

  • Juan Elijah, Albert Randy Department of Computer Engineering, Arizona State University Author

Keywords:

Predictive Maintenance, Machine Learning, High-Performance Computing (HPC), Failure Prediction, Anomaly Detection, Time-Series Forecasting, AI-Driven System Monitoring, Fault Diagnosis, Downtime Reduction, System Reliability.

Abstract

High-Performance Computing (HPC) systems play a crucial role in scientific simulations, artificial intelligence workloads, and large-scale data processing. However, the complexity and scale of HPC infrastructure introduce challenges in system reliability, failure prediction, and downtime mitigation. Machine learning (ML)-enhanced predictive maintenance offers a transformative approach to addressing these challenges by leveraging historical system logs, sensor data, and operational metrics to forecast failures before they occur. This paper explores the integration of ML techniques, such as deep learning, anomaly detection, and time-series forecasting, to enhance fault prediction, optimize maintenance schedules, and reduce unplanned downtime. We propose a hybrid predictive maintenance framework that incorporates supervised and unsupervised learning models for anomaly classification and real-time system monitoring. Experimental evaluations on real-world HPC datasets demonstrate the efficacy of our approach in improving system reliability and resource efficiency. The findings suggest that ML-driven predictive maintenance not only enhances HPC system performance but also contributes to cost savings and energy efficiency by minimizing unnecessary maintenance interventions.

Downloads

Download data is not yet available.

Downloads

Published

2011-11-17

How to Cite

Machine Learning-Enhanced Predictive Maintenance for High-Performance Computing (HPC) Systems. (2011). International Journal of Machine Learning Research in Cybersecurity and Artificial Intelligence, 2(1), 10-17. https://ijmlrcai.com/index.php/Journal/article/view/347

Similar Articles

1-10 of 263

You may also start an advanced similarity search for this article.