| dc.description.abstract |
Reactive, threshold-based autoscaling mechanisms in Kubernetes are inadequate for modern cloud-native applications, as they fail to efficiently manage dynamic workloads, leading to performance degradation and inefficient resource utilization. This inherent "scaling lag" creates a critical need for an automated, intelligent, and proactive management paradigm. This research addresses this challenge by introducing NimbusGuard, a novel, intelligent orchestration framework designed for proactive recovery and performance optimization in Kubernetes. The proposed solution overcomes the limitations of traditional autoscalers by synergistically integrating a Long Short-Term Memory (LSTM) network for predictive forecasting with a Deep Q-Network (DQN) agent for adaptive, multi-objective decisionmaking. A key contribution of this work is the practical implementation of this intelligence within a stateful LangGraph workflow, which includes a crucial MCP safety validation layer to ensure all scaling decisions are orchestrated and validated, preventing system instability. In empirical benchmarks against industry-standard autoscalers, NimbusGuard demonstrated vastly superior performance, reducing the average time to scale by 80% compared to the default Horizontal Pod Autoscaler (HPA) and by over 33% compared to KEDA. This research contributes a production-aware framework that successfully bridges the gap between theoretical AI models and their practical, safe deployment, offering a tangible solution that enhances the performance, efficiency, and resilience of modern cloud-native applications. |
en_US |