NimbusGuard: A Novel Intelligent Orchestration Framework for Proactive Recovery and Performance Optimization in Kubernetes

Wanigasooriya, Chamath

dc.contributor.author	Wanigasooriya, Chamath
dc.date.accessioned	2026-03-11T09:05:44Z
dc.date.available	2026-03-11T09:05:44Z
dc.date.issued	2025
dc.identifier.citation	Wanigasooriya, Chamath (2025) NimbusGuard: A Novel Intelligent Orchestration Framework for Proactive Recovery and Performance Optimization in Kubernetes. Msc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	20232727
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/2947
dc.description.abstract	Reactive, threshold-based autoscaling mechanisms in Kubernetes are inadequate for modern cloud-native applications, as they fail to efficiently manage dynamic workloads, leading to performance degradation and inefficient resource utilization. This inherent "scaling lag" creates a critical need for an automated, intelligent, and proactive management paradigm. This research addresses this challenge by introducing NimbusGuard, a novel, intelligent orchestration framework designed for proactive recovery and performance optimization in Kubernetes. The proposed solution overcomes the limitations of traditional autoscalers by synergistically integrating a Long Short-Term Memory (LSTM) network for predictive forecasting with a Deep Q-Network (DQN) agent for adaptive, multi-objective decisionmaking. A key contribution of this work is the practical implementation of this intelligence within a stateful LangGraph workflow, which includes a crucial MCP safety validation layer to ensure all scaling decisions are orchestrated and validated, preventing system instability. In empirical benchmarks against industry-standard autoscalers, NimbusGuard demonstrated vastly superior performance, reducing the average time to scale by 80% compared to the default Horizontal Pod Autoscaler (HPA) and by over 33% compared to KEDA. This research contributes a production-aware framework that successfully bridges the gap between theoretical AI models and their practical, safe deployment, offering a tangible solution that enhances the performance, efficiency, and resilience of modern cloud-native applications.	en_US
dc.language.iso	en	en_US
dc.subject	Kubernetes	en_US
dc.subject	Autoscaling	en_US
dc.subject	Reinforcement Learning	en_US
dc.title	NimbusGuard: A Novel Intelligent Orchestration Framework for Proactive Recovery and Performance Optimization in Kubernetes	en_US
dc.type	Thesis	en_US