K8sGPT is an open-source CLI tool that combines SRE expertise with AI capabilities to automatically diagnose and remediate Kubernetes cluster issues. It provides 12+ built-in analyzers covering Pod, Service, Deployment, and more, with data anonymization and local AI model support via Ollama. Supports multiple AI backends including OpenAI, Azure, Google Vertex, and Amazon Bedrock.




Kubernetes has become the de facto standard for container orchestration, but managing production-grade clusters presents significant challenges. When applications experience failures—whether Pods stuck in Pending state, CrashLoopBackOff errors, or service connectivity issues—troubleshooting requires deep SRE expertise and extensive manual investigation across multiple system components.
K8sGPT addresses these challenges by encoding SRE knowledge into a modular analyzer architecture combined with artificial intelligence capabilities. This open-source CLI tool automatically diagnoses Kubernetes cluster issues and provides actionable remediation recommendations in natural language.
At its core, K8sGPT leverages 12+ built-in analyzers that cover critical Kubernetes resources including Pods, Services, Deployments, Ingress controllers, StatefulSets, and more. Each analyzer is designed by experienced SREs to detect common failure patterns and provide intelligent insights. When you run an analysis, the tool examines your cluster state, identifies anomalies, and leverages large language models to explain the root causes in plain English.
The project has gained significant traction in the Kubernetes community, with adoption by organizations including Spectro Cloud, Nethopper, and Upstage AI. K8sGPT has been featured on Product Hunt and received recognition from HelloGitHub, demonstrating strong community validation. The tool is distributed under the Apache 2.0 open-source license, ensuring full transparency and extensibility.
What distinguishes K8sGPT from traditional monitoring solutions is its AI-powered approach. Rather than simply alerting on symptoms, K8sGPT analyzes relationships between resources, understands common failure modes, and provides context-aware explanations that help teams resolve issues faster—even without deep Kubernetes expertise.
K8sGPT provides a comprehensive suite of features designed for Kubernetes operators who need intelligent, automated troubleshooting capabilities. The platform combines advanced AI analysis with enterprise-grade security and extensibility.
AI-Powered Analysis forms the foundation of the platform. The tool employs sophisticated algorithms that examine cluster state across multiple dimensions, correlating events, resource conditions, and configuration issues. When you execute k8sgpt analyze --explain, the system generates detailed natural language explanations of detected problems. The stats mode enables performance profiling, showing execution time for each analyzer so operators can identify bottlenecks in their analysis workflows.
Data Anonymization addresses critical security concerns for production environments. K8sGPT automatically replaces sensitive identifiers—including StatefulSet names, Service endpoints, Pod labels, and Node identifiers—with reversible placeholders before transmitting data to AI backends. This feature covers 12+ resource types, ensuring comprehensive protection when using external AI providers.
Multiple AI Provider Support offers flexibility in backend selection. Organizations can choose from OpenAI, Azure OpenAI, Google Vertex AI, Amazon Bedrock, IBM WatsonX, Cohere, Hugging Face, or run entirely locally with Ollama. The modular backend architecture allows seamless provider switching through the k8sgpt auth command.
Auto Remediation empowers teams to automatically apply AI-suggested fixes. This feature operates under user control—operators decide whether to enable automatic remediation based on their operational maturity and risk tolerance.
MCP Server Integration exposes Kubernetes operations as standardized tools and resources through the Model Context Protocol. The server provides 12+ tools for cluster analysis and resource management, 3 read-only resource types, and 3 guided troubleshooting prompts. Support for both stdio and HTTP modes enables integration with Claude Desktop and custom AI workflows.
K8sGPT serves a diverse range of users—from individual developers managing personal clusters to enterprise platform teams operating large-scale production environments. Understanding common use cases helps you determine whether the tool aligns with your operational needs.
Application Troubleshooting represents the most frequent use case. When Pods remain in Pending state due to resource constraints, enter CrashLoopBackOff due to configuration errors, or services become unreachable, K8sGPT analyzes the underlying conditions and provides specific remediation steps. Teams report receiving root cause analysis within seconds—a process that previously required manual investigation across multiple CLI tools and documentation references.
Cluster Health Monitoring suits organizations requiring proactive issue detection. The k8sgpt-operator deploys within your Kubernetes cluster, enabling scheduled analyses that integrate with Prometheus metrics and Alertmanager alerting. This approach transforms reactive troubleshooting into preventive maintenance, identifying潜在风险 before they impact production workloads.
Production Data Protection concerns organizations operating in regulated industries or sensitive environments. By utilizing the --anonymize flag, all Kubernetes object names, labels, and identifiers are replaced with placeholders before reaching AI backends. Alternatively, running Ollama locally keeps all data within your network boundary—essential for financial services, healthcare, and government deployments.
Claude Desktop Integration appeals to teams preferring natural language interaction with their infrastructure. Starting the MCP server with k8sgpt serve --mcp exposes cluster operations to Claude, enabling queries like "Why is my ingress controller failing?" or "Show me all pods with memory issues."
For daily troubleshooting tasks, the local CLI approach provides the fastest feedback loop. Deploy the k8sgpt-operator when you need continuous monitoring with alerting integration. Use --anonymize in production to protect sensitive metadata regardless of your AI backend choice.
Installing and configuring K8sGPT takes minutes, with multiple deployment options to match different operational requirements. Choose the approach that best fits your workflow.
Installation varies by operating system. macOS and Linux users benefit from Homebrew: brew install k8sgpt. Windows users download pre-built binaries from the releases page and add the executable to their PATH. For Kubernetes-native deployment, the k8sgpt-operator can be installed via Helm or directly from manifests.
AI Provider Configuration uses the authentication system. Run k8sgpt auth add to configure your preferred provider:
# OpenAI configuration
k8sgpt auth add --provider openai --api-key $OPENAI_API_KEY
# Azure OpenAI
k8sgpt auth add --provider azure --api-key $AZURE_OPENAI_KEY --endpoint https://your-resource.openai.azure.com/
# Local Ollama
k8sgpt auth add --provider ollama --base-url http://localhost:11434
Basic Analysis begins with the k8sgpt analyze command. The default execution scans your current Kubernetes context and reports issues:
# Simple analysis
k8sgpt analyze
# Detailed AI explanation
k8sgpt analyze --explain
# Filter to specific analyzers
k8sgpt analyze --filter=podAnalyzer,serviceAnalyzer
# Anonymize sensitive data before sending to AI
k8sgpt analyze --explain --anonymize
Configuration Files follow platform conventions. macOS stores settings at ~/Library/Application Support/k8sgpt/k8sgpt.yaml, Linux at ~/.config/k8sgpt/k8sgpt.yaml, and Windows at %LOCALAPPDATA%/k8sgpt/k8sgpt.yaml. These files control default providers, analyzers, output formatting, and cache settings.
MCP Server Setup for Claude Desktop integration requires starting the server and configuring your AI assistant. Stdio mode works directly with Claude Desktop, while HTTP mode suits custom integrations:
# Stdio mode for Claude Desktop
k8sgpt serve --mcp
# HTTP mode for custom applications
k8sgpt serve --mcp --mcp-http --mcp-port 8089
Always use the --anonymize flag in production environments when using external AI providers. This ensures Kubernetes object names, labels, and annotations are replaced with placeholders before data leaves your infrastructure.
K8sGPT's architecture reflects modern cloud-native design principles, emphasizing modularity, extensibility, and enterprise security requirements. Understanding the technical foundations helps platform engineers evaluate integration possibilities and performance characteristics.
Modular Analyzer Architecture forms the system's backbone. Each analyzer operates as an independent module that inspects specific Kubernetes resource types. The framework handles common concerns—Kubernetes API client management, error handling, result aggregation—allowing analyzer developers to focus on domain logic. Custom analyzers follow a defined schema and expose functionality through gRPC services, enabling extensions for organization-specific requirements.
Built-in Analyzers provide out-of-the-box coverage for core Kubernetes resources. The default set includes podAnalyzer for Pod lifecycle issues, pvcAnalyzer for persistent volume claims, rsAnalyzer for ReplicaSet problems, serviceAnalyzer for service connectivity, eventAnalyzer for cluster events, ingressAnalyzer for ingress controller issues, statefulSetAnalyzer for StatefulSet concerns, deploymentAnalyzer for deployment operations, jobAnalyzer and cronJobAnalyzer for batch workloads, nodeAnalyzer for node health, and webhook analyzers for admission controller issues. ConfigMap analysis rounds out the core coverage.
Optional Analyzers extend functionality for advanced use cases: hpaAnalyzer for horizontal pod autoscaling, pdbAnalyzer for PodDisruptionBudget validation, networkPolicyAnalyzer for network policy auditing, Gateway API analyzers (gatewayClass, gateway, httproute), logAnalyzer for log aggregation, storageAnalyzer for storage class analysis, and securityAnalyzer for security context validation.
Security Posture demonstrates enterprise readiness. The project maintains OpenSSF Best Practices certification, confirming adherence to security development lifecycle requirements. Apache 2.0 licensing provides clear usage rights. FOSSA compliance checking ensures dependency licensing compatibility. The data anonymization pipeline operates before any external communication, replacing identifiers with reversible tokens that preserve analysis utility while protecting sensitive information.
Remote Caching optimizes multi-cluster operations. When managing numerous clusters, redundant analysis consumes unnecessary AI API credits. K8sGPT supports S3-compatible storage, Azure Blob Storage, and Google Cloud Storage for result caching, enabling result sharing across cluster environments.
Traditional monitoring tools like Prometheus, Grafana, or Datadog focus on metrics collection and alerting—they identify that something is wrong but leave root cause analysis to operators. K8sGPT combines anomaly detection with AI-powered diagnosis, explaining issues in natural language and suggesting specific remediation steps. Rather than reviewing dozens of metrics dashboards, operators receive actionable insights directly.
K8sGPT provides multiple privacy layers. The --anonymize flag automatically replaces Kubernetes object names, labels, annotations, and other identifiers with placeholders before data reaches AI backends. For maximum privacy, deploy Ollama locally and configure K8sGPT to use http://localhost:11434—all analysis occurs within your network with zero external data transmission. Remote caching supports S3, Azure Blob, and GCS with appropriate access controls.
The platform supports extensive provider options: OpenAI (GPT-4, GPT-4o), Azure OpenAI Service, Google Vertex AI, Amazon Bedrock (Claude, Titan, Llama models), IBM WatsonX, Cohere, Hugging Face (inference endpoints), and Ollama for local deployment. The provider-agnostic architecture allows switching backends without changing analysis workflows.
Configure the MCP server by running k8sgpt serve --mcp in stdio mode. In Claude Desktop, add the server configuration pointing to your k8sgpt executable. Once connected, you can issue natural language queries like "Analyze my default namespace" or "Why is the payment-service deployment failing?" The MCP protocol exposes tools for cluster analysis, resource management, and event retrieval.
Not necessarily. The local CLI mode handles most troubleshooting scenarios—simply point kubectl to your cluster context and run analysis. Deploy the k8sgpt-operator when you need continuous monitoring, scheduled analyses, or integration with Prometheus/Alertmanager for automated alerting. The operator approach suits production environments requiring proactive issue detection.
K8sGPT itself is free open-source software under Apache 2.0 licensing. Costs depend on your AI backend choice: OpenAI and cloud providers charge per API token based on their pricing. Running Ollama locally eliminates API costs but requires compute resources. The k8sgpt-operator runs on your existing Kubernetes infrastructure with minimal resource overhead.
K8sGPT is an open-source CLI tool that combines SRE expertise with AI capabilities to automatically diagnose and remediate Kubernetes cluster issues. It provides 12+ built-in analyzers covering Pod, Service, Deployment, and more, with data anonymization and local AI model support via Ollama. Supports multiple AI backends including OpenAI, Azure, Google Vertex, and Amazon Bedrock.
One app. Your entire coaching business
AI-powered website builder for everyone
AI dating photos that actually get matches
Popular AI tools directory for discovery and promotion
Product launch platform for founders with SEO backlinks
Cursor vs Windsurf vs GitHub Copilot — we compare features, pricing, AI models, and real-world performance to help you pick the best AI code editor in 2026.
Compare the top AI agent frameworks including LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, and LlamaIndex. Find the best framework for building multi-agent AI systems.