Production Deployment
Deploy Kubernetes in production — managed cloud services, cluster management, high availability, and disaster recovery strategies.
Managed Kubernetes Services
Cloud-managed Kubernetes handles control plane operations: EKS (AWS), GKE (Google), AKS (Azure). They manage API server, etcd, scheduler, and controller manager. You manage worker nodes and applications.
Managed services provide automatic upgrades, integrated monitoring, and cloud-native storage/networking. Choose based on your cloud provider and ecosystem requirements.
- EKS control plane costs ~$73/month per cluster
- GKE Autopilot manages nodes automatically
- AKS offers free control plane management
# EKS eksctl create cluster --name prod --region us-east-1 \ --nodegroup-name workers --node-type t3.large --nodes 3 # GKE gcloud container clusters create prod \ --num-nodes 3 --machine-type e2-standard-4 --region us-central1
Cluster Management
Manage cluster lifecycle with infrastructure as code (Terraform, Pulumi). Version control cluster configuration. Automate node provisioning, upgrades, and scaling.
Use node pools (or node groups) to separate workloads — system pods on dedicated nodes, application pods on auto-scaling pools. Taints and tolerations control pod placement.
# Node pool with taint for system workloads eksctl create nodegroup --cluster prod \ --name system --node-type t3.medium --nodes 2 \ --node-labels role=system \ --node-taints dedicated=system:NoSchedule
High Availability
Run control plane components across multiple availability zones. Worker nodes should span at least three AZs. Deploy application replicas with pod anti-affinity to spread across nodes and zones.
Use PodDisruptionBudgets to maintain minimum availability during voluntary disruptions (node upgrades, cluster autoscaler). Target 99.9% uptime with proper HA architecture.
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: web
topologyKey: topology.kubernetes.io/zoneDisaster Recovery
Backup etcd regularly with Velero — it backs up cluster resources and persistent volume snapshots. Test restore procedures quarterly. Document RTO and RPO targets per service tier.
Multi-region failover requires DNS-based traffic switching and cross-region data replication. Start with single-region HA before adding multi-region complexity.
# Velero backup velero install --provider aws --bucket k8s-backups velero backup create daily-backup --include-namespaces production velero restore create --from-backup daily-backup
Production Checklist
Before going live: enable RBAC and audit logging, configure network policies, set resource quotas, install monitoring and alerting, configure backup, test disaster recovery, and document runbooks.
Ongoing operations: regular Kubernetes version upgrades, node image updates, certificate rotation, cost optimization reviews, and security audits.
# Production readiness checklist # ✓ RBAC configured with least privilege # ✓ Network policies enforced # ✓ Resource requests/limits on all pods # ✓ Liveness and readiness probes configured # ✓ Monitoring and alerting operational # ✓ Backup and restore tested # ✓ PodDisruptionBudgets defined # ✓ Ingress TLS configured # ✓ Image scanning in CI pipeline # ✓ Runbooks for common incidents