Advanced15 min read

Production Deployment

Deploy Kubernetes in production — managed cloud services, cluster management, high availability, and disaster recovery strategies.

Managed Kubernetes Services

Cloud-managed Kubernetes handles control plane operations: EKS (AWS), GKE (Google), AKS (Azure). They manage API server, etcd, scheduler, and controller manager. You manage worker nodes and applications.

Managed services provide automatic upgrades, integrated monitoring, and cloud-native storage/networking. Choose based on your cloud provider and ecosystem requirements.

EKS control plane costs ~$73/month per cluster
GKE Autopilot manages nodes automatically
AKS offers free control plane management

# EKS
eksctl create cluster --name prod --region us-east-1 \
  --nodegroup-name workers --node-type t3.large --nodes 3

# GKE
gcloud container clusters create prod \
  --num-nodes 3 --machine-type e2-standard-4 --region us-central1

Cluster Management

Manage cluster lifecycle with infrastructure as code (Terraform, Pulumi). Version control cluster configuration. Automate node provisioning, upgrades, and scaling.

Use node pools (or node groups) to separate workloads — system pods on dedicated nodes, application pods on auto-scaling pools. Taints and tolerations control pod placement.

# Node pool with taint for system workloads
eksctl create nodegroup --cluster prod \
  --name system --node-type t3.medium --nodes 2 \
  --node-labels role=system \
  --node-taints dedicated=system:NoSchedule

High Availability

Run control plane components across multiple availability zones. Worker nodes should span at least three AZs. Deploy application replicas with pod anti-affinity to spread across nodes and zones.

Use PodDisruptionBudgets to maintain minimum availability during voluntary disruptions (node upgrades, cluster autoscaler). Target 99.9% uptime with proper HA architecture.

spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app: web
            topologyKey: topology.kubernetes.io/zone

Disaster Recovery

Backup etcd regularly with Velero — it backs up cluster resources and persistent volume snapshots. Test restore procedures quarterly. Document RTO and RPO targets per service tier.

Multi-region failover requires DNS-based traffic switching and cross-region data replication. Start with single-region HA before adding multi-region complexity.

# Velero backup
velero install --provider aws --bucket k8s-backups
velero backup create daily-backup --include-namespaces production
velero restore create --from-backup daily-backup

Production Checklist

Before going live: enable RBAC and audit logging, configure network policies, set resource quotas, install monitoring and alerting, configure backup, test disaster recovery, and document runbooks.

Ongoing operations: regular Kubernetes version upgrades, node image updates, certificate rotation, cost optimization reviews, and security audits.