Advanced13 min read

Health Checks & Resource Limits

Configure container health checks, resource limits, and restart policies for reliable, production-grade container operations.

Docker Health Checks

Health checks determine if a container is functioning, not just running. Define with HEALTHCHECK in Dockerfile or healthcheck in compose.yml. Docker runs the check command periodically and marks the container healthy or unhealthy.

Orchestrators use health status for routing decisions — unhealthy containers are removed from load balancers. Compose depends_on with condition: service_healthy waits for checks to pass.

start_period gives the app time to boot before checks count as failures
Use CMD-SHELL for commands with pipes or operators
Health check commands must exist inside the container image

HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

# compose.yml
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
  interval: 30s
  timeout: 5s
  retries: 3
  start_period: 40s

Writing Effective Health Endpoints

Health endpoints should verify critical dependencies — database connectivity, cache availability, and external service reachability. Return 200 for healthy, 503 for unhealthy, with JSON details.

Separate liveness (is the process alive?) from readiness (can it handle traffic?). Liveness failures trigger restarts; readiness failures remove the container from rotation without restarting.

// /health endpoint
app.get('/health', async (req, res) => {
  try {
    await db.query('SELECT 1');
    res.json({ status: 'healthy', uptime: process.uptime() });
  } catch (err) {
    res.status(503).json({ status: 'unhealthy', error: err.message });
  }
});

Memory Limits

Set memory limits with --memory (hard limit) and --memory-reservation (soft limit). When a container exceeds the hard limit, the OOM killer terminates it. The soft limit triggers reclaim pressure before the hard limit.

Monitor memory usage with docker stats and set limits based on observed peak usage plus headroom. Node.js applications need limits that account for V8 heap growth.

docker run -d \
  --memory=512m \
  --memory-reservation=256m \
  --memory-swap=512m \
  myapp:latest

# compose.yml
deploy:
  resources:
    limits:
      memory: 512M
    reservations:
      memory: 256M

CPU Limits

Limit CPU with --cpus (e.g., 1.5 for one and a half cores) or --cpu-shares for relative weighting. CPU limits prevent a single container from starving others on shared hosts.

CPU limits do not guarantee minimum CPU — use reservations for that. On Kubernetes, requests guarantee minimum allocation while limits cap maximum usage.

--cpus=0.5 allows half a core of CPU time
cpu-shares are relative — 512 vs default 1024 means half priority
Without limits, containers can use all available host CPU

docker run -d \
  --cpus=1.0 \
  --cpu-shares=512 \
  myapp:latest

# compose.yml
deploy:
  resources:
    limits:
      cpus: '1.0'
    reservations:
      cpus: '0.25'

Restart Policies and Recovery

Restart policies control container behavior after exit. no (default) does not restart. on-failure restarts on non-zero exit. always restarts regardless. unless-stopped restarts unless manually stopped.

Combine health checks with restart policies and orchestrator-level recovery for self-healing deployments. A container that fails health checks should be replaced automatically.

docker run -d --restart unless-stopped myapp

# Swarm restart policy
deploy:
  restart_policy:
    condition: on-failure
    delay: 5s
    max_attempts: 3
    window: 120s

Monitoring and Alerting

Export container metrics to Prometheus via cAdvisor or the Docker daemon metrics endpoint. Alert on: container restarts, OOM kills, health check failures, and resource usage approaching limits.

Set up dashboards showing CPU, memory, network, and disk I/O per container. Correlate metric spikes with deployment events to catch regressions quickly.

# Enable Docker metrics in daemon.json
{
  "metrics-addr": "127.0.0.1:9323",
  "experimental": true
}

# Alert on container restarts
# Prometheus: rate(container_restart_count[5m]) > 0