Implementing Zero-Downtime Deployments

Deployment patterns that let teams ship faster without waking users or on-call engineers.

Baikal Signal
Release engineering with fewer sharp edges and cleaner rollback paths.

Zero-downtime deployment ensures your service remains available during updates. This guide covers proven strategies and implementation patterns.

Deployment Strategies

Three main approaches exist for zero-downtime deployments:

  • Blue-Green: Run two identical environments, switch traffic instantly
  • Rolling: Gradually replace instances one by one
  • Canary: Route small percentage to new version first

Blue-Green Deployments

Maintain two production environments. Deploy to inactive one, then switch:

# Deploy to green environment
                                                                        kubectl apply -f deployment-green.yaml
                                                                        
                                                                        # Verify green is healthy
                                                                        kubectl get pods -l env=green
                                                                        
                                                                        # Switch traffic
                                                                        kubectl patch service myapp -p '{"spec":{"selector":{"env":"green"}}}'
                                                                        
                                                                        # Monitor for issues
                                                                        # If problems, switch back instantly

Advantages

  • Instant rollback capability
  • Test production environment before switching
  • No mixed version state

Disadvantages

  • Requires 2x infrastructure
  • Database migrations need special handling
  • Stateful applications are complex

Rolling Updates

Replace instances gradually to maintain availability:

apiVersion: apps/v1
                                                                        kind: Deployment
                                                                        metadata:
                                                                          name: myapp
                                                                        spec:
                                                                          replicas: 6
                                                                          strategy:
                                                                            type: RollingUpdate
                                                                            rollingUpdate:
                                                                              maxSurge: 2        # Allow 2 extra pods during update
                                                                              maxUnavailable: 1  # At most 1 pod down at a time

The update process:

  1. Create new pod with updated version
  2. Wait for health check to pass
  3. Terminate old pod
  4. Repeat until all pods updated

Health Checks

Critical for zero-downtime deployments. Implement both readiness and liveness probes:

livenessProbe:
                                                                          httpGet:
                                                                            path: /healthz
                                                                            port: 8080
                                                                          initialDelaySeconds: 10
                                                                          periodSeconds: 5
                                                                        
                                                                        readinessProbe:
                                                                          httpGet:
                                                                            path: /ready
                                                                            port: 8080
                                                                          initialDelaySeconds: 5
                                                                          periodSeconds: 3

Your health endpoint should verify:

  • Database connectivity
  • Required dependencies availability
  • Critical background workers running

Rollback Procedures

Always have a rollback plan. With Kubernetes:

# View deployment history
                                                                        kubectl rollout history deployment/myapp
                                                                        
                                                                        # Rollback to previous version
                                                                        kubectl rollout undo deployment/myapp
                                                                        
                                                                        # Rollback to specific revision
                                                                        kubectl rollout undo deployment/myapp --to-revision=3

Automated Rollback

Implement automatic rollback on error rate threshold:

if error_rate > 5% for 2 minutes:
                                                                          trigger rollback
                                                                          alert oncall team
                                                                          create incident

Summary

Zero-downtime deployment is achievable with proper planning. Choose the strategy that fits your constraints, implement comprehensive health checks, and always have a tested rollback procedure. Start with rolling updates for stateless services and consider blue-green for critical systems.