Implementing Zero-Downtime Deployments

Baikal Signal

Release engineering with fewer sharp edges and cleaner rollback paths.

Deployment Strategies
Blue-Green Deployments
Rolling Updates
Health Checks
Rollback Procedures

Zero-downtime deployment ensures your service remains available during updates. This guide covers proven strategies and implementation patterns.

Deployment Strategies

Three main approaches exist for zero-downtime deployments:

Blue-Green: Run two identical environments, switch traffic instantly
Rolling: Gradually replace instances one by one
Canary: Route small percentage to new version first

Blue-Green Deployments

Maintain two production environments. Deploy to inactive one, then switch:

# Deploy to green environment
                                                                        kubectl apply -f deployment-green.yaml
                                                                        
                                                                        # Verify green is healthy
                                                                        kubectl get pods -l env=green
                                                                        
                                                                        # Switch traffic
                                                                        kubectl patch service myapp -p '{"spec":{"selector":{"env":"green"}}}'
                                                                        
                                                                        # Monitor for issues
                                                                        # If problems, switch back instantly

Advantages

Instant rollback capability
Test production environment before switching
No mixed version state

Disadvantages

Requires 2x infrastructure
Database migrations need special handling
Stateful applications are complex

Rolling Updates

Replace instances gradually to maintain availability:

apiVersion: apps/v1
                                                                        kind: Deployment
                                                                        metadata:
                                                                          name: myapp
                                                                        spec:
                                                                          replicas: 6
                                                                          strategy:
                                                                            type: RollingUpdate
                                                                            rollingUpdate:
                                                                              maxSurge: 2        # Allow 2 extra pods during update
                                                                              maxUnavailable: 1  # At most 1 pod down at a time

The update process:

Create new pod with updated version
Wait for health check to pass
Terminate old pod
Repeat until all pods updated

Health Checks

Critical for zero-downtime deployments. Implement both readiness and liveness probes:

livenessProbe:
                                                                          httpGet:
                                                                            path: /healthz
                                                                            port: 8080
                                                                          initialDelaySeconds: 10
                                                                          periodSeconds: 5
                                                                        
                                                                        readinessProbe:
                                                                          httpGet:
                                                                            path: /ready
                                                                            port: 8080
                                                                          initialDelaySeconds: 5
                                                                          periodSeconds: 3

Your health endpoint should verify:

Database connectivity
Required dependencies availability
Critical background workers running

Rollback Procedures

Always have a rollback plan. With Kubernetes:

# View deployment history
                                                                        kubectl rollout history deployment/myapp
                                                                        
                                                                        # Rollback to previous version
                                                                        kubectl rollout undo deployment/myapp
                                                                        
                                                                        # Rollback to specific revision
                                                                        kubectl rollout undo deployment/myapp --to-revision=3

Automated Rollback

Implement automatic rollback on error rate threshold:

if error_rate > 5% for 2 minutes:
                                                                          trigger rollback
                                                                          alert oncall team
                                                                          create incident

Summary

Zero-downtime deployment is achievable with proper planning. Choose the strategy that fits your constraints, implement comprehensive health checks, and always have a tested rollback procedure. Start with rolling updates for stateless services and consider blue-green for critical systems.

Implementing Zero-Downtime Deployments

Table of Contents

Deployment Strategies

Blue-Green Deployments

Advantages

Disadvantages

Rolling Updates

Health Checks

Rollback Procedures

Automated Rollback

Summary

Related Articles

Kubernetes Production Readiness Checklist

Building a Modern Observability Stack

Microservices Communication Patterns