Hi all,



Thanks for your time last week. With an active-active cluster requirement confirmed, we summarized the key notes:

Two solutions discussed,
Replicate NLB ---> ALB ---> EKS Cluster flow: Easier migration approach from current single EKS cluster to active-active setup, but for blue/green deployments, DNS records may cache on the client side:
Set Route53 DNS TTL to 60 seconds
Implement gradual traffic shift: reduce traffic to blue environment before updates, monitor NLB connection draining, then complete the cutover
Consider client-side connection refresh mechanisms if traffic routing doesn't behave as expected due to persistent connections
· TargetGroupBinding Approach: 1/ high risk to make current ALBs detach from aws-load-balancer-controller management: potential ALB recreating(ingress recreating), cache management of ALB, etc. 2/how to remove the ingress object safely, may not be deleted gracefully. (edited)

2. Active-active EKS clusters tips and some recommendations,

Be Stateless from application or EKS perspective: don't store session data locally, no configuration in the local files, etc.
No circular traffic patterns between EKS clusters
Kube-burner is for EKS control plane scaling test.
Recommend validating the DR in Ningxia region work as expected.
Recommend build an Observation platform for easier monitoring, tracking


Plus, Key Considerations for EKS cluster's upgrade with Blue/Green,

Network Planning and Resource Isolation
Pay careful attention to the network planning for both new and existing clusters. Since both EKS clusters will be deployed within the same VPC, it is critical to strictly differentiate the subnet configurations, security group settings, and corresponding resource tags between the clusters to prevent accidental deletion of any resources.

ALB Metrics and Health Check Configuration
Monitor ALB metrics closely and configure proper health checks for critical services. If ALB connection counts accurately reflect business traffic patterns, prioritize monitoring these metrics as key performance indicators.

Route 53 Weighted Routing TTL Configuration
When implementing traffic management through Route 53 weighted routing, ensure that the TTL is set to no more than 60 seconds to enable rapid traffic switching capabilities.

Testing Environment Validation
It is strongly recommended to conduct comprehensive testing of the entire setup in a testing environment before implementing in production.
 
 
Back to Top