High Availability Deployment Across Data Centers¶

Scenario Requirements¶

The customer's data center environment consists of a single Kubernetes (k8s) cluster spanning Data Center A and Data Center B. They want to deploy a 3-master 3-slave Redis cluster to achieve high availability across the data centers. They expect Redis to continue providing services even when one data center is offline.

svc

Solution¶

To meet the high availability requirements across the two data centers, the Redis replicas need to be deployed in the following manner:

3 leader replicas running on cluster nodes: k8s-node-01, k8s-node-02, k8s-node-06
3 follower replicas running on cluster nodes: k8s-node-03, k8s-node-04, k8s-node-05
Ensure that each cluster node runs only one Redis replica

This solution uses workload scheduling strategies to achieve the deployment goals through weighted node affinity and workload anti-affinity policies.

Note

Please ensure that each node has sufficient resources to avoid scheduler failure due to resource shortage.

1. Label Configuration¶

Redis Workload Labels¶

To schedule the leader and follower replicas separately, the Redis replicas are divided using labels:

Redis Replica	Label
redis-leader	app:redis-leader
redis-follower	app:redis-follower

Cluster Node Labels¶

To allocate the leader and follower replicas to two different data centers, two topological domains need to be defined among the 6 cluster nodes. The labels for each cluster node are as follows:

Cluster Node	Label	Topological Domain
k8s-node-01	az1:node01	az1
k8s-node-02	az1:node02	az1
k8s-node-06	az1:node03	az1
k8s-node-04	az2:node01	az2
k8s-node-05	az2:node02	az2
k8s-node-03	az2:node03	az2

2. Scheduling Configuration¶

The redis-leader and redis-follower replicas need to be scheduled in different topological domains, so separate affinity policies need to be configured as follows:

redis-leader

# Apply workload anti-affinity to the cluster nodes (k8s-node-01, k8s-node-02, k8s-node-06) within the topological domain __az1__, ensuring that only one leader replica is scheduled per cluster node.
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - redis-leader
          topologyKey: az1

# Apply node affinity scheduling for the replicas of redis-leader within the topological domain __az1__ on cluster nodes (k8s-node-01, k8s-node-02, k8s-node-06)
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          preference:
            matchExpressions:
              - key: az1
                operator: In
                values:
                  - node01
        - weight: 90
          preference:
            matchExpressions:
              - key: az1
                operator: In
                values:
                  - node02
        - weight: 80
          preference:
            matchExpressions:
              - key: az1
                operator: In
                values:
                  - node03

redis-follower

# Apply workload anti-affinity to the cluster nodes (k8s-node-03, k8s-node-04, k8s-node-05) within the topological domain __az2__ for the replicas of redis-follower
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - redis-follower
          topologyKey: az2

# Apply node affinity scheduling for the replicas of redis-follower within the topological domain __az2__ on cluster nodes (k8s-node-03, k8s-node-04, k8s-node-05)
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          preference:
            matchExpressions:
              - key: az2
                operator: In
                values:
                  - node01
        - weight: 90
          preference:
            matchExpressions:
              - key: az2
                operator: In
                values:
                  - node02
        - weight: 80
          preference:
            matchExpressions:
              - key: az2
                operator: In
                values:
                  - node03

Handling Data Center Offline¶

Data Center A Offline¶

When Data Center A is offline, two Redis-leader replicas will go offline, and the entire Redis cluster will be unable to provide normal services, as shown in the following diagram:

sync

Solution

Use the redis-cli tool to connect to any redis-follower replica in Data Center B and manually convert it to a leader replica.

# Connect to a follower node
redis-cli -h <ip> -p <port>
# Password authentication, the password can be found in the instance overview page of the middleware module
auth <password>
# Perform role conversion for the node
cluster failover takeover
# Check the role information of the node, it should have changed
role

After the role conversion of a replica in Data Center B, the cluster can resume its service capability. When Data Center A comes back online, the original redis-leader replica will join the Redis instance as a follower.

Data Center B Offline¶

When Data Center B is offline, only one redis-leader replica will go offline, and the Redis service will not be interrupted. No manual intervention is required, as shown in the following diagram: