HomeAboutMeBlogGuest
© 2025 Sejin Cha. All rights reserved.
Built with Next.js, deployed on Vercel
✍🏻
Learnary (learn - diary)
/
Redis Cluster Auto FailOver 설정

Redis Cluster Auto FailOver 설정

progress
Done
Tags
DevOps
Build UpWhatWhyHow3개의 클러스터 구조를 이루고 있는 레디스에 장애를 의도6개의 노드 장애 의도해보기 REFER

Build Up


Docker Redis Cluster 구성 [docker]

What


FailOver란
실패 즉 장애를 의미한다.
AutoFailOver
실패했을 때 자동으로 원래 상태로 복구하는 것을 의미한다.
 
레디스 클러스터 모드를 설정했다고 해서 Auto FailOver를 자동으로 제공하는 것이 아니므로 이 부분을 핸들링 해야한다.

Why


  • 24시간으로 운영되는 가운데 장애 발생시 사용자는 불편함을 느낄 수 있어 사용자 이탈률이 증가하며, 수동적으로 개발자가 대응하는데 시간이 소요되기 때문이다.
  • 또한 개발자가 대응하는 사이 데이터 유실이 될 수도 있기 때문이다.
 

How


3개의 클러스터 구조를 이루고 있는 레디스에 장애를 의도

 
  1. 7002번 포트 의도적 장애내기
$ docker exec -it redis-master-2 bash $ redis-cli -c -p 7002 127.0.0.1:7002> debug segfault Error: Connection reset by peer
  1. 7002 번 로그 살펴보기
docker logs -f redis-master-2 * FAIL message received from 3c349984f0bb61490c170ab68f2617a35d9581d6 about 79816979a6dd4b226e476121dd385ed6c25e5151 # Cluster state changed: fail
 
  1. 7001번으로 클러스터 정보 살펴보기
$ docker exec -it redis-master-1 bash $ redis-cli -c -p 7001 127.0.0.1:7001> set a b (error) CLUSTERDOWN The cluster is down 127.0.0.1:7001> cluster info cluster_state:fail cluster_slots_assigned:16384 cluster_slots_ok:10922 cluster_slots_pfail:0 cluster_slots_fail:5462 cluster_known_nodes:3 cluster_size:3 cluster_current_epoch:3 cluster_my_epoch:1 cluster_stats_messages_ping_sent:435 cluster_stats_messages_pong_sent:461 cluster_stats_messages_sent:896 cluster_stats_messages_ping_received:459 cluster_stats_messages_pong_received:434 cluster_stats_messages_meet_received:2 cluster_stats_messages_fail_received:1 cluster_stats_messages_received:896 127.0.0.1:7001> cluster nodes 027a002ecc012b61a5997f151ad01bccbb65d1c0 127.0.0.1:7001@17001 myself,master - 0 1624260599000 1 connected 0-5460 3c349984f0bb61490c170ab68f2617a35d9581d6 127.0.0.1:7003@17003 master - 0 1624260599580 3 connected 10923-16383 79816979a6dd4b226e476121dd385ed6c25e5151 127.0.0.1:7002@17002 master,fail - 1624260495388 1624260493352 2 disconnected 5461-10922
  • 127.0.0.1:7001> set a b
    • master가 3개 미만인 상태라 데이터 삽입이 불가능하다.
  • cluster_state:fail
    • cluster상태가 fail로 이용이 불가능하다.
  • 127.0.0.1:7001> cluster nodes
    • 노드들의 상태정보를 검색한 결과 7002번 master가 disconnected된 것을 확인할 수 있다.
 
  1. 7002번을 다시 재시작 해본다.
$ docker restart redis-master-2 redis-master-2 $ docker ps | grep redis-master-2 de5e52fb0428 redis:6.2.3 "docker-entrypoint.s…" 9 minutes ago Up 8 seconds redis-master-2
  1. 7001번으로 돌아가 클러스터 정보 살펴보기
$ docker exec -it redis-master-1 bash $ redis-cli -c -p 7001 127.0.0.1:7001> cluster nodes 027a002ecc012b61a5997f151ad01bccbb65d1c0 127.0.0.1:7001@17001 myself,master - 0 1624260765000 1 connected 0-5460 3c349984f0bb61490c170ab68f2617a35d9581d6 127.0.0.1:7003@17003 master - 0 1624260767224 3 connected 10923-16383 79816979a6dd4b226e476121dd385ed6c25e5151 127.0.0.1:7002@17002 master - 0 1624260766218 2 connected 5461-10922
 
  1. 클러스터 정상작동 확인
127.0.0.1:7001> set a b -> Redirected to slot [15495] located at 127.0.0.1:7003 OK 127.0.0.1:7003> get a "b" 127.0.0.1:7003> set b c -> Redirected to slot [3300] located at 127.0.0.1:7001 OK 127.0.0.1:7001> set c d -> Redirected to slot [7365] located at 127.0.0.1:7002 OK 127.0.0.1:7002> get c "d"
 

6개의 노드 장애 의도해보기

 
기대해보는 시나리오
m1의 노드가 죽으면 자동으로 slave노드가 m1의 역할로 프로모션하여 레디스 자체가 단일장애 지점이 되지 않는다라는 것을 기대함.
notion image
  1. 의도적인 노드 중지
docker stop redis-master-2
  1. slave log 확인하기
$ docker logs -f redis-slave-2 # Connection with master lost. * Caching the disconnected master state. * Reconnecting to MASTER 192.168.56.101:7001 * MASTER <-> REPLICA sync started # Error condition on socket for SYNC: Connection refused ... # Failover election won: I'm the new master. # Cluster state changed: ok
  1. 클러스터 정보 확인하기
$ docker exec -it redis-master-1 bash $ redis-cli -c -p 7001 127.0.0.1:7001> cluster nodes 30a99d668af3ddda16e2a9d3ee97fb53a5ebfa6d 192.168.56.100:7002@17002 myself,slave 22110f4ea10f11a8cb6ea283dedfc27c6ffabc07 0 1624434594000 3 connected 22110f4ea10f11a8cb6ea283dedfc27c6ffabc07 192.168.56.102:7001@17001 master - 0 1624434596533 3 connected 10923-16383 094af2ab1db0d147d7f475f3954429ae7d18dee0 192.168.56.102:7002@17002 master - 0 1624434595524 7 connected 5461-10922 c952f5ef4783b5c19129bc630b88e8e3bf602622 192.168.56.101:7001@17001 master,fail - 1624434574580 1624434573000 2 disconnected 5b56d458a0d8e64d5f40ece0a99713dcb9c70723 192.168.56.100:7001@17001 master - 0 1624434595524 1 connected 0-5460 e0d9ee09b593889cd093d217a16a0b535e6abef2 192.168.56.101:7002@17002 slave 5b56d458a0d8e64d5f40ece0a99713dcb9c70723 0 1624434595626 1 connected
  • 094af2ab1db0d147d7f475f3954429ae7d18dee0 192.168.56.102:7002@17002 master
    • 기존에 slave였던 192.168.56.102:7002가 master로 승격한 것을 확인할 수 있다.
  • c952f5ef4783b5c19129bc630b88e8e3bf602622 192.168.56.101:7001@17001 master,fail
    • 장애가 발생한 master노드는 fail상태이고, disconnected 되어있다.
 
  1. 중지 노드 다시 활성하 시키기
$ docker restart redis-master-2
 
  1. 레디스 정보 다시 확인하기
$ docker exec -it redis-master-1 bash $ redis-cli -c -p 7001 127.0.0.1:7001> cluster nodes 30a99d668af3ddda16e2a9d3ee97fb53a5ebfa6d 192.168.56.100:7002@17002 myself,slave 22110f4ea10f11a8cb6ea283dedfc27c6ffabc07 0 1624434646000 3 connected 22110f4ea10f11a8cb6ea283dedfc27c6ffabc07 192.168.56.102:7001@17001 master - 0 1624434646576 3 connected 10923-16383 094af2ab1db0d147d7f475f3954429ae7d18dee0 192.168.56.102:7002@17002 master - 0 1624434646071 7 connected 5461-10922 c952f5ef4783b5c19129bc630b88e8e3bf602622 192.168.56.101:7001@17001 slave 094af2ab1db0d147d7f475f3954429ae7d18dee0 0 1624434647079 7 connected 5b56d458a0d8e64d5f40ece0a99713dcb9c70723 192.168.56.100:7001@17001 master - 0 1624434646575 1 connected 0-5460 e0d9ee09b593889cd093d217a16a0b535e6abef2 192.168.56.101:7002@17002 slave 5b56d458a0d8e64d5f40ece0a99713dcb9c70723 0 1624434646576 1 connected
 
docker compose redis 구조
redis-node1: platform: linux/x86_64 # m1 MacOS의 경우 image: redis:6.2 container_name: redis-node1 volumes: # 작성한 설정 파일을 볼륨을 통해 컨테이너에 공유 - ./redis-cluster/redis.conf:/usr/local/etc/redis/redis.conf command: redis-server /usr/local/etc/redis/redis.conf ports: - "6380:6380" - "6381:6381" - "6379:6379" - "6382:6382" - "6383:6383" - "6384:6384" redis-node2: network_mode: "service:redis-node1" platform: linux/x86_64 image: redis:6.2 container_name: redis-node2 volumes: - ./redis-cluster/redis1.conf:/usr/local/etc/redis/redis.conf command: redis-server /usr/local/etc/redis/redis.conf redis-node3: network_mode: "service:redis-node1" platform: linux/x86_64 image: redis:6.2 container_name: redis-node3 volumes: - ./redis-cluster/redis2.conf:/usr/local/etc/redis/redis.conf command: redis-server /usr/local/etc/redis/redis.conf redis-slave1: network_mode: "service:redis-node1" platform: linux/x86_64 image: redis:6.2 container_name: redis-slave1 volumes: - ./redis-cluster/redis-slave1.conf:/usr/local/etc/redis/redis.conf command: redis-server /usr/local/etc/redis/redis.conf redis-slave2: network_mode: "service:redis-node1" platform: linux/x86_64 image: redis:6.2 container_name: redis-slave2 volumes: - ./redis-cluster/redis-slave2.conf:/usr/local/etc/redis/redis.conf command: redis-server /usr/local/etc/redis/redis.conf redis-slave3: network_mode: "service:redis-node1" platform: linux/x86_64 image: redis:6.2 container_name: redis-slave3 volumes: - ./redis-cluster/redis-slave3.conf:/usr/local/etc/redis/redis.conf command: redis-server /usr/local/etc/redis/redis.conf redis-cluster-entry: network_mode: "service:redis-node1" platform: linux/x86_64 image: redis:6.2 container_name: redis-cluster-entry command: redis-cli --cluster create 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 127.0.0.1:6383 127.0.0.1:6384 --cluster-yes --cluster-replicas 1 depends_on: - redis-node1 - redis-node2 - redis-node3 restart: on-failure

 REFER


Scaling with Redis Cluster
Horizontal scaling with Redis Cluster
Scaling with Redis Cluster
https://redis.io/docs/management/scaling/