Dockerでの Patroni + etcd によるPostgreSQLのHA構成

IT系

最近のコンテナや仮想化環境でのデータベースの冗長化について、Patroni + etcdという構成が多いようだという事で早速勉強してみました!

イメージは以下のような構成です。

etcdではSSOT(Single Source of Truth)という事で、今回は1つしか起動していませんが、本来であれば複数(奇数)起動しておき、多数決(Quorum)でプライマリを監視・管理するべきですが、今回は Patroni+PostgreSQLの動作の勉強という事でご容赦下さいませ。

今回のディレクトリツリーは以下の通りです。

.
 |-- Dockerfile.patroni
 |-- docker-compose.yml
 |-- etcd-data
 |-- node1-data
 |-- node2-data
 `-- patroni
     |-- node1.yml
     `-- node2.yml

Patroni他パッケージ入りPostgreSQLイメージを作成

Dockerfile.patroni

FROM postgres:16

USER root

RUN apt-get update -y && \
    apt-get install -y python3 python3-venv python3-pip jq && \
    rm -rf /var/lib/apt/lists/*

RUN python3 -m venv /opt/patroni-venv

RUN /opt/patroni-venv/bin/pip install --upgrade pip && \
    /opt/patroni-venv/bin/pip install "patroni[etcd]" psycopg2-binary

ENV PATH="/opt/patroni-venv/bin:$PATH"

RUN mkdir -p /var/lib/postgresql/data && \
    chown -R postgres:postgres /var/lib/postgresql

USER postgres

CMD ["bash"]

ビルドする

$ sudo docker build -f Dockerfile.patroni -t patroni-pg:16 .

$ sudo docker images | grep patroni-pg
WARNING: This output is designed for human readability. For machine-readable output, please use --format.
patroni-pg:16                          a35649c86e35        873MB             0B

Patroniの設定を作成する

node1用の設定

./patoroni/node1.yml

scope: demo-cluster
name: node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: node1:8008

etcd:
  host: etcd:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
        wal_level: replica
        hot_standby: on
        max_wal_senders: 10
        max_replication_slots: 10
    initdb:
      - encoding: UTF8
      - data-checksums
    users:
      postgres:
        password: admin@pass
        options:
          - superuser
          - createdb
      replicator:
        password: replica@pass
        options:
          - replication

postgresql:
  listen: 0.0.0.0:5432
  connect_address: node1:5432
  data_dir: /var/lib/postgresql/data
  authentication:
    superuser:
      username: postgres
      password: admin@pass
    replication:
      username: replicator
      password: replica@pass
  parameters:
    shared_buffers: 256MB
    max_connections: 100
  pg_hba:
    - local all all trust
    - host all all 0.0.0.0/0 md5
    - host replication replicator 0.0.0.0/0 md5

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false

node2の設定

./patroni/node2.yml

scope: demo-cluster
name: node2

restapi:
  listen: 0.0.0.0:8008
  connect_address: node2:8008

etcd:
  host: etcd:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
        wal_level: replica
        hot_standby: on
        max_wal_senders: 10
        max_replication_slots: 10
    initdb:
      - encoding: UTF8
      - data-checksums
    users:
      postgres:
        password: admin@pass
        options:
          - superuser
          - createdb
      replicator:
        password: replica@pass
        options:
          - replication

postgresql:
  listen: 0.0.0.0:5432
  connect_address: node1:5432
  data_dir: /var/lib/postgresql/data
  authentication:
    superuser:
      username: postgres
      password: admin@pass
    replication:
      username: replicator
      password: replica@pass
  parameters:
    shared_buffers: 256MB
    max_connections: 100
  pg_hba:
    - local all all trust
    - host all all 0.0.0.0/0 md5
    - host replication replicator 0.0.0.0/0 md5

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false

docker-composeの作成

./docker-compose.yml

services:
  etcd:
    image: quay.io/coreos/etcd:v3.5.11
    container_name: etcd
    command:
      - /usr/local/bin/etcd
      - --name=etcd0
      - --data-dir=/etcd-data
      - --listen-client-urls=http://0.0.0.0:2379
      - --advertise-client-urls=http://etcd:2379
      - --listen-peer-urls=http://0.0.0.0:2380
      - --initial-advertise-peer-urls=http://etcd:2380
      - --initical-cluster=etcd0=http://etcd:2380
      - --initical-cluster-state=new
      - --enable-v2=true
    volumes:
      - ./etcd-data:/etcd-data
    ports:
      - "2379:2379"
    networks:
      - patroni-net
    restart: unless-stopped

  node1:
    image: patroni-pg:16
    container_name: patroni-node1
    depends_on:
      - etcd
    user: "postgres"
    environment:
      PATRONI_LOG_LEVEL: INFO
    volumes:
      - ./node1-data:/var/lib/postgresql/data
      - ./patroni/node1.yml:/etc/patroni.yml
    command: ["patroni", "/etc/patroni.yml"]
    ports:
      - "5432:5432"   # node1 PostgreSQL
      - "8008:8008"   # Patroni REST for node1
    networks:
      - patroni-net
    restart: unless-stopped

  node2:
    image: patroni-pg:16
    container_name: patroni-node2
    depends_on:
      - etcd
    user: "postgres"
    environment:
      PATRONI_LOG_LEVEL: INFO
    volumes:
      - ./node2-data:/var/lib/postgresql/data
      - ./patroni/node2.yml:/etc/patroni.yml
    command: ["patroni", "/etc/patroni.yml"]
    ports:
      - "5433:5432"   # node1 PostgreSQL
      - "8009:8008"   # Patroni REST for node2
    networks:
      - patroni-net
    restart: unless-stopped

networks:
  patroni-net:
    driver: bridge

パーミション設定

% sudo chmod 700 ./node1-data ./node2-data ./etcd-data
% sudo chown -R 999:999 ./node1-data ./node2-data

起動

まずはじめにetcdだけ起動する。多分imageをまだ取得していないので時間がかかるので。

% sudo docker-compose up -d etcd
Creating network "pg-patroni-ha_patroni-net" with driver "bridge"
Pulling etcd (quay.io/coreos/etcd:v3.5.11)...
v3.5.11: Pulling from coreos/etcd
63b450eae87c: Pull complete
960043b8858c: Pull complete
b4ca4c215f48: Pull complete
eebb06941f3e: Pull complete
02cd68c0cbf6: Pull complete
d3c894b5b2b0: Pull complete
b40161cd83fc: Pull complete
46ba3f23f1d3: Pull complete
4fa131a1b726: Pull complete
2dee0c551ea9: Pull complete
14b7aea02afd: Pull complete
2670ef230b3a: Pull complete
18b2a73de8d9: Pull complete
9c0f3e2fb65e: Pull complete
Digest: sha256:8429758911821e36a15e2fe6ae75cfd563a501c97af0b3d119c74921f5fb6f19
Status: Downloaded newer image for quay.io/coreos/etcd:v3.5.11
Creating etcd ... done

$ sudo docker logs etcd

※エラーが無いかポート番号がちゃんとリッスンできているかなどを確認

次に、ndoe1,2を起動します

% sudo docker-compose up -d node1 node2

※ちなみにですが、docker-composeからはnode1, node2 としてアクセスできますが、dockerからは、patroni-node1, patroni-node2としてアクセスすることになりますのでご注意

動作確認

node1の起動ログを確認

% sudo docker logs -f patroni-node1
2026-01-10 04:24:23,321 INFO: No PostgreSQL configuration items changed, nothing to reload.
2026-01-10 04:24:23,350 INFO: Systemd integration is not supported
2026-01-10 04:24:23,360 INFO: Lock owner: None; I am node1
2026-01-10 04:24:23,363 INFO: waiting for leader to bootstrap

node2の起動とそのログ

$ sudo docker logs -f patroni-node2
etcd is up-to-date
Creating patroni-node2 ... done
root@catalyst:/home/saka/work/pg-patroni-ha# docker logs patroni-node2
2026-01-10 05:03:17,654 INFO: No PostgreSQL configuration items changed, nothing to reload.
2026-01-10 05:03:17,678 INFO: Systemd integration is not supported
2026-01-10 05:03:17,682 WARNING: Postgresql is not running.
2026-01-10 05:03:17,682 INFO: Lock owner: node1; I am node2
2026-01-10 05:03:17,684 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7593529639153778709
  Database cluster state: shut down in recovery
  pg_control last modified: Sat Jan 10 05:03:03 2026
  Latest checkpoint location: 0/3000228
  Latest checkpoint's REDO location: 0/30001F0
  Latest checkpoint's REDO WAL file: 000000070000000000000003
  Latest checkpoint's TimeLineID: 7
  Latest checkpoint's PrevTimeLineID: 7
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:744
  Latest checkpoint's NextOID: 16389
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 723
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 744
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sat Jan 10 04:31:29 2026
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/30002D8
  Min recovery ending loc's timeline: 7
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 0
  Mock authentication nonce: ea482b8120dd4ae<snip>c90d74394

2026-01-10 05:03:17,685 INFO: Lock owner: node1; I am node2
2026-01-10 05:03:17,694 INFO: Local timeline=7 lsn=0/30002D8
2026-01-10 05:03:17,721 INFO: primary_timeline=7
2026-01-10 05:03:17,721 INFO: Lock owner: node1; I am node2
2026-01-10 05:03:17,725 INFO: starting as a secondary
2026-01-10 05:03:18,132 INFO: postmaster pid=23
2026-01-10 05:03:18.135 UTC [23] LOG:  starting PostgreSQL 16.10 (Debian 16.10-1.pgdg13+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 14.2.0-19) 14.2.0, 64-bit
2026-01-10 05:03:18.135 UTC [23] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2026-01-10 05:03:18.153 UTC [23] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2026-01-10 05:03:18.164 UTC [27] LOG:  database system was shut down in recovery at 2026-01-10 05:03:03 UTC
2026-01-10 05:03:18.164 UTC [27] LOG:  entering standby mode
2026-01-10 05:03:18.174 UTC [27] LOG:  redo starts at 0/30001F0
2026-01-10 05:03:18.174 UTC [27] LOG:  consistent recovery state reached at 0/30002D8
2026-01-10 05:03:18.174 UTC [27] LOG:  invalid record length at 0/30002D8: expected at least 24, got 0
2026-01-10 05:03:18.174 UTC [23] LOG:  database system is ready to accept read-only connections
localhost:5432 - accepting connections
2026-01-10 05:03:18.201 UTC [28] LOG:  started streaming WAL from primary at 0/3000000 on timeline 7
localhost:5432 - accepting connections
2026-01-10 05:03:18,233 INFO: Lock owner: node1; I am node2
2026-01-10 05:03:18,233 INFO: establishing a new patroni heartbeat connection to postgres
2026-01-10 05:03:18,260 INFO: no action. I am (node2), a secondary, and following a leader (node1)
2026-01-10 05:03:20,285 INFO: no action. I am (node2), a secondary, and following a leader (node1)

patroniの動作確認

// node1

$ curl http://192.168.1.147:8008/
{"state": "running", "postmaster_start_time": "2026-01-10 04:31:29.119878+00:00", "role": "primary", "server_version": 160010, "xlog": {"location": 50332376}, "timeline": 7, "replication": [{"usename": "replicator", "application_name": "node2", "client_addr": "172.24.0.4", "state": "streaming", "sync_state": "async", "sync_priority": 0}], "dcs_last_seen": 1768020910, "database_system_identifier": "7593529639153778709", "patroni": {"version": "4.1.0", "scope": "demo-cluster", "name": "node1"}}


// node2

$ curl http://192.168.1.147:8009/
{"state": "running", "postmaster_start_time": "2026-01-10 04:54:46.642707+00:00", "role": "replica", "server_version": 160010, "xlog": {"received_location": 50332376, "replayed_location": 50332376, "replayed_timestamp": null, "paused": false}, "timeline": 7, "replication_state": "streaming", "dcs_last_seen": 1768020920, "database_system_identifier": "7593529639153778709", "patroni": {"version": "4.1.0", "scope": "demo-cluster", "name": "node2"}}

※node1が role:primaryで、node2が role:replica になっていることを確認する

PostgreSQLがreplicaされているか確認する。

node2には何も無い事を確認する。

$ sudo docker exec -it patroni-node2 psql -U postgres -d postgres
psql (16.10 (Debian 16.10-1.pgdg13+1))
Type "help" for help.

postgres=# \d
Did not find any relations.

node1にもまだ何も無い事を確認してテーブルを作成する。

$ sudo docker exec -it patroni-node1 psql -U postgres -d postgres
psql (16.10 (Debian 16.10-1.pgdg13+1))
Type "help" for help.

postgres=# \d
Did not find any relations.

postgres=# CREATE TABLE employees (
postgres(# id SERIAL PRIMARY KEY,
postgres(# name TEXT NOT NULL,
postgres(# role TEXT NOT NULL,
postgres(# salary NUMERIC(10,2)
postgres(# );
CREATE TABLE

postgres=# INSERT INTO employees (name, role, salary) VALUES
('saka', 'DBA', 100.00),
('yoko', 'Developer', 200.00);
INSERT 0 2

node2にレプリケーションされていることを確認する。

$ sudo  docker exec -it patroni-node2 psql -U postgres -d postgres
psql (16.10 (Debian 16.10-1.pgdg13+1))
Type "help" for help.

postgres=# \d
                List of relations
 Schema |       Name       |   Type   |  Owner
--------+------------------+----------+----------
 public | employees        | table    | postgres
 public | employees_id_seq | sequence | postgres
(2 rows)

postgres=# SELECT * FROM employees;
 id | name |   role    | salary
----+------+-----------+--------
  1 | saka | DBA       | 100.00
  2 | yoko | Developer | 200.00
(2 rows)

リカバリ状態を確認する

postgres=# SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

※tは、trueの意味。読み取り専用

FailOver/FailBackの確認

FailOver

異常の発生と切替り

node1を停止したのち、node2が現用になっていることを確認する。

$ sudo docker stop patroni-node1 ; sudo docker logs -f patroni-node2
patroni-node1
2026-01-10 05:03:17,654 INFO: No PostgreSQL configuration items changed, nothing to reload.
2026-01-10 05:03:17,678 INFO: Systemd integration is not supported
2026-01-10 05:03:17,682 WARNING: Postgresql is not running.
2026-01-10 05:03:17,682 INFO: Lock owner: node1; I am node2
2026-01-10 05:03:17,684 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7593529639153778709
  Database cluster state: shut down in recovery
  pg_control last modified: Sat Jan 10 05:03:03 2026
  Latest checkpoint location: 0/3000228
  Latest checkpoint's REDO location: 0/30001F0
  Latest checkpoint's REDO WAL file: 000000070000000000000003
  Latest checkpoint's TimeLineID: 7
  Latest checkpoint's PrevTimeLineID: 7
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:744
  Latest checkpoint's NextOID: 16389
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 723
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 744
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Sat Jan 10 04:31:29 2026
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/30002D8
  Min recovery ending loc's timeline: 7
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: replica
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 0
  Mock authentication nonce: ea482b8120dd4<snip>cf1c90d74394

2026-01-10 05:03:17,685 INFO: Lock owner: node1; I am node2
2026-01-10 05:03:17,694 INFO: Local timeline=7 lsn=0/30002D8
2026-01-10 05:03:17,721 INFO: primary_timeline=7
2026-01-10 05:03:17,721 INFO: Lock owner: node1; I am node2
2026-01-10 05:03:17,725 INFO: starting as a secondary
2026-01-10 05:03:18,132 INFO: postmaster pid=23
2026-01-10 05:03:18.135 UTC [23] LOG:  starting PostgreSQL 16.10 (Debian 16.10-1.pgdg13+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 14.2.0-19) 14.2.0, 64-bit
2026-01-10 05:03:18.135 UTC [23] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2026-01-10 05:03:18.153 UTC [23] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2026-01-10 05:03:18.164 UTC [27] LOG:  database system was shut down in recovery at 2026-01-10 05:03:03 UTC
2026-01-10 05:03:18.164 UTC [27] LOG:  entering standby mode
2026-01-10 05:03:18.174 UTC [27] LOG:  redo starts at 0/30001F0
2026-01-10 05:03:18.174 UTC [27] LOG:  consistent recovery state reached at 0/30002D8
2026-01-10 05:03:18.174 UTC [27] LOG:  invalid record length at 0/30002D8: expected at least 24, got 0
2026-01-10 05:03:18.174 UTC [23] LOG:  database system is ready to accept read-only connections
localhost:5432 - accepting connections
2026-01-10 05:03:18.201 UTC [28] LOG:  started streaming WAL from primary at 0/3000000 on timeline 7
localhost:5432 - accepting connections
2026-01-10 05:03:18,233 INFO: Lock owner: node1; I am node2
2026-01-10 05:03:18,233 INFO: establishing a new patroni heartbeat connection to postgres
2026-01-10 05:03:18,260 INFO: no action. I am (node2), a secondary, and following a leader (node1)
2026-01-10 05:03:20,285 INFO: no action. I am (node2), a secondary, and following a leader (node1)
   :
   :
2026-01-10 05:21:37,231 INFO: no action. I am (node2), the leader with the lock

※ node2がnode1のsecondary から leaderに昇格しています

node2にデータが投入できるか確認する。

postgres=# SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
(1 row)

postgres=# INSERT INTO employees(name, role, salary) VALUES('nanasi', 'DevOps', 500.00);
INSERT 0 1

postgres=# SELECT * FROM employees;
 id |  name  |   role    | salary
----+--------+-----------+--------
  1 | saka   | DBA       | 100.00
  2 | yoko   | Developer | 200.00
 34 | nanasi | DevOps    | 500.00
(3 rows)

異常の復旧と確認

node1を復旧させてみます。復旧後はnode1がreplicaでnode2がprimaryとなっていることを確認します。

$ sudo docker start patroni-node1
patroni-node1

$ curl http://192.168.1.147:8008/
{"state": "running", "postmaster_start_time": "2026-01-10 05:53:07.375152+00:00", "role": "replica", "server_version": 160010, "xlog": {"received_location": 50579312, "replayed_location": 50579312, "replayed_timestamp": "2026-01-10 05:30:39.238103+00:00", "paused": false}, "timeline": 8, "replication_state": "streaming", "dcs_last_seen": 1768024407, "database_system_identifier": "7593529639153778709", "patroni": {"version": "4.1.0", "scope": "demo-cluster", "name": "node1"}}

$ curl http://192.168.1.147:8009/
{"state": "running", "postmaster_start_time": "2026-01-10 05:03:18.158145+00:00", "role": "primary", "server_version": 160010, "xlog": {"location": 50579312}, "timeline": 8, "replication": [{"usename": "replicator", "application_name": "node1", "client_addr": "172.24.0.3", "state": "streaming", "sync_state": "async", "sync_priority": 0}], "dcs_last_seen": 1768024417, "database_system_identifier": "7593529639153778709", "patroni": {"version": "4.1.0", "scope": "demo-cluster", "name": "node2"}}

※ node1がreplicaで、node2がprimaryになっていることが確認できます

FailBack

今度はnode2を停止させると、node1がprimaryに戻ります。

$ sudo docker stop patroni-node2; sudo docker logs -f patroni-node1
<snip>
2026-01-10 04:24:35,763 INFO: no action. I am (node1), a secondary, and following a leader (node2)
   :
   :
2026-01-10 04:31:30,283 INFO: no action. I am (node1), the leader with the lock


$ sudo docker exec -it patroni-node1 psql -U postgres -d postgres -c 'SELECT pg_is_in_recovery();'
 pg_is_in_recovery
-------------------
 f
(1 row)

$ curl http://192.168.1.147:8008/
{"state": "running", "postmaster_start_time": "2026-01-10 05:53:07.375152+00:00", "role": "primary", "server_version": 160010, "xlog": {"location": 50579712}, "timeline": 9, "dcs_last_seen": 1768025242, "database_system_identifier": "7593529639153778709", "patroni": {"version": "4.1.0", "scope": "demo-cluster", "name": "node1"}}

node2を起動して確認

$ sudo docker start patroni-node2
patroni-node2

$ sudo docker logs -f patroni-node2
<snip>
2026-01-10 06:09:12,393 INFO: no action. I am (node2), a secondary, and following a leader (node1)

$ curl http://192.168.1.147:8009/
{"state": "running", "postmaster_start_time": "2026-01-10 06:08:56.881990+00:00", "role": "replica", "server_version": 160010, "xlog": {"received_location": 50579712, "replayed_location": 50579712, "replayed_timestamp": null, "paused": false}, "timeline": 9, "replication_state": "streaming", "dcs_last_seen": 1768025532, "database_system_identifier": "7593529639153778709", "patroni": {"version": "4.1.0", "scope": "demo-cluster", "name": "node2"}}

付録

etcdが落ちたらどうなるの?

落ちてる間は両方がreplica状態になった。

$ docker stop etcd
etcd

$ curl http://192.168.1.147:8008/
{"state": "running", "postmaster_start_time": "2026-01-10 06:18:22.938926+00:00", "role": "replica", "server_version": 160010, "xlog": {"received_location": 50579832, "replayed_location": 50579832, "replayed_timestamp": null, "paused": false}, "timeline": 9, "replication": [{"usename": "replicator", "application_name": "node2", "client_addr": "172.24.0.4", "state": "streaming", "sync_state": "async", "sync_priority": 0}], "cluster_unlocked": true, "dcs_last_seen": 1768025882, "database_system_identifier": "7593529639153778709", "patroni": {"version": "4.1.0", "scope": "demo-cluster", "name": "node1"}}

$ curl http://192.168.1.147:8009/
{"state": "running", "postmaster_start_time": "2026-01-10 06:08:56.881990+00:00", "role": "replica", "server_version": 160010, "xlog": {"received_location": 50579832, "replayed_location": 50579832, "replayed_timestamp": null, "paused": false}, "timeline": 9, "replication_state": "streaming", "cluster_unlocked": true, "dcs_last_seen": 1768025882, "database_system_identifier": "7593529639153778709", "patroni": {"version": "4.1.0", "scope": "demo-cluster", "name": "node2"}}

etcdを起動したのちは、node2に切り替わって正常動作に戻った

$ sudo docker exec -it patroni-node1 psql -U postgres -d postgres -c 'SELECT pg_is_in_recovery();'
 pg_is_in_recovery
-------------------
 t
(1 row)

$ sudo docker exec -it patroni-node2 psql -U postgres -d postgres -c 'SEL
ECT pg_is_in_recovery();'
 pg_is_in_recovery
-------------------
 f
(1 row)


$ curl http://192.168.1.147:8008/
{"state": "running", "postmaster_start_time": "2026-01-10 06:18:22.938926+00:00", "role": "replica", "server_version": 160010, "xlog": {"received_location": 50580112, "replayed_location": 50580112, "replayed_timestamp": null, "paused": false}, "timeline": 10, "replication_state": "streaming", "dcs_last_seen": 1768026319, "database_system_identifier": "7593529639153778709", "patroni": {"version": "4.1.0", "scope": "demo-cluster", "name": "node1"}}

$ curl http://192.168.1.147:8009/
{"state": "running", "postmaster_start_time": "2026-01-10 06:08:56.881990+00:00", "role": "primary", "server_version": 160010, "xlog": {"location": 50580112}, "timeline": 10, "replication": [{"usename": "replicator", "application_name": "node1", "client_addr": "172.24.0.3", "state": "streaming", "sync_state": "async", "sync_priority": 0}], "dcs_last_seen": 1768026329, "database_system_identifier": "7593529639153778709", "patroni": {"version": "4.1.0", "scope": "demo-cluster", "name": "node2"}}

maximum_lag_on_failoverの動作について

1. 何に使われるのか

この値を読むのは Patroni の Python プロセスです。

  • etcd → この値を保存
  • Patroni → etcd からこの値を読み取り、フェイルオーバー時に評価

2. 何を制御しているのか

フェイルオーバー時、Patroni は各レプリカについて

「今のマスターからどれだけ WAL が遅れているか」

pg_stat_replicationpg_last_wal_receive_lsn() などで計算します。

その差が

現在のWAL位置 - レプリカの受信済WAL位置

で計測され、バイト単位で比較されます。

maximum_lag_on_failover はその許容上限です。


3. 1048576 の意味

1048576 bytes = 1 MiB

つまりこの設定は:

「レプリカが最大 1MB まで遅れていても昇格を許可する」

という意味になります。


4. フェイルオーバー時に何が起きるか

たとえば

ノードWAL遅延
replica1200 KB
replica2800 KB
replica33 MB

でマスターが落ちた場合:

  • replica1 → OK
  • replica2 → OK
  • replica3 → 除外される

Patroni は maximum_lag_on_failover を超えたノードを
**「危険な候補」**として昇格対象から除外します。


5. なぜこの制限が必要か

これが無いと次の事故が起きます:

  1. マスターが死ぬ
  2. かなり遅れているレプリカが昇格
  3. クライアントは「成功した」と思っていたトランザクションが消える
  4. データ破損に見える

maximum_lag_on_failover
**データ損失を許容できる最大量(RPO)**を直接指定しているパラメータです。


6. pg_rewind との関係

use_pg_rewind: true がある場合でも:

  • pg_rewind は「追いつかせる」ためのツール
  • しかし「失われた WAL」は戻らない

ので、この制限は依然として重要です。


7. 実務的な目安

システム推奨値
銀行・課金系0 または 64k
業務DB1MB〜10MB
ログ・分析100MB〜

あなたの構成(業務系 + Django + Redmine + PostgreSQL)であれば
1MB はかなり保守的で良い設定です。


まとめ

maximum_lag_on_failover: 1048576

「Patroni がレプリカを新マスターに昇格させるとき、
WALの遅れが 1MB 以内のノードしか選ばない」

という データ消失防止のためのガードレールです。

PostgreSQLではなく Patroniのフェイルオーバーアルゴリズムの中核に効いています。

タイトルとURLをコピーしました