瀚高数据库
目录
环境
文档用途
详细信息
环境
系统平台:N/A
版本:4.5
文档用途
HAC集群环境中,因某种特殊原因需要删除当前data目录并重建数据库,能够快速搭建集群;避免重新安装。
详细信息
1、所有节点停止hghac服务,删除原data目录,重新在主节点initdb(原配置的HAC集群文件不变)
[root@db data]# systemctl stop hghac-vip[root@db data]# initdb -e sm4 -c "echo *******" -D /db/hgdbdata/data
2、启动节点1的HAC服务,此时集群信息显示异常
[root@db data]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list+ Cluster: ha (7072987311974756506) +-----------+| Member | Host | Role | State | TL | Lag in MB |+--------+------+------+-------+----+-----------++--------+------+------+-------+----+-----------+[root@db data]# systemctl status hghac-vip
● hghac-vip.service - hghac
Loaded: loaded (/etc/systemd/system/hghac-vip.service; enabled; vendor preset: disabled)Active: failed (Result: exit-code) since Fri -03-18 12:16:07 CST; 3min 7s agoProcess: 44961 ExecStart=/opt/HighGo/tools/hghac/hghac /opt/HighGo/tools/hghac/hghac.yaml (code=exited, status=1/FAILURE)Main PID: 44961 (code=exited, status=1/FAILURE)Mar 18 12:16:05 db systemd[1]: Started hghac.Mar 18 12:16:07 db systemd[1]: hghac-vip.service: main process exited, code=exited, status=1/FAILUREMar 18 12:16:07 db systemd[1]: Unit hghac-vip.service entered failed state.Mar 18 12:16:07 db systemd[1]: hghac-vip.service failed.[root@db data]# systemctl start hghac-vip[root@db data]# systemctl status hghac-vip
● hghac-vip.service - hghac
Loaded: loaded (/etc/systemd/system/hghac-vip.service; enabled; vendor preset: disabled)Active: failed (Result: exit-code) since Fri -03-18 12:19:26 CST; 2min 13s agoProcess: 45581 ExecStart=/opt/HighGo/tools/hghac/hghac /opt/HighGo/tools/hghac/hghac.yaml (code=exited, status=1/FAILURE)Main PID: 45581 (code=exited, status=1/FAILURE)Mar 18 12:19:24 db systemd[1]: Started hghac.Mar 18 12:19:26 db systemd[1]: hghac-vip.service: main process exited, code=exited, status=1/FAILUREMar 18 12:19:26 db systemd[1]: Unit hghac-vip.service entered failed state.Mar 18 12:19:26 db systemd[1]: hghac-vip.service failed.
3、HAC集群日志中会报错集群的identifier与原来不一致(因为重新建库了):
[root@db hghalog]# pwd/db/hgdbdata/hghalog[root@db hghalog]# tail -f patroni.log -03-18 12:16:06,807 INFO: Selected new etcd server http://192.168.80.111:2379-03-18 12:16:06,828 INFO: No PostgreSQL configuration items changed, nothing to reload.-03-18 12:16:06,890 CRITICAL: system ID mismatch, node hghaca belongs to a different cluster: 7072987311974756506 != 7076286699020760566-03-18 12:19:25,967 INFO: Selected new etcd server http://192.168.80.113:2379-03-18 12:19:25,992 INFO: No PostgreSQL configuration items changed, nothing to reload.-03-18 12:19:26,063 CRITICAL: system ID mismatch, node hghaca belongs to a different cluster: 7072987311974756506 != 7076286699020760566
4、各节点重启etcd和hghac服务后,还是报错如上。
[root@db ~]# /opt/HighGo/tools/hghac/etcd/amd64/etcdctl endpoint status --write-out=table+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+|ENDPOINT| ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+| http://192.168.80.111:2379 | ddbfd190d03ca278 | 3.4.15 | 20 kB |false |false | 218 | 1686066 | 1686066 | || http://192.168.80.112:2379 | 1c703f0b65f7bddb | 3.4.15 | 20 kB |false |false | 218 | 1686066 | 1686066 | || http://192.168.80.113:2379 | 92255e8f5c9ebfcd | 3.4.15 | 20 kB |true |false | 218 | 1686066 | 1686066 | |+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+[root@db ~]# systemctl start hghac-vip[root@db ~]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list+ Cluster: ha (7072987311974756506) +-----------+| Member | Host | Role | State | TL | Lag in MB |+--------+------+------+-------+----+-----------++--------+------+------+-------+----+-----------+
5、原因分析:因为etcd的库文件中记录了此信息,需重新生成etcd的相关信息
[root@db etcd]# pwd/opt/HighGo/tools/etcd[root@db etcd]# lshgdw1.etcd[root@db etcd]# pwd/opt/HighGo/tools/etcd[root@db etcd]# lshgdw1.etcd[root@db etcd]# mv hgdw1.etcd hgdw1.etcd.bak <--所有节点都改名此目录或删除此目录[root@db etcd]# systemctl stop etcd[root@db etcd]# systemctl start etcd[root@db etcd]# pwd/opt/HighGo/tools/etcd[root@db etcd]# lltotal 0drwx------ 3 root root 20 Mar 18 12:34 hgdw1.etcddrwx------. 3 root root 20 Mar 18 12:27 hgdw1.etcd.bak <--重启etcd会重新生成该目录及其下的所有文件,[root@db etcd]# /opt/HighGo/tools/hghac/etcd/amd64/etcdctl endpoint status --write-out=table+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+|ENDPOINT| ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+| http://192.168.80.111:2379 | ddbfd190d03ca278 | 3.4.15 | 20 kB |true |false | 2 |8 | 8 | || http://192.168.80.112:2379 | 1c703f0b65f7bddb | 3.4.15 | 20 kB |false |false | 2 |8 | 8 | || http://192.168.80.113:2379 | 92255e8f5c9ebfcd | 3.4.15 | 20 kB |false |false | 2 |8 | 8 | |+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+[root@db etcd]# [root@db etcd]# systemctl start hghac-vip <--此时启动HAC,集群信息显示正常[root@db etcd]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list+ Cluster: ha (7076286699020760566) ----+---------+----+-----------+-----------------+| Member | Host| Role | State | TL | Lag in MB | Pending restart |+--------+---------------------+--------+---------+----+-----------+-----------------+| hghaca | 192.168.80.111:5866 | Leader | running | 2 | | *|+--------+---------------------+--------+---------+----+-----------+-----------------+[root@db etcd]#
启动其他节点的HAC,结果如下:
[root@db etcd]# /opt/HighGo/tools/hghac/hghactl -c /opt/HighGo/tools/hghac/hghac.yaml list+ Cluster: ha (7076286699020760566) -----+---------+----+-----------+-----------------+| Member | Host| Role | State | TL | Lag in MB | Pending restart |+--------+---------------------+---------+---------+----+-----------+-----------------+| hghaca | 192.168.80.111:5866 | Leader | running | 2 | | *|| hghacb | 192.168.80.112:5866 | Replica | running | 2 | 0 | *|| hghacc | 192.168.80.113:5866 | Replica | running | 2 | 0 | *|+--------+---------------------+---------+---------+----+-----------+-----------------+