一个RGW环境的更新,ceph 12.2.12升级到14.2.4流程,跳过中间的13版本。 注意:升级很危险,操作需谨慎。升级没有后悔药,本人不承担任何因升级及相关操作导致的任何数据丢失风险。
yum源里面将旧的
https://mirrors.aliyun.com/ceph/rpm-luminous/el7/x86_64/
替换为
https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64/
之后更新yum源信息,使用install即可完成二进制包的升级。
yum clean all
yum makecache
yum install ceph ceph-radosgw
软件版本升级以后还要使用下面的命令依次,重启MON,MGR,OSD,最后是RGW
systemctl stop ceph-mon@*
systemctl stop ceph-mgr@*
systemctl stop ceph-osd@*
systemctl stop ceph-radosgw@*
升级后出现“Legacy BlueStore stats reporting”和“ 3 monitors have not enabled msgr2”,两种类型的异常。 出现“Legacy BlueStore stats reporting” 是因为底层数据结构发生变化导致。 出现“3 monitors have not enabled msgr2” 是因为新版本需要默认开启msgr2的通信模块。
[root@demohost-229 supdev]# ceph -s
cluster:
id: a293ad23-f310-480b-ab2a-5629f2aeef45
health: HEALTH_WARN
Legacy BlueStore stats reporting detected on 6 OSD(s)
3 monitors have not enabled msgr2
services:
mon: 3 daemons, quorum demohost-227,demohost-228,demohost-229 (age 4m)
mgr: demohost-229(active, since 4m), standbys: demohost-227, demohost-228
osd: 6 osds: 6 up, 6 in
rgw: 3 daemons active (demohost-227, demohost-228, demohost-229)
data:
pools: 7 pools, 184 pgs
objects: 279.96k objects, 92 GiB
usage: 295 GiB used, 3.0 TiB / 3.3 TiB avail
pgs: 184 active+clean
io:
client: 55 KiB/s rd, 0 B/s wr, 55 op/s rd, 37 op/s wr
[root@demohost-229 supdev]# ceph -v
ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
[root@demohost-227 supdev]# ceph health detail
HEALTH_WARN Legacy BlueStore stats reporting detected on 6 OSD(s); 3 monitors have not enabled msgr2
BLUESTORE_LEGACY_STATFS Legacy BlueStore stats reporting detected on 6 OSD(s)
osd.0 legacy statfs reporting detected, suggest to run store repair to get consistent statistic reports
osd.1 legacy statfs reporting detected, suggest to run store repair to get consistent statistic reports
osd.2 legacy statfs reporting detected, suggest to run store repair to get consistent statistic reports
osd.3 legacy statfs reporting detected, suggest to run store repair to get consistent statistic reports
osd.4 legacy statfs reporting detected, suggest to run store repair to get consistent statistic reports
osd.5 legacy statfs reporting detected, suggest to run store repair to get consistent statistic reports
MON_MSGR2_NOT_ENABLED 3 monitors have not enabled msgr2
mon.demohost-227 is not bound to a msgr2 port, only v1:172.17.61.227:6789/0
mon.demohost-228 is not bound to a msgr2 port, only v1:172.17.61.228:6789/0
mon.demohost-229 is not bound to a msgr2 port, only v1:172.17.61.229:6789/0
先修复OSD相关的异常,流程为:停OSD服务,执行“ceph-bluestore-tool repair”,之后再启动OSD服务,将所有OSD依次全部这样操作一遍即可。以修复OSD.1 为例
[root@demohost-227 supdev]# systemctl stop ceph-osd@1
[root@demohost-227 supdev]# ls /var/lib/ceph/osd/ceph-1
activate.monmap block bluefs ceph_fsid fsid keyring kv_backend magic mkfs_done osd_key ready require_osd_release type whoami
[root@demohost-227 supdev]# ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-1
2019-12-02 14:41:06.607 7faf98bfcf80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: legacy statfs record found, removing
2019-12-02 14:41:06.607 7faf98bfcf80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool 8
2019-12-02 14:41:06.607 7faf98bfcf80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool a
2019-12-02 14:41:06.607 7faf98bfcf80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool c
2019-12-02 14:41:06.607 7faf98bfcf80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool d
2019-12-02 14:41:06.607 7faf98bfcf80 -1 bluestore(/var/lib/ceph/osd/ceph-1) fsck error: missing Pool StatFS record for pool ffffffffffffffff
repair success
[root@demohost-227 supdev]# systemctl start ceph-osd@1
[root@demohost-227 supdev]# ceph -s
cluster:
id: a293ad23-f310-480b-ab2a-5629f2aeef45
health: HEALTH_WARN
Legacy BlueStore stats reporting detected on 5 OSD(s)
3 monitors have not enabled msgr2
services:
mon: 3 daemons, quorum demohost-227,demohost-228,demohost-229 (age 11m)
mgr: demohost-229(active, since 11m), standbys: demohost-227, demohost-228
osd: 6 osds: 6 up, 6 in
rgw: 3 daemons active (demohost-227, demohost-228, demohost-229)
data:
pools: 7 pools, 184 pgs
objects: 279.96k objects, 92 GiB
usage: 294 GiB used, 3.0 TiB / 3.3 TiB avail
pgs: 184 active+clean
io:
recovery: 367 B/s, 5 objects/s
之后修复mgr2的问题,随便找台机器执行开启命令即可。
[root@demohost-229 supdev]# ceph -s
cluster:
id: a293ad23-f310-480b-ab2a-5629f2aeef45
health: HEALTH_WARN
3 monitors have not enabled msgr2
services:
mon: 3 daemons, quorum demohost-227,demohost-228,demohost-229 (age 19m)
mgr: demohost-229(active, since 19m), standbys: demohost-227, demohost-228
osd: 6 osds: 6 up, 6 in
rgw: 3 daemons active (demohost-227, demohost-228, demohost-229)
data:
pools: 7 pools, 184 pgs
objects: 279.96k objects, 92 GiB
usage: 293 GiB used, 3.0 TiB / 3.3 TiB avail
pgs: 184 active+clean
io:
client: 7.1 KiB/s rd, 7 op/s rd, 0 op/s wr
recovery: 156 B/s, 2 objects/s
[root@demohost-227 tools]# ceph mon enable-msgr2
[root@demohost-227 tools]# ceph -s
cluster:
id: a293ad23-f310-480b-ab2a-5629f2aeef45
health: HEALTH_OK
services:
mon: 3 daemons, quorum demohost-227,demohost-228,demohost-229 (age 13s)
mgr: demohost-229(active, since 22m), standbys: demohost-227, demohost-228
osd: 6 osds: 6 up, 6 in
rgw: 3 daemons active (demohost-227, demohost-228, demohost-229)
data:
pools: 7 pools, 184 pgs
objects: 279.96k objects, 92 GiB
usage: 293 GiB used, 3.0 TiB / 3.3 TiB avail
pgs: 184 active+clean
io:
client: 14 KiB/s rd, 0 B/s wr, 13 op/s rd, 10 op/s wr
升级操作不复杂,但是里面会遇上各种奇葩问题,升级尽量控制在小版本的维度,如果是这种跨大版本,老司机都容易翻车,所以一点要谨慎。