零 修订记录
序号 |
修订时间 |
修订内容 |
1 |
2021/11/18 |
新增 |
一 摘要
笨猪主要介绍ceph常见告警及维护方法或过程中遇到的问题
二 环境信息
2.1 ceph版本
[cephadmin@proceph01 ~]$ ceph -v
ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)
[cephadmin@proceph01 ~]$
2.2 操作系统版本
[cephadmin@proceph01 ~]$ cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
[cephadmin@proceph01 ~]$
三 常见告警及处理办法
3.1 pgs not deep-scrubbed in time
这个告警,处理过程中会引起新的告警,新的告警请忽略,过段时间会自动解除。处理过程会有点长
3.1.1 报错
[cephadmin@proceph01 ~]$ ceph health detail
HEALTH_WARN 1 pgs not deep-scrubbed in time
PG_NOT_DEEP_SCRUBBED 1 pgs not deep-scrubbed in time
pg 1.d6 not deep-scrubbed since 2021-11-06 02:49:03.880981
处理
[cephadmin@proceph01 ~]$ ceph pg deep-scrub 1.d6
instructing pg 1.d6 on osd.17 to deep-scrub
[cephadmin@proceph01 ~]$
3.1.2 处理
[cephadmin@proceph01 ~]$ ceph pg deep-scrub 1.d6
instructing pg 1.d6 on osd.17 to deep-scrub
[cephadmin@proceph01 ~]$
处理过程 会引起新的告警,而且处理时间也不短,我这里三个节点,18块osd ,跑了2-3个小时。
[cephadmin@proceph01 ~]$ ceph health detail
HEALTH_WARN 1 pgs not deep-scrubbed in time; 10 slow ops, oldest one blocked for 52 sec, osd.17 has slow ops
PG_NOT_DEEP_SCRUBBED 1 pgs not deep-scrubbed in time
pg 1.d6 not deep-scrubbed since 2021-11-06 02:49:03.880981
SLOW_OPS 10 slow ops, oldest one blocked for 52 sec, osd.17 has slow ops
[cephadmin@proceph01 ~]$
[cephadmin@proceph01 ~]$ ceph -s
cluster:
id: 9cdee1f8-f168-4151-82cd-f6591855ccbe
health: HEALTH_WARN
1 pgs not deep-scrubbed in time
0 slow ops, oldest one blocked for 39 sec, osd.17 has slow ops
services:
mon: 3 daemons, quorum proceph01,proceph02,proceph03 (age 4M)
mgr: proceph01(active, since 5M), standbys: proceph03, proceph02
osd: 18 osds: 18 up (since 4M), 18 in (since 4M)
data:
pools: 1 pools, 512 pgs
objects: 5.98M objects, 22 TiB
usage: 67 TiB used, 64 TiB / 131 TiB avail
pgs: 508 active+clean
4 active+clean+scrubbing+deep
io:
client: 47 MiB/s rd, 17 MiB/s wr, 234 op/s rd, 1.24k op/s wr
[cephadmin@proceph01 ~]$
3.1.3 处理完成
[cephadmin@proceph01 ~]$ ceph -s
cluster:
id: 9cdee1f8-f168-4151-82cd-f6591855ccbe
health: HEALTH_OK
services:
mon: 3 daemons, quorum proceph01,proceph02,proceph03 (age 4M)
mgr: proceph01(active, since 5M), standbys: proceph03, proceph02
osd: 18 osds: 18 up (since 4M), 18 in (since 4M)
data:
pools: 1 pools, 512 pgs
objects: 6.02M objects, 23 TiB
usage: 67 TiB used, 64 TiB / 131 TiB avail
pgs: 509 active+clean
3 active+clean+scrubbing+deep
io:
client: 13 MiB/s rd, 54 MiB/s wr, 40 op/s rd, 1.52k op/s wr
[cephadmin@proceph01 ~]$