> 文章列表 > ceph 维护系列(二)--卸载osd

ceph 维护系列(二)--卸载osd

ceph 维护系列(二)--卸载osd

一 摘要

本文主要介绍从ceph 某台节点上卸载一块或者多块osd(硬盘)

二 环境信息

2.1 操作系统版本

[root@proceph05 ~]# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core) 
[root@proceph05 ~]#

2.2 ceph 版本

[cephadmin@proceph05 ~]$ ceph -v
ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)
[cephadmin@proceph05 ~]$

2.3 ceph 集群概况

目前有5台ceph 节点,每台上各6块硬盘。

三 实施

本文参考官方文档https://docs.ceph.com/en/nautilus/rados/operations/add-or-rm-osds/

3.1 查看osd 状态

ceph osd tree

3.2 Take the OSD out of the Cluster

ceph osd out osd.29

3.3 Observe the Data Migration

[cephadmin@proceph05 ~]$ ceph -wcluster:id:     9cdee1f8-f168-4151-82cd-f6591855ccbehealth: HEALTH_WARN10 nearfull osd(s)1 pool(s) nearfullLow space hindering backfill (add storage if this doesn't resolve itself): 19 pgs backfill_toofull4 pgs not deep-scrubbed in time8 pgs not scrubbed in timeservices:mon: 5 daemons, quorum proceph01,proceph02,proceph03,proceph04,proceph05 (age 5w)mgr: proceph01(active, since 16M), standbys: proceph03, proceph02, proceph04, proceph05osd: 30 osds: 30 up (since 9M), 27 in (since 41h); 20 remapped pgsdata:pools:   1 pools, 512 pgsobjects: 13.89M objects, 53 TiBusage:   158 TiB used, 61 TiB / 218 TiB availpgs:     541129/41679906 objects misplaced (1.298%)486 active+clean19  active+remapped+backfill_toofull6   active+clean+scrubbing+deep1   active+remapped+backfillingio:client:   1018 KiB/s rd, 41 MiB/s wr, 65 op/s rd, 2.83k op/s wrrecovery: 13 MiB/s, 3 objects/sprogress:Rebalancing after osd.28 marked out[=========================.....]Rebalancing after osd.29 marked out[==========================....]Rebalancing after osd.27 marked out[=========================.....]

3.4 Stopping the OSD

在待卸载osd 服务器上用root 用户操作 ,停止osd 服务

[root@proceph05 ~]# systemctl status ceph-osd@28
● ceph-osd@28.service - Ceph object storage daemon osd.28Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: disabled)Active: active (running) since 三 2022-06-29 15:01:50 CST; 9 months 20 days ago
Main PID: 32987 (ceph-osd)CGroup: /system.slice/system-ceph\\x2dosd.slice/ceph-osd@28.service└─32987 /usr/bin/ceph-osd -f --cluster ceph --id 28 --setuser ceph --setgroup ceph
[root@proceph05 ~]# systemctl status ceph-osd@28
● ceph-osd@28.service - Ceph object storage daemon osd.28Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: disabled)Active: inactive (dead) since 四 2023-04-20 09:52:55 CST; 1min 18s agoProcess: 32987 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
Main PID: 32987 (code=exited, status=0/SUCCESS)

检查 osd up 数量已经变少,但是还是有30个osd

[cephadmin@proceph05 ~]$ ceph -scluster:id:     9cdee1f8-f168-4151-82cd-f6591855ccbehealth: HEALTH_WARN10 nearfull osd(s)1 pool(s) nearfullLow space hindering backfill (add storage if this doesn't resolve itself): 14 pgs backfill_toofullDegraded data redundancy: 377301/41316627 objects degraded (0.913%), 14 pgs degraded, 14 pgs undersized12 pgs not deep-scrubbed in time9 pgs not scrubbed in timeservices:mon: 5 daemons, quorum proceph01,proceph02,proceph03,proceph04,proceph05 (age 5w)mgr: proceph01(active, since 16M), standbys: proceph03, proceph02, proceph04, proceph05osd: 30 osds: 27 up (since 5m), 27 in (since 2d); 14 remapped pgs

3.5 Removing the OSD

cephadmin 用户操作

[cephadmin@proceph05 ~]$ ceph osd purge 29 --yes-i-really-mean-it
purged osd.29

检查

[cephadmin@proceph05 ~]$ ceph -scluster:id:     9cdee1f8-f168-4151-82cd-f6591855ccbehealth: HEALTH_WARN10 nearfull osd(s)1 pool(s) nearfullLow space hindering backfill (add storage if this doesn't resolve itself): 13 pgs backfill_toofullDegraded data redundancy: 350543/41316651 objects degraded (0.848%), 13 pgs degraded, 13 pgs undersized12 pgs not deep-scrubbed in time9 pgs not scrubbed in timeservices:mon: 5 daemons, quorum proceph01,proceph02,proceph03,proceph04,proceph05 (age 5w)mgr: proceph01(active, since 16M), standbys: proceph03, proceph02, proceph04, proceph05osd: 29 osds: 27 up (since 7m), 27 in (since 2d); 93 remapped pgs

可见osd 总数量已经由30减少为29了。
也可以用ceph osd tree 检查。

3.6 从服务器上把卸载的osd 对应硬盘拆下

我是从服务器bmc, 将硬盘offline ,然后去ceph 里看看有没有osd down
,若没有则 offline 没问题,若有请立即online,然后重启该osd即可。

[cephadmin@proceph05 ~]$ sudo systemctl reset-failed ceph-osd@25
[cephadmin@proceph05 ~]$ sudo systemctl start ceph-osd@25
[cephadmin@proceph05 ~]$ sudo systemctl status ceph-osd@25
● ceph-osd@25.service - Ceph object storage daemon osd.25Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: disabled)Active: active (running) since Thu 2023-04-20 10:27:59 CST; 33s agoProcess: 276984 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 276990 (ceph-osd)CGroup: /system.slice/system-ceph\\x2dosd.slice/ceph-osd@25.service└─276990 /usr/bin/ceph-osd -f --cluster ceph --id 25 --setuser ceph --setgroup ceph