Ceph常见问题

文章列表

Ceph常见问题

1. CephFS问题诊断

1.1 无法创建

创建新CephFS报错Error EINVAL: pool ‘rbd-ssd’ already contains some objects. Use an empty pool instead，解决办法：

ceph fs new cephfs rbd-ssd rbd-hdd --force

1.2 mds.0 is damaged

断电后出现此问题。MDS进程报错： Error recovering journal 0x200: (5) Input/output error。诊断过程：

# 健康状况
ceph health detail
# HEALTH_ERR mds rank 0 is damaged; mds cluster is degraded
# mds.0 is damaged# 文件系统详细信息，可以看到唯一的MDS Boron启动不了
ceph fs status
# cephfs - 0 clients
# ======
# +------+--------+-----+----------+-----+------+
# | Rank | State  | MDS | Activity | dns | inos |
# +------+--------+-----+----------+-----+------+
# |  0   | failed |     |          |     |      |
# +------+--------+-----+----------+-----+------+
# +---------+----------+-------+-------+
# |   Pool  |   type   |  used | avail |
# +---------+----------+-------+-------+
# | rbd-ssd | metadata |  138k |  106G |
# | rbd-hdd |   data   | 4903M | 2192G |
# +---------+----------+-------+-------+# +-------------+
# | Standby MDS |
# +-------------+
# |    Boron    |
# +-------------+# 显示错误原因
ceph tell mds.0 damage
# terminate called after throwing an instance of 'std::out_of_range'
#   what():  map::at
# Aborted# 尝试修复，无效
ceph mds repaired 0# 尝试导出CephFS日志，无效
cephfs-journal-tool journal export backup.bin
# 2019-10-17 16:21:34.179043 7f0670f41fc0 -1 Header 200.00000000 is unreadable
# 2019-10-17 16:21:34.179062 7f0670f41fc0 -1 journal_export: Journal not readable, attempt object-by-object dump with `rados`Error ((5) Input/output error)# 尝试重日志修复，无效
# 尝试将journal中所有可回收的 inodes/dentries 写到后端存储（如果版本比后端更高）
cephfs-journal-tool event recover_dentries summary
# Events by type:
# Errors: 0
# 2019-10-17 16:22:00.836521 7f2312a86fc0 -1 Header 200.00000000 is unreadable# 尝试截断日志，无效
cephfs-journal-tool journal reset 
# got error -5from Journaler, failing
# 2019-10-17 16:22:14.263610 7fe6717b1700  0 client.6494353.journaler.resetter(ro) error getting journal off disk
# Error ((5) Input/output error)# 删除重建，数据丢失
ceph fs rm cephfs  --yes-i-really-mean-it## 又一次遇到此问题# 深度清理，发现200.00000000存在数据不一致
ceph osd deep-scrub all
40.14 shard 14: soid 40:292cf221:::200.00000000:head data_digest0x6ebfd975 != data_digest 0x9e943993 from auth oi 40:292cf221:::200.00000000:head(22366'34 mds.0.902:1 dirty|data_digest|omap_digest s 90 uv 34 dd 9e943993 od ffffffff alloc_hint [0 0 0])                                                                              
40.14 deep-scrub 0 missing, 1 inconsistent objects
40.14 deep-scrub 1 errors# 查看RADOS不一致对象详细信息
rados list-inconsistent-obj  40.14  --format=json-pretty
{"epoch": 23060,"inconsistents": [{"object": {"name": "200.00000000",},"errors": [],"union_shard_errors": [# 错误原因，校验信息不一致"data_digest_mismatch_info"],"selected_object_info": {"oid": {"oid": "200.00000000",},},"shards": [{"osd": 7,"primary": true,"errors": [],"size": 90,"omap_digest": "0xffffffff"},{"osd": 14,"primary": false,# errors：分片之间存在不一致，而且无法确定哪个分片坏掉了，原因：
#    data_digest_mismatch 此副本的摘要信息和主副本不一样
#    size_mismatch 此副本的数据长度和主副本不一致
#    read_error 可能存在磁盘错误"errors": [# 这里的原因是两个副本的摘要不一致"data_digest_mismatch_info"],"size": 90,"omap_digest": "0xffffffff","data_digest": "0x6ebfd975"}]}]
}
# 转为处理inconsistent问题，停止OSD.14，Flush 日志，启动OSD.14，执行PG修复
# 无效…… 执行PG修复后Ceph会自动以权威副本覆盖不一致的副本，但是并非总能生效，
# 例如，这里的情况，主副本的数据摘要信息丢失# 删除故障对象
rados -p rbd-ssd  rm 200.00000000

2. OSD问题诊断

2.1 启动后立即崩溃

通常可以认为属于Ceph的Bug。这些Bug可能因为数据状态引发，有些时候将崩溃OSD的权重清零，可以恢复：

# 尝试解决osd.17启动后立即宕机
ceph osd reweight 17 0

3. PG问题诊断

3.1 所有PG卡在unkown

如果创建一个存储池后，其所有PG都卡在此状态，可能原因是CRUSH map不正常。你可以配置osd_crush_update_on_start为true让集群自动调整CRUSH map。

3.2 卡在peering

ceph -s显示如下状态，长期不恢复：

  cluster:       health: HEALTH_WARN                                          Reduced data availability: 2 pgs inactive, 2 pgs peering19 slow requests are blocked > 32 secdata:pgs:     0.391% pgs not active510 active+clean2   peering

此案例中，使用此PG的Pod呈Known状态。

检查卡在inactive状态的PG：

ceph pg dump_stuck inactivePG_STAT STATE   UP     UP_PRIMARY ACTING ACTING_PRIMARY  
17.68   peering [3,12]          3 [3,12]              3
16.32   peering [4,12]          4 [4,12]              4

输出其中一个PG的诊断信息，片断如下：

// ceph pg 17.68 query
{                                               "info": {                                            "stats": {"state": "peering","stat_sum": {"num_objects_dirty": 5},"up": [3,12],"acting": [3,12],// 因为哪个OSD而阻塞"blocked_by": [12],"up_primary": 3,"acting_primary": 3}},"recovery_state": [// 如果顺利，第一个元素应该是 "name": "Started/Primary/Active"{"name": "Started/Primary/Peering/GetInfo","enter_time": "2018-06-11 18:32:39.594296",// 但是，卡在向OSD 12 请求信息这一步上"requested_info_from": [{"osd": "12"}]},{"name": "Started/Primary/Peering",},{"name": "Started",}]
}

没有获得osd-12阻塞Peering的明确原因。

查看日志，osd-12位于10.0.0.104，osd-3位于10.0.0.100，后者为Primary OSD。

osd-3日志，在18:26开始出现，和所有其它OSD之间心跳检测失败。此时10.0.0.100负载很高，卡死。

osd-12日志，在18:26左右大量出现：

osd.12 466 heartbeat_check: no reply from 10.0.0.100:6803 osd.4 since back 2018-06-11 18:26:44.973982 ...

直到18:44分仍然无法进行心跳检测，重启osd-12后一切恢复正常。

3.3 incomplete

检查无法完成的PG：

ceph pg dump_stuck# PG_STAT STATE      UP     UP_PRIMARY ACTING ACTING_PRIMARY
# 17.79   incomplete [9,17]          9 [9,17]              9
# 32.1c   incomplete [16,9]         16 [16,9]             16
# 17.30   incomplete [16,9]         16 [16,9]             16
# 31.35   incomplete [9,17]          9 [9,17]              9

查询PG 17.30的诊断信息：

// ceph pg  17.30 query
{"state": "incomplete","info": {"pgid": "17.30","stats": {// 被osd.11阻塞而无法完成，此osd已经不存在"blocked_by": [11],"up_primary": 16,"acting_primary": 16}},// 恢复的历史记录"recovery_state": [{"name": "Started/Primary/Peering/Incomplete","enter_time": "2018-06-17 04:48:45.185352",// 最终状态，此PG没有完整的副本"comment": "not enough complete instances of this PG"},{"name": "Started/Primary/Peering","enter_time": "2018-06-17 04:48:45.131904","probing_osds": ["9","16","17"],// 期望检查已经不存在的OSD"down_osds_we_would_probe": [11],"peering_blocked_by_detail": [{"detail": "peering_blocked_by_history_les_bound"}]}]
}

以看到17.30期望到osd.11寻找权威数据，而osd.11已经永久丢失了。这种情况下，可以尝试强制标记PG为complete。

首先，停止PG的主OSD： service ceph-osd@16 stop

然后，运行下面的工具：

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-16  --pgid 17.30 --op mark-complete
# Marking complete 
# Marking complete succeeded

最后，重启PG的主OSD： service ceph-osd@16 start

3.4 单副本导致的stale

不做副本的情况下，单个OSD宕机即导致数据不可用：

ceph health detail 
# 注意Acting Set仅仅有一个成员
# pg 2.21 is stuck stale for 688.372740, current state stale+active+clean, last acting [7]
# 但是其它PG的Acting Set则不是
# pg 3.4f is active+recovering+degraded, acting [9,1]

如果OSD的确出现硬件故障，则数据丢失。此外，你也无法对这种PG进行查询操作。

3.5 inconsistent

定位出问题PG的主OSD，停止它，刷出日志，然后修复PG：

 ceph health detail
# HEALTH_ERR 2 scrub errors; Possible data damage: 2 pgs inconsistent
# OSD_SCRUB_ERRORS 2 scrub errors
# PG_DAMAGED Possible data damage: 2 pgs inconsistent
#     pg 15.33 is active+clean+inconsistent, acting [8,9]
#     pg 15.61 is active+clean+inconsistent, acting [8,16]# 查找OSD所在机器
ceph osd find 8# 登陆到osd.8所在机器
systemctl stop ceph-osd@8.service
ceph-osd -i 8 --flush-journal
systemctl start ceph-osd@8.service
ceph pg repair 15.61

4. 对象问题诊断

4.1 unfound

持有对象权威副本的OSD宕机或被剔除，会导致该问题出现。例如两个配对的OSD（共同处理某个PG）：

osd.1宕机
osd.2独自处理了一些写操作
osd1开机
osd.1+osd2配对，由于osd.2独自的写操作，缺失的对象排队等候在osd.1上恢复
恢复完成之前，osd.2宕机，或者被移除

在上面这个事件序列中，osd.1知道权威副本存在，但是却找不到，这种情况下针对目标对象的请求会被阻塞，直到权威副本的持有者osd上线。

执行下面的命令，定位存在问题的PG：

ceph health detail | grep unfound
# OBJECT_UNFOUND 1/90055 objects unfound (0.001%)
#     pg 33.3e has 1 unfound objects
#    pg 33.3e is active+recovery_wait+degraded, acting [17,6], 1 unfound

进一步，定位存在问题的对象：

 // ceph pg 33.3e list_missing
{"offset": {"oid": "","key": "","snapid": 0,"hash": 0,"max": 0,"pool": -9223372036854775808,"namespace": ""},"num_missing": 1,"num_unfound": 1,"objects": [{"oid": {// 丢失的对象"oid": "obj_delete_at_hint.0000000066","key": "","snapid": -2,"hash": 2846662078,"max": 0,"pool": 33,"namespace": ""},"need": "1723'1412","have": "0'0","flags": "none","locations": []}],"more": false
}

如果丢失的对象太多，more会显示为true。

执行下面的命令，可以查看PG的诊断信息：

// ceph pg 33.3e query
{"state": "active+recovery_wait+degraded","recovery_state": [{"name": "Started/Primary/Active","enter_time": "2018-06-16 15:03:32.873855",// 丢失的对象所在的OSD"might_have_unfound": [{"osd": "6","status": "already probed"},{"osd": "11","status": "osd is down"}],} ]
}

上面输出中的osd.11，先前已经出现硬件故障，被移除了。这意味着unfound的对象已经不可恢复。你可以标记：

# 回滚到前一个版本，如果是新创建对象则忘记其存在。不支持EC池
ceph pg 33.3e mark_unfound_lost revert
# 让Ceph忘记unfound对象的存在
ceph pg 33.3e mark_unfound_lost delete

5. ceph-deploy

5.1 TypeError: ‘Logger’ object is not callable

/usr/lib/python2.7/dist-packages/ceph_deploy/osd.py第376行，替换为：

LOG.info(line.decode('utf-8'))

5.2 Could not locate executable ‘ceph-volume’ make sure it is installed and available

应该安装ceph-deploy的1.5.39版本，2.0.0版本仅仅支持luminous：

apt remove ceph-deploy
apt install ceph-deploy=1.5.39 -y

5.3 部署MON后ceph-s卡死

在我的环境下，是因为MON节点识别的public addr为LVS的虚拟网卡的IP地址导致。修改配置，显式指定MON的IP地址即可：

[mon.master01-10-5-38-24]
public addr = 10.5.38.24 
cluster addr = 10.5.38.24[mon.master02-10-5-38-39]
public addr = 10.5.38.39
cluster addr = 10.5.38.39[mon.master03-10-5-39-41]
public addr = 10.5.39.41
cluster addr = 10.5.39.41

6. ceph-helm

在我的环境下部署，出现一系列和权限有关的问题，如果你遇到相同问题且不关心安全性，可以修改配置：

# kubectl -n ceph edit configmap ceph-etc
apiVersion: v1
data:ceph.conf: |[global]fsid = 08adecc5-72b1-4c57-b5b7-a543cd8295e7mon_host = ceph-mon.ceph.svc.k8s.gmem.cc# 添加以下三行auth client required = noneauth cluster required = noneauth service required = none[osd]# 在大型集群里用单独的“集群”网可显著地提升性能cluster_network = 10.0.0.0/16ms_bind_port_max = 7100public_network = 10.0.0.0/16
kind: ConfigMap

如果需要保证集群安全，请参考下面几个案例。

6.1 ceph-mgr报Operation not permitted

问题现象
此Pod一直无法启动，查看容器日志，发现：
timeout 10 ceph --cluster ceph auth get-or-create mgr.xenial-100 mon ‘allow profile mgr’ osd ‘allow *’ mds ‘allow *’ -o /var/lib/ceph/mgr/ceph-xenial-100/keyring
0 librados: client.admin authentication error (1) Operation not permitted

问题分析
连接到可以访问的ceph-mon，执行命令：

kubectl -n ceph exec -it ceph-mon-nhx52 -c ceph-mon -- ceph

发现报同样的错误。这说明client.admin的Keyring有问题。登陆到ceph-mon，获取Keyring列表：

# kubectl -n ceph exec -it ceph-mon-nhx52 -c ceph-mon bash
# ceph --cluster=ceph  --name mon. --keyring=/var/lib/ceph/mon/ceph-xenial-100/keyring auth list   installed auth entries:client.adminkey: AQAXPdtaAAAAABAA6wd1kCog/XtV9bSaiDHNhw==auid: 0caps: [mds] allowcaps: [mgr] allow *caps: [mon] allow *caps: [osd] allow *client.bootstrap-mdskey: AQAgPdtaAAAAABAAFPgqn4/zM5mh8NhccPWKcw==caps: [mon] allow profile bootstrap-mds
client.bootstrap-osdkey: AQAUPdtaAAAAABAASbfGQ/B/PY4Imoa4Gxsa2Q==caps: [mon] allow profile bootstrap-osd
client.bootstrap-rgwkey: AQAJPdtaAAAAABAAswtFjgQWahHsuy08Egygrw==caps: [mon] allow profile bootstrap-rgw

而当前使用的client.admin的Keyring内容为：

[client.admin]key = AQAda9taAAAAABAAgWIsgbEiEsFRJQq28hFgTQ==auid = 0caps mds = "allow"caps mon = "allow *"caps osd = "allow *"caps mgr = "allow *"

内容不一致。使用auth list获得的client.admin的Keyring，可以发现是有效的：

ceph --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-xenial-100/keyring auth get client.admin > client.admin.keyyring
ceph --name client.admin --keyring client.admin.keyyring # OKskydns_skydns_dns_cachemiss_count_total{instance="172.27.100.134:10055"}

检查一下各Pod的/etc/ceph/ceph.client.admin.keyring，可以发现都是从Secret ceph-client-admin-keyring挂载的。那么这个Secret是如何生成的呢？执行命令：

kubectl -n ceph get job --output=yaml --export | grep ceph-client-admin-keyring -B 50

可以发现Job ceph-storage-keys-generator负责生成该Secret。查看其Pod日志可以生成Keyring、创建Secret的记录。进一步查看Pod的资源定义，可以看到负责创建的脚本/opt/ceph/ceph-storage-key.sh挂载自ConfigMap ceph-bin中的ceph-storage-key.sh。

解决此问题最简单的办法就是修改Secret，将其修改为集群中实际有效的Keyring：

# 导出Secret定义
kubectl -n ceph get  secret ceph-client-admin-keyring --output=yaml --export > ceph-client-admin-keyring
# 获得有效Keyring的Base64编码
cat client.admin.keyyring | base64
# 将Secret中的编码替换为上述Base64，然后重新创建Secret
kubectl -n ceph apply -f ceph-client-admin-keyring

此外Secret pvc-ceph-client-key中存放的也是admin用户的Key，其内容也需要替换到有效的：

kubectl -n ceph edit secret  pvc-ceph-client-key

6.2 pvc无法提供

原因和上一个问题类似，还是权限问题。

查看无法绑定的PVC日志：

# kubectl -n ceph describe pvcNormal   Provisioning        53s   ceph.com/rbd ceph-rbd-provisioner-5544dcbcf5-n846s 708edb2c-4619-11e8-abf2-e672650d97a2  External provisioner is provisioning volume for claim
"ceph/ceph-pvc"Warning  ProvisioningFailed  53s   ceph.com/rbd ceph-rbd-provisioner-5544dcbcf5-n846s 708edb2c-4619-11e8-abf2-e672650d97a2  Failed to provision volume with StorageClass "general"
: failed to create rbd image: exit status 1, command output: 2018-04-22 13:44:35.269967 7fb3e3e3ad80 -1 did not load config file, using default settings.
2018-04-22 13:44:35.297828 7fb3e3e3ad80 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2)No such file or directoryConnection to localhost closed by remote host.
Connection to localhost closed.e3e3ad80  0 librados: client.admin authentication error (1) Operation not permitted

rbd-provisioner需要读取StorageClass定义，获取需要的凭证信息：

# kubectl -n ceph get storageclass --output=yaml
apiVersion: v1                                                                                                                                                                  
items:                                                                                                                                                                          
- apiVersion: storage.k8s.io/v1                                                                                                                                                 kind: StorageClass                                                                                                                                                            metadata:                                                                                                                             name: generalparameters:adminId: adminadminSecretName: pvc-ceph-conf-combined-storageclassadminSecretNamespace: cephimageFeatures: layeringimageFormat: "2"monitors: ceph-mon.ceph.svc.k8s.gmem.cc:6789pool: rbduserId: adminuserSecretName: pvc-ceph-client-keyprovisioner: ceph.com/rbdreclaimPolicy: Delete

可以看到牵涉到两个Secret：pvc-ceph-conf-combined-storageclass、pvc-ceph-client-key，你需要把正确的Keyring内容写入其中。

6.3 pvc无法Attach

现象：
PVC可以Provision，RBD可以通过Ceph命令挂载，但是Pod无法启动，Describe之显示：

auth: unable to find a keyring on /etc/ceph/keyring: (2) No such file or directory
monclient(hunting): authenticate NOTE: no keyring found; disabled cephx authentication
librados: client.admin authentication error (95) Operation not supported

解决办法：
把ceph.client.admin.keyring拷贝一份为 /etc/ceph/keyring即可。

6.4 ceph-osd报Operation not permitted

原因和上一个问题一样。查看无法启动的容器日志：

kubectl -n ceph logs ceph-osd-dev-vdb-bjnbm -c osd-prepare-pod
# ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring health                                                   
# 0 librados: client.bootstrap-osd authentication error (1) Operation not permitted                                               
# [errno 1] error connecting to the cluster

进一步查看，可以发现/var/lib/ceph/bootstrap-osd/ceph.keyring挂载自ceph-bootstrap-osd-keyring下的ceph.keyring：

# kubectl -n ceph get secret ceph-bootstrap-osd-keyring --output=yaml --export
apiVersion: v1
data:ceph.keyring: W2NsaWVudC5ib290c3RyYXAtb3NkXQogIGtleSA9IEFRQVlhOXRhQUFBQUFCQUFSQ2l1bVY1NFpOU2JGVWwwSDZnYlJ3PT0KICBjYXBzIG1vbiA9ICJhbGxvdyBwcm9maWxlIGJvb3RzdHJhcC1vc2QiCgo=
kind: Secret
metadata:creationTimestamp: nullname: ceph-bootstrap-osd-keyringselfLink: /api/v1/namespaces/ceph/secrets/ceph-bootstrap-osd-keyring
type: Opaque# BASE64解码后：
[client.bootstrap-osd]key = AQAYa9taAAAAABAARCiumV54ZNSbFUl0H6gbRw==caps mon = "allow profile bootstrap-osd"

获得实际有效的Keyring：

kubectl -n ceph exec -it ceph-mon-nhx52 -c ceph-mon -- ceph --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-xenial-100/keyring auth get client.bootstrap-osd
# 注意上述命令的输出的第一行exported keyring for client.bootstrap-osd不属于Keyring
[client.bootstrap-osd]key = AQAUPdtaAAAAABAASbfGQ/B/PY4Imoa4Gxsa2Q==caps mon = "allow profile bootstrap-osd"

修改Secret： kubectl -n ceph edit secret ceph-bootstrap-osd-keyring 替换为上述Keyring。

6.5 ceph-osd报No cluster conf with fsid

报错信息：

# kubectl -n ceph logs  ceph-osd-dev-vdc-cpkxh -c osd-activate-pod
ceph_disk.main.Error: Error: No cluster conf found in /etc/ceph with fsid 08adecc5-72b1-4c57-b5b7-a543cd8295e7
# 每个OSD都包同样的错误

对应的配置文件内容：

 kubectl -n ceph get configmap ceph-etc --output=yaml
apiVersion: v1
data:ceph.conf: |[global]fsid = a4426e8a-c46d-4407-95f1-911a23a0dd6emon_host = ceph-mon.ceph.svc.k8s.gmem.cc[osd]cluster_network = 10.0.0.0/16ms_bind_port_max = 7100public_network = 10.0.0.0/16
kind: ConfigMap
metadata:name: ceph-etcnamespace: ceph

可以看到，fsid不一致。修改一下ConfigMap中的fsid即可解决此问题。

6.6 容器无法Attach PV

报错信息：
describe pod报错：timeout expired waiting for volumes to attach/mount for pod
kubelet报错：executable file not found in $PATH, rbd output
原因分析：
动态提供的持久卷，包含两个阶段：
1. 卷提供，原本由控制平面负责，controller-manager中需要包含rbd命令，才能在Ceph集群中创建供K8S使用的镜像。目前这个职责由external_storage项目的rbd-provisioner完成
2. 卷依附/分离，由使用卷的Pod所在的Node的kubelet负责完成。这些Node需要安装rbd命令，并提供有效的配置文件

解决方案：

# 安装软件
apt install -y ceph-common
# 从ceph-mon拷贝以下文件：
# /etc/ceph/ceph.client.admin.keyring
# /etc/ceph/ceph.conf

应用上述方案后，如果继续报错：rbd: map failed exit status 110, rbd output: rbd: sysfs write failed In some cases useful info is found in syslog。则查看一下系统日志：

dmesg | tail# [ 3004.833252] libceph: mon0 10.0.0.100:6789 feature set mismatch, my 106b84a842a42 
#     < server's 40106b84a842a42, missing 400000000000000
# [ 3004.840980] libceph: mon0 10.0.0.100:6789 missing required protocol features

对照本文前面的特性表，可以发现内核版本必须4.5+才可以（CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING）。
最简单的办法就是升级一下内核：

# Desktop
apt install --install-recommends linux-generic-hwe-16.04 xserver-xorg-hwe-16.04 -y
# Server
apt install --install-recommends linux-generic-hwe-16.04 -ysudo apt-get remove linux-headers-4.4.* -y && \\
sudo apt-get remove linux-image-4.4.* -y && \\
sudo apt-get autoremove -y && \\
sudo update-grub

或者，将tunables profile调整到hammer版本的Ceph：

ceph osd crush tunables hammer
# adjusted tunables profile to hammer

6.7 OSD启动失败报文件名太长

报错信息：ERROR: osd init failed: (36) File name too long

报错原因：使用的文件系统为EXT4，存储的xattrs大小有限制，有条件的话最好使用XFS

解决办法：修改配置文件，如下：

 osd_max_object_name_len = 256
osd_max_object_namespace_len = 64

6.8 无法打开/proc/0/cmdline

报错信息：Fail to open ‘/proc/0/cmdline’ error No such file or directory

报错原因：在CentOS 7上，将ceph-mon和ceph-osd（基于目录）部署在同一节点（基于Helm）报此错误，分离后问题消失。此外部署mon的那些节点还设置了虚IP，其子网和Ceph的Cluster/Public网络相同，这导致了某些OSD监听的地址不正确。

再次遇到此问题，原因是一个虚拟网卡lo:ngress使用和eth0相同的网段，导致OSD使用了错误的网络。

解决办法是写死OSD监听地址：

[osd.2]                                                                                                                                                                          
public addr = 10.0.4.1                                                                                                                                                           
cluster addr = 10.0.4.1

6.9 无法挂载RBD

报错信息：Input/output error，结合dmesg | tail可以看到更细节的报错

报错原因，可能情况：

CentOS7下报错，提示客户端不满足特性CEPH_FEATURE_CRUSH_V4（1000000000000）。解决办法，将Bucket算法改为straw。注意，之后加入的OSD仍然默认使用straw2，使用的镜像的标签为tag-build-master-luminous-ubuntu-16.04。

6.10 write error: File name too long

external storage中的CephFS可以正常Provisioning，但是尝试读写数据时报此错误。原因是文件路径过长，和底层文件系统有关，为了兼容部分Ext文件系统的机器，我们限制了osd_max_object_name_len。

解决办法，不使用UUID，而使用namespace + pvcname来命名目录。修改cephfs-provisioner.go，118行

// create random share name
share := fmt.Sprintf("%s-%s", options.PVC.Namespace,options.PVC.Name)
// create random user id
user := fmt.Sprintf("%s-%s", options.PVC.Namespace,options.PVC.Name)

重新编译即可。

7. k8s相关

7.1 **rbd image * is still being used**

describe pod发现：

rbd image rbd-unsafe/kubernetes-dynamic-pvc-c0ac2cff-84ef-11e8-9a2a-566b651a72d6 is still being used

说明有其它客户端正在占用此镜像。如果尝试删除镜像，你会发现无法成功：

rbd rm rbd-unsafe/kubernetes-dynamic-pvc-c0ac2cff-84ef-11e8-9a2a-566b651a72d6 librbd::image::RemoveRequest: 0x560e39df9af0 check_image_watchers: image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.

要知道watcher是谁，可以执行：

rbd status rbd-unsafe/kubernetes-dynamic-pvc-c0ac2cff-84ef-11e8-9a2a-566b651a72d6 
Watchers:watcher=10.5.39.12:0/1652752791 client.94563 cookie=18446462598732840961

可以发现10.5.39.12正在占用镜像。

另一种获取watcher的方法是，使用rbd的header对象。执行下面的命令获取rbd的诊断信息：

rbd info rbd-unsafe/kubernetes-dynamic-pvc-c0ac2cff-84ef-11e8-9a2a-566b651a72d6 rbd image 'kubernetes-dynamic-pvc-c0ac2cff-84ef-11e8-9a2a-566b651a72d6':size 8192 MB in 2048 objectsorder 22 (4096 kB objects)block_name_prefix: rbd_data.134474b0dc51format: 2features: layeringflags: create_timestamp: Wed Jul 11 17:49:51 2018

字段block_name_prefix的值rbd_data.134474b0dc51，将data换为header即为header对象。然后使用命令：

rados listwatchers -p rbd-unsafe rbd_header.134474b0dc51watcher=10.5.39.12:0/1652752791 client.94563 cookie=18446462598732840961

既然知道10.5.39.12占用镜像，断开连接即可。在此机器上执行下面的命令，显示当前映射的rbd镜像列表：

rbd showmappedid pool       image                                                       snap device  
0  rbd-unsafe kubernetes-dynamic-pvc-c0ac2cff-84ef-11e8-9a2a-566b651a72d6 -    /dev/rbd0 
1  rbd-unsafe kubernetes-dynamic-pvc-0729f9a6-84f0-11e8-9b75-5a3f858854b1 -    /dev/rbd1

此机器上的rbd0虽然映射，但是没有挂载。解除映射：

rbd unmap /dev/rbd0

再次检查rbd镜像状态，发现没有watcher了：

rbd status rbd-unsafe/kubernetes-dynamic-pvc-c0ac2cff-84ef-11e8-9a2a-566b651a72d6 Watchers: none

7.2 rbd: map failed signal: aborted (core dumped)

kubectl describe报错Unable to mount volumes for pod… timeout expired waiting for volumes to attach or mount for pod…

检查发现目标rbd没有Watcher，Pod所在机器的Kubectl报错rbd: map failed signal: aborted (core dumped)。此前曾经在该机器上执行过rbd unmap操作。

手工 rbd map后问题消失。

8. 断电后无法启动OSD

journal do_read_entry: bad header magic

报错信息：journal do_read_entry(156389376): bad header magic…FAILED assert(interval.last > last)

这是12.2版本已知的BUG，断电后可能出现OSD无法启动，可能导致数据丢失。

9. 其他

9.1 Couldn’t init storage provider (RADOS)

RGW实例无法启动，通过journalctl看到上述信息。

要查看更多信息，需要查看RGW日志：

2020-10-22 16:51:55.771035 7fb1b0f20e80  0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 2546439
2020-10-22 16:51:55.792872 7fb1b0f20e80  0 librados: client.rgw.ceph02 authentication error (22) Invalid argument
2020-10-22 16:51:55.793450 7fb1b0f20e80 -1 Couldn't init storage provider (RADOS)

可以发现是和身份验证有关的问题。

通过 systemctl status ceph-radosgw@rgw.$RGW_HOST得到命令行，手工运行：

radosgw -f --cluster ceph  --name client.rgw.ceph02 --setuser ceph --setgroup ceph -d --debug_ms 1

发现报错和上面一样。尝试增加–keyring参数，问题解决：

radosgw -f --cluster ceph  --name client.rgw.ceph02        \\--setuser ceph --setgroup ceph -d --debug_ms 1           \\--keyring=/var/lib/ceph/radosgw/ceph-rgw.ceph02/keyring

看来是Systemd服务没有找到keyring导致。

9.2 禁用IPv6的机器上无法开启Prometheus模块

报错信息：Unhandled exception from module ‘prometheus’ while running on mgr.master01-10-5-38-24: error(‘No socket could
be created’,)

解决办法： ceph config-key set mgr/prometheus/server_addr 0.0.0.0

9.3 反复警告mon… clock skew

原因是时钟不同步警告阈值太低，在global段增加配置并重启MON：

mon clock drift allowed = 2
mon clock drift warn backoff = 30

或者执行下面的命令即时生效：

ceph tell mon.* injectargs '--mon_clock_drift_allowed=2'
ceph tell mon.* injectargs '--mon_clock_drift_warn_backoff=30'

或者检查ntp相关配置，保证时钟同步精度。

9.4 深度清理导致高IO

深度清理很消耗IO，如果长时间无法完成，可以禁用：

ceph osd set noscrub
ceph osd set nodeep-scrub

问题解决后，可以再启用：

ceph osd unset noscrub
ceph osd unset nodeep-scrub

使用CFQ作为IO调度器时，可以调整OSD IO线程的优先级：

# 设置调度器
echo cfq > /sys/block/sda/queue/scheduler# 检查当前某个OSD的磁盘线程优先级类型
ceph daemon osd.4 config get osd_disk_thread_ioprio_class# 修改IO优先级
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 7'
# IOPRIO_CLASS_RT最高 IOPRIO_CLASS_IDLE最低
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'

如果上述措施没有问题时，可以考虑配置以下参数：

osd_deep_scrub_stride = 131072                                                                                                                                                       
# 每次Scrub的块数量范围
osd_scrub_chunk_min = 1                                                                                                                                                              
osd_scrub_chunk_max = 5                                                                                                                                                              
osd scrub during recovery = false                                                                                                                                                    
osd deep scrub interval = 2592000                                                                                                                                                    
osd scrub max interval = 2592000                                                                                                                                                     
# 单个OSD并发进行的Scrub个数
osd max scrubs = 1   
# Scrub起止时间                                                                                                                                                            
osd max begin hour = 2                                                                                                                                                               
osd max end hour = 6                                                                                                                                                                 
# 系统负载超过多少则禁止Scrub
osd scrub load threshold = 4                                                                                                                                                         
# 每次Scrub后强制休眠0.1秒
osd scrub sleep = 0.1                                                                                                                                                                  
# 线程优先级
osd disk thread ioprio priority = 7
osd disk thread ioprio class = idle

9.5 强制unmap

如果Watcher被黑名单，则尝试Unmap镜像时会报错：rbd: sysfs write failed rbd: unmap failed: (16) Device or resource busy

可以使用下面的命令强制unmap： rbd unmap -o force ...

9.6 增加pg_num和pgp_num后无法A+C

部分PG状态卡死，可能原因是OSD允许的PG数量受限，修改全局配置项mon_max_pg_per_osd并重启MON即可。

此外注意：调整PG数量后，一定要进入A+C状态后，再进行下一次调整。

9.7 无法删除RBD镜像

下面第二个镜像对应的K8S PV已经删除：

rbd ls
# kubernetes-dynamic-pvc-35350b13-46b8-11e8-bde0-a2c14c93573f
# kubernetes-dynamic-pvc-78740b26-46eb-11e8-8349-e6e3339859d4

但是对应的RBD没有删除，手工删除：

rbd remove kubernetes-dynamic-pvc-78740b26-46eb-11e8-8349-e6e3339859d4

报错：

2018-04-23 13:37:25.559444 7f919affd700 -1 librbd::image::RemoveRequest: 0x5598e77831d0 check_image_watchers: image has watchers - not removing
Removing image: 0% complete…failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.

查看RBD状态：

# rbd info kubernetes-dynamic-pvc-78740b26-46eb-11e8-8349-e6e3339859d4
rbd image 'kubernetes-dynamic-pvc-78740b26-46eb-11e8-8349-e6e3339859d4':size 8192 MB in 2048 objectsorder 22 (4096 kB objects)block_name_prefix: rbd_data.1003e238e1f29format: 2features: layeringflags: create_timestamp: Mon Apr 23 11:42:59 2018#rbd status kubernetes-dynamic-pvc-78740b26-46eb-11e8-8349-e6e3339859d4
Watchers:watcher=10.0.0.101:0/4275384344 client.65597 cookie=18446462598732840963

到10.0.0.101这台机器上查看：

# df | grep e6e3339859d4
/dev/rbd2        8125880  251560   7438508   4% /var/lib/kubelet/plugins/kubernetes.io/rbd/rbd/rbd-image-kubernetes-dynamic-pvc-78740b26-46eb-11e8-8349-e6e3339859d4

重启Kubelet后可以删除RBD。

9.8 Error EEXIST: entity osd.9 exists but key does not match

# 删除密钥
ceph auth del osd.9
# 重新收集目标主机的密钥
ceph-deploy --username ceph-ops gatherkeys Carbon

9.9 创建新Pool后无法Active+Clean

pgs:     12.413% pgs unknown                                                                                                                                                   20.920% pgs not active                                                                                                                                                768 active+clean                                                                                                                                                      241 creating+activating                                                                                                                                               143 unknown

可能是由于PG总数太大导致，降低PG数量后很快Active+Clean

9.10 Orphaned pod无法清理

报错信息：Orphaned pod “a9621c0e-41ee-11e8-9407-deadbeef00a0” found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them

临时解决办法：

rm -rf /var/lib/kubelet/pods/a9621c0e-41ee-11e8-9407-deadbeef00a0/volumes/rook.io~rook/

9.11 osd启动报错：ERROR: osd init failed: (1) Operation not permitted

可能原因是OSD使用的keyring和MON不一致。对于ID为14的OSD，将宿主机/var/lib/ceph/osd/ceph-14/keyring的内容替换为 ceph auth get osd.14的输出前两行即可。

9.12 Mount failed with ‘(11) Resource temporarily unavailable’

在没有停止OSD的情况下执行ceph-objectstore-tool命令，会出现此错误。

9.13 neither `public_addr` nor `public_network` keys are defined for monitors

通过ceph-deploy添加MON节点时出现此错误，将public_network配置添加到配置文件的global段即可。

9.14 journalctl删除pv后卡在Terminating

可能原因：

对应的PVC没有删除，还在引用此PV。先删除PV即可

chown: cannot access ‘/var/log/ceph’: No such file or directory

OSD无法启动，报上面的错误，可以配置：

ceph:storage:osd_log: /var/log

HEALTH_WARN application not enabled on

 #池 # 功能
ceph osd pool application enable rbd block-devices

10. 诊断

调试日志

注意：详尽的日志每小时可能超过 1GB ，如果你的系统盘满了，这个节点就会停止工作。

10.1 临时启用调试日志

 # 通过中心化配置下发
ceph tell osd.0 config set debug_osd 0/5# 到目标主机上，针对OSD进程设置
ceph daemon osd.0 config set debug_osd 0/5

10.2 配置日志级别

可以为各子系统定制日志级别

 # debug {subsystem} = {log-level}/{memory-level}[global]debug ms = 1/5
[mon]debug mon = 20debug paxos = 1/5debug auth = 2
[osd]debug osd = 1/5debug filestore = 1/5debug journal = 1debug monc = 5/20
[mds]debug mds = 1debug mds balancer = 1debug mds log = 1debug mds migrator = 1

子系统列表：

子系统	日志级别	内存日志级别
default	0	5
lockdep	0	1
context	0	1
crush	1	1
mds	1	5
mds balancer	1	5
mds locker	1	5
mds log	1	5
mds log expire	1	5
mds migrator	1	5
buffer	0	1
timer	0	1
filer	0	1
striper	0	1
objecter	0	1
rados	0	5
rbd	0	5
rbd mirror	0	5
rbd replay	0	5
journaler	0	5
objectcacher	0	5
client	0	5
osd	1	5
optracker	0	5
objclass	0	5
filestore	1	3
journal	1	3
ms	0	5
mon	1	5
monc	0	10
paxos	1	5
tp	0	5
auth	1	5
crypto	1	5
finisher	1	1
reserver	1	1
heartbeatmap	1	5
perfcounter	1	5
rgw	1	5
rgw sync	1	5
civetweb	1	10
javaclient	1	5
asok	1	5
throttle	1	1
refs	0	0
compressor	1	5
bluestore	1	5
bluefs	1	5
bdev	1	3
kstore	1	5
rocksdb	4	5
leveldb	4	5
memdb	4	5
fuse	1	5
mgr	1	5
mgrc	1	5
dpdk	1	5
eventtrace	1	5

10.3 加快日志滚动

如果磁盘空间有限，可以配置/etc/logrotate.d/ceph，加快日志滚动：

rotate 7
weekly
size 500M
compress
sharedscripts

然后设置定时任务，定期检查并清理：

30 * * * * /usr/sbin/logrotate /etc/logrotate.d/ceph >/dev/null 2>&1

Ceph常见问题

1. CephFS问题诊断

1.1 无法创建

1.2 mds.0 is damaged

2. OSD问题诊断

2.1 启动后立即崩溃

3. PG问题诊断

3.1 所有PG卡在unkown

3.2 卡在peering

3.3 incomplete

3.4 单副本导致的stale

3.5 inconsistent

4. 对象问题诊断

4.1 unfound

5. ceph-deploy

5.1 TypeError: ‘Logger’ object is not callable

5.2 Could not locate executable ‘ceph-volume’ make sure it is installed and available

5.3 部署MON后ceph-s卡死

6. ceph-helm

6.1 ceph-mgr报Operation not permitted

6.2 pvc无法提供

6.3 pvc无法Attach

6.4 ceph-osd报Operation not permitted

6.5 ceph-osd报No cluster conf with fsid

6.6 容器无法Attach PV

6.7 OSD启动失败报文件名太长

6.8 无法打开/proc/0/cmdline

6.9 无法挂载RBD

6.10 write error: File name too long

7. k8s相关

7.1 rbd image * is still being used

7.2 rbd: map failed signal: aborted (core dumped)

8. 断电后无法启动OSD

9. 其他

9.1 Couldn’t init storage provider (RADOS)

9.2 禁用IPv6的机器上无法开启Prometheus模块

9.3 反复警告mon… clock skew

9.4 深度清理导致高IO

9.5 强制unmap

9.6 增加pg_num和pgp_num后无法A+C

9.7 无法删除RBD镜像

9.8 Error EEXIST: entity osd.9 exists but key does not match

9.9 创建新Pool后无法Active+Clean

9.10 Orphaned pod无法清理

9.11 osd启动报错：ERROR: osd init failed: (1) Operation not permitted

9.12 Mount failed with ‘(11) Resource temporarily unavailable’

9.13 neither public_addr nor public_network keys are defined for monitors

9.14 journalctl删除pv后卡在Terminating

10. 诊断

10.1 临时启用调试日志

10.2 配置日志级别

10.3 加快日志滚动

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

7.1 **rbd image * is still being used**

9.13 neither `public_addr` nor `public_network` keys are defined for monitors