uname
Linux localhost.localdomain 4.19.90-24.4.v2101.ky10.x86_64 #1 SMP Mon May 24 12:14:55 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
mount | grep nfs
200.22.252.66:/data0/media on /data0/media type nfs4 (rw,relatime,sync,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=200.22.252.67,local_lock=none,addr=200.22.252.66)
client端df -h
卡住,读写不可用。umount -l <挂载点>
后,无法重新挂载,重启client的操作系统后,恢复。
client端报错:
Nov 14 13:21:32 localhost kernel: [2762097.294397] nfs: server 200.22.252.66 not responding, still trying
server端报错(暂不确定是否相关):
Nov 14 13:02:17 localhost kernel: [2761217.103877] nfsd4_validate_stateid: 26 callbacks suppressed
...
Nov 14 13:02:17 localhost kernel: [2761217.104230] NFSD: client 200.22.252.69 testing state ID with incorrect client ID
社区stable仓库的linux.4.19.y分支执行命令git log -L:nfsd4_validate_stateid:fs/nfsd/nfs4state.c
可以看到nfsd4_validate_stateid
函数还经过以下修改:
600df3856f0b nfsd: Remove incorrect check in nfsd4_validate_stateid
c6ac11906599 nfsd4: kill warnings on testing stateids with mismatched clientids
在这两个补丁未合入时,test stateid操作时,server端检测到stateid不匹配,server返回给client端错误NFS4ERR_BAD_STATEID
,client端不会调用 free stateid 操作,函数nfs41_test_and_free_expired_stateid
返回错误值-NFS4ERR_BAD_STATEID
,但client端的后续处理流程和执行free stateid操作返回-NFS4ERR_EXPIRED
错误的处理流程没什么区别,为什么会导致client端不断发起test stateid,还没搞明白。
server端代码:
nfsd4_test_stateid
nfsd4_validate_stateidif (!same_clid(&stateid->si_opaque.so_clid, &cl->cl_clientid))
"NFSD: client %s testing state ID with incorrect client ID\n", addr_str);
pr_warn_ratelimited(
printk_ratelimited
__ratelimit
___ratelimit"%s: %d callbacks suppressed\n"
printk_deferred(KERN_WARNING return nfserr_bad_stateid
client端代码:
nfs41_test_and_free_expired_stateid// return -NFS4ERR_BAD_STATEID
nfs41_test_stateid
_nfs41_test_stateid
.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_TEST_STATEID]
status = nfs4_call_sync_sequence != NFS_OKreturn -NFS4ERR_BAD_STATEID // -res.status, 10025
// 只有 -NFS4ERR_EXPIRED, -NFS4ERR_ADMIN_REVOKED, -NFS4ERR_DELEG_REVOKED 三种错误,才会调用 free stateid
// 在问题场景下,永远不会调用 free stateid
nfs41_free_stateid
.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_FREE_STATEID]return -NFS4ERR_EXPIRED
client端的open stateid相关代码流程:
nfs4_state_manager
nfs4_do_reclaim
nfs4_reclaim_open_state
.recover_open = nfs41_open_expired,
nfs41_open_expired
nfs41_check_open_stateid
nfs41_test_and_free_expired_stateid
update_open_stateid
nfs4_test_and_free_stateid// nfs_v4_1_minor_ops
.test_and_free_expired = nfs41_test_and_free_expired_stateid, nfs41_test_and_free_expired_stateid