环境信息:
uname -a
Linux localhost.localdomain 4.19.90-24.4.v2101.ky10.x86_64 #1 SMP Mon May 24 12:14:55 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
modinfo rpcrdma
filename: /lib/modules/4.19.90-24.4.v2101.ky10.x86_64/updates/net/sunrpc/xprtrdma/rpcrdma.ko
version: 2.0.1
license: Dual BSD/GPL
description: rpcrdma dummy kernel module
author: Alaa Hleihel
srcversion: 6AFF4B70A07D55D1FAD40A4
depends: mlx_compat
retpoline: Y
name: rpcrdma
vermagic: 4.19.90-24.4.v2101.ky10.x86_64 SMP mod_unload modversions
报错rdma协议不支持:
echo 'rdma 20049' > /proc/fs/nfsd/portlist
-bash: echo: write error: Protocol not supported
打开rpc的调试开关:
echo 0x7fff > /proc/sys/sunrpc/rpc_debug
报错:
25017.682688] svc: creating transport rdma[20049]
[25017.685068] svc: transport rdma not found, err 93
[25017.685070] svc: svc_destroy(nfsd, 9) [
正常情况下的日志为:
modprobe rpcrdma
[ 129.419173][ 8] PKCS#7 signature not signed with a trusted key
[ 129.432317][ 8] SVCRDMA Module Init, register RPC RDMA transport
[ 129.433320][ 8] svcrdma_ord : 16
[ 129.433928][ 8] max_requests : 32
[ 129.434565][ 8] max_bc_requests : 2
[ 129.435273][ 8] max_inline : 4096'rdma'
[ 129.436078][ 8] svc: Adding svc transport class 'rdma-bc'
[ 129.437013][ 8] svc: Adding svc transport class
[ 129.438179][ 8] RPC: Registered rdma transport module.
[ 129.439090][ 8] RPC: Registered rdma backchannel transport module.
[ 129.440200][ 8] RPCRDMA Module Init, register RPC RDMA transport
[ 129.441287][ 8] Defaults:
[ 129.441762][ 8] Slots 128
[ 129.441762][ 8] MaxInlineRead 4096
[ 129.441762][ 8] MaxInlineWrite 4096
[ 129.443476][ 8] Padding 0
[ 129.443476][ 8] Memreg 5
'rdma 20049' > /proc/fs/nfsd/portlist
echo
[ 137.108816][ 7] svc: creating transport rdma[20049]
[ 137.110479][ 7] svcrdma: Creating RDMA listener
[ 137.112040][ 7] svc: creating transport rdma[20049] [ 137.113683][ 7] svcrdma: Creating RDMA listener
查看可以用 kprobe 跟踪的函数:
cat /sys/kernel/debug/tracing/available_filter_functions
跟踪_svc_create_xprt
和__svc_xpo_create
函数:
echo 'p:p__svc_create_xprt _svc_create_xprt' >> /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/p__svc_create_xprt/enable
# echo 0 > /sys/kernel/debug/tracing/events/kprobes/p__svc_create_xprt/enable
# echo '-:p__svc_create_xprt' >> /sys/kernel/debug/tracing/kprobe_events
echo 'p:p___svc_xpo_create __svc_xpo_create' >> /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/p___svc_xpo_create/enable
# echo 0 > /sys/kernel/debug/tracing/events/kprobes/p___svc_xpo_create/enable
# echo '-:p___svc_xpo_create' >> /sys/kernel/debug/tracing/kprobe_events
查看kprobe日志:
echo 0 > /sys/kernel/debug/tracing/trace # 清除trace信息
cat /sys/kernel/debug/tracing/trace_pipe &
跟踪svc_reg_xprt_class
函数时,发现根本没走到:
echo 'r:r_svc_reg_xprt_class svc_reg_xprt_class ret=$retval' >> /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/r_svc_reg_xprt_class/enable
# echo stacktrace > /sys/kernel/debug/tracing/events/kprobes/r_svc_reg_xprt_class/trigger
# echo '!stacktrace' > /sys/kernel/debug/tracing/events/kprobes/r_svc_reg_xprt_class/trigger
# echo 0 > /sys/kernel/debug/tracing/events/kprobes/r_svc_reg_xprt_class/enable
# echo '-:r_svc_reg_xprt_class' >> /sys/kernel/debug/tracing/kprobe_events
使用modinfo rpcrdma
查看:
filename: /lib/modules/4.19.90-24.4.v2101.ky10.x86_64/updates/net/sunrpc/xprtrdma/rpcrdma.ko
version: 2.0.1
license: Dual BSD/GPL
description: rpcrdma dummy kernel module
author: Alaa Hleihel
srcversion: 6AFF4B70A07D55D1FAD40A4
depends: mlx_compat
retpoline: Y
name: rpcrdma
vermagic: 4.19.90-24.4.v2101.ky10.x86_64 SMP mod_unload modversions
发现rpcrdma
模块在updates
目录下,且依赖的是mlx_compat
。
当加载rpcrdma
模块时,将rdma
加到svc_xprt_class_list
链表中,在执行命令echo 'rdma 20049' > /proc/fs/nfsd/portlist
时,走到_svc_create_xprt
函数时在svc_xprt_class_list
链表中找rdma
:
nfsd_fill_super"portlist", &transaction_ops, S_IWUSR|S_IRUGO}
[NFSD_Ports] = {
write_ports
__write_ports
__write_ports_addxprt
svc_create_xprt"svc: creating transport %s[%d]\n",
dprintk(
_svc_create_xprt// 从 svc_xprt_class_list 链表中找
__svc_xpo_create// xcl->xcl_ops->xpo_create
svc_rdma_create "svcrdma: Creating RDMA listener\n")
dprintk(
rpc_rdma_init
svc_rdma_init
svc_reg_xprt_class"svc: Adding svc transport class"
dprintk(// 加到 svc_xprt_class_list 链表中
list_add_tail(&xcl->xcl_list, &svc_xprt_class_list)
重新编译内核,使用modinfo rpcrdma
查看:
filename: /lib/modules/4.19.90-24/kernel/net/sunrpc/xprtrdma/rpcrdma.ko
alias: xprtrdma
alias: svcrdma
license: Dual BSD/GPL
description: RPC/RDMA Transport
author: Open Grid Computing and Network Appliance, Inc.
srcversion: FDF863B8AE9371858D8AF75
depends: ib_core,sunrpc,rdma_cm
retpoline: Y
intree: Y
name: rpcrdma
vermagic: 4.19.90-24 SMP mod_unload modversions
echo 'rdma 20049' > /proc/fs/nfsd/portlist
执行成功。
出问题时/lib/modules/4.19.90-24.4.v2101.ky10.x86_64/updates/net/sunrpc/xprtrdma/rpcrdma.ko
下的ko是编译驱动时重新编译的,依赖的是mlx_compat
模块。因此只要使用最原生的rpcrdma
模块就能解决此问题。