环境信息:
uname -a
Linux localhost.localdomain 4.19.90-24.4.v2101.ky10.x86_64 #1 SMP Mon May 24 12:14:55 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
modinfo rpcrdma
filename: /lib/modules/4.19.90-24.4.v2101.ky10.x86_64/updates/net/sunrpc/xprtrdma/rpcrdma.ko
version: 2.0.1
license: Dual BSD/GPL
description: rpcrdma dummy kernel module
author: Alaa Hleihel
srcversion: 6AFF4B70A07D55D1FAD40A4
depends: mlx_compat
retpoline: Y
name: rpcrdma
vermagic: 4.19.90-24.4.v2101.ky10.x86_64 SMP mod_unload modversions报错rdma协议不支持:
echo 'rdma 20049' > /proc/fs/nfsd/portlist
-bash: echo: write error: Protocol not supported打开rpc的调试开关:
echo 0x7fff > /proc/sys/sunrpc/rpc_debug报错:
[25017.682688] svc: creating transport rdma[20049]
[25017.685068] svc: transport rdma not found, err 93
[25017.685070] svc: svc_destroy(nfsd, 9)正常情况下的日志为:
modprobe rpcrdma
[ 129.419173][ 8] PKCS#7 signature not signed with a trusted key
[ 129.432317][ 8] SVCRDMA Module Init, register RPC RDMA transport
[ 129.433320][ 8] svcrdma_ord : 16
[ 129.433928][ 8] max_requests : 32
[ 129.434565][ 8] max_bc_requests : 2
[ 129.435273][ 8] max_inline : 4096
[ 129.436078][ 8] svc: Adding svc transport class 'rdma'
[ 129.437013][ 8] svc: Adding svc transport class 'rdma-bc'
[ 129.438179][ 8] RPC: Registered rdma transport module.
[ 129.439090][ 8] RPC: Registered rdma backchannel transport module.
[ 129.440200][ 8] RPCRDMA Module Init, register RPC RDMA transport
[ 129.441287][ 8] Defaults:
[ 129.441762][ 8] Slots 128
[ 129.441762][ 8] MaxInlineRead 4096
[ 129.441762][ 8] MaxInlineWrite 4096
[ 129.443476][ 8] Padding 0
[ 129.443476][ 8] Memreg 5
echo 'rdma 20049' > /proc/fs/nfsd/portlist
[ 137.108816][ 7] svc: creating transport rdma[20049]
[ 137.110479][ 7] svcrdma: Creating RDMA listener
[ 137.112040][ 7] svc: creating transport rdma[20049]
[ 137.113683][ 7] svcrdma: Creating RDMA listener查看可以用 kprobe 跟踪的函数:
cat /sys/kernel/debug/tracing/available_filter_functions跟踪_svc_create_xprt和__svc_xpo_create函数:
echo 'p:p__svc_create_xprt _svc_create_xprt' >> /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/p__svc_create_xprt/enable
# echo 0 > /sys/kernel/debug/tracing/events/kprobes/p__svc_create_xprt/enable
# echo '-:p__svc_create_xprt' >> /sys/kernel/debug/tracing/kprobe_events
echo 'p:p___svc_xpo_create __svc_xpo_create' >> /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/p___svc_xpo_create/enable
# echo 0 > /sys/kernel/debug/tracing/events/kprobes/p___svc_xpo_create/enable
# echo '-:p___svc_xpo_create' >> /sys/kernel/debug/tracing/kprobe_events查看kprobe日志:
echo 0 > /sys/kernel/debug/tracing/trace # 清除trace信息
cat /sys/kernel/debug/tracing/trace_pipe &跟踪svc_reg_xprt_class函数时,发现根本没走到:
echo 'r:r_svc_reg_xprt_class svc_reg_xprt_class ret=$retval' >> /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/r_svc_reg_xprt_class/enable
# echo stacktrace > /sys/kernel/debug/tracing/events/kprobes/r_svc_reg_xprt_class/trigger
# echo '!stacktrace' > /sys/kernel/debug/tracing/events/kprobes/r_svc_reg_xprt_class/trigger
# echo 0 > /sys/kernel/debug/tracing/events/kprobes/r_svc_reg_xprt_class/enable
# echo '-:r_svc_reg_xprt_class' >> /sys/kernel/debug/tracing/kprobe_events使用modinfo rpcrdma查看:
filename: /lib/modules/4.19.90-24.4.v2101.ky10.x86_64/updates/net/sunrpc/xprtrdma/rpcrdma.ko
version: 2.0.1
license: Dual BSD/GPL
description: rpcrdma dummy kernel module
author: Alaa Hleihel
srcversion: 6AFF4B70A07D55D1FAD40A4
depends: mlx_compat
retpoline: Y
name: rpcrdma
vermagic: 4.19.90-24.4.v2101.ky10.x86_64 SMP mod_unload modversions发现rpcrdma模块在updates目录下,且依赖的是mlx_compat。
当加载rpcrdma模块时,将rdma加到svc_xprt_class_list链表中,在执行命令echo 'rdma 20049' > /proc/fs/nfsd/portlist时,走到_svc_create_xprt函数时在svc_xprt_class_list链表中找rdma:
nfsd_fill_super
[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO}
write_ports
__write_ports
__write_ports_addxprt
svc_create_xprt
dprintk("svc: creating transport %s[%d]\n",
_svc_create_xprt
// 从 svc_xprt_class_list 链表中找
__svc_xpo_create // 找到了才会执行到这里
svc_rdma_create // xcl->xcl_ops->xpo_create
dprintk("svcrdma: Creating RDMA listener\n")
return -EPROTONOSUPPORT // 找不到返回错误码
rpc_rdma_init
svc_rdma_init
svc_reg_xprt_class
dprintk("svc: Adding svc transport class"
// 加到 svc_xprt_class_list 链表中
list_add_tail(&xcl->xcl_list, &svc_xprt_class_list)重新编译内核,使用modinfo rpcrdma查看:
filename: /lib/modules/4.19.90-24/kernel/net/sunrpc/xprtrdma/rpcrdma.ko
alias: xprtrdma
alias: svcrdma
license: Dual BSD/GPL
description: RPC/RDMA Transport
author: Open Grid Computing and Network Appliance, Inc.
srcversion: FDF863B8AE9371858D8AF75
depends: ib_core,sunrpc,rdma_cm
retpoline: Y
intree: Y
name: rpcrdma
vermagic: 4.19.90-24 SMP mod_unload modversionsecho 'rdma 20049' > /proc/fs/nfsd/portlist执行成功。
出问题时/lib/modules/4.19.90-24.4.v2101.ky10.x86_64/updates/net/sunrpc/xprtrdma/rpcrdma.ko下的ko是编译驱动时重新编译的,依赖的是mlx_compat模块。因此只要使用最原生的rpcrdma模块就能解决此问题。