环境信息:
crash> sys
KERNEL: usr/lib/debug/lib/modules/4.19.90-52.39.v2207.ky10.aarch64/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 32
DATE: Wed May 28 02:04:57 CST 2025
UPTIME: 06:43:45
LOAD AVERAGE: 4.08, 4.00, 3.83
TASKS: 4811
NODENAME: k8s-node01
RELEASE: 4.19.90-52.39.v2207.ky10.aarch64
VERSION: #4 SMP Wed Jun 5 15:52:50 CST 2024
MACHINE: aarch64 (unknown Mhz)
MEMORY: 64 GB
PANIC: "Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000"
rpm2cpio kernel-debuginfo-4.19.90-52.39.v2207.ky10.aarch64.rpm | cpio -div
crash usr/lib/debug/lib/modules/4.19.90-52.39.v2207.ky10.aarch64/vmlinux vmcore
crash> mod -s nfsd
crash> mod -s sunrpc
查看崩溃的栈:
crash> bt
PC: ffff000048111014 [__queue_work+180]
反汇编:
crash> dis -l __queue_work
...
/usr/src/debug/kernel-4.19.90/linux-4.19.90-52.39.v2207.ky10.aarch64/kernel/workqueue.c: 577
# unbound_pwq_by_node() -> rcu_dereference_raw()
0xffff000048110ff8 <__queue_work+152>: sxtw x0, w0
0xffff000048110ffc <__queue_work+156>: add x0, x0, #0x22
/usr/src/debug/kernel-4.19.90/linux-4.19.90-52.39.v2207.ky10.aarch64/./include/linux/compiler.h: 310
# rcu_dereference_raw() -> READ_ONCE() -> __READ_ONCE() -> __read_once_size()
0xffff000048111000 <__queue_work+160>: ldr x19, [x24,x0,lsl #3] # pwq的值
...
/usr/src/debug/kernel-4.19.90/linux-4.19.90-52.39.v2207.ky10.aarch64/kernel/workqueue.c: 1400
0xffff000048111010 <__queue_work+176>: cbnz x0, 0xffff000048111100 <__queue_work+416>
# 将寄存器 X19 中的值作为内存地址,从该地址读取 64 位数据,并将其存入寄存器 X2
0xffff000048111014 <__queue_work+180>: ldr x2, [x19] # 访问pwq->pool
所以崩溃发生在nfsd4_run_cb() -> queue_work() -> queue_work_on() -> __queue_work()
。
再结合:
crash> struct pool_workqueue -o
struct pool_workqueue {
0] struct worker_pool *pool; [
日志中的... at virtual address 0000000000000000
表明解引用的是struct pool_workqueue
结构体的第一个成员,所以是在执行到以下代码时发生空指针解引用:
// kernel/workqueue.c: 1400
1400 if (last_pool && last_pool != pwq->pool) { // pwq为NULL
callback_wq
crash> rd callback_wq
ffff000042a55a70: ffff80010eb50600
再查看__queue_work()
的反汇编:
...
0xffff000048110f74 <__queue_work+20>: mov x24, x1 # 将寄存器 x1 中的值复制到寄存器 x24 中
...
0xffff000048110f8c <__queue_work+44>: ldr w0, [x24,#256] # 从内存地址 x24 + 256 处加载 32 位数据 到寄存器 w0 中
...
0xffff000048110fdc <__queue_work+124>: add x1, x26, #0xb48 # 将寄存器 x26 的值与立即数 0xb48 相加,结果存入寄存器 x1
...
0xffff000048111000 <__queue_work+160>: ldr x19, [x24,x0,lsl #3] # 数组array地址: x24 + (x0 << 3), 访问以 8 字节为单位的数组array中,第 x0 个元素,然后把它的值存到 x19
...
aarch64架构下整数参数使用的寄存器依次为: x0~x7
,__queue_work()
的第二个参数struct workqueue_struct *wq
的值为X24: ffff80042c343400
。
和当前的callback_wq
的值不一样。
crash> struct workqueue_struct ffff80042c343400
struct workqueue_struct {
...
dfl_pwq = 0x0,
...
在__queue_work()
中发生空指针解引用:
svc_process
svc_process_common// versp->vs_dispatch()
nfsd_dispatch // proc->pc_func()
nfsd4_proc_compound // op->opdesc->op_func()
nfsd4_create_session
nfsd4_init_conn
nfsd4_probe_callback_sync
nfsd4_probe_callback
nfsd4_run_cb// include/linux/workqueue.h
queue_work
queue_work_on
__queue_workif (wq->flags & WQ_UNBOUND) { // 条件满足
if (last_pool && last_pool != pwq->pool) { // pwq为NULL
nfsd: last server has exited
短时间打印了两次,说明有两个进程同时执行到nfsd_last_thread()
:
write
ksys_write
vfs_write
__vfs_write
nfsctl_transaction_write
write_threads
nfsd_svc
nfsd_startup_net
nfsd_startup_generic
nfsd_users++
nfs4_state_start
nfsd4_create_callback_queue
callback_wq = alloc_ordered_workqueue()
nfs4_state_start_net"NFSD: starting %ld-second grace period (net %x)\n"
printk(KERN_INFO
nfsd
nfsd_destroy
svc_shutdown_net
nfsd_last_thread
nfsd_shutdown_net
nfs4_state_shutdown_net
nfs4_state_destroy_net
destroy_client
__destroy_client
nfsd4_shutdown_callback
flush_workqueue
nfsd_shutdown_generic
--nfsd_users
nfs4_state_shutdown
nfsd4_destroy_callback_queue
destroy_workqueue(callback_wq)if (!(wq->flags & WQ_UNBOUND)) { // 条件不满足
wq->dfl_pwq = NULL
put_pwq_unlocked"nfsd: last server has exited, flushing export cache\n") printk(KERN_WARNING
如果nfsd_startup_generic()
和nfsd_shutdown_generic()
会并发,这个问题可能出现,但nfsd_users
变量有nfsd_mutex
锁保护,所以不会并发,具体讨论请查看nfsd: convert the nfsd_users to atomic_t。
Re: [PATCH 2/3] nfsd: use kref and new mutex for global config management
Re: [RFC PATCH] nfsd: convert the nfsd_users to atomic_t
合入以下补丁:
[PATCH v2 1/5] 1054e8ffc5c4 nfsd: prevent callback tasks running concurrently
: 增加一个flag防止nfsd4_callback工作队列并发[PATCH 38f080f3cd19 NFSD: Move callback_wq into struct nfs4_client
: 原本只有一个工作队列callback_wq,改成让每个nfs4_client结构体都有一个工作队列
__free_client()
的补丁: 59f8e91b75ec nfsd4: use reference count to free client
nfs4_state_start()
的补丁: [PATCH RFC v25 3/7] d76cc46b37e1 NFSD: move create/destroy of laundry_wq to init_nfsd and exit_nfsd
[PATCH 00/20 v3] SUNRPC: clean up server thread management