日志如下:
1146536.498525] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[1146536.499331] Mem abort info:
[1146536.499604] ESR = 0x96000047
[1146536.499894] Exception class = DABT (current EL), IL = 32 bits
[1146536.500406] SET = 0, FnV = 0
[1146536.500690] EA = 0, S1PTW = 0
[1146536.500982] Data abort info:
[1146536.501255] ISV = 0, ISS = 0x00000047
[1146536.501610] CM = 0, WnR = 1
[1146536.501889] user pgtable: 64k pages, 48-bit VAs, pgdp = 00000000edff596a
[1146536.502457] [0000000000000008] pgd=00000007bfa00003, pud=00000007bfa00003, pmd=0000000760650003, pte=0000000000000000
[1146536.503333] Internal error: Oops: 96000047 [#1] SMP
[...
1146536.506426] Process cifsd (pid: 1592, stack limit = 0x000000000b974547)
[1146536.507128] CPU: 5 PID: 1592 Comm: cifsd Kdump: loaded Not tainted 4.19.90-25.31.v2101.ky10.aarch64 #1
[1146536.508095] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[1146536.508830] pstate: 80c00005 (Nzcv daif +PAN +UAO)
[1146536.509389] pc : cifs_reconnect+0x214/0x538 [cifs]
[1146536.509923] lr : cifs_reconnect+0x230/0x538 [cifs]
[1146536.510448] sp : ffff80077eea3c10
[1146536.510833] x29: ffff80077eea3c10 x28: ffff800769b18800
[1146536.511406] x27: ffff800766ee9180 x26: ffff80077b7d4200
[1146536.511982] x25: ffff80077eea3c90 x24: ffff800766ee91c0
[1146536.512564] x23: ffff0000030d5000 x22: ffff0000030d5800
[1146536.513143] x21: ffff0000030d5898 x20: ffff0000030d5000
[1146536.513725] x19: ffff800766ee9000 x18: 0000000000000000
[1146536.514309] x17: 0000000000000000 x16: 0000000000000000
[1146536.514886] x15: 0000000000000000 x14: 0000000000000000
[1146536.515455] x13: 0000000000000000 x12: 0000000000000000
[1146536.516029] x11: 0000000000000001 x10: 00000000ac130045
[1146536.516607] x9 : 0000000000000001 x8 : 0000000000000015
[1146536.517181] x7 : 00000000ffffffff x6 : ffff8007ff0b11f8
[1146536.517763] x5 : ffff000008c5e1b8 x4 : ffff800763672680
[1146536.518341] x3 : ffff80077b7dbf00 x2 : 0000000000000000
[1146536.518928] x1 : ffff80077b7d4200 x0 : ffff80077b7d4200
[1146536.519507] Call trace:
[1146536.519807] cifs_reconnect+0x214/0x538 [cifs]
[1146536.520312] cifs_readv_from_socket+0x158/0x250 [cifs]
[1146536.520876] cifs_read_from_socket+0x4c/0x60 [cifs]
[1146536.521425] cifs_demultiplex_thread+0xd0/0x828 [cifs]
[1146536.521991] kthread+0x134/0x138
[1146536.523049] ret_from_fork+0x10/0x18
[1146536.524044] Code: f9400022 540001c0 f9400423 aa0103e0 (f9000443)
[1146536.525246] SMP: stopping secondary CPUs
[1146536.533373] Starting crashdump kernel...
[1146536.535656] Bye! [
scripts/faddr2line fs/cifs/cifs.ko.debug cifs_reconnect+0x214/0x538
cifs_reconnect+0x214/0x538:
__list_del at /usr/src/debug/kernel-4.19.90/linux-4.19.90-25.31.v2101.ky10.aarch64/./include/linux/list.h:105
(inlined by) __list_del_entry at /usr/src/debug/kernel-4.19.90/linux-4.19.90-25.31.v2101.ky10.aarch64/./include/linux/list.h:120
(inlined by) list_del_init at /usr/src/debug/kernel-4.19.90/linux-4.19.90-25.31.v2101.ky10.aarch64/./include/linux/list.h:159
(inlined by) cifs_reconnect at /usr/src/debug/kernel-4.19.90/linux-4.19.90-25.31.v2101.ky10.aarch64/fs/cifs/connect.c:456
crash> dis -lx cifs_reconnect
...
/usr/src/debug/kernel-4.19.90/linux-4.19.90-25.31.v2101.ky10.aarch64/fs/cifs/connect.c: 457 # 因为list_del_init是内联函数
0xffff000003050748 <cifs_reconnect+0x210>: mov x0, x1
/usr/src/debug/kernel-4.19.90/linux-4.19.90-25.31.v2101.ky10.aarch64/./include/linux/list.h: 105
0xffff00000305074c <cifs_reconnect+0x214>: str x3, [x2,#8]
crash> struct list_head -o
struct list_head {
0] struct list_head *next;
[8] struct list_head *prev;
[
}SIZE: 16
panic发生在cifs_reconnect -> list_del_init -> __list_del_entry -> __list_del
中的next->prev = prev;
,next == NULL
,解引用prev
(偏移量为8)时发生空指针解引用。
修复补丁abe57073d08c CIFS: Fix retry mid list corruption on reconnects
当smb server重启或网络故障时,在cifs_reconnect
中将pending_mid_q
链表中的struct mid_q_entry
移到retry_list
链表(这时有GlobalMid_Lock
锁保护),然后在没有锁保护的情况下从retry_list
链表中删除,在从retry_list
链表中删除的过程中有可能执行到cifs_delete_mid
中将struct mid_q_entry
释放,这时就会发生use-after-free。
kthread
cifs_demultiplex_thread
cifs_read_from_socket
cifs_readv_from_socketif (length <= 0) // 当server服务重启时
cifs_reconnect
server->tcpStatus = CifsNeedReconnectif (server->tcpStatus == CifsNeedReconnect) // 条件满足
cifs_reconnect// 在GlobalMid_Lock的保护下将pending_mid_q链表中的mid移到retry_list链表
list_move(&mid_entry->qhead, &retry_list)// 从retry_list链表中删除,无任何锁保护
list_del_init
cifs_send_recv
compound_send_recv// ses->server->ops->setup_request
smb2_setup_request
smb2_get_mid_entry
*mid = smb2_mid_entry_alloc// 加到pending_mid_q链表中,有GlobalMid_Lock保护
list_add_tail
wait_for_response// 等待 dequeue_mid 中 mid_state 改变,同时从pending_mid_q链表中删除
wait_event_freezekillable_unsafe(server->response_q, midQ->mid_state != MID_REQUEST_SUBMITTED)
out:
cifs_delete_mid// 在GlobalMid_Lock的保护下从链表中删除, 正常流程下不在任何一个链表中
list_del_init
kthread
cifs_demultiplex_thread
standard_receive3
cifs_handle_standard
handle_mid
dequeue_mid// 在GlobalMid_Lock的保护下从链表中删除(可能是pending_mid_q链表也可能是retry_list链表)
list_del_init(&mid->qhead)
以下方法只能保证执行到cifs_reconnect -> list_del_init
,暂时还没有构造出use-after-free的时序。
test.sh
文件:
while true
do
cat /mnt/file > /dev/null # /mnt是挂载点
echo 3 > /proc/sys/vm/drop_caches
done
bash test.sh &
# 如果没进入cifs_reconnect -> list_del_init,等test.sh脚本中cat命令执行成功后再继续重启smbd服务
systemctl restart smbd.service