文件系统变成只读,系统重启后恢复。
环境信息:
mount | grep btrfs
/dev/mapper/datavg-dockerlv on /var/lib/docker type btrfs (rw,relatime,noacl,space_cache,subvolid=5,subvol=/)
/dev/mapper/datavg-dockerlv on /var/lib/docker/btrfs type btrfs (rw,relatime,noacl,space_cache,subvolid=5,subvol=/btrfs)
uname -a
Linux pjdhcpaasnap8pn 4.19.90-17.ky10.x86_64 #1 SMP Sun Jun 28 15:41:49 CST 2020 x86_64 x86_64 x86_64 GNU/Linux
内核日志:
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328533] ------------[ cut here ]------------
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328594] WARNING: CPU: 42 PID: 460094 at fs/btrfs/extent-tree.c:6805 __btrfs_free_extent.isra.71+0x65c/0xbd0 [btrfs]
...
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328647] CPU: 42 PID: 460094 Comm: httpWorkerThrea Kdump: loaded Tainted: G W 4.19.90-17.ky10.x86_64 #1
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328648] Hardware name: Sugon H620-G30/65N32-US, BIOS 0SSSX245 06/07/2020
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328656] RIP: 0010:__btrfs_free_extent.isra.71+0x65c/0xbd0 [btrfs]
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328659] Code: 44 89 e9 ba c8 1a 00 00 48 c7 c6 a0 aa 06 c1 e8 fc ba 09 00 e9 fe fa ff ff 0f 0b 44 89 ee 48 c7 c7 68 7c 07 c1 e8 14 e9 2d e5 <0f> 0b e9 ce fa ff ff 41 83 f8 29 0f 86 d6 01 00 00 48 8b 7c 24 08
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328660] RSP: 0018:ffffae7f9f6e78c8 EFLAGS: 00010286
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328661] RAX: 0000000000000000 RBX: 000000000000a0c9 RCX: 0000000000000006
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328662] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff90a4ffa96850
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328663] RBP: 00000470931db000 R08: 000000000210773e R09: 0000000000000004
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328664] R10: 00000000ffffffe4 R11: 0000000000000001 R12: ffff909b302e2a80
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328664] R13: 00000000ffffffe4 R14: 0000000000000000 R15: 000000000003fc62
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328666] FS: 00007f4aedae2700(0000) GS:ffff90a4ffa80000(0000) knlGS:0000000000000000
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328667] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328667] CR2: 00007efce4073ff0 CR3: 0000006413d3c000 CR4: 00000000003406e0
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328668] Call Trace:
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328682] __btrfs_run_delayed_refs+0x4f8/0x1150 [btrfs]
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328707] btrfs_run_delayed_refs+0xe7/0x1b0 [btrfs]
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328717] btrfs_truncate_inode_items+0xa56/0xeb0 [btrfs]
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328728] btrfs_evict_inode+0x496/0x4f0 [btrfs]
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328733] evict+0xd2/0x1a0
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328736] __dentry_kill+0xdd/0x180
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328738] dentry_kill+0x4d/0x260
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328740] dput+0x183/0x200
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328743] __fput+0x118/0x1f0
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328747] task_work_run+0x8a/0xb0
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328750] do_exit+0x2ec/0xbf0
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328753] do_group_exit+0x3a/0xa0
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328755] get_signal+0x13f/0x7a0
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328758] do_signal+0x36/0x610
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328768] exit_to_usermode_loop+0x71/0xe0
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328770] do_syscall_64+0x1a3/0x1d0
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328773] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328779] RIP: 0033:0x7f4b12fae10c
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328785] Code: Bad RIP value.
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328785] RSP: 002b:00007f4aedae1550 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328786] RAX: fffffffffffffe00 RBX: 00007f4b0d1914e0 RCX: 00007f4b12fae10c
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328787] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f4b0d19150c
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328787] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000794039ff8
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328788] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4b0d1914b8
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328789] R13: 0000000000008223 R14: 0000000000000000 R15: 00007f4b0d19150c
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328790] ---[ end trace 74a7f2461a0f9473 ]---
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.328961] BTRFS: error (device dm-9) in __btrfs_free_extent:6805: errno=-28 No space left
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.439057] BTRFS info (device dm-9): forced readonly
Dec 8 15:25:17 pjdhcpaasnap8pn kernel: [13439425.439067] BTRFS: error (device dm-9) in btrfs_run_delayed_refs:2935: errno=-28 No space left
...
Dec 8 15:25:28 pjdhcpaasnap8pn kernel: [13439435.962898] BTRFS warning (device dm-9): Skipping commit of aborted transaction.
Dec 8 15:25:28 pjdhcpaasnap8pn kernel: [13439435.962902] BTRFS: error (device dm-9) in cleanup_transaction:1860: errno=-28 No space left
git log fs/btrfs/
中有一个补丁日志有点像,0d977e0eba23 btrfs: reset replace target device to allocation state on close。
lookup_extent_backref
函数中可能返回-ENOSPC
错误的地方非常多,不太可能把每一种可能性都分析出来。
exit_to_usermode_loop
do_signal
get_signal
do_group_exit
do_exit
task_work_run
__fput
__fput
dput
dentry_kill
__dentry_kill
evict
btrfs_evict_inode
btrfs_truncate_inode_items
btrfs_run_delayed_refs
__btrfs_run_delayed_refs
run_one_delayed_ref// 暂时不确定是走到这里还是走到 run_delayed_data_ref
run_delayed_tree_ref
__btrfs_free_extent28
ret = lookup_extent_backref = -ENOSPC = -
btrfs_abort_transaction
run_delayed_data_ref
__btrfs_free_extent
btrfs_abort_transaction
btrfs_abort_transaction
btrfs_abort_transaction
__btrfs_abort_transaction
__btrfs_handle_fs_error"BTRFS: error (device %s) in %s:%d: errno=%d %s (%pV)\n"
pr_crit(if (sb_rdonly(sb)) // 条件不满足(不是只读)才会向下执行
// 设置为只读
sb->s_flags |= SB_RDONLY "forced readonly"); btrfs_info(fs_info,
Question about btrfs no space left error and forced readonly
在生产环境下,如果必须要快速恢复环境,且这个环境是可以接受重启系统,就可以尝试手动导出vmcore,vmcore中的信息有时对分析问题很有帮助:
echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger