Age | Commit message (Collapse) | Author | Files | Lines |
|
commit 851e99ebeec3f4a672bb5010cf1ece095acee447 upstream.
Al Viro brought it to my attention that the dentries may not be filled
when the parse_options() is called, causing the call to set_gid() to
possibly crash. It should only be called if parse_options() succeeds
totally anyway.
He suggested the logical place to do the update is in apply_options().
Link: https://lore.kernel.org/all/20220225165219.737025658@goodmis.org/
Link: https://lkml.kernel.org/r/20220225153426.1c4cab6b@gandalf.local.home
Cc: stable@vger.kernel.org
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 48b27b6b5191 ("tracefs: Set all files to the same group ownership as the mount option")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 84ec758fb2daa236026506868c8796b0500c047d ]
When configfs_register_subsystem() or configfs_unregister_subsystem()
is executing link_group() or unlink_group(),
it is possible that two processes add or delete list concurrently.
Some unfortunate interleavings of them can cause kernel panic.
One of cases is:
A --> B --> C --> D
A <-- B <-- C <-- D
delete list_head *B | delete list_head *C
--------------------------------|-----------------------------------
configfs_unregister_subsystem | configfs_unregister_subsystem
unlink_group | unlink_group
unlink_obj | unlink_obj
list_del_init | list_del_init
__list_del_entry | __list_del_entry
__list_del | __list_del
// next == C |
next->prev = prev |
| next->prev = prev
prev->next = next |
| // prev == B
| prev->next = next
Fix this by adding mutex when calling link_group() or unlink_group(),
but parent configfs_subsystem is NULL when config_item is root.
So I create a mutex configfs_subsystem_mutex.
Fixes: 7063fbf22611 ("[PATCH] configfs: User-driven configuration filesystem")
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit f240762f88b4b1b58561939ffd44837759756477 upstream.
Looping ~65535 times doing kmalloc() calls can trigger soft lockups,
especially with DEBUG features (like KASAN).
[ 253.536212] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [b219417889:12575]
[ 253.544433] Modules linked in: vfat fat i2c_mux_pca954x i2c_mux spidev cdc_acm xhci_pci xhci_hcd sha3_generic gq(O)
[ 253.544451] CPU: 64 PID: 12575 Comm: b219417889 Tainted: G S O 5.17.0-smp-DEV #801
[ 253.544457] RIP: 0010:kernel_text_address (./include/asm-generic/sections.h:192 ./include/linux/kallsyms.h:29 kernel/extable.c:67 kernel/extable.c:98)
[ 253.544464] Code: 0f 93 c0 48 c7 c1 e0 63 d7 a4 48 39 cb 0f 92 c1 20 c1 0f b6 c1 5b 5d c3 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 53 48 89 fb <48> c7 c0 00 00 80 a0 41 be 01 00 00 00 48 39 c7 72 0c 48 c7 c0 40
[ 253.544468] RSP: 0018:ffff8882d8baf4c0 EFLAGS: 00000246
[ 253.544471] RAX: 1ffff1105b175e00 RBX: ffffffffa13ef09a RCX: 00000000a13ef001
[ 253.544474] RDX: ffffffffa13ef09a RSI: ffff8882d8baf558 RDI: ffffffffa13ef09a
[ 253.544476] RBP: ffff8882d8baf4d8 R08: ffff8882d8baf5e0 R09: 0000000000000004
[ 253.544479] R10: ffff8882d8baf5e8 R11: ffffffffa0d59a50 R12: ffff8882eab20380
[ 253.544481] R13: ffffffffa0d59a50 R14: dffffc0000000000 R15: 1ffff1105b175eb0
[ 253.544483] FS: 00000000016d3380(0000) GS:ffff88af48c00000(0000) knlGS:0000000000000000
[ 253.544486] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 253.544488] CR2: 00000000004af0f0 CR3: 00000002eabfa004 CR4: 00000000003706e0
[ 253.544491] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 253.544492] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 253.544494] Call Trace:
[ 253.544496] <TASK>
[ 253.544498] ? io_queue_sqe (fs/io_uring.c:7143)
[ 253.544505] __kernel_text_address (kernel/extable.c:78)
[ 253.544508] unwind_get_return_address (arch/x86/kernel/unwind_frame.c:19)
[ 253.544514] arch_stack_walk (arch/x86/kernel/stacktrace.c:27)
[ 253.544517] ? io_queue_sqe (fs/io_uring.c:7143)
[ 253.544521] stack_trace_save (kernel/stacktrace.c:123)
[ 253.544527] ____kasan_kmalloc (mm/kasan/common.c:39 mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:515)
[ 253.544531] ? ____kasan_kmalloc (mm/kasan/common.c:39 mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:515)
[ 253.544533] ? __kasan_kmalloc (mm/kasan/common.c:524)
[ 253.544535] ? kmem_cache_alloc_trace (./include/linux/kasan.h:270 mm/slab.c:3567)
[ 253.544541] ? io_issue_sqe (fs/io_uring.c:4556 fs/io_uring.c:4589 fs/io_uring.c:6828)
[ 253.544544] ? __io_queue_sqe (fs/io_uring.c:?)
[ 253.544551] __kasan_kmalloc (mm/kasan/common.c:524)
[ 253.544553] kmem_cache_alloc_trace (./include/linux/kasan.h:270 mm/slab.c:3567)
[ 253.544556] ? io_issue_sqe (fs/io_uring.c:4556 fs/io_uring.c:4589 fs/io_uring.c:6828)
[ 253.544560] io_issue_sqe (fs/io_uring.c:4556 fs/io_uring.c:4589 fs/io_uring.c:6828)
[ 253.544564] ? __kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469)
[ 253.544567] ? __kasan_slab_alloc (mm/kasan/common.c:39 mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469)
[ 253.544569] ? kmem_cache_alloc_bulk (mm/slab.h:732 mm/slab.c:3546)
[ 253.544573] ? __io_alloc_req_refill (fs/io_uring.c:2078)
[ 253.544578] ? io_submit_sqes (fs/io_uring.c:7441)
[ 253.544581] ? __se_sys_io_uring_enter (fs/io_uring.c:10154 fs/io_uring.c:10096)
[ 253.544584] ? __x64_sys_io_uring_enter (fs/io_uring.c:10096)
[ 253.544587] ? do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 253.544590] ? entry_SYSCALL_64_after_hwframe (??:?)
[ 253.544596] __io_queue_sqe (fs/io_uring.c:?)
[ 253.544600] io_queue_sqe (fs/io_uring.c:7143)
[ 253.544603] io_submit_sqe (fs/io_uring.c:?)
[ 253.544608] io_submit_sqes (fs/io_uring.c:?)
[ 253.544612] __se_sys_io_uring_enter (fs/io_uring.c:10154 fs/io_uring.c:10096)
[ 253.544616] __x64_sys_io_uring_enter (fs/io_uring.c:10096)
[ 253.544619] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 253.544623] entry_SYSCALL_64_after_hwframe (??:?)
Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: io-uring <io-uring@vger.kernel.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220215041003.2394784-1-eric.dumazet@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ea1d1ca4025ac6c075709f549f9aa036b5b6597d upstream.
Check item size before accessing the device item to avoid out of bound
access, similar to inode_item check.
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0c982944af27d131d3b74242f3528169f66950ad upstream.
while mounting the crafted image, out-of-bounds access happens:
[350.429619] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
[350.429636] index 1048096 is out of range for type 'page *[16]'
[350.429650] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4 #1
[350.429652] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[350.429653] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
[350.429772] Call Trace:
[350.429774] <TASK>
[350.429776] dump_stack_lvl+0x47/0x5c
[350.429780] ubsan_epilogue+0x5/0x50
[350.429786] __ubsan_handle_out_of_bounds+0x66/0x70
[350.429791] btrfs_get_16+0xfd/0x120 [btrfs]
[350.429832] check_leaf+0x754/0x1a40 [btrfs]
[350.429874] ? filemap_read+0x34a/0x390
[350.429878] ? load_balance+0x175/0xfc0
[350.429881] validate_extent_buffer+0x244/0x310 [btrfs]
[350.429911] btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
[350.429935] end_bio_extent_readpage+0x3af/0x850 [btrfs]
[350.429969] ? newidle_balance+0x259/0x480
[350.429972] end_workqueue_fn+0x29/0x40 [btrfs]
[350.429995] btrfs_work_helper+0x71/0x330 [btrfs]
[350.430030] ? __schedule+0x2fb/0xa40
[350.430033] process_one_work+0x1f6/0x400
[350.430035] ? process_one_work+0x400/0x400
[350.430036] worker_thread+0x2d/0x3d0
[350.430037] ? process_one_work+0x400/0x400
[350.430038] kthread+0x165/0x190
[350.430041] ? set_kthread_struct+0x40/0x40
[350.430043] ret_from_fork+0x1f/0x30
[350.430047] </TASK>
[350.430077] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2
check_leaf() is checking the leaf:
corrupt leaf: root=4 block=29396992 slot=1, bad key order, prev (16140901064495857664 1 0) current (1 204 12582912)
leaf 29396992 items 6 free space 3565 generation 6 owner DEV_TREE
leaf 29396992 flags 0x1(WRITTEN) backref revision 1
fs uuid a62e00e8-e94e-4200-8217-12444de93c2e
chunk uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8
item 0 key (16140901064495857664 INODE_ITEM 0) itemoff 3955 itemsize 40
generation 0 transid 0 size 0 nbytes 17592186044416
block group 0 mode 52667 links 33 uid 0 gid 2104132511 rdev 94223634821136
sequence 100305 flags 0x2409000(none)
atime 0.0 (1970-01-01 08:00:00)
ctime 2973280098083405823.4294967295 (-269783007-01-01 21:37:03)
mtime 18446744071572723616.4026825121 (1902-04-16 12:40:00)
otime 9249929404488876031.4294967295 (622322949-04-16 04:25:58)
item 1 key (1 DEV_EXTENT 12582912) itemoff 3907 itemsize 48
dev extent chunk_tree 3
chunk_objectid 256 chunk_offset 12582912 length 8388608
chunk_tree_uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8
The corrupted leaf of device tree has an inode item. The leaf passed
checksum and others checks in validate_extent_buffer until check_leaf_item().
Because of the key type BTRFS_INODE_ITEM, check_inode_item() is called even we
are in the device tree. Since the
item offset + sizeof(struct btrfs_inode_item) > eb->len, out-of-bounds access
is triggered.
The item end vs leaf boundary check has been done before
check_leaf_item(), so fix it by checking item size in check_inode_item()
before access of the inode item in extent buffer.
Other check functions except check_dev_item() in check_leaf_item()
have their item size checks.
The commit for check_dev_item() is followed.
No regression observed during running fstests.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
CC: stable@vger.kernel.org # 5.10+
CC: Wenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 848fdd62399c638e65a1512616acaa5de7d5c5e8 ]
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit d19e0183a88306acda07f4a01fedeeffe2a2a06b upstream.
The result of the writeback, whether it is an ENOSPC or an EIO, or
anything else, does not inhibit the NFS client from reporting the
correct file timestamps.
Fixes: 79566ef018f5 ("NFS: Getattr doesn't require data sync semantics")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e0caaf75d443e02e55e146fd75fe2efc8aed5540 upstream.
Commit ac795161c936 (NFSv4: Handle case where the lookup of a directory
fails) [1], part of Linux since 5.17-rc2, introduced a regression, where
a symbolic link on an NFS mount to a directory on another NFS does not
resolve(?) the first time it is accessed:
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Fixes: ac795161c936 ("NFSv4: Handle case where the lookup of a directory fails")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit dd5532a4994bfda0386eb2286ec00758cee08444 ]
Strangely, dquot_quota_sync ignores the return code from the ->sync_fs
call, which means that quotacalls like Q_SYNC never see the error. This
doesn't seem right, so fix that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2719c7160dcfaae1f73a1c0c210ad3281c19022e ]
If we fail to synchronize the filesystem while preparing to freeze the
fs, abort the freeze.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 2e7be9db125a0bf940c5d65eb5c40d8700f738b5 upstream.
Currently if we get IO error while doing send then we abort without
logging information about which file caused issue. So log it to help
with debugging.
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Dāvis Mosāns <davispuh@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 24d7275ce2791829953ed4e72f68277ceb2571c6 upstream.
The syzbot reported the below BUG:
kernel BUG at include/linux/page-flags.h:785!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 4392 Comm: syz-executor560 Not tainted 5.16.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:PageDoubleMap include/linux/page-flags.h:785 [inline]
RIP: 0010:__page_mapcount+0x2d2/0x350 mm/util.c:744
Call Trace:
page_mapcount include/linux/mm.h:837 [inline]
smaps_account+0x470/0xb10 fs/proc/task_mmu.c:466
smaps_pte_entry fs/proc/task_mmu.c:538 [inline]
smaps_pte_range+0x611/0x1250 fs/proc/task_mmu.c:601
walk_pmd_range mm/pagewalk.c:128 [inline]
walk_pud_range mm/pagewalk.c:205 [inline]
walk_p4d_range mm/pagewalk.c:240 [inline]
walk_pgd_range mm/pagewalk.c:277 [inline]
__walk_page_range+0xe23/0x1ea0 mm/pagewalk.c:379
walk_page_vma+0x277/0x350 mm/pagewalk.c:530
smap_gather_stats.part.0+0x148/0x260 fs/proc/task_mmu.c:768
smap_gather_stats fs/proc/task_mmu.c:741 [inline]
show_smap+0xc6/0x440 fs/proc/task_mmu.c:822
seq_read_iter+0xbb0/0x1240 fs/seq_file.c:272
seq_read+0x3e0/0x5b0 fs/seq_file.c:162
vfs_read+0x1b5/0x600 fs/read_write.c:479
ksys_read+0x12d/0x250 fs/read_write.c:619
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The reproducer was trying to read /proc/$PID/smaps when calling
MADV_FREE at the mean time. MADV_FREE may split THPs if it is called
for partial THP. It may trigger the below race:
CPU A CPU B
----- -----
smaps walk: MADV_FREE:
page_mapcount()
PageCompound()
split_huge_page()
page = compound_head(page)
PageDoubleMap(page)
When calling PageDoubleMap() this page is not a tail page of THP anymore
so the BUG is triggered.
This could be fixed by elevated refcount of the page before calling
mapcount, but that would prevent it from counting migration entries, and
it seems overkilling because the race just could happen when PMD is
split so all PTE entries of tail pages are actually migration entries,
and smaps_account() does treat migration entries as mapcount == 1 as
Kirill pointed out.
Add a new parameter for smaps_account() to tell this entry is migration
entry then skip calling page_mapcount(). Don't skip getting mapcount
for device private entries since they do track references with mapcount.
Pagemap also has the similar issue although it was not reported. Fixed
it as well.
[shy828301@gmail.com: v4]
Link: https://lkml.kernel.org/r/20220203182641.824731-1-shy828301@gmail.com
[nathan@kernel.org: avoid unused variable warning in pagemap_pmd_range()]
Link: https://lkml.kernel.org/r/20220207171049.1102239-1-nathan@kernel.org
Link: https://lkml.kernel.org/r/20220120202805.3369-1-shy828301@gmail.com
Fixes: e9b61f19858a ("thp: reintroduce split_huge_page()")
Signed-off-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: syzbot+1f52b3a18d5633fa7f82@syzkaller.appspotmail.com
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e386dfc56f837da66d00a078e5314bc8382fab83 upstream.
Commit 054aa8d439b9 ("fget: check that the fd still exists after getting
a ref to it") fixed a race with getting a reference to a file just as it
was being closed. It was a fairly minimal patch, and I didn't think
re-checking the file pointer lookup would be a measurable overhead,
since it was all right there and cached.
But I was wrong, as pointed out by the kernel test robot.
The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed
quite noticeably. Admittedly it seems to be a very artificial test:
doing "poll()" system calls on regular files in a very tight loop in
multiple threads.
That means that basically all the time is spent just looking up file
descriptors without ever doing anything useful with them (not that doing
'poll()' on a regular file is useful to begin with). And as a result it
shows the extra "re-check fd" cost as a sore thumb.
Happily, the regression is fixable by just writing the code to loook up
the fd to be better and clearer. There's still a cost to verify the
file pointer, but now it's basically in the noise even for that
benchmark that does nothing else - and the code is more understandable
and has better comments too.
[ Side note: this patch is also a classic case of one that looks very
messy with the default greedy Myers diff - it's much more legible with
either the patience of histogram diff algorithm ]
Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Carel Si <beibei.si@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a8d54baba7c65db2d3278873def61f8d3753d766 ]
An fs_location attribute returns a string that can be ipv4, ipv6,
or DNS name. An ip location can have a port appended to it and if
no port is present a default port needs to be set. If rpc_pton()
fails to parse, try calling rpc_uaddr2socaddr() that can convert
an universal address.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f5b27cc6761e27ee6387a24df1a99ca77b360fea ]
Make nfs_parse_server_name available outside of nfs4namespace.c.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 90e12a3191040bd3854d3e236c35921e4e92a044 ]
Remove the check for the zero length fs_locations reply in the
xdr decoding, and instead check for that in the migration code.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b05bf5c63b326ce1da84ef42498d8e0e292e694c ]
When decode_devicenotify_args() exits with no entries, we need to
ensure that the struct cb_devicenotifyargs is initialised to
{ 0, NULL } in order to avoid problems in
nfs4_callback_devicenotify().
Reported-by: <rtm@csail.mit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fbd2057e5329d3502a27491190237b6be52a1cb6 ]
kstrdup() returns NULL when some internal memory errors happen, it is
better to check the return value of it so to catch the memory error in
time.
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2c52c8376db7160a1dd8a681c61c9258405ef143 ]
When the bitmask of the attributes doesn't include the security label,
don't bother printing it. Since the label might not be null terminated,
adjust the printing format accordingly.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b5e7b59c3480f355910f9d2c6ece5857922a5e54 ]
Currently the nfs_access_get_cached family of functions report a
'struct nfs_access_entry' as the result, with both .mask and .cred set.
However the .cred is never used. This is probably good and there is no
guarantee that it won't be freed before use.
Change to only report the 'mask' - as this is all that is used or needed.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 6a4d333d540041d244b2fca29b8417bfde20af81 upstream.
NFSv3 and NFSv4 use u64 offset values on the wire. Record these values
verbatim without the implicit type case to loff_t.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6260d9a56ab352b54891ec66ab0eced57d55abc6 upstream.
Ensure that a client cannot specify a WRITE range that falls in a
byte range outside what the kernel's internal types (such as loff_t,
which is signed) can represent. The kiocb iterators, invoked in
nfsd_vfs_write(), should properly limit write operations to within
the underlying file system's s_maxbytes.
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 468d126dab45718feeb728319be20bd869a5eaa7 upstream.
For some long forgotten reason, the nfs_client cl_flags field is
initialised in nfs_get_client() instead of being initialised at
allocation time. This quirk was harmless until we moved the call to
nfs_create_rpc_client().
Fixes: dd99e9f98fbf ("NFSv4: Initialise connection to the server in nfs4_alloc_client()")
Cc: stable@vger.kernel.org # 4.8.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8fca8a2b0a822f7936130af7299d2fd7f0a66714 upstream.
should not use fast commit log data directly, add le32_to_cpu().
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 0b5b5a62b945 ("ext4: use ext4_ext_remove_space() for fast commit replay delete range")
Cc: stable@kernel.org
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20220126063146.2302-1-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cdce59a1549190b66f8e3fe465c2b2f714b98a94 upstream.
Current code does not fully takes care of krealloc() error case, which
could lead to silent memory corruption or a kernel bug. This patch
fixes that.
Also it cleans up some duplicated error handling logic from various
functions in fast_commit.c file.
Reported-by: luo penghao <luo.penghao@zte.com.cn>
Suggested-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/62e8b6a1cce9359682051deb736a3c0953c9d1e9.1642416995.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 897026aaa73eb2517dfea8d147f20ddb0b813044 upstream.
While running "./check -I 200 generic/475" it sometimes gives below
kernel BUG(). Ideally we should not call ext4_write_inline_data() if
ext4_create_inline_data() has failed.
<log snip>
[73131.453234] kernel BUG at fs/ext4/inline.c:223!
<code snip>
212 static void ext4_write_inline_data(struct inode *inode, struct ext4_iloc *iloc,
213 void *buffer, loff_t pos, unsigned int len)
214 {
<...>
223 BUG_ON(!EXT4_I(inode)->i_inline_off);
224 BUG_ON(pos + len > EXT4_I(inode)->i_inline_size);
This patch handles the error and prints out a emergency msg saying potential
data loss for the given inode (since we couldn't restore the original
inline_data due to some previous error).
[ 9571.070313] EXT4-fs (dm-0): error restoring inline_data for inode -- potential data loss! (inode 1703982, error -30)
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/9f4cd7dfd54fa58ff27270881823d94ddf78dd07.1642416995.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 31a074a0c62dc0d2bfb9b543142db4fe27f9e5eb upstream.
For now in ext4_mb_new_blocks_simple, if we found a block which
should be excluded then will switch to next group, this may
probably cause 'group' run out of range.
Change to check next block in the same group when get a block should
be excluded. Also change the search range to EXT4_CLUSTERS_PER_GROUP
and add error checking.
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20220110035141.1980-3-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 599ea31d13617c5484c40cdf50d88301dc351cfc upstream.
During fast commit replay procedure, we clear inode blocks bitmap in
ext4_ext_clear_bb(), this may cause ext4_mb_new_blocks_simple() allocate
blocks still in use.
Make ext4_fc_record_regions() also record physical disk regions used by
inodes during replay procedure. Then ext4_mb_new_blocks_simple() can
excludes these blocks in use.
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Link: https://lore.kernel.org/r/20220110035141.1980-2-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ab451ea952fe9d7afefae55ddb28943a148247fe upstream.
From RFC 7530 Section 16.34.5:
o The server has not recorded an unconfirmed { v, x, c, *, * } and
has recorded a confirmed { v, x, c, *, s }. If the principals of
the record and of SETCLIENTID_CONFIRM do not match, the server
returns NFS4ERR_CLID_INUSE without removing any relevant leased
client state, and without changing recorded callback and
callback_ident values for client { x }.
The current code intends to do what the spec describes above but
it forgot to set 'old' to NULL resulting to the confirmed client
to be expired.
Fixes: 2b63482185e6 ("nfsd: fix clid_inuse on mount with security change")
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit bb902cb47cf93b33cd92b3b7a4019330a03ef57f upstream.
This patch adds accounting flags to fs_context and legacy_fs_context
allocation sites so that kernel could correctly charge these objects.
We have written a PoC to demonstrate the effect of the missing-charging
bugs. The PoC takes around 1,200MB unaccounted memory, while it is
charged for only 362MB memory usage. We evaluate the PoC on QEMU x86_64
v5.2.90 + Linux kernel v5.10.19 + Debian buster. All the limitations
including ulimits and sysctl variables are set as default. Specifically,
the hard NOFILE limit and nr_open in sysctl are both 1,048,576.
/*------------------------- POC code ----------------------------*/
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/file.h>
#include <time.h>
#include <sys/wait.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <signal.h>
#include <sched.h>
#include <fcntl.h>
#include <linux/mount.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)
#define STACK_SIZE (8 * 1024)
#ifndef __NR_fsopen
#define __NR_fsopen 430
#endif
static inline int fsopen(const char *fs_name, unsigned int flags)
{
return syscall(__NR_fsopen, fs_name, flags);
}
static char thread_stack[512][STACK_SIZE];
int thread_fn(void* arg)
{
for (int i = 0; i< 800000; ++i) {
int fsfd = fsopen("nfs", FSOPEN_CLOEXEC);
if (fsfd == -1) {
errExit("fsopen");
}
}
while(1);
return 0;
}
int main(int argc, char *argv[]) {
int thread_pid;
for (int i = 0; i < 1; ++i) {
thread_pid = clone(thread_fn, thread_stack[i] + STACK_SIZE, \
SIGCHLD, NULL);
}
while(1);
return 0;
}
/*-------------------------- end --------------------------------*/
Link: https://lkml.kernel.org/r/1626517201-24086-1-git-send-email-nglaive@gmail.com
Signed-off-by: Yutian Yang <nglaive@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <shenwenbo@zju.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e804861bd4e69cc5fe1053eedcb024982dde8e48 upstream.
Quota disable ioctl starts a transaction before waiting for the qgroup
rescan worker completes. However, this wait can be infinite and results
in deadlock because of circular dependency among the quota disable
ioctl, the qgroup rescan worker and the other task with transaction such
as block group relocation task.
The deadlock happens with the steps following:
1) Task A calls ioctl to disable quota. It starts a transaction and
waits for qgroup rescan worker completes.
2) Task B such as block group relocation task starts a transaction and
joins to the transaction that task A started. Then task B commits to
the transaction. In this commit, task B waits for a commit by task A.
3) Task C as the qgroup rescan worker starts its job and starts a
transaction. In this transaction start, task C waits for completion
of the transaction that task A started and task B committed.
This deadlock was found with fstests test case btrfs/115 and a zoned
null_blk device. The test case enables and disables quota, and the
block group reclaim was triggered during the quota disable by chance.
The deadlock was also observed by running quota enable and disable in
parallel with 'btrfs balance' command on regular null_blk devices.
An example report of the deadlock:
[372.469894] INFO: task kworker/u16:6:103 blocked for more than 122 seconds.
[372.479944] Not tainted 5.16.0-rc8 #7
[372.485067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[372.493898] task:kworker/u16:6 state:D stack: 0 pid: 103 ppid: 2 flags:0x00004000
[372.503285] Workqueue: btrfs-qgroup-rescan btrfs_work_helper [btrfs]
[372.510782] Call Trace:
[372.514092] <TASK>
[372.521684] __schedule+0xb56/0x4850
[372.530104] ? io_schedule_timeout+0x190/0x190
[372.538842] ? lockdep_hardirqs_on+0x7e/0x100
[372.547092] ? _raw_spin_unlock_irqrestore+0x3e/0x60
[372.555591] schedule+0xe0/0x270
[372.561894] btrfs_commit_transaction+0x18bb/0x2610 [btrfs]
[372.570506] ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
[372.578875] ? free_unref_page+0x3f2/0x650
[372.585484] ? finish_wait+0x270/0x270
[372.591594] ? release_extent_buffer+0x224/0x420 [btrfs]
[372.599264] btrfs_qgroup_rescan_worker+0xc13/0x10c0 [btrfs]
[372.607157] ? lock_release+0x3a9/0x6d0
[372.613054] ? btrfs_qgroup_account_extent+0xda0/0xda0 [btrfs]
[372.620960] ? do_raw_spin_lock+0x11e/0x250
[372.627137] ? rwlock_bug.part.0+0x90/0x90
[372.633215] ? lock_is_held_type+0xe4/0x140
[372.639404] btrfs_work_helper+0x1ae/0xa90 [btrfs]
[372.646268] process_one_work+0x7e9/0x1320
[372.652321] ? lock_release+0x6d0/0x6d0
[372.658081] ? pwq_dec_nr_in_flight+0x230/0x230
[372.664513] ? rwlock_bug.part.0+0x90/0x90
[372.670529] worker_thread+0x59e/0xf90
[372.676172] ? process_one_work+0x1320/0x1320
[372.682440] kthread+0x3b9/0x490
[372.687550] ? _raw_spin_unlock_irq+0x24/0x50
[372.693811] ? set_kthread_struct+0x100/0x100
[372.700052] ret_from_fork+0x22/0x30
[372.705517] </TASK>
[372.709747] INFO: task btrfs-transacti:2347 blocked for more than 123 seconds.
[372.729827] Not tainted 5.16.0-rc8 #7
[372.745907] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[372.767106] task:btrfs-transacti state:D stack: 0 pid: 2347 ppid: 2 flags:0x00004000
[372.787776] Call Trace:
[372.801652] <TASK>
[372.812961] __schedule+0xb56/0x4850
[372.830011] ? io_schedule_timeout+0x190/0x190
[372.852547] ? lockdep_hardirqs_on+0x7e/0x100
[372.871761] ? _raw_spin_unlock_irqrestore+0x3e/0x60
[372.886792] schedule+0xe0/0x270
[372.901685] wait_current_trans+0x22c/0x310 [btrfs]
[372.919743] ? btrfs_put_transaction+0x3d0/0x3d0 [btrfs]
[372.938923] ? finish_wait+0x270/0x270
[372.959085] ? join_transaction+0xc75/0xe30 [btrfs]
[372.977706] start_transaction+0x938/0x10a0 [btrfs]
[372.997168] transaction_kthread+0x19d/0x3c0 [btrfs]
[373.013021] ? btrfs_cleanup_transaction.isra.0+0xfc0/0xfc0 [btrfs]
[373.031678] kthread+0x3b9/0x490
[373.047420] ? _raw_spin_unlock_irq+0x24/0x50
[373.064645] ? set_kthread_struct+0x100/0x100
[373.078571] ret_from_fork+0x22/0x30
[373.091197] </TASK>
[373.105611] INFO: task btrfs:3145 blocked for more than 123 seconds.
[373.114147] Not tainted 5.16.0-rc8 #7
[373.120401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[373.130393] task:btrfs state:D stack: 0 pid: 3145 ppid: 3141 flags:0x00004000
[373.140998] Call Trace:
[373.145501] <TASK>
[373.149654] __schedule+0xb56/0x4850
[373.155306] ? io_schedule_timeout+0x190/0x190
[373.161965] ? lockdep_hardirqs_on+0x7e/0x100
[373.168469] ? _raw_spin_unlock_irqrestore+0x3e/0x60
[373.175468] schedule+0xe0/0x270
[373.180814] wait_for_commit+0x104/0x150 [btrfs]
[373.187643] ? test_and_set_bit+0x20/0x20 [btrfs]
[373.194772] ? kmem_cache_free+0x124/0x550
[373.201191] ? btrfs_put_transaction+0x69/0x3d0 [btrfs]
[373.208738] ? finish_wait+0x270/0x270
[373.214704] ? __btrfs_end_transaction+0x347/0x7b0 [btrfs]
[373.222342] btrfs_commit_transaction+0x44d/0x2610 [btrfs]
[373.230233] ? join_transaction+0x255/0xe30 [btrfs]
[373.237334] ? btrfs_record_root_in_trans+0x4d/0x170 [btrfs]
[373.245251] ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
[373.253296] relocate_block_group+0x105/0xc20 [btrfs]
[373.260533] ? mutex_lock_io_nested+0x1270/0x1270
[373.267516] ? btrfs_wait_nocow_writers+0x85/0x180 [btrfs]
[373.275155] ? merge_reloc_roots+0x710/0x710 [btrfs]
[373.283602] ? btrfs_wait_ordered_extents+0xd30/0xd30 [btrfs]
[373.291934] ? kmem_cache_free+0x124/0x550
[373.298180] btrfs_relocate_block_group+0x35c/0x930 [btrfs]
[373.306047] btrfs_relocate_chunk+0x85/0x210 [btrfs]
[373.313229] btrfs_balance+0x12f4/0x2d20 [btrfs]
[373.320227] ? lock_release+0x3a9/0x6d0
[373.326206] ? btrfs_relocate_chunk+0x210/0x210 [btrfs]
[373.333591] ? lock_is_held_type+0xe4/0x140
[373.340031] ? rcu_read_lock_sched_held+0x3f/0x70
[373.346910] btrfs_ioctl_balance+0x548/0x700 [btrfs]
[373.354207] btrfs_ioctl+0x7f2/0x71b0 [btrfs]
[373.360774] ? lockdep_hardirqs_on_prepare+0x410/0x410
[373.367957] ? lockdep_hardirqs_on_prepare+0x410/0x410
[373.375327] ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
[373.383841] ? find_held_lock+0x2c/0x110
[373.389993] ? lock_release+0x3a9/0x6d0
[373.395828] ? mntput_no_expire+0xf7/0xad0
[373.402083] ? lock_is_held_type+0xe4/0x140
[373.408249] ? vfs_fileattr_set+0x9f0/0x9f0
[373.414486] ? selinux_file_ioctl+0x349/0x4e0
[373.420938] ? trace_raw_output_lock+0xb4/0xe0
[373.427442] ? selinux_inode_getsecctx+0x80/0x80
[373.434224] ? lockdep_hardirqs_on+0x7e/0x100
[373.440660] ? force_qs_rnp+0x2a0/0x6b0
[373.446534] ? lock_is_held_type+0x9b/0x140
[373.452763] ? __blkcg_punt_bio_submit+0x1b0/0x1b0
[373.459732] ? security_file_ioctl+0x50/0x90
[373.466089] __x64_sys_ioctl+0x127/0x190
[373.472022] do_syscall_64+0x3b/0x90
[373.477513] entry_SYSCALL_64_after_hwframe+0x44/0xae
[373.484823] RIP: 0033:0x7f8f4af7e2bb
[373.490493] RSP: 002b:00007ffcbf936178 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[373.500197] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8f4af7e2bb
[373.509451] RDX: 00007ffcbf936220 RSI: 00000000c4009420 RDI: 0000000000000003
[373.518659] RBP: 00007ffcbf93774a R08: 0000000000000013 R09: 00007f8f4b02d4e0
[373.527872] R10: 00007f8f4ae87740 R11: 0000000000000246 R12: 0000000000000001
[373.537222] R13: 00007ffcbf936220 R14: 0000000000000000 R15: 0000000000000002
[373.546506] </TASK>
[373.550878] INFO: task btrfs:3146 blocked for more than 123 seconds.
[373.559383] Not tainted 5.16.0-rc8 #7
[373.565748] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[373.575748] task:btrfs state:D stack: 0 pid: 3146 ppid: 2168 flags:0x00000000
[373.586314] Call Trace:
[373.590846] <TASK>
[373.595121] __schedule+0xb56/0x4850
[373.600901] ? __lock_acquire+0x23db/0x5030
[373.607176] ? io_schedule_timeout+0x190/0x190
[373.613954] schedule+0xe0/0x270
[373.619157] schedule_timeout+0x168/0x220
[373.625170] ? usleep_range_state+0x150/0x150
[373.631653] ? mark_held_locks+0x9e/0xe0
[373.637767] ? do_raw_spin_lock+0x11e/0x250
[373.643993] ? lockdep_hardirqs_on_prepare+0x17b/0x410
[373.651267] ? _raw_spin_unlock_irq+0x24/0x50
[373.657677] ? lockdep_hardirqs_on+0x7e/0x100
[373.664103] wait_for_completion+0x163/0x250
[373.670437] ? bit_wait_timeout+0x160/0x160
[373.676585] btrfs_quota_disable+0x176/0x9a0 [btrfs]
[373.683979] ? btrfs_quota_enable+0x12f0/0x12f0 [btrfs]
[373.691340] ? down_write+0xd0/0x130
[373.696880] ? down_write_killable+0x150/0x150
[373.703352] btrfs_ioctl+0x3945/0x71b0 [btrfs]
[373.710061] ? find_held_lock+0x2c/0x110
[373.716192] ? lock_release+0x3a9/0x6d0
[373.722047] ? __handle_mm_fault+0x23cd/0x3050
[373.728486] ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
[373.737032] ? set_pte+0x6a/0x90
[373.742271] ? do_raw_spin_unlock+0x55/0x1f0
[373.748506] ? lock_is_held_type+0xe4/0x140
[373.754792] ? vfs_fileattr_set+0x9f0/0x9f0
[373.761083] ? selinux_file_ioctl+0x349/0x4e0
[373.767521] ? selinux_inode_getsecctx+0x80/0x80
[373.774247] ? __up_read+0x182/0x6e0
[373.780026] ? count_memcg_events.constprop.0+0x46/0x60
[373.787281] ? up_write+0x460/0x460
[373.792932] ? security_file_ioctl+0x50/0x90
[373.799232] __x64_sys_ioctl+0x127/0x190
[373.805237] do_syscall_64+0x3b/0x90
[373.810947] entry_SYSCALL_64_after_hwframe+0x44/0xae
[373.818102] RIP: 0033:0x7f1383ea02bb
[373.823847] RSP: 002b:00007fffeb4d71f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[373.833641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1383ea02bb
[373.842961] RDX: 00007fffeb4d7210 RSI: 00000000c0109428 RDI: 0000000000000003
[373.852179] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078
[373.861408] R10: 00007f1383daec78 R11: 0000000000000202 R12: 00007fffeb4d874a
[373.870647] R13: 0000000000493099 R14: 0000000000000001 R15: 0000000000000000
[373.879838] </TASK>
[373.884018]
Showing all locks held in the system:
[373.894250] 3 locks held by kworker/4:1/58:
[373.900356] 1 lock held by khungtaskd/63:
[373.906333] #0: ffffffff8945ff60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260
[373.917307] 3 locks held by kworker/u16:6/103:
[373.923938] #0: ffff888127b4f138 ((wq_completion)btrfs-qgroup-rescan){+.+.}-{0:0}, at: process_one_work+0x712/0x1320
[373.936555] #1: ffff88810b817dd8 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x73f/0x1320
[373.951109] #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_qgroup_rescan_worker+0x1f6/0x10c0 [btrfs]
[373.964027] 2 locks held by less/1803:
[373.969982] #0: ffff88813ed56098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80
[373.981295] #1: ffffc90000b3b2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x9e2/0x1060
[373.992969] 1 lock held by btrfs-transacti/2347:
[373.999893] #0: ffff88813d4887a8 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0xe3/0x3c0 [btrfs]
[374.015872] 3 locks held by btrfs/3145:
[374.022298] #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl_balance+0xc3/0x700 [btrfs]
[374.034456] #1: ffff88813d48a0a0 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_balance+0xfe5/0x2d20 [btrfs]
[374.047646] #2: ffff88813d488838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x354/0x930 [btrfs]
[374.063295] 4 locks held by btrfs/3146:
[374.069647] #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl+0x38b1/0x71b0 [btrfs]
[374.081601] #1: ffff88813d488bb8 (&fs_info->subvol_sem){+.+.}-{3:3}, at: btrfs_ioctl+0x38fd/0x71b0 [btrfs]
[374.094283] #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_disable+0xc8/0x9a0 [btrfs]
[374.106885] #3: ffff88813d489800 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_disable+0xd5/0x9a0 [btrfs]
[374.126780] =============================================
To avoid the deadlock, wait for the qgroup rescan worker to complete
before starting the transaction for the quota disable ioctl. Clear
BTRFS_FS_QUOTA_ENABLE flag before the wait and the transaction to
request the worker to complete. On transaction start failure, set the
BTRFS_FS_QUOTA_ENABLE flag again. These BTRFS_FS_QUOTA_ENABLE flag
changes can be done safely since the function btrfs_quota_disable is not
called concurrently because of fs_info->subvol_sem.
Also check the BTRFS_FS_QUOTA_ENABLE flag in qgroup_rescan_init to avoid
another qgroup rescan worker to start after the previous qgroup worker
completed.
CC: stable@vger.kernel.org # 5.4+
Suggested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ee12595147ac1fbfb5bcb23837e26dd58d94b15d upstream.
This code calls fd_install() which gives the userspace access to the fd.
Then if copy_info_records_to_user() fails it calls put_unused_fd(fd) but
that will not release it and leads to a stale entry in the file
descriptor table.
Generally you can't trust the fd after a call to fd_install(). The fix
is to delay the fd_install() until everything else has succeeded.
Fortunately it requires CAP_SYS_ADMIN to reach this code so the security
impact is less.
Fixes: f644bc449b37 ("fanotify: fix copy_event_to_user() fid error clean up")
Link: https://lore.kernel.org/r/20220128195656.GA26981@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a37d9a17f099072fe4d3a9048b0321978707a918 upstream.
Apparently, there are some applications |