summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2025-10-12f2fs: fix to do sanity check on node footer for non inode dnodeChao Yu5-23/+46
commit c18ecd99e0c707ef8f83cace861cbc3162f4fdf1 upstream. As syzbot reported below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/file.c:1243! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5354 Comm: syz.0.0 Not tainted 6.17.0-rc1-syzkaller-00211-g90d970cade8e #0 PREEMPT(full) RIP: 0010:f2fs_truncate_hole+0x69e/0x6c0 fs/f2fs/file.c:1243 Call Trace: <TASK> f2fs_punch_hole+0x2db/0x330 fs/f2fs/file.c:1306 f2fs_fallocate+0x546/0x990 fs/f2fs/file.c:2018 vfs_fallocate+0x666/0x7e0 fs/open.c:342 ksys_fallocate fs/open.c:366 [inline] __do_sys_fallocate fs/open.c:371 [inline] __se_sys_fallocate fs/open.c:369 [inline] __x64_sys_fallocate+0xc0/0x110 fs/open.c:369 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f1e65f8ebe9 w/ a fuzzed image, f2fs may encounter panic due to it detects inconsistent truncation range in direct node in f2fs_truncate_hole(). The root cause is: a non-inode dnode may has the same footer.ino and footer.nid, so the dnode will be parsed as an inode, then ADDRS_PER_PAGE() may return wrong blkaddr count which may be 923 typically, by chance, dn.ofs_in_node is equal to 923, then count can be calculated to 0 in below statement, later it will trigger panic w/ f2fs_bug_on(, count == 0 || ...). count = min(end_offset - dn.ofs_in_node, pg_end - pg_start); This patch introduces a new node_type NODE_TYPE_NON_INODE, then allowing passing the new_type to sanity_check_node_footer in f2fs_get_node_folio() to detect corruption that a non-inode dnode has the same footer.ino and footer.nid. Scripts to reproduce: mkfs.f2fs -f /dev/vdb mount /dev/vdb /mnt/f2fs touch /mnt/f2fs/foo touch /mnt/f2fs/bar dd if=/dev/zero of=/mnt/f2fs/foo bs=1M count=8 umount /mnt/f2fs inject.f2fs --node --mb i_nid --nid 4 --idx 0 --val 5 /dev/vdb mount /dev/vdb /mnt/f2fs xfs_io /mnt/f2fs/foo -c "fpunch 6984k 4k" Cc: stable@kernel.org Reported-by: syzbot+b9c7ffd609c3f09416ab@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/68a68e27.050a0220.1a3988.0002.GAE@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-28Merge tag 'mm-hotfixes-stable-2025-09-27-22-35' of ↵Linus Torvalds2-4/+9
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "7 hotfixes. 4 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 6 of these fixes are for MM. All singletons, please see the changelogs for details" * tag 'mm-hotfixes-stable-2025-09-27-22-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: include/linux/pgtable.h: convert arch_enter_lazy_mmu_mode() and friends to static inlines mm/damon/sysfs: do not ignore callback's return value in damon_sysfs_damon_call() mailmap: add entry for Bence Csókás fs/proc/task_mmu: check p->vec_buf for NULL kmsan: fix out-of-bounds access to shadow memory mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count mm/hugetlb: fix folio is still mapped when deleted
2025-09-26Merge tag 'v6.17rc7-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds1-12/+89
Pull smb client fixes from Steve French: - Fix unlink bug - Fix potential out of bounds access in processing compound requests * tag 'v6.17rc7-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix wrong index reference in smb2_compound_op() smb: client: handle unlink(2) of files open by different clients
2025-09-26Merge tag 'vfs-6.17-rc8.fixes' of ↵Linus Torvalds10-16/+50
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Prevent double unlock in netfs - Fix a NULL pointer dereference in afs_put_server() - Fix a reference leak in netfs * tag 'vfs-6.17-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: netfs: fix reference leak afs: Fix potential null pointer dereference in afs_put_server netfs: Prevent duplicate unlocking
2025-09-26smb: client: fix wrong index reference in smb2_compound_op()Sang-Heon Jeon1-1/+1
In smb2_compound_op(), the loop that processes each command's response uses wrong indices when accessing response bufferes. This incorrect indexing leads to improper handling of command results. Also, if incorrectly computed index is greather than or equal to MAX_COMPOUND, it can cause out-of-bounds accesses. Fixes: 3681c74d342d ("smb: client: handle lack of EA support in smb2_query_path_info()") # 6.14 Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-26netfs: fix reference leakMax Kellermann8-14/+47
Commit 20d72b00ca81 ("netfs: Fix the request's work item to not require a ref") modified netfs_alloc_request() to initialize the reference counter to 2 instead of 1. The rationale was that the requet's "work" would release the second reference after completion (via netfs_{read,write}_collection_worker()). That works most of the time if all goes well. However, it leaks this additional reference if the request is released before the I/O operation has been submitted: the error code path only decrements the reference counter once and the work item will never be queued because there will never be a completion. This has caused outages of our whole server cluster today because tasks were blocked in netfs_wait_for_outstanding_io(), leading to deadlocks in Ceph (another bug that I will address soon in another patch). This was caused by a netfs_pgpriv2_begin_copy_to_cache() call which failed in fscache_begin_write_operation(). The leaked netfs_io_request was never completed, leaving `netfs_inode.io_count` with a positive value forever. All of this is super-fragile code. Finding out which code paths will lead to an eventual completion and which do not is hard to see: - Some functions like netfs_create_write_req() allocate a request, but will never submit any I/O. - netfs_unbuffered_read_iter_locked() calls netfs_unbuffered_read() and then netfs_put_request(); however, netfs_unbuffered_read() can also fail early before submitting the I/O request, therefore another netfs_put_request() call must be added there. A rule of thumb is that functions that return a `netfs_io_request` do not submit I/O, and all of their callers must be checked. For my taste, the whole netfs code needs an overhaul to make reference counting easier to understand and less fragile & obscure. But to fix this bug here and now and produce a patch that is adequate for a stable backport, I tried a minimal approach that quickly frees the request object upon early failure. I decided against adding a second netfs_put_request() each time because that would cause code duplication which obscures the code further. Instead, I added the function netfs_put_failed_request() which frees such a failed request synchronously under the assumption that the reference count is exactly 2 (as initially set by netfs_alloc_request() and never touched), verified by a WARN_ON_ONCE(). It then deinitializes the request object (without going through the "cleanup_work" indirection) and frees the allocation (with RCU protection to protect against concurrent access by netfs_requests_seq_start()). All code paths that fail early have been changed to call netfs_put_failed_request() instead of netfs_put_request(). Additionally, I have added a netfs_put_request() call to netfs_unbuffered_read() as explained above because the netfs_put_failed_request() approach does not work there. Fixes: 20d72b00ca81 ("netfs: Fix the request's work item to not require a ref") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Paulo Alcantara <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-25fs/proc/task_mmu: check p->vec_buf for NULLJakub Acs1-0/+3
When the PAGEMAP_SCAN ioctl is invoked with vec_len = 0 reaches pagemap_scan_backout_range(), kernel panics with null-ptr-deref: [ 44.936808] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 44.937797] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 44.938391] CPU: 1 UID: 0 PID: 2480 Comm: reproducer Not tainted 6.17.0-rc6 #22 PREEMPT(none) [ 44.939062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 44.939935] RIP: 0010:pagemap_scan_thp_entry.isra.0+0x741/0xa80 <snip registers, unreliable trace> [ 44.946828] Call Trace: [ 44.947030] <TASK> [ 44.949219] pagemap_scan_pmd_entry+0xec/0xfa0 [ 44.952593] walk_pmd_range.isra.0+0x302/0x910 [ 44.954069] walk_pud_range.isra.0+0x419/0x790 [ 44.954427] walk_p4d_range+0x41e/0x620 [ 44.954743] walk_pgd_range+0x31e/0x630 [ 44.955057] __walk_page_range+0x160/0x670 [ 44.956883] walk_page_range_mm+0x408/0x980 [ 44.958677] walk_page_range+0x66/0x90 [ 44.958984] do_pagemap_scan+0x28d/0x9c0 [ 44.961833] do_pagemap_cmd+0x59/0x80 [ 44.962484] __x64_sys_ioctl+0x18d/0x210 [ 44.962804] do_syscall_64+0x5b/0x290 [ 44.963111] entry_SYSCALL_64_after_hwframe+0x76/0x7e vec_len = 0 in pagemap_scan_init_bounce_buffer() means no buffers are allocated and p->vec_buf remains set to NULL. This breaks an assumption made later in pagemap_scan_backout_range(), that page_region is always allocated for p->vec_buf_index. Fix it by explicitly checking p->vec_buf for NULL before dereferencing. Other sites that might run into same deref-issue are already (directly or transitively) protected by checking p->vec_buf. Note: From PAGEMAP_SCAN man page, it seems vec_len = 0 is valid when no output is requested and it's only the side effects caller is interested in, hence it passes check in pagemap_scan_get_args(). This issue was found by syzkaller. Link: https://lkml.kernel.org/r/20250922082206.6889-1-acsjakub@amazon.de Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: Jakub Acs <acsjakub@amazon.de> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jinjiang Tu <tujinjiang@huawei.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Penglei Jiang <superman.xpt@gmail.com> Cc: Mark Brown <broonie@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-25mm/hugetlb: fix folio is still mapped when deletedJinjiang Tu1-4/+6
Migration may be raced with fallocating hole. remove_inode_single_folio will unmap the folio if the folio is still mapped. However, it's called without folio lock. If the folio is migrated and the mapped pte has been converted to migration entry, folio_mapped() returns false, and won't unmap it. Due to extra refcount held by remove_inode_single_folio, migration fails, restores migration entry to normal pte, and the folio is mapped again. As a result, we triggered BUG in filemap_unaccount_folio. The log is as follows: BUG: Bad page cache in process hugetlb pfn:156c00 page: refcount:515 mapcount:0 mapping:0000000099fef6e1 index:0x0 pfn:0x156c00 head: order:9 mapcount:1 entire_mapcount:1 nr_pages_mapped:0 pincount:0 aops:hugetlbfs_aops ino:dcc dentry name(?):"my_hugepage_file" flags: 0x17ffffc00000c1(locked|waiters|head|node=0|zone=2|lastcpupid=0x1fffff) page_type: f4(hugetlb) page dumped because: still mapped when deleted CPU: 1 UID: 0 PID: 395 Comm: hugetlb Not tainted 6.17.0-rc5-00044-g7aac71907bde-dirty #484 NONE Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x4f/0x70 filemap_unaccount_folio+0xc4/0x1c0 __filemap_remove_folio+0x38/0x1c0 filemap_remove_folio+0x41/0xd0 remove_inode_hugepages+0x142/0x250 hugetlbfs_fallocate+0x471/0x5a0 vfs_fallocate+0x149/0x380 Hold folio lock before checking if the folio is mapped to avold race with migration. Link: https://lkml.kernel.org/r/20250912074139.3575005-1-tujinjiang@huawei.com Fixes: 4aae8d1c051e ("mm/hugetlbfs: unmap pages if page fault raced with hole punch") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-25afs: Fix potential null pointer dereference in afs_put_serverZhen Ni1-1/+2
afs_put_server() accessed server->debug_id before the NULL check, which could lead to a null pointer dereference. Move the debug_id assignment, ensuring we never dereference a NULL server pointer. Fixes: 2757a4dc1849 ("afs: Fix access after dec in put functions") Cc: stable@vger.kernel.org Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> Acked-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-24Merge tag 'for-6.17-rc7-tag' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One more regression fix for a problem in zoned mode: mounting would fail if the number of open and active zones reached a common limit that didn't use to be checked" * tag 'for-6.17-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: don't fail mount needlessly due to too many active zones
2025-09-23btrfs: zoned: don't fail mount needlessly due to too many active zonesJohannes Thumshirn1-0/+6
Previously BTRFS did not look at a device's reported max_open_zones limit, but starting with commit 04147d8394e8 ("btrfs: zoned: limit active zones to max_open_zones"), zoned BTRFS limited the number of concurrently used block-groups to the number of max_open_zones a device reported, if it hadn't already reported a number of max_active_zones. Starting with commit 04147d8394e8 the number of open zones is treated the same way as active zones. But this leads to mount failures on filesystems which have been used before 04147d8394e8 because too many zones are in an open state. Ignore the new limitations on these filesystems, so zones can be finished or evacuated. Reported-by: Yuwei Han <hrx@bupt.moe> Link: https://lore.kernel.org/all/2F48A90AF7DDF380+1790bcfd-cb6f-456b-870d-7982f21b5eae@bupt.moe/ Fixes: 04147d8394e8 ("btrfs: zoned: limit active zones to max_open_zones") Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-21smb: client: handle unlink(2) of files open by different clientsPaulo Alcantara1-11/+88
In order to identify whether a certain file is open by a different client, start the unlink process by sending a compound request of CREATE(DELETE_ON_CLOSE) + CLOSE with only FILE_SHARE_DELETE bit set in smb2_create_req::ShareAccess. If the file is currently open, then the server will fail the request with STATUS_SHARING_VIOLATION, in which case we'll map it to -EBUSY, so __cifs_unlink() will fall back to silly-rename the file. This fixes the following case where open(O_CREAT) fails with -ENOENT (STATUS_DELETE_PENDING) due to file still open by a different client. * Before patch $ mount.cifs //srv/share /mnt/1 -o ...,nosharesock $ mount.cifs //srv/share /mnt/2 -o ...,nosharesock $ cd /mnt/1 $ touch foo $ exec 3<>foo $ cd /mnt/2 $ rm foo $ touch foo touch: cannot touch 'foo': No such file or directory $ exec 3>&- * After patch $ mount.cifs //srv/share /mnt/1 -o ...,nosharesock $ mount.cifs //srv/share /mnt/2 -o ...,nosharesock $ cd /mnt/1 $ touch foo $ exec 3<>foo $ cd /mnt/2 $ rm foo $ touch foo $ exec 3>&- Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: David Howells <dhowells@redhat.com> Cc: Frank Sorenson <sorenson@redhat.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-21smb: server: use disable_work_sync in transport_rdma.cStefan Metzmacher1-3/+3
This makes it safer during the disconnect and avoids requeueing. It's ok to call disable_work[_sync]() more than once. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-21smb: server: don't use delayed_work for post_recv_credits_workStefan Metzmacher1-10/+8
If we are using a hardcoded delay of 0 there's no point in using delayed_work it only adds confusion. The client also uses a normal work_struct and now it is easier to move it to the common smbdirect_socket. Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-20Merge tag 'for-6.17-rc6-tag' of ↵Linus Torvalds5-21/+43
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull a few more btrfs fixes from David Sterba: - in tree-checker, fix wrong size of check for inode ref item - in ref-verify, handle combination of mount options that allow partially damaged extent tree (reported by syzbot) - additional validation of compression mount option to catch invalid string as level * tag 'for-6.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: reject invalid compression level btrfs: ref-verify: handle damaged extent root tree btrfs: tree-checker: fix the incorrect inode ref size check
2025-09-19Merge tag '6.17-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds4-34/+64
Pull smb client fixes from Steve French: - Two unlink fixes: one for rename and one for deferred close - Four smbdirect/RDMA fixes: fix buffer leak in negotiate, two fixes for races in smbd_destroy, fix offset and length checks in recv_done * tag '6.17-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix smbdirect_recv_io leak in smbd_negotiate() error path smb: client: fix file open check in __cifs_unlink() smb: client: let smbd_destroy() call disable_work_sync(&info->post_send_credits_work) smb: client: use disable[_delayed]_work_sync in smbdirect.c smb: client: fix filename matching of deferred files smb: client: let recv_done verify data_offset, data_length and remaining_data_length
2025-09-18smb: client: fix smbdirect_recv_io leak in smbd_negotiate() error pathStefan Metzmacher1-1/+3
During tests of another unrelated patch I was able to trigger this error: Objects remaining on __kmem_cache_shutdown() Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-18smb: client: fix file open check in __cifs_unlink()Paulo Alcantara1-2/+15
Fix the file open check to decide whether or not silly-rename the file in SMB2+. Fixes: c5ea3065586d ("smb: client: fix data loss due to broken rename(2)") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: Frank Sorenson <sorenson@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-18btrfs: reject invalid compression levelQu Wenruo3-18/+33
Inspired by recent changes to compression level parsing in 6db1df415d73fc ("btrfs: accept and ignore compression level for lzo") it turns out that we do not do any extra validation for compression level input string, thus allowing things like "compress=lzo:invalid" to be accepted without warnings. Although we accept levels that are beyond the supported algorithm ranges, accepting completely invalid level specification is not correct. Fix the too loose checks for compression level, by doing proper error handling of kstrtoint(), so that we will reject not only too large values (beyond int range) but also completely wrong levels like "lzo:invalid". Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-17Merge tag 'mm-hotfixes-stable-2025-09-17-21-10' of ↵Linus Torvalds2-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "15 hotfixes. 11 are cc:stable and the remainder address post-6.16 issues or aren't considered necessary for -stable kernels. 13 of these fixes are for MM. The usual shower of singletons, plus - fixes from Hugh to address various misbehaviors in get_user_pages() - patches from SeongJae to address a quite severe issue in DAMON - another series also from SeongJae which completes some fixes for a DAMON startup issue" * tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: zram: fix slot write race condition nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/* samples/damon/mtier: avoid starting DAMON before initialization samples/damon/prcl: avoid starting DAMON before initialization samples/damon/wsse: avoid starting DAMON before initialization MAINTAINERS: add Lance Yang as a THP reviewer MAINTAINERS: add Jann Horn as rmap reviewer mm/damon/sysfs: use dynamically allocated repeat mode damon_call_control mm/damon/core: introduce damon_call_control->dealloc_on_cancel mm: folio_may_be_lru_cached() unless folio_test_large() mm: revert "mm: vmscan.c: fix OOM on swap stress test" mm: revert "mm/gup: clear the LRU flag of a page before adding to LRU batch" mm/gup: local lru_add_drain() to avoid lru_add_drain_all() mm/gup: check ref_count instead of lru before migration
2025-09-18btrfs: ref-verify: handle damaged extent root treeDavid Sterba1-1/+8
Syzbot hits a problem with enabled ref-verify, ignorebadroots and a fuzzed/damaged extent tree. There's no fallback option like in other places that can deal with it so disable the whole ref-verify as it is just a debugging feature. Reported-by: syzbot+9c3e0cdfbfe351b0bc0e@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/0000000000001b6052062139be1c@google.com/ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-18btrfs: tree-checker: fix the incorrect inode ref size checkQu Wenruo1-2/+2
[BUG] Inside check_inode_ref(), we need to make sure every structure, including the btrfs_inode_extref header, is covered by the item. But our code is incorrectly using "sizeof(iref)", where @iref is just a pointer. This means "sizeof(iref)" will always be "sizeof(void *)", which is much smaller than "sizeof(struct btrfs_inode_extref)". This will allow some bad inode extrefs to sneak in, defeating tree-checker. [FIX] Fix the typo by calling "sizeof(*iref)", which is the same as "sizeof(struct btrfs_inode_extref)", and will be the correct behavior we want. Fixes: 71bf92a9b877 ("btrfs: tree-checker: Add check for INODE_REF") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-17smb: client: let smbd_destroy() call ↵Stefan Metzmacher1-0/+3
disable_work_sync(&info->post_send_credits_work) In smbd_destroy() we may destroy the memory so we better wait until post_send_credits_work is no longer pending and will never be started again. I actually just hit the case using rxe: WARNING: CPU: 0 PID: 138 at drivers/infiniband/sw/rxe/rxe_verbs.c:1032 rxe_post_recv+0x1ee/0x480 [rdma_rxe] ... [ 5305.686979] [ T138] smbd_post_recv+0x445/0xc10 [cifs] [ 5305.687135] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687149] [ T138] ? __kasan_check_write+0x14/0x30 [ 5305.687185] [ T138] ? __pfx_smbd_post_recv+0x10/0x10 [cifs] [ 5305.687329] [ T138] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 5305.687356] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687368] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687378] [ T138] ? _raw_spin_unlock_irqrestore+0x11/0x60 [ 5305.687389] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687399] [ T138] ? get_receive_buffer+0x168/0x210 [cifs] [ 5305.687555] [ T138] smbd_post_send_credits+0x382/0x4b0 [cifs] [ 5305.687701] [ T138] ? __pfx_smbd_post_send_credits+0x10/0x10 [cifs] [ 5305.687855] [ T138] ? __pfx___schedule+0x10/0x10 [ 5305.687865] [ T138] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 5305.687875] [ T138] ? queue_delayed_work_on+0x8e/0xa0 [ 5305.687889] [ T138] process_one_work+0x629/0xf80 [ 5305.687908] [ T138] ? srso_alias_return_thunk+0x5/0xfbef5 [ 5305.687917] [ T138] ? __kasan_check_write+0x14/0x30 [ 5305.687933] [ T138] worker_thread+0x87f/0x1570 ... It means rxe_post_recv was called after rdma_destroy_qp(). This happened because put_receive_buffer() was triggered by ib_drain_qp() and called: queue_work(info->workqueue, &info->post_send_credits_work); Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-17smb: client: use disable[_delayed]_work_sync in smbdirect.cStefan Metzmacher1-3/+3
This makes it safer during the disconnect and avoids requeueing. It's ok to call disable[delayed_]work[_sync]() more than once. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: 050b8c374019 ("smbd: Make upper layer decide when to destroy the transport") Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Fixes: c7398583340a ("CIFS: SMBD: Implement RDMA memory registration") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-17smb: client: fix filename matching of deferred filesPaulo Alcantara3-27/+21
Fix the following case where the client would end up closing both deferred files (foo.tmp & foo) after unlink(foo) due to strstr() call in cifs_close_deferred_file_under_dentry(): fd1 = openat(AT_FDCWD, "foo", O_WRONLY|O_CREAT|O_TRUNC, 0666); fd2 = openat(AT_FDCWD, "foo.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666); close(fd1); close(fd2); unlink("foo"); Fixes: e3fc065682eb ("cifs: Deferred close performance improvements") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Cc: Frank Sorenson <sorenson@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-17smb: client: let recv_done verify data_offset, data_length and ↵Stefan Metzmacher1-1/+19
remaining_data_length This is inspired by the related server fixes. Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-17Merge tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds1-58/+125
Pull smb server fixes from Steve French: - Two fixes for remaining_data_length and offset checks in receive path - Don't go over max SGEs which caused smbdirect send to fail (and trigger disconnect) * tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_size ksmbd: smbdirect: validate data_offset and data_length field of smb_direct_data_transfer smb: server: let smb_direct_writev() respect SMB_DIRECT_MAX_SEND_SGES
2025-09-17Merge tag 'for-6.17-rc6-tag' of ↵Linus Torvalds5-12/+15
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - in zoned mode, turn assertion to proper code when reserving space in relocation block group - fix search key of extended ref (hardlink) when replaying log - fix initialization of file extent tree on filesystems without no-holes feature - add harmless data race annotation to block group comparator * tag 'for-6.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: annotate block group access with data_race() when sorting for reclaim btrfs: initialize inode::file_extent_tree after i_mode has been set btrfs: zoned: fix incorrect ASSERT in btrfs_zoned_reserve_data_reloc_bg() btrfs: fix invalid extref key setup when replaying dentry
2025-09-15netfs: Prevent duplicate unlockingLizhi Xu1-1/+1
The filio lock has been released here, so there is no need to jump to error_folio_unlock to release it again. Reported-by: syzbot+b73c7d94a151e2ee1e9b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b73c7d94a151e2ee1e9b Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Acked-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15btrfs: annotate block group access with data_race() when sorting for reclaimFilipe Manana1-1/+8
When sorting the block group list for reclaim we are using a block group's used bytes counter without taking the block group's spinlock, so we can race with a concurrent task updating it (at btrfs_update_block_group()), which makes tools like KCSAN unhappy and report a race. Since the sorting is not strictly needed from a functional perspective and such races should rarely cause any ordering changes (only load/store tearing could cause them), not to mention that after the sorting the ordering may no longer be accurate due to concurrent allocations and deallocations of extents in a block group, annotate the accesses to the used counter with data_race() to silence KCSAN and similar tools. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-15btrfs: initialize inode::file_extent_tree after i_mode has been setaustinchang2-9/+5
btrfs_init_file_extent_tree() uses S_ISREG() to determine if the file is a regular file. In the beginning of btrfs_read_locked_inode(), the i_mode hasn't been read from inode item, then file_extent_tree won't be used at all in volumes without NO_HOLES. Fix this by calling btrfs_init_file_extent_tree() after i_mode is initialized in btrfs_read_locked_inode(). Fixes: 3d7db6e8bd22e6 ("btrfs: don't allocate file extent tree for non regular files") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: austinchang <austinchang@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-15btrfs: zoned: fix incorrect ASSERT in btrfs_zoned_reserve_data_reloc_bg()Johannes Thumshirn1-1/+1
When moving a block-group to the dedicated data relocation space-info in btrfs_zoned_reserve_data_reloc_bg() it is asserted that the newly created block group for data relocation does not contain any zone_unusable bytes. But on disks with zone_capacity < zone_size, the difference between zone_size and zone_capacity is accounted as zone_unusable. Instead of asserting that the block-group does not contain any zone_unusable bytes, remove them from the block-groups total size. Reported-by: Yi Zhang <yi.zhang@redhat.com> Link: https://lore.kernel.org/linux-block/CAHj4cs8-cS2E+-xQ-d2Bj6vMJZ+CwT_cbdWBTju4BV35LsvEYw@mail.gmail.com/ Fixes: daa0fde322350 ("btrfs: zoned: fix data relocation block group reservation") Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-15btrfs: fix invalid extref key setup when replaying dentryFilipe Manana1-1/+1
The offset for an extref item's key is not the object ID of the parent dir, otherwise we would not need the extref item and would use plain ref items. Instead the offset is the result of a hash computation that uses the object ID of the parent dir and the name associated to the entry. So fix this by setting the key offset at replay_one_name() to be the result of calling btrfs_extref_hash(). Fixes: 725af92a6251 ("btrfs: Open-code name_in_log_ref in replay_one_name") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-14ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_sizeStefan Metzmacher1-1/+10
This is inspired by the check for data_offset + data_length. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: stable@vger.kernel.org Fixes: 2ea086e35c3d ("ksmbd: add buffer validation for smb direct") Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-14ksmbd: smbdirect: validate data_offset and data_length field of ↵Namjae Jeon1-8/+9
smb_direct_data_transfer If data_offset and data_length of smb_direct_data_transfer struct are invalid, out of bounds issue could happen. This patch validate data_offset and data_length field in recv_done. Cc: stable@vger.kernel.org Fixes: 2ea086e35c3d ("ksmbd: add buffer validation for smb direct") Reviewed-by: Stefan Metzmacher <metze@samba.org> Reported-by: Luigino Camastra, Aisle Research <luigino.camastra@aisle.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-14smb: server: let smb_direct_writev() respect SMB_DIRECT_MAX_SEND_SGESStefan Metzmacher1-50/+107
We should not use more sges for ib_post_send() than we told the rdma device in rdma_create_qp()! Otherwise ib_post_send() will return -EINVAL, so we disconnect the connection. Or with the current siw.ko we'll get 0 from ib_post_send(), but will never ever get a completion for the request. I've already sent a fix for siw.ko... So we need to make sure smb_direct_writev() limits the number of vectors we pass to individual smb_direct_post_send_data() calls, so that we don't go over the queue pair limits. Commit 621433b7e25d ("ksmbd: smbd: relax the count of sges required") was very strange and I guess only needed because SMB_DIRECT_MAX_SEND_SGES was 8 at that time. It basically removed the check that the rdma device is able to handle the number of sges we try to use. While the real problem was added by commit ddbdc861e37c ("ksmbd: smbd: introduce read/write credits for RDMA read/write") as it used the minumun of device->attrs.max_send_sge and device->attrs.max_sge_rd, with the problem that device->attrs.max_sge_rd is always 1 for iWarp. And that limitation should only apply to RDMA Read operations. For now we keep that limitation for RDMA Write operations too, fixing that is a task for another day as it's not really required a bug fix. Commit 2b4eeeaa9061 ("ksmbd: decrease the number of SMB3 smbdirect server SGEs") lowered SMB_DIRECT_MAX_SEND_SGES to 6, which is also used by our client code. And that client code enforces device->attrs.max_send_sge >= 6 since commit d2e81f92e5b7 ("Decrease the number of SMB3 smbdirect client SGEs") and (briefly looking) only the i40w driver provides only 3, see I40IW_MAX_WQ_FRAGMENT_COUNT. But currently we'd require 4 anyway, so that would not work anyway, but now it fails early. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Hyunchul Lee <hyc.lee@gmail.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: linux-rdma@vger.kernel.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Fixes: ddbdc861e37c ("ksmbd: smbd: introduce read/write credits for RDMA read/write") Fixes: 621433b7e25d ("ksmbd: smbd: relax the count of sges required") Fixes: 2b4eeeaa9061 ("ksmbd: decrease the number of SMB3 smbdirect server SGEs") Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-09-14Merge tag 'x86-urgent-2025-09-14' of ↵Linus Torvalds3-7/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix a CPU topology parsing bug on AMD guests, and address a lockdep warning in the resctrl filesystem" * tag 'x86-urgent-2025-09-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Eliminate false positive lockdep warning when reading SNC counters x86/cpu/topology: Always try cpu_parse_topology_ext() on AMD/Hygon
2025-09-13Merge tag 'erofs-for-6.17-rc6-fixes' of ↵Linus Torvalds5-36/+65
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Fix invalid algorithm dereference in encoded extents - Add missing dax_break_layout_final(), since recent FSDAX fixes didn't cover EROFS - Arrange long xattr name prefixes more properly * tag 'erofs-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix long xattr name prefix placement erofs: fix runtime warning on truncate_folio_batch_exceptionals() erofs: fix invalid algorithm for encoded extents
2025-09-13nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/*Nathan Chancellor2-6/+6
When accessing one of the files under /sys/fs/nilfs2/features when CONFIG_CFI_CLANG is enabled, there is a CFI violation: CFI failure at kobj_attr_show+0x59/0x80 (target: nilfs_feature_revision_show+0x0/0x30; expected type: 0xfc392c4d) ... Call Trace: <TASK> sysfs_kf_seq_show+0x2a6/0x390 ? __cfi_kobj_attr_show+0x10/0x10 kernfs_seq_show+0x104/0x15b seq_read_iter+0x580/0xe2b ... When the kobject of the kset for /sys/fs/nilfs2 is initialized, its ktype is set to kset_ktype, which has a ->sysfs_ops of kobj_sysfs_ops. When nilfs_feature_attr_group is added to that kobject via sysfs_create_group(), the kernfs_ops of each files is sysfs_file_kfops_rw, which will call sysfs_kf_seq_show() when ->seq_show() is called. sysfs_kf_seq_show() in turn calls kobj_attr_show() through ->sysfs_ops->show(). kobj_attr_show() casts the provided attribute out to a 'struct kobj_attribute' via container_of() and calls ->show(), resulting in the CFI violation since neither nilfs_feature_revision_show() nor nilfs_feature_README_show() match the prototype of ->show() in 'struct kobj_attribute'. Resolve the CFI violation by adjusting the second parameter in nilfs_feature_{revision,README}_show() from 'struct attribute' to 'struct kobj_attribute' to match the expected prototype. Link: https://lkml.kernel.org/r/20250906144410.22511-1-konishi.ryusuke@gmail.com Fixes: aebe17f68444 ("nilfs2: add /sys/fs/nilfs2/features group") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509021646.bc78d9ef-lkp@intel.com/ Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13Merge tag 'ceph-for-6.17-rc6' of https://github.com/ceph/ceph-clientLinus Torvalds7-123/+219
Pull ceph fixes from Ilya Dryomov: "A fix for a race condition around r_parent tracking that took a long time to track down from Alex and some fixes for potential crashes on accessing invalid memory from Max and myself. All marked for stable" * tag 'ceph-for-6.17-rc6' of https://github.com/ceph/ceph-client: libceph: fix invalid accesses to ceph_connection_v1_info ceph: fix crash after fscrypt_encrypt_pagecache_blocks() error ceph: always call ceph_shift_unused_folios_left() ceph: fix race condition where r_parent becomes stale before sending message ceph: fix race condition validating r_parent before applying state
2025-09-13Merge tag 'driver-core-6.17-rc6' of ↵Linus Torvalds1-20/+38
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core fixes from Danilo Krummrich: - Fix UAF in cgroup pressure polling by using kernfs_get_active_of() to prevent operations on released file descriptors - Fix unresolved intra-doc link in the documentation of struct Device when CONFIG_DRM != y - Update the DMA Rust MAINTAINERS entry * tag 'driver-core-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: MAINTAINERS: Update the DMA Rust entry kernfs: Fix UAF in polling when open file is released rust: device: fix unresolved link to drm::Device
2025-09-12Merge tag 'v6.17-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds8-75/+293
Pull smb client fixes from Steve French: "Two smb3 client fixes, both for stable: - Fix encryption problem with multiple compounded ops - Fix rename error cases that could lead to data corruption" * tag 'v6.17-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix data loss due to broken rename(2) smb: client: fix compound alignment with encryption
2025-09-12erofs: fix long xattr name prefix placementGao Xiang3-6/+16
Currently, xattr name prefixes are forcibly placed into the packed inode if the fragments feature is enabled, and users have no option to put them in plain form directly on disk. This is inflexible. First, as mentioned above, users should be able to store unwrapped long xattr name prefixes unconditionally (COMPAT_PLAIN_XATTR_PFX). Second, since we now have the new metabox inode to store metadata, it should be used when available instead of the packed inode. Fixes: 414091322c63 ("erofs: implement metadata compression") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-09-11Merge tag 'for-6.17-rc5-tag' of ↵Linus Torvalds5-16/+56
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull