summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
9 dayssysfs: don't remove existing directory on update failureGreg Kroah-Hartman1-1/+1
commit 237557b8a81ab948e8332f7c0058e758f081c0a3 upstream. When sysfs_update_group() is called for a named group and create_files() fails (e.g. -ENOMEM), internal_create_group() calls kernfs_remove(kn) on the group directory. In the update path, kn was obtained via kernfs_find_and_get() and refers to a directory that already existed before this call. Removing it silently destroys a sysfs group that the caller did not create. Only remove the directory if we created it ourselves. On update failure the directory remains as it is left empty by remove_files() inside create_files(), but can be repopulated by a retry. Cc: Rajat Jain <rajatja@google.com> Fixes: c855cf2759d2 ("sysfs: Fix internal_create_group() for named group updates") Cc: stable <stable@kernel.org> Assisted-by: gkh_clanker_t1000 Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Link: https://patch.msgid.link/2026052003-uniquely-hastily-c093@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 dayssmb: client: reject userspace cifs.spnego descriptionsAsim Viladi Oglu Manizada1-0/+16
commit 3da1fdf4efbc490041eb4f836bf596201203f8f2 upstream. cifs.spnego key descriptions contain authority-bearing fields such as pid, uid, creduid, and upcall_target that cifs.upcall treats as kernel-originating inputs. However, userspace can also create keys of this type through request_key(2) or add_key(2), allowing those fields to be supplied without CIFS origin. Only accept cifs.spnego descriptions while CIFS is using its private spnego_cred to request the key. Fixes: f1d662a7d5e5 ("[CIFS] Add upcall files for cifs to use spnego/kerberos") Assisted-by: avom-custom-harness:gpt-5.5-qwen3.6-mod-mix Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Asim Viladi Oglu Manizada <manizada@pm.me> Signed-off-by: Steve French <stfrench@microsoft.com> [Salvatore Bonaccorso: Apply changes to fs/cifs/cifs_spnego.c instead of fs/smb/client/cifs_spnego.c before 38c8a9a52082 ("smb: move client and server files to common directory fs/smb") in v6.4-rc1 and backported to v6.1.36] Signed-off-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysceph: fix a buffer leak in __ceph_setxattr()Viacheslav Dubeyko1-0/+1
commit 5d3cc36b4e77a27ce7b686b7c59c7072bcb3fa8e upstream. The old_blob in __ceph_setxattr() can store ci->i_xattrs.prealloc_blob value during the retry. However, it is never called the ceph_buffer_put() for the old_blob object. This patch fixes the issue of the buffer leak. Cc: stable@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysbtrfs: fix double-decrement of bytes_may_use in submit_one_async_extent()Mark Harmstone1-1/+1
[ Upstream commit 82323b1a7088b7a5c3e528a5d634bff447fa286f ] submit_one_async_extent() calls btrfs_reserve_extent(), which decrements bytes_may_use. If the call btrfs_create_io_em() fails, we jump to out_free_reserve, which calls extent_clear_unlock_delalloc(). Because we're specifying EXTENT_DO_ACCOUNTING, i.e. EXTENT_CLEAR_META_RESV | EXTENT_CLEAR_DATA_RESV, this decreases bytes_may_use again. This can lead to problems later on, as an initial write can fail only for the writeback to silently ENOSPC. Fix this by replacing EXTENT_DO_ACCOUNTING with EXTENT_CLEAR_META_RESV. This parallels a4fe134fc1d8eb ("btrfs: fix a double release on reserved extents in cow_one_range()"), which is the same fix in cow_one_range(). Fixes: 151a41bc46df ("Btrfs: fix what bits we clear when erroring out from delalloc") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysbtrfs: merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to PAGE_START_WRITEBACKQu Wenruo3-26/+18
[ Upstream commit 6869b0a8be775e920be54ee9b69a743ca20d8332 ] PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK are two defines used in __process_pages_contig(), to let the function know to clear page dirty bit and then set page writeback. However page writeback and dirty bits are conflicting (at least for sector size == PAGE_SIZE case), this means these two have to be always updated together. This means we can merge PAGE_CLEAR_DIRTY and PAGE_SET_WRITEBACK to PAGE_START_WRITEBACK. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: 82323b1a7088 ("btrfs: fix double-decrement of bytes_may_use in submit_one_async_extent()") Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysfs/adfs: validate nzones in adfs_validate_bblk()Bae Yeonju1-0/+3
[ Upstream commit dd9d3e16c2d5fa166e13dce07413be51f42c8f5d ] Reject ADFS disc records with a zero zone count during boot block validation, before the disc record is used. When nzones is 0, adfs_read_map() passes it to kmalloc_array(0, ...) which returns ZERO_SIZE_PTR, and adfs_map_layout() then writes to dm[-1], causing an out-of-bounds write before the allocated buffer. adfs_validate_dr0() already rejects nzones != 1 for old-format images. Add the equivalent check to adfs_validate_bblk() for new-format images so that a crafted image with nzones == 0 is rejected at probe time. Found by syzkaller. Fixes: f6f14a0d71b0 ("fs/adfs: map: move map-specific sb initialisation to map.c") Signed-off-by: Bae Yeonju <iwasbaeyz@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysnfs/blocklayout: Fix compilation error (`make W=1`) in bl_write_pagelist()Andy Shevchenko1-3/+1
[ Upstream commit f83c8dda456ce4863f346aa26d88efa276eda35d ] Clang compiler is not happy about set but unused variable (when dprintk() is no-op): .../blocklayout/blocklayout.c:384:9: error: variable 'count' set but not used [-Werror,-Wunused-but-set-variable] Remove a leftover from the previous cleanup. Fixes: 3a6fd1f004fc ("pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelist") Acked-by: Anna Schumaker <anna.schumkaer@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysocfs2: validate group add input before cachingZhengYuan Huang1-5/+7
[ Upstream commit 70b672833f4025341c11b22c7f83778a5cd611bc ] [BUG] OCFS2_IOC_GROUP_ADD can trigger a BUG_ON in ocfs2_set_new_buffer_uptodate(): kernel BUG at fs/ocfs2/uptodate.c:509! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI RIP: 0010:ocfs2_set_new_buffer_uptodate+0x194/0x1e0 fs/ocfs2/uptodate.c:509 Code: ffffe88f 42b9fe4c 89e64889 dfe8b4df Call Trace: ocfs2_group_add+0x3f1/0x1510 fs/ocfs2/resize.c:507 ocfs2_ioctl+0x309/0x6e0 fs/ocfs2/ioctl.c:887 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl fs/ioctl.c:583 [inline] __x64_sys_ioctl+0x197/0x1e0 fs/ioctl.c:583 x64_sys_call+0x1144/0x26a0 arch/x86/include/generated/asm/syscalls_64.h:17 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x93/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7bbfb55a966d [CAUSE] ocfs2_group_add() calls ocfs2_set_new_buffer_uptodate() on a user-controlled group block before ocfs2_verify_group_and_input() validates that block number. That helper is only valid for newly allocated metadata and asserts that the block is not already present in the chosen metadata cache. The code also uses INODE_CACHE(inode) even though the group descriptor belongs to main_bm_inode and later journal accesses use that cache context instead. [FIX] Validate the on-disk group descriptor before caching it, then add it to the metadata cache tracked by INODE_CACHE(main_bm_inode). Keep the validation failure path separate from the later cleanup path so we only remove the buffer from that cache after it has actually been inserted. This keeps the group buffer lifetime consistent across validation, journaling, and cleanup. Link: https://lkml.kernel.org/r/20260410020209.3786348-1-gality369@gmail.com Fixes: 7909f2bf8353 ("[PATCH 2/2] ocfs2: Implement group add for online resize") Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysocfs2: validate bg_bits during freefrag scanZhengYuan Huang1-1/+17
[ Upstream commit 8f687eeed3da3012152b0f9473f578869de0cd7b ] [BUG] A crafted filesystem can trigger an out-of-bounds bitmap walk when OCFS2_IOC_INFO is issued with OCFS2_INFO_FL_NON_COHERENT. BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:68 [inline] BUG: KASAN: use-after-free in _test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline] BUG: KASAN: use-after-free in test_bit_le include/asm-generic/bitops/le.h:21 [inline] BUG: KASAN: use-after-free in ocfs2_info_freefrag_scan_chain fs/ocfs2/ioctl.c:495 [inline] BUG: KASAN: use-after-free in ocfs2_info_freefrag_scan_bitmap fs/ocfs2/ioctl.c:588 [inline] BUG: KASAN: use-after-free in ocfs2_info_handle_freefrag fs/ocfs2/ioctl.c:662 [inline] BUG: KASAN: use-after-free in ocfs2_info_handle_request+0x1c66/0x3370 fs/ocfs2/ioctl.c:754 Read of size 8 at addr ffff888031bce000 by task syz.0.636/1435 Call Trace: __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0xbe/0x130 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xd1/0x650 mm/kasan/report.c:482 kasan_report+0xfb/0x140 mm/kasan/report.c:595 check_region_inline mm/kasan/generic.c:186 [inline] kasan_check_range+0x11c/0x200 mm/kasan/generic.c:200 __kasan_check_read+0x11/0x20 mm/kasan/shadow.c:31 instrument_atomic_read include/linux/instrumented.h:68 [inline] _test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline] test_bit_le include/asm-generic/bitops/le.h:21 [inline] ocfs2_info_freefrag_scan_chain fs/ocfs2/ioctl.c:495 [inline] ocfs2_info_freefrag_scan_bitmap fs/ocfs2/ioctl.c:588 [inline] ocfs2_info_handle_freefrag fs/ocfs2/ioctl.c:662 [inline] ocfs2_info_handle_request+0x1c66/0x3370 fs/ocfs2/ioctl.c:754 ocfs2_info_handle+0x18d/0x2a0 fs/ocfs2/ioctl.c:828 ocfs2_ioctl+0x632/0x6e0 fs/ocfs2/ioctl.c:913 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl fs/ioctl.c:583 [inline] __x64_sys_ioctl+0x197/0x1e0 fs/ioctl.c:583 ... [CAUSE] ocfs2_info_freefrag_scan_chain() uses on-disk bg_bits directly as the bitmap scan limit. The coherent path reads group descriptors through ocfs2_read_group_descriptor(), which validates the descriptor before use. The non-coherent path uses ocfs2_read_blocks_sync() instead and skips that validation, so an impossible bg_bits value can drive the bitmap walk past the end of the block. [FIX] Compute the bitmap capacity from the filesystem format with ocfs2_group_bitmap_size(), report descriptors whose bg_bits exceeds that limit, and clamp the scan to the computed capacity. This keeps the freefrag report going while avoiding reads beyond the buffer. Link: https://lkml.kernel.org/r/20260410034220.3825769-1-gality369@gmail.com Fixes: d24a10b9f8ed ("Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.") Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysocfs2: fix listxattr handling when the buffer is fullZhengYuan Huang1-2/+2
[ Upstream commit d12f558e6200b3f47dbef9331ed6d115d2410e59 ] [BUG] If an OCFS2 inode has both inline and block-based xattrs, listxattr() can return a size larger than the caller's buffer when the inline names consume that buffer exactly. kernel BUG at mm/usercopy.c:102! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI RIP: 0010:usercopy_abort+0xb7/0xd0 mm/usercopy.c:102 Call Trace: __check_heap_object+0xe3/0x120 mm/slub.c:8243 check_heap_object mm/usercopy.c:196 [inline] __check_object_size mm/usercopy.c:250 [inline] __check_object_size+0x5c5/0x780 mm/usercopy.c:215 check_object_size include/linux/ucopysize.h:22 [inline] check_copy_size include/linux/ucopysize.h:59 [inline] copy_to_user include/linux/uaccess.h:219 [inline] listxattr+0xb0/0x170 fs/xattr.c:926 filename_listxattr fs/xattr.c:958 [inline] path_listxattrat+0x137/0x320 fs/xattr.c:988 __do_sys_listxattr fs/xattr.c:1001 [inline] __se_sys_listxattr fs/xattr.c:998 [inline] __x64_sys_listxattr+0x7f/0xd0 fs/xattr.c:998 ... [CAUSE] Commit 936b8834366e ("ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().") replaced the old per-handler list accounting with ocfs2_xattr_list_entry(), but it kept using size == 0 to detect probe mode. That assumption stops being true once ocfs2_listxattr() finishes the inline-xattr pass. If the inline names fill the caller buffer exactly, the block-xattr pass runs with a non-NULL buffer and a remaining size of zero. ocfs2_xattr_list_entry() then skips the bounds check, keeps counting block names, and returns a positive size larger than the supplied buffer. [FIX] Detect probe mode by testing whether the destination buffer pointer is NULL instead of whether the remaining size is zero. That restores the pre-refactor behavior and matches the OCFS2 getxattr helpers. Once the remaining buffer reaches zero while more names are left, the block-xattr pass now returns -ERANGE instead of reporting a size larger than the allocated list buffer. Link: https://lkml.kernel.org/r/20260410040339.3837162-1-gality369@gmail.com Fixes: 936b8834366e ("ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().") Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysocfs2/dlm: fix off-by-one in dlm_match_regions() region comparisonJunrui Luo1-1/+1
[ Upstream commit 01b61e8dda9b0fdb0d4cda43de25f4e390554d7b ] The local-vs-remote region comparison loop uses '<=' instead of '<', causing it to read one entry past the valid range of qr_regions. The other loops in the same function correctly use '<'. Fix the loop condition to use '<' for consistency and correctness. Link: https://lkml.kernel.org/r/SYBPR01MB78813DA26B50EC5E01F00566AF7BA@SYBPR01MB7881.ausprd01.prod.outlook.com Fixes: ea2034416b54 ("ocfs2/dlm: Add message DLM_QUERY_REGION") Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reported-by: Yuhao Jiang <danisjiang@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysocfs2/dlm: validate qr_numregions in dlm_match_regions()Junrui Luo1-0/+8
[ Upstream commit 7ab3fbb01bc6d79091bc375e5235d360cd9b78be ] Patch series "ocfs2/dlm: fix two bugs in dlm_match_regions()". In dlm_match_regions(), the qr_numregions field from a DLM_QUERY_REGION network message is used to drive loops over the qr_regions buffer without sufficient validation. This series fixes two issues: - Patch 1 adds a bounds check to reject messages where qr_numregions exceeds O2NM_MAX_REGIONS. The o2net layer only validates message byte length; it does not constrain field values, so a crafted message can set qr_numregions up to 255 and trigger out-of-bounds reads past the 1024-byte qr_regions buffer. - Patch 2 fixes an off-by-one in the local-vs-remote comparison loop, which uses '<=' instead of '<', reading one entry past the valid range even when qr_numregions is within bounds. This patch (of 2): The qr_numregions field from a DLM_QUERY_REGION network message is used directly as loop bounds in dlm_match_regions() without checking against O2NM_MAX_REGIONS. Since qr_regions is sized for at most O2NM_MAX_REGIONS (32) entries, a crafted message with qr_numregions > 32 causes out-of-bounds reads past the qr_regions buffer. Add a bounds check for qr_numregions before entering the loops. Link: https://lkml.kernel.org/r/SYBPR01MB7881A334D02ACEE5E0645801AF7BA@SYBPR01MB7881.ausprd01.prod.outlook.com Link: https://lkml.kernel.org/r/SYBPR01MB788166F524AD04E262E174BEAF7BA@SYBPR01MB7881.ausprd01.prod.outlook.com Fixes: ea2034416b54 ("ocfs2/dlm: Add message DLM_QUERY_REGION") Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reported-by: Yuhao Jiang <danisjiang@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysquota: Fix race of dquot_scan_active() with quota deactivationJan Kara1-8/+30
[ Upstream commit e93ab401da4b2e2c1b8ef2424de2f238d51c8b2d ] dquot_scan_active() can race with quota deactivation in quota_release_workfn() like: CPU0 (quota_release_workfn) CPU1 (dquot_scan_active) ============================== ============================== spin_lock(&dq_list_lock); list_replace_init( &releasing_dquots, &rls_head); /* dquot X on rls_head, dq_count == 0, DQ_ACTIVE_B still set */ spin_unlock(&dq_list_lock); synchronize_srcu(&dquot_srcu); spin_lock(&dq_list_lock); list_for_each_entry(dquot, &inuse_list, dq_inuse) { /* finds dquot X */ dquot_active(X) -> true atomic_inc(&X->dq_count); } spin_unlock(&dq_list_lock); spin_lock(&dq_list_lock); dquot = list_first_entry(&rls_head); WARN_ON_ONCE(atomic_read(&dquot->dq_count)); The problem is not only a cosmetic one as under memory pressure the caller of dquot_scan_active() can end up working on freed dquot. Fix the problem by making sure the dquot is removed from releasing list when we acquire a reference to it. Fixes: 869b6ea1609f ("quota: Fix slow quotaoff") Reported-by: Sam Sun <samsun1006219@gmail.com> Link: https://lore.kernel.org/all/CAEkJfYPTt3uP1vAYnQ5V2ZWn5O9PLhhGi5HbOcAzyP9vbXyjeg@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 dayspstore/ram: fix resource leak when ioremap() failsCole Leavitt1-0/+4
[ Upstream commit 2ddb69f686ef7a621645e97fc7329c50edf5d0e5 ] In persistent_ram_iomap(), ioremap() or ioremap_wc() may return NULL on failure. Currently, if this happens, the function returns NULL without releasing the memory region acquired by request_mem_region(). This leads to a resource leak where the memory region remains reserved but unusable. Additionally, the caller persistent_ram_buffer_map() handles NULL correctly by returning -ENOMEM, but without this check, a NULL return combined with request_mem_region() succeeding leaves resources in an inconsistent state. This is the ioremap() counterpart to commit 05363abc7625 ("pstore: ram_core: fix incorrect success return when vmap() fails") which fixed a similar issue in the vmap() path. Fixes: 404a6043385d ("staging: android: persistent_ram: handle reserving and mapping memory") Signed-off-by: Cole Leavitt <cole@unwrap.rs> Link: https://patch.msgid.link/20260225235406.11790-1-cole@unwrap.rs Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysnilfs2: reject zero bd_oblocknr in nilfs_ioctl_mark_blocks_dirty()Deepanshu Kartikey1-0/+6
[ Upstream commit be3e5d10643d3be1cbac9d9939f220a99253f980 ] nilfs_ioctl_mark_blocks_dirty() uses bd_oblocknr to detect dead blocks by comparing it with the current block number bd_blocknr. If they differ, the block is considered dead and skipped. However, bd_oblocknr should never be 0 since block 0 typically stores the primary superblock and is never a valid GC target block. A corrupted ioctl request with bd_oblocknr set to 0 causes the comparison to incorrectly match when the lookup returns -ENOENT and sets bd_blocknr to 0, bypassing the dead block check and calling nilfs_bmap_mark() on a non-existent block. This causes nilfs_btree_do_lookup() to return -ENOENT, triggering the WARN_ON(ret == -ENOENT). Fix this by rejecting ioctl requests with bd_oblocknr set to 0 at the beginning of each iteration. [ryusuke: slightly modified the commit message and comments for accuracy] Fixes: 7942b919f732 ("nilfs2: ioctl operations") Reported-by: syzbot+98a040252119df0506f8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=98a040252119df0506f8 Suggested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com> Reported-by: syzbot+466a45fcfb0562f5b9a0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=466a45fcfb0562f5b9a0 Cc: Junjie Cao <junjie.cao@linux.dev> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysfs/omfs: reject s_sys_blocksize smaller than OMFS_DIR_STARTHyungJung Joo1-0/+6
[ Upstream commit 0621c385fda1376e967f37ccd534c26c3e511d14 ] omfs_fill_super() rejects oversized s_sys_blocksize values (> PAGE_SIZE), but it does not reject values smaller than OMFS_DIR_START (0x1b8 = 440). Later, omfs_make_empty() uses sbi->s_sys_blocksize - OMFS_DIR_START as the length argument to memset(). Since s_sys_blocksize is u32, a crafted filesystem image with s_sys_blocksize < OMFS_DIR_START causes an unsigned underflow there, wrapping to a value near 2^32. That drives a ~4 GiB memset() from bh->b_data + OMFS_DIR_START and overwrites kernel memory far beyond the backing block buffer. Add the corresponding lower-bound check alongside the existing upper-bound check in omfs_fill_super(), so that malformed images are rejected during superblock validation before any filesystem data is processed. Fixes: a3ab7155ea21 ("omfs: add directory routines") Signed-off-by: Hyungjung Joo <jhj140711@gmail.com> Link: https://patch.msgid.link/20260317054827.1822061-1-jhj140711@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysisofs: validate block number from NFS file handle in isofs_export_igetMichael Bommarito1-1/+1
commit 24376458138387fb251e782e624c7776e9826796 upstream. isofs_fh_to_dentry() and isofs_fh_to_parent() pass an attacker- controlled block number (ifid->block or ifid->parent_block) from the NFS file handle to isofs_export_iget(), which only rejects block == 0 before calling isofs_iget() and ultimately sb_bread(). A crafted file handle with fh_len sufficient to pass the check added by commit 0405d4b63d08 ("isofs: Prevent the use of too small fid") can still drive the server to read any in-range block on the backing device as if it were an iso_directory_record. That earlier fix was assigned CVE-2025-37780. sb_bread() on an out-of-range block returns NULL cleanly via the EIO path, so there is no memory-safety violation. For in-range reads of adjacent-partition data on the same block device, the unrelated bytes end up in iso_inode_info fields that reach the NFS client as dentry metadata. The deployment surface (isofs exported over NFS from loop-mounted images) is narrow and requires an authenticated NFS peer, but the malformed-file-handle class is reportable as hardening next to the existing CVE-2025-37780 fix. Reject block >= ISOFS_SB(sb)->s_nzones in isofs_export_iget() so the check covers both isofs_fh_to_dentry() and isofs_fh_to_parent() call sites with a single line. Fixes: 0405d4b63d08 ("isofs: Prevent the use of too small fid") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260419212155.2169382-3-michael.bommarito@gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysisofs: validate Rock Ridge CE continuation extent against volume sizeMichael Bommarito1-0/+9
commit a36d990f591320e9dd379ab30063ebfe91d47e1f upstream. rock_continue() reads rs->cont_extent verbatim from the Rock Ridge CE record and passes it to sb_bread() without checking that the block number is within the mounted ISO 9660 volume. commit e595447e177b ("[PATCH] rock.c: handle corrupted directories") added cont_offset and cont_size rejection for the CE continuation but did not validate the extent block number itself. commit f54e18f1b831 ("isofs: Fix infinite looping over CE entries") later capped the CE chain length at RR_MAX_CE_ENTRIES = 32 but again left the block number unchecked. With a crafted ISO mounted via udisks2 (desktop optical auto-mount) or via CAP_SYS_ADMIN mount, rs->cont_extent can therefore point at an out-of-range block or at blocks belonging to an adjacent filesystem on the same block device. sb_bread() on an out-of-range block returns NULL cleanly via the block layer EIO path, so there is no memory-safety violation. For in-range reads of adjacent- filesystem data, the CE buffer is parsed as Rock Ridge records and only the text of SL sub-records reaches userspace through readlink(), which makes the info-leak channel narrow and difficult to exploit; still, rejecting the malformed CE outright matches the rejection shape already present in the same function for cont_offset and cont_size. Add an ISOFS_SB(sb)->s_nzones bounds check to rock_continue() next to the existing offset/size rejection, printing the same corrupted-directory-entry notice. Fixes: f54e18f1b831 ("isofs: Fix infinite looping over CE entries") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260419212155.2169382-2-michael.bommarito@gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysudf: reject descriptors with oversized CRC lengthMichael Bommarito1-2/+6
commit 55d41b0a20128e86b9e960dd2e3f0a2d69a18df7 upstream. udf_read_tagged() skips CRC verification when descCRCLength + sizeof(struct tag) exceeds the block size. A crafted UDF image can set descCRCLength to an oversized value to bypass CRC validation entirely; the descriptor is then accepted based solely on the 8-bit tag checksum, which is trivially recomputable. Reject such descriptors instead of silently accepting them. A legitimate single-block descriptor should never have a CRC length that exceeds the block. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-6 Assisted-by: Codex:gpt-5-4 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260413211240.853662-1-michael.bommarito@gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysfanotify: fix false positive on permission eventsMiklos Szeredi2-8/+12
commit 7746e3bd4cc19b5092e00d32d676e329bfcb6900 upstream. fsnotify_get_mark_safe() may return false for a mark on an unrelated group, which results in bypassing the permission check. Fix by skipping over detached marks that are not in the current group. CC: stable@vger.kernel.org Fixes: abc77577a669 ("fsnotify: Provide framework for dropping SRCU lock in ->handle_event") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260410144950.156160-1-mszeredi@redhat.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysinotify: fix watch count leak when fsnotify_add_inode_mark_locked() failsChia-Ming Chang1-0/+1
commit 6a320935fa4293e9e599ec9f85dc9eb3be7029f8 upstream. When fsnotify_add_inode_mark_locked() fails in inotify_new_watch(), the error path calls inotify_remove_from_idr() but does not call dec_inotify_watches() to undo the preceding inc_inotify_watches(). This leaks a watch count, and repeated failures can exhaust the max_user_watches limit with -ENOSPC even when no watches are active. Prior to commit 1cce1eea0aff ("inotify: Convert to using per-namespace limits"), the watch count was incremented after fsnotify_add_mark_locked() succeeded, so this path was not affected. The conversion moved inc_inotify_watches() before the mark insertion without adding the corresponding rollback. Add the missing dec_inotify_watches() call in the error path. Fixes: 1cce1eea0aff ("inotify: Convert to using per-namespace limits") Cc: stable@vger.kernel.org Signed-off-by: Chia-Ming Chang <chiamingc@synology.com> Signed-off-by: robbieko <robbieko@synology.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://patch.msgid.link/20260224093442.3076294-1-chiamingc@synology.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all()Sohei Koyama1-1/+3
commit 77d059519382bd66283e6a4e83ee186e87e7708f upstream. The commit c8e008b60492 ("ext4: ignore xattrs past end") introduced a refcount leak in when block_csum is false. ext4_xattr_inode_dec_ref_all() calls ext4_get_inode_loc() to get iloc.bh, but never releases it with brelse(). Fixes: c8e008b60492 ("ext4: ignore xattrs past end") Signed-off-by: Sohei Koyama <skoyama@ddn.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: stable@vger.kernel.org Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Link: https://patch.msgid.link/20260406074830.8480-1-skoyama@ddn.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysuserfaultfd: allow registration of ranges below mmap_min_addrDenis M. Karpov1-2/+0
commit 161ce69c2c89781784b945d8e281ff2da9dede9c upstream. The current implementation of validate_range() in fs/userfaultfd.c performs a hard check against mmap_min_addr. This is redundant because UFFDIO_REGISTER operates on memory ranges that must already be backed by a VMA. Enforcing mmap_min_addr or capability checks again in userfaultfd is unnecessary and prevents applications like binary compilers from using UFFD for valid memory regions mapped by application. Remove the redundant check for mmap_min_addr. We started using UFFD instead of the classic mprotect approach in the binary translator to track application writes. During development, we encountered this bug. The translator cannot control where the translated application chooses to map its memory and if the app requires a low-address area, UFFD fails, whereas mprotect would work just fine. I believe this is a genuine logic bug rather than an improvement, and I would appreciate including the fix in stable. Link: https://lore.kernel.org/20260409103345.15044-1-komlomal@gmail.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Denis M. Karpov <komlomal@gmail.com> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysext2: reject inodes with zero i_nlink and valid mode in ext2_iget()Vasiliy Kovalev1-3/+11
commit 25947cc5b2374cd5bf627fe3141496444260d04f upstream. ext2_iget() already rejects inodes with i_nlink == 0 when i_mode is zero or i_dtime is set, treating them as deleted. However, the case of i_nlink == 0 with a non-zero mode and zero dtime slips through. Since ext2 has no orphan list, such a combination can only result from filesystem corruption - a legitimate inode deletion always sets either i_dtime or clears i_mode before freeing the inode. A crafted image can exploit this gap to present such an inode to the VFS, which then triggers WARN_ON inside drop_nlink() (fs/inode.c) via ext2_unlink(), ext2_rename() and ext2_rmdir(): WARNING: CPU: 3 PID: 609 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336 CPU: 3 UID: 0 PID: 609 Comm: syz-executor Not tainted 6.12.77+ #1 Call Trace: <TASK> inode_dec_link_count include/linux/fs.h:2518 [inline] ext2_unlink+0x26c/0x300 fs/ext2/namei.c:295 vfs_unlink+0x2fc/0x9b0 fs/namei.c:4477 do_unlinkat+0x53e/0x730 fs/namei.c:4541 __x64_sys_unlink+0xc6/0x110 fs/namei.c:4587 do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> WARNING: CPU: 0 PID: 646 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336 CPU: 0 UID: 0 PID: 646 Comm: syz.0.17 Not tainted 6.12.77+ #1 Call Trace: <TASK> inode_dec_link_count include/linux/fs.h:2518 [inline] ext2_rename+0x35e/0x850 fs/ext2/namei.c:374 vfs_rename+0xf2f/0x2060 fs/namei.c:5021 do_renameat2+0xbe2/0xd50 fs/namei.c:5178 __x64_sys_rename+0x7e/0xa0 fs/namei.c:5223 do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> WARNING: CPU: 0 PID: 634 at fs/inode.c:336 drop_nlink+0xad/0xd0 fs/inode.c:336 CPU: 0 UID: 0 PID: 634 Comm: syz-executor Not tainted 6.12.77+ #1 Call Trace: <TASK> inode_dec_link_count include/linux/fs.h:2518 [inline] ext2_rmdir+0xca/0x110 fs/ext2/namei.c:311 vfs_rmdir+0x204/0x690 fs/namei.c:4348 do_rmdir+0x372/0x3e0 fs/namei.c:4407 __x64_sys_unlinkat+0xf0/0x130 fs/namei.c:4577 do_syscall_64+0xf5/0x220 arch/x86/entry/common.c:78 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> Extend the existing i_nlink == 0 check to also catch this case, reporting the corruption via ext2_error() and returning -EFSCORRUPTED. This rejects the inode at load time and prevents it from reaching any of the namei.c paths. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Link: https://patch.msgid.link/20260404152011.2590197-1-kovalev@altlinux.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysocfs2: split transactions in dio completion to avoid credit exhaustionHeming Zhao1-29/+45
commit d647c5b2fbf81560818dacade360abc8c00a9665 upstream. During ocfs2 dio operations, JBD2 may report warnings via following call trace: ocfs2_dio_end_io_write ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ocfs2_try_to_merge_extent ocfs2_extend_rotate_transaction ocfs2_extend_trans jbd2__journal_restart start_this_handle output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to handle extents in a batch of transaction. Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should only be removed from the orphan list after the extent tree update is complete. This ensures that if a crash occurs in the middle of extent tree updates, we won't leave stale blocks beyond EOF. This patch also changes the logic for updating the inode size and removing orphan, making it similar to ext4_dio_write_end_io(). Both operations are performed only when everything looks good. Finally, thanks to Jans and Joseph for providing the bug fix prototype and suggestions. Link: https://lkml.kernel.org/r/20260402134328.27334-2-heming.zhao@suse.com Signed-off-by: Heming Zhao <heming.zhao@suse.com> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysfuse: quiet down complaints in fuse_conn_limit_writeDarrick J. Wong1-2/+2
commit 129a45f9755a89f573c6a513a6b9e3d234ce89b0 upstream. gcc 15 complains about an uninitialized variable val that is passed by reference into fuse_conn_limit_write: control.c: In function ‘fuse_conn_congestion_threshold_write’: include/asm-generic/rwonce.h:55:37: warning: ‘val’ may be used uninitialized [-Wmaybe-uninitialized] 55 | *(volatile typeof(x) *)&(x) = (val); \ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ include/asm-generic/rwonce.h:61:9: note: in expansion of macro ‘__WRITE_ONCE’ 61 | __WRITE_ONCE(x, val); \ | ^~~~~~~~~~~~ control.c:178:9: note: in expansion of macro ‘WRITE_ONCE’ 178 | WRITE_ONCE(fc->congestion_threshold, val); | ^~~~~~~~~~ control.c:166:18: note: ‘val’ was declared here 166 | unsigned val; | ^~~ Unfortunately there's enough macro spew involved in kstrtoul_from_user that I think gcc gives up on its analysis and sprays the above warning. AFAICT it's not actually a bug, but we could just zero-initialize the variable to enable using -Wmaybe-uninitialized to find real problems. Previously we would use some weird uninitialized_var annotation to quiet down the warnings, so clearly this code has been like this for quite some time. Cc: stable@vger.kernel.org # v5.9 Fixes: 3f649ab728cda8 ("treewide: Remove uninitialized_var() usage") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysfuse: reject oversized dirents in page cacheSamuel Page1-0/+4
commit 51a8de6c50bf947c8f534cd73da4c8f0a13e7bed upstream. fuse_add_dirent_to_cache() computes a serialized dirent size from the server-controlled namelen field and copies the dirent into a single page-cache page. The existing logic only checks whether the dirent fits in the remaining space of the current page and advances to a fresh page if not. It never checks whether the dirent itself exceeds PAGE_SIZE. As a result, a malicious FUSE server can return a dirent with namelen=4095, producing a serialized record size of 4120 bytes. On 4 KiB page systems this causes memcpy() to overflow the cache page by 24 bytes into the following kernel page. Reject dirents that cannot fit in a single page before copying them into the readdir cache. Fixes: 69e34551152a ("fuse: allow caching readdir") Cc: stable@vger.kernel.org # v6.16+ Assisted-by: Bynario AI Signed-off-by: Samuel Page <sam@bynar.io> Reported-by: Qi Tang <tpluszz77@gmail.com> Reported-by: Zijun Hu <nightu@northwestern.edu> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260420090139.662772-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 dayscifs: Fix connections leak when tlink setup failedZhang Xiaoxu1-4/+13
commit 1dcdf5f5b2137185cbdd5385f29949ab3da4f00c upstream. If the tlink setup failed, lost to put the connections, then the module refcnt leak since the cifsd kthread not exit. Also leak the fscache info, and for next mount with fsc, it will print the follow errors: CIFS: Cache volume key already in use (cifs,127.0.0.1:445,TEST) Let's check the result of tlink setup, and do some cleanup. Fixes: 56c762eb9bee ("cifs: Refactor out cifs_mount()") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> [ kovalev: bp to fix CVE-2022-49822; adapted to use direct xid/ses/tcon variables instead of mnt_ctx struct fields due to the older kernel not having the corresponding cifs_mount() refactoring (see upstream commit c88f7dcd6d64); additionally NULL out mntdata after dfs_cache_add_vol() transfers its ownership to vol_list, otherwise the new error path from mount_setup_tlink() failure would double-free it via kfree(mntdata) in the error: label ] Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
9 daysgfs2: Validate i_depth for exhash directoriesAndrew Price2-4/+6
[ Upstream commit 557c024ca7250bb65ae60f16c02074106c2f197b ] A fuzzer test introduced corruption that ends up with a depth of 0 in dir_e_read(), causing an undefined shift by 32 at: index = hash >> (32 - dip->i_depth); As calculated in an open-coded way in dir_make_exhash(), the minimum depth for an exhash directory is ilog2(sdp->sd_hash_ptrs) and 0 is invalid as sdp->sd_hash_ptrs is fixed as sdp->bsize / 16 at mount time. So we can avoid the undefined behaviour by checking for depth values lower than the minimum in gfs2_dinode_in(). Values greater than the maximum are already being checked for there. Also switch the calculation in dir_make_exhash() to use ilog2() to clarify how the depth is calculated. Tested with the syzkaller repro.c and xfstests '-g quick'. Reported-by: syzbot+4708579bb230a0582a57@syzkaller.appspotmail.com Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> [ To maintain consistency in error handling in gfs2_dinode_in(), use "goto corrupt" in v5.10. ] Signed-off-by: Ruohan Lan <ruohanlan@aliyun.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysocfs2: fix possible deadlock between unlink and dio_end_io_writeJoseph Qi1-2/+1
[ Upstream commit b02da26a992db0c0e2559acbda0fc48d4a2fd337 ] ocfs2_unlink takes orphan dir inode_lock first and then ip_alloc_sem, while in ocfs2_dio_end_io_write, it acquires these locks in reverse order. This creates an ABBA lock ordering violation on lock classes ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE] and ocfs2_file_ip_alloc_sem_key. Lock Chain #0 (orphan dir inode_lock -> ip_alloc_sem): ocfs2_unlink ocfs2_prepare_orphan_dir ocfs2_lookup_lock_orphan_dir inode_lock(orphan_dir_inode) <- lock A __ocfs2_prepare_orphan_dir ocfs2_prepare_dir_for_insert ocfs2_extend_dir ocfs2_expand_inline_dir down_write(&oi->ip_alloc_sem) <- Lock B Lock Chain #1 (ip_alloc_sem -> orphan dir inode_lock): ocfs2_dio_end_io_write down_write(&oi->ip_alloc_sem) <- Lock B ocfs2_del_inode_from_orphan() inode_lock(orphan_dir_inode) <- Lock A Deadlock Scenario: CPU0 (unlink) CPU1 (dio_end_io_write) ------ ------ inode_lock(orphan_dir_inode) down_write(ip_alloc_sem) down_write(ip_alloc_sem) inode_lock(orphan_dir_inode) Since ip_alloc_sem is to protect allocation changes, which is unrelated with operations in ocfs2_del_inode_from_orphan. So move ocfs2_del_inode_from_orphan out of ip_alloc_sem to fix the deadlock. Link: https://lkml.kernel.org/r/20260306032211.1016452-1-joseph.qi@linux.alibaba.com Reported-by: syzbot+67b90111784a3eac8c04@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=67b90111784a3eac8c04 Fixes: a86a72a4a4e0 ("ocfs2: take ip_alloc_sem in ocfs2_dio_get_block & ocfs2_dio_end_io_write") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysfs/ocfs2: fix comments mentioning i_mutexhongnanli11-16/+16
[ Upstream commit 137cebf9432eae024d0334953ed92a2a78619b52 ] inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix comments still mentioning i_mutex. Link: https://lkml.kernel.org/r/20220214031314.100094-1-hongnan.li@linux.alibaba.com Signed-off-by: hongnanli <hongnan.li@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: b02da26a992d ("ocfs2: fix possible deadlock between unlink and dio_end_io_write") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysocfs2: fix out-of-bounds write in ocfs2_write_end_inlineJoseph Qi1-0/+10
[ Upstream commit 7bc5da4842bed3252d26e742213741a4d0ac1b14 ] KASAN reports a use-after-free write of 4086 bytes in ocfs2_write_end_inline, called from ocfs2_write_end_nolock during a copy_