summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2024-10-10smb3: fix incorrect mode displayed for read-only filesSteve French1-8/+11
commit 2f3017e7cc7515e0110a3733d8dca84de2a1d23d upstream. Commands like "chmod 0444" mark a file readonly via the attribute flag (when mapping of mode bits into the ACL are not set, or POSIX extensions are not negotiated), but they were not reported correctly for stat of directories (they were reported ok for files and for "ls"). See example below: root:~# ls /mnt2 -l total 12 drwxr-xr-x 2 root root 0 Sep 21 18:03 normaldir -rwxr-xr-x 1 root root 0 Sep 21 23:24 normalfile dr-xr-xr-x 2 root root 0 Sep 21 17:55 readonly-dir -r-xr-xr-x 1 root root 209716224 Sep 21 18:15 readonly-file root:~# stat -c %a /mnt2/readonly-dir 755 root:~# stat -c %a /mnt2/readonly-file 555 This fixes the stat of directories when ATTR_READONLY is set (in cases where the mode can not be obtained other ways). root:~# stat -c %a /mnt2/readonly-dir 555 Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10smb: client: use actual path when queryfswangrong4-10/+26
commit a421e3fe0e6abe27395078f4f0cec5daf466caea upstream. Due to server permission control, the client does not have access to the shared root directory, but can access subdirectories normally, so users usually mount the shared subdirectories directly. In this case, queryfs should use the actual path instead of the root directory to avoid the call returning an error (EACCES). Signed-off-by: wangrong <wangrong@uniontech.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10gfs2: fix double destroy_workqueue errorJulian Sun2-1/+3
commit 6cb9df81a2c462b89d2f9611009ab43ae8717841 upstream. When gfs2_fill_super() fails, destroy_workqueue() is called within gfs2_gl_hash_clear(), and the subsequent code path calls destroy_workqueue() on the same work queue again. This issue can be fixed by setting the work queue pointer to NULL after the first destroy_workqueue() call and checking for a NULL pointer before attempting to destroy the work queue again. Reported-by: syzbot+d34c2a269ed512c531b0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d34c2a269ed512c531b0 Fixes: 30e388d57367 ("gfs2: Switch to a per-filesystem glock workqueue") Cc: stable@vger.kernel.org Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10sysctl: avoid spurious permanent empty tablesThomas Weißschuh1-3/+8
commit 559d4c6a9d3b60f239493239070eb304edaea594 upstream. The test if a table is a permanently empty one, inspects the address of the registered ctl_table argument. However as sysctl_mount_point is an empty array and does not occupy and space it can end up sharing an address with another object in memory. If that other object itself is a "struct ctl_table" then registering that table will fail as it's incorrectly recognized as permanently empty. Avoid this issue by adding a dummy element to the array so that is not empty anymore. Explicitly register the table with zero elements as otherwise the dummy element would be recognized as a sentinel element which would lead to a runtime warning from the sysctl core. While the issue seems not being encountered at this time, this seems mostly to be due to luck. Also a future change, constifying sysctl_mount_point and root_table, can reliably trigger this issue on clang 18. Given that empty arrays are non-standard in the first place it seems prudent to avoid them if possible. Fixes: 4a7b29f65094 ("sysctl: move sysctl type to ctl_table_header") Fixes: a35dd3a786f5 ("sysctl: drop now unnecessary out-of-bounds check") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Closes: https://lore.kernel.org/oe-lkp/202408051453.f638857e-lkp@intel.com Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10NFSD: Fix NFSv4's PUTPUBFH operationChuck Lever1-9/+1
commit 202f39039a11402dcbcd5fece8d9fa6be83f49ae upstream. According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH. Replace the XDR decoder for PUTPUBFH with a "noop" since we no longer want the minorversion check, and PUTPUBFH has no arguments to decode. (Ideally nfsd4_decode_noop should really be called nfsd4_decode_void). PUTPUBFH should now behave just like PUTROOTFH. Reported-by: Cedric Blancher <cedric.blancher@gmail.com> Fixes: e1a90ebd8b23 ("NFSD: Combine decode operations for v4 and v4.1") Cc: Dan Shelton <dan.f.shelton@gmail.com> Cc: Roland Mainz <roland.mainz@nrubsig.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10nfsd: map the EBADMSG to nfserr_io to avoid warningLi Lingfeng1-0/+1
commit 340e61e44c1d2a15c42ec72ade9195ad525fd048 upstream. Ext4 will throw -EBADMSG through ext4_readdir when a checksum error occurs, resulting in the following WARNING. Fix it by mapping EBADMSG to nfserr_io. nfsd_buffered_readdir iterate_dir // -EBADMSG -74 ext4_readdir // .iterate_shared ext4_dx_readdir ext4_htree_fill_tree htree_dirblock_to_tree ext4_read_dirblock __ext4_read_dirblock ext4_dirblock_csum_verify warn_no_space_for_csum __warn_no_space_for_csum return ERR_PTR(-EFSBADCRC) // -EBADMSG -74 nfserrno // WARNING [ 161.115610] ------------[ cut here ]------------ [ 161.116465] nfsd: non-standard errno: -74 [ 161.117315] WARNING: CPU: 1 PID: 780 at fs/nfsd/nfsproc.c:878 nfserrno+0x9d/0xd0 [ 161.118596] Modules linked in: [ 161.119243] CPU: 1 PID: 780 Comm: nfsd Not tainted 5.10.0-00014-g79679361fd5d #138 [ 161.120684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qe mu.org 04/01/2014 [ 161.123601] RIP: 0010:nfserrno+0x9d/0xd0 [ 161.124676] Code: 0f 87 da 30 dd 00 83 e3 01 b8 00 00 00 05 75 d7 44 89 ee 48 c7 c7 c0 57 24 98 89 44 24 04 c6 05 ce 2b 61 03 01 e8 99 20 d8 00 <0f> 0b 8b 44 24 04 eb b5 4c 89 e6 48 c7 c7 a0 6d a4 99 e8 cc 15 33 [ 161.127797] RSP: 0018:ffffc90000e2f9c0 EFLAGS: 00010286 [ 161.128794] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 161.130089] RDX: 1ffff1103ee16f6d RSI: 0000000000000008 RDI: fffff520001c5f2a [ 161.131379] RBP: 0000000000000022 R08: 0000000000000001 R09: ffff8881f70c1827 [ 161.132664] R10: ffffed103ee18304 R11: 0000000000000001 R12: 0000000000000021 [ 161.133949] R13: 00000000ffffffb6 R14: ffff8881317c0000 R15: ffffc90000e2fbd8 [ 161.135244] FS: 0000000000000000(0000) GS:ffff8881f7080000(0000) knlGS:0000000000000000 [ 161.136695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 161.137761] CR2: 00007fcaad70b348 CR3: 0000000144256006 CR4: 0000000000770ee0 [ 161.139041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 161.140291] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 161.141519] PKRU: 55555554 [ 161.142076] Call Trace: [ 161.142575] ? __warn+0x9b/0x140 [ 161.143229] ? nfserrno+0x9d/0xd0 [ 161.143872] ? report_bug+0x125/0x150 [ 161.144595] ? handle_bug+0x41/0x90 [ 161.145284] ? exc_invalid_op+0x14/0x70 [ 161.146009] ? asm_exc_invalid_op+0x12/0x20 [ 161.146816] ? nfserrno+0x9d/0xd0 [ 161.147487] nfsd_buffered_readdir+0x28b/0x2b0 [ 161.148333] ? nfsd4_encode_dirent_fattr+0x380/0x380 [ 161.149258] ? nfsd_buffered_filldir+0xf0/0xf0 [ 161.150093] ? wait_for_concurrent_writes+0x170/0x170 [ 161.151004] ? generic_file_llseek_size+0x48/0x160 [ 161.151895] nfsd_readdir+0x132/0x190 [ 161.152606] ? nfsd4_encode_dirent_fattr+0x380/0x380 [ 161.153516] ? nfsd_unlink+0x380/0x380 [ 161.154256] ? override_creds+0x45/0x60 [ 161.155006] nfsd4_encode_readdir+0x21a/0x3d0 [ 161.155850] ? nfsd4_encode_readlink+0x210/0x210 [ 161.156731] ? write_bytes_to_xdr_buf+0x97/0xe0 [ 161.157598] ? __write_bytes_to_xdr_buf+0xd0/0xd0 [ 161.158494] ? lock_downgrade+0x90/0x90 [ 161.159232] ? nfs4svc_decode_voidarg+0x10/0x10 [ 161.160092] nfsd4_encode_operation+0x15a/0x440 [ 161.160959] nfsd4_proc_compound+0x718/0xe90 [ 161.161818] nfsd_dispatch+0x18e/0x2c0 [ 161.162586] svc_process_common+0x786/0xc50 [ 161.163403] ? nfsd_svc+0x380/0x380 [ 161.164137] ? svc_printk+0x160/0x160 [ 161.164846] ? svc_xprt_do_enqueue.part.0+0x365/0x380 [ 161.165808] ? nfsd_svc+0x380/0x380 [ 161.166523] ? rcu_is_watching+0x23/0x40 [ 161.167309] svc_process+0x1a5/0x200 [ 161.168019] nfsd+0x1f5/0x380 [ 161.168663] ? nfsd_shutdown_threads+0x260/0x260 [ 161.169554] kthread+0x1c4/0x210 [ 161.170224] ? kthread_insert_work_sanity_check+0x80/0x80 [ 161.171246] ret_from_fork+0x1f/0x30 Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10nfsd: fix delegation_blocked() to block correctly for at least 30 secondsNeilBrown1-2/+3
commit 45bb63ed20e02ae146336412889fe5450316a84f upstream. The pair of bloom filtered used by delegation_blocked() was intended to block delegations on given filehandles for between 30 and 60 seconds. A new filehandle would be recorded in the "new" bit set. That would then be switch to the "old" bit set between 0 and 30 seconds later, and it would remain as the "old" bit set for 30 seconds. Unfortunately the code intended to clear the old bit set once it reached 30 seconds old, preparing it to be the next new bit set, instead cleared the *new* bit set before switching it to be the old bit set. This means that the "old" bit set is always empty and delegations are blocked between 0 and 30 seconds. This patch updates bd->new before clearing the set with that index, instead of afterwards. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Cc: stable@vger.kernel.org Fixes: 6282cd565553 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10exfat: fix memory leak in exfat_load_bitmap()Yuezhang Mo1-5/+5
commit d2b537b3e533f28e0d97293fe9293161fe8cd137 upstream. If the first directory entry in the root directory is not a bitmap directory entry, 'bh' will not be released and reassigned, which will cause a memory leak. Fixes: 1e49a94cf707 ("exfat: add bitmap operations") Cc: stable@vger.kernel.org Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ocfs2: fix possible null-ptr-deref in ocfs2_set_buffer_uptodateLizhi Xu1-1/+2
commit 33b525cef4cff49e216e4133cc48452e11c0391e upstream. When doing cleanup, if flags without OCFS2_BH_READAHEAD, it may trigger NULL pointer dereference in the following ocfs2_set_buffer_uptodate() if bh is NULL. Link: https://lkml.kernel.org/r/20240902023636.1843422-3-joseph.qi@linux.alibaba.com Fixes: cf76c78595ca ("ocfs2: don't put and assigning null to bh allocated outside") Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Heming Zhao <heming.zhao@suse.com> Suggested-by: Heming Zhao <heming.zhao@suse.com> Cc: <stable@vger.kernel.org> [4.20+] Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ocfs2: fix null-ptr-deref when journal load failed.Julian Sun1-3/+4
commit 5784d9fcfd43bd853654bb80c87ef293b9e8e80a upstream. During the mounting process, if journal_reset() fails because of too short journal, then lead to jbd2_journal_load() fails with NULL j_sb_buffer. Subsequently, ocfs2_journal_shutdown() calls jbd2_journal_flush()->jbd2_cleanup_journal_tail()-> __jbd2_update_log_tail()->jbd2_journal_update_sb_log_tail() ->lock_buffer(journal->j_sb_buffer), resulting in a null-pointer dereference error. To resolve this issue, we should check the JBD2_LOADED flag to ensure the journal was properly loaded. Additionally, use journal instead of osb->journal directly to simplify the code. Link: https://syzkaller.appspot.com/bug?extid=05b9b39d8bdfe1a0861f Link: https://lkml.kernel.org/r/20240902030844.422725-1-sunjunchao2870@gmail.com Fixes: f6f50e28f0cb ("jbd2: Fail to load a journal if it is too short") Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Reported-by: syzbot+05b9b39d8bdfe1a0861f@syzkaller.appspotmail.com Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ocfs2: remove unreasonable unlock in ocfs2_read_blocksLizhi Xu1-1/+0
commit c03a82b4a0c935774afa01fd6d128b444fd930a1 upstream. Patch series "Misc fixes for ocfs2_read_blocks", v5. This series contains 2 fixes for ocfs2_read_blocks(). The first patch fix the issue reported by syzbot, which detects bad unlock balance in ocfs2_read_blocks(). The second patch fixes an issue reported by Heming Zhao when reviewing above fix. This patch (of 2): There was a lock release before exiting, so remove the unreasonable unlock. Link: https://lkml.kernel.org/r/20240902023636.1843422-1-joseph.qi@linux.alibaba.com Link: https://lkml.kernel.org/r/20240902023636.1843422-2-joseph.qi@linux.alibaba.com Fixes: cf76c78595ca ("ocfs2: don't put and assigning null to bh allocated outside") Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: syzbot+ab134185af9ef88dfed5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ab134185af9ef88dfed5 Tested-by: syzbot+ab134185af9ef88dfed5@syzkaller.appspotmail.com Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> [4.20+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ocfs2: cancel dqi_sync_work before freeing oinfoJoseph Qi1-2/+6
commit 35fccce29feb3706f649726d410122dd81b92c18 upstream. ocfs2_global_read_info() will initialize and schedule dqi_sync_work at the end, if error occurs after successfully reading global quota, it will trigger the following warning with CONFIG_DEBUG_OBJECTS_* enabled: ODEBUG: free active (active state 0) object: 00000000d8b0ce28 object type: timer_list hint: qsync_work_fn+0x0/0x16c This reports that there is an active delayed work when freeing oinfo in error handling, so cancel dqi_sync_work first. BTW, return status instead of -1 when .read_file_info fails. Link: https://syzkaller.appspot.com/bug?extid=f7af59df5d6b25f0febd Link: https://lkml.kernel.org/r/20240904071004.2067695-1-joseph.qi@linux.alibaba.com Fixes: 171bf93ce11f ("ocfs2: Periodic quota syncing") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Reported-by: syzbot+f7af59df5d6b25f0febd@syzkaller.appspotmail.com Tested-by: syzbot+f7af59df5d6b25f0febd@syzkaller.appspotmail.com Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ocfs2: reserve space for inline xattr before attaching reflink treeGautham Ananthakrishna2-12/+25
commit 5ca60b86f57a4d9648f68418a725b3a7de2816b0 upstream. One of our customers reported a crash and a corrupted ocfs2 filesystem. The crash was due to the detection of corruption. Upon troubleshooting, the fsck -fn output showed the below corruption [EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record, but fsck believes the largest valid value is 227. Clamp the next record value? n The stat output from the debugfs.ocfs2 showed the following corruption where the "Next Free Rec:" had overshot the "Count:" in the root metadata block. Inode: 33080590 Mode: 0640 Generation: 2619713622 (0x9c25a856) FS Generation: 904309833 (0x35e6ac49) CRC32: 00000000 ECC: 0000 Type: Regular Attr: 0x0 Flags: Valid Dynamic Features: (0x16) HasXattr InlineXattr Refcounted Extended Attributes Block: 0 Extended Attributes Inline Size: 256 User: 0 (root) Group: 0 (root) Size: 281320357888 Links: 1 Clusters: 141738 ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024 atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024 mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024 dtime: 0x0 -- Wed Dec 31 17:00:00 1969 Refcount Block: 2777346 Last Extblk: 2886943 Orphan Slot: 0 Sub Alloc Slot: 0 Sub Alloc Bit: 14 Tree Depth: 1 Count: 227 Next Free Rec: 230 ## Offset Clusters Block# 0 0 2310 2776351 1 2310 2139 2777375 2 4449 1221 2778399 3 5670 731 2779423 4 6401 566 2780447 ....... .... ....... ....... .... ....... The issue was in the reflink workfow while reserving space for inline xattr. The problematic function is ocfs2_reflink_xattr_inline(). By the time this function is called the reflink tree is already recreated at the destination inode from the source inode. At this point, this function reserves space for inline xattrs at the destination inode without even checking if there is space at the root metadata block. It simply reduces the l_count from 243 to 227 thereby making space of 256 bytes for inline xattr whereas the inode already has extents beyond this index (in this case up to 230), thereby causing corruption. The fix for this is to reserve space for inline metadata at the destination inode before the reflink tree gets recreated. The customer has verified the fix. Link: https://lkml.kernel.org/r/20240918063844.1830332-1-gautham.ananthakrishna@oracle.com Fixes: ef962df057aa ("ocfs2: xattr: fix inlined xattr reflink") Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ocfs2: fix uninit-value in ocfs2_get_block()Joseph Qi1-3/+2
commit 2af148ef8549a12f8025286b8825c2833ee6bcb8 upstream. syzbot reported an uninit-value BUG: BUG: KMSAN: uninit-value in ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159 ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159 do_mpage_readpage+0xc45/0x2780 fs/mpage.c:225 mpage_readahead+0x43f/0x840 fs/mpage.c:374 ocfs2_readahead+0x269/0x320 fs/ocfs2/aops.c:381 read_pages+0x193/0x1110 mm/readahead.c:160 page_cache_ra_unbounded+0x901/0x9f0 mm/readahead.c:273 do_page_cache_ra mm/readahead.c:303 [inline] force_page_cache_ra+0x3b1/0x4b0 mm/readahead.c:332 force_page_cache_readahead mm/internal.h:347 [inline] generic_fadvise+0x6b0/0xa90 mm/fadvise.c:106 vfs_fadvise mm/fadvise.c:185 [inline] ksys_fadvise64_64 mm/fadvise.c:199 [inline] __do_sys_fadvise64 mm/fadvise.c:214 [inline] __se_sys_fadvise64 mm/fadvise.c:212 [inline] __x64_sys_fadvise64+0x1fb/0x3a0 mm/fadvise.c:212 x64_sys_call+0xe11/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:222 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f This is because when ocfs2_extent_map_get_blocks() fails, p_blkno is uninitialized. So the error log will trigger the above uninit-value access. The error log is out-of-date since get_blocks() was removed long time ago. And the error code will be logged in ocfs2_extent_map_get_blocks() once ocfs2_get_cluster() fails, so fix this by only logging inode and block. Link: https://syzkaller.appspot.com/bug?extid=9709e73bae885b05314b Link: https://lkml.kernel.org/r/20240925090600.3643376-1-joseph.qi@linux.alibaba.com Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: syzbot+9709e73bae885b05314b@syzkaller.appspotmail.com Tested-by: syzbot+9709e73bae885b05314b@syzkaller.appspotmail.com Cc: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ocfs2: fix the la space leak when unmounting an ocfs2 volumeHeming Zhao1-0/+19
commit dfe6c5692fb525e5e90cefe306ee0dffae13d35f upstream. This bug has existed since the initial OCFS2 code. The code logic in ocfs2_sync_local_to_main() is wrong, as it ignores the last contiguous free bits, which causes an OCFS2 volume to lose the last free clusters of LA window on each umount command. Link: https://lkml.kernel.org/r/20240719114310.14245-1-heming.zhao@suse.com Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Su Yue <glass.su@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10jbd2: correctly compare tids with tid_geq function in jbd2_fc_begin_commitKemeng Shi1-1/+1
commit f0e3c14802515f60a47e6ef347ea59c2733402aa upstream. Use tid_geq to compare tids to work over sequence number wraps. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Cc: stable@kernel.org Link: https://patch.msgid.link/20240801013815.2393869-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10jbd2: stop waiting for space when jbd2_cleanup_journal_tail() returns errorBaokun Li1-2/+5
commit f5cacdc6f2bb2a9bf214469dd7112b43dd2dd68a upstream. In __jbd2_log_wait_for_space(), we might call jbd2_cleanup_journal_tail() to recover some journal space. But if an error occurs while executing jbd2_cleanup_journal_tail() (e.g., an EIO), we don't stop waiting for free space right away, we try other branches, and if j_committing_transaction is NULL (i.e., the tid is 0), we will get the following complain: ============================================ JBD2: I/O error when updating journal superblock for sdd-8. __jbd2_log_wait_for_space: needed 256 blocks and only had 217 space available __jbd2_log_wait_for_space: no way to get more journal space in sdd-8 ------------[ cut here ]------------ WARNING: CPU: 2 PID: 139804 at fs/jbd2/checkpoint.c:109 __jbd2_log_wait_for_space+0x251/0x2e0 Modules linked in: CPU: 2 PID: 139804 Comm: kworker/u8:3 Not tainted 6.6.0+ #1 RIP: 0010:__jbd2_log_wait_for_space+0x251/0x2e0 Call Trace: <TASK> add_transaction_credits+0x5d1/0x5e0 start_this_handle+0x1ef/0x6a0 jbd2__journal_start+0x18b/0x340 ext4_dirty_inode+0x5d/0xb0 __mark_inode_dirty+0xe4/0x5d0 generic_update_time+0x60/0x70 [...] ============================================ So only if jbd2_cleanup_journal_tail() returns 1, i.e., there is nothing to clean up at the moment, continue to try to reclaim free space in other ways. Note that this fix relies on commit 6f6a6fda2945 ("jbd2: fix ocfs2 corrupt when updating journal superblock fails") to make jbd2_cleanup_journal_tail return the correct error code. Fixes: 8c3f25d8950c ("jbd2: don't give up looking for space so easily in __jbd2_log_wait_for_space") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240718115336.2554501-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10parisc: Fix stack start for ADDR_NO_RANDOMIZE personalityHelge Deller1-1/+2
commit f31b256994acec6929306dfa86ac29716e7503d6 upstream. Fix the stack start address calculation for the parisc architecture in setup_arg_pages() when address randomization is disabled. When the ADDR_NO_RANDOMIZE process personality is disabled there is no need to add additional space for the stack. Note that this patch touches code inside an #ifdef CONFIG_STACK_GROWSUP hunk, which is why only the parisc architecture is affected since it's the only Linux architecture where the stack grows upwards. Without this patch you will find the stack in the middle of some mapped libaries and suddenly limited to 6MB instead of 8MB: root@parisc:~# setarch -R /bin/bash -c "cat /proc/self/maps" 00010000-00019000 r-xp 00000000 08:05 1182034 /usr/bin/cat 00019000-0001a000 rwxp 00009000 08:05 1182034 /usr/bin/cat 0001a000-0003b000 rwxp 00000000 00:00 0 [heap] f90c4000-f9283000 r-xp 00000000 08:05 1573004 /usr/lib/hppa-linux-gnu/libc.so.6 f9283000-f9285000 r--p 001bf000 08:05 1573004 /usr/lib/hppa-linux-gnu/libc.so.6 f9285000-f928a000 rwxp 001c1000 08:05 1573004 /usr/lib/hppa-linux-gnu/libc.so.6 f928a000-f9294000 rwxp 00000000 00:00 0 f9301000-f9323000 rwxp 00000000 00:00 0 [stack] f98b4000-f98e4000 r-xp 00000000 08:05 1572869 /usr/lib/hppa-linux-gnu/ld.so.1 f98e4000-f98e5000 r--p 00030000 08:05 1572869 /usr/lib/hppa-linux-gnu/ld.so.1 f98e5000-f98e9000 rwxp 00031000 08:05 1572869 /usr/lib/hppa-linux-gnu/ld.so.1 f9ad8000-f9b00000 rw-p 00000000 00:00 0 f9b00000-f9b01000 r-xp 00000000 00:00 0 [vdso] With the patch the stack gets correctly mapped at the end of the process memory map: root@panama:~# setarch -R /bin/bash -c "cat /proc/self/maps" 00010000-00019000 r-xp 00000000 08:13 16385582 /usr/bin/cat 00019000-0001a000 rwxp 00009000 08:13 16385582 /usr/bin/cat 0001a000-0003b000 rwxp 00000000 00:00 0 [heap] fef29000-ff0eb000 r-xp 00000000 08:13 16122400 /usr/lib/hppa-linux-gnu/libc.so.6 ff0eb000-ff0ed000 r--p 001c2000 08:13 16122400 /usr/lib/hppa-linux-gnu/libc.so.6 ff0ed000-ff0f2000 rwxp 001c4000 08:13 16122400 /usr/lib/hppa-linux-gnu/libc.so.6 ff0f2000-ff0fc000 rwxp 00000000 00:00 0 ff4b4000-ff4e4000 r-xp 00000000 08:13 16121913 /usr/lib/hppa-linux-gnu/ld.so.1 ff4e4000-ff4e6000 r--p 00030000 08:13 16121913 /usr/lib/hppa-linux-gnu/ld.so.1 ff4e6000-ff4ea000 rwxp 00032000 08:13 16121913 /usr/lib/hppa-linux-gnu/ld.so.1 ff6d7000-ff6ff000 rw-p 00000000 00:00 0 ff6ff000-ff700000 r-xp 00000000 00:00 0 [vdso] ff700000-ff722000 rwxp 00000000 00:00 0 [stack] Reported-by: Camm Maguire <camm@maguirefamily.org> Signed-off-by: Helge Deller <deller@gmx.de> Fixes: d045c77c1a69 ("parisc,metag: Fix crashes due to stack randomization on stack-grows-upwards architectures") Fixes: 17d9822d4b4c ("parisc: Consider stack randomization for mmap base only when necessary") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: fix off by one issue in alloc_flex_gd()Baokun Li1-8/+10
commit 6121258c2b33ceac3d21f6a221452692c465df88 upstream. Wesley reported an issue: ================================================================== EXT4-fs (dm-5): resizing filesystem from 7168 to 786432 blocks ------------[ cut here ]------------ kernel BUG at fs/ext4/resize.c:324! CPU: 9 UID: 0 PID: 3576 Comm: resize2fs Not tainted 6.11.0+ #27 RIP: 0010:ext4_resize_fs+0x1212/0x12d0 Call Trace: __ext4_ioctl+0x4e0/0x1800 ext4_ioctl+0x12/0x20 __x64_sys_ioctl+0x99/0xd0 x64_sys_call+0x1206/0x20d0 do_syscall_64+0x72/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e ================================================================== While reviewing the patch, Honza found that when adjusting resize_bg in alloc_flex_gd(), it was possible for flex_gd->resize_bg to be bigger than flexbg_size. The reproduction of the problem requires the following: o_group = flexbg_size * 2 * n; o_size = (o_group + 1) * group_size; n_group: [o_group + flexbg_size, o_group + flexbg_size * 2) o_size = (n_group + 1) * group_size; Take n=0,flexbg_size=16 as an example: last:15 |o---------------|--------------n-| o_group:0 resize to n_group:30 The corresponding reproducer is: img=test.img rm -f $img truncate -s 600M $img mkfs.ext4 -F $img -b 1024 -G 16 8M dev=`losetup -f --show $img` mkdir -p /tmp/test mount $dev /tmp/test resize2fs $dev 248M Delete the problematic plus 1 to fix the issue, and add a WARN_ON_ONCE() to prevent the issue from happening again. [ Note: another reproucer which this commit fixes is: img=test.img rm -f $img truncate -s 25MiB $img mkfs.ext4 -b 4096 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 $img truncate -s 3GiB $img dev=`losetup -f --show $img` mkdir -p /tmp/test mount $dev /tmp/test resize2fs $dev 3G umount $dev losetup -d $dev -- TYT ] Reported-by: Wesley Hershberger <wesley.hershberger@canonical.com> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081231 Reported-by: Stéphane Graber <stgraber@stgraber.org> Closes: https://lore.kernel.org/all/20240925143325.518508-1-aleksandr.mikhalitsyn@canonical.com/ Tested-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Tested-by: Eric Sandeen <sandeen@redhat.com> Fixes: 665d3e0af4d3 ("ext4: reduce unnecessary memory allocation in alloc_flex_gd()") Cc: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240927133329.1015041-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: mark fc as ineligible using an handle in ext4_xattr_set()Luis Henriques (SUSE)1-1/+2
commit 04e6ce8f06d161399e5afde3df5dcfa9455b4952 upstream. Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result in a fast-commit being done before the filesystem is effectively marked as ineligible. This patch moves the call to this function so that an handle can be used. If a transaction fails to start, then there's not point in trying to mark the filesystem as ineligible, and an error will eventually be returned to user-space. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240923104909.18342-3-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: use handle to mark fc as ineligible in __track_dentry_update()Luis Henriques (SUSE)1-8/+11
commit faab35a0370fd6e0821c7a8dd213492946fc776f upstream. Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result in a fast-commit being done before the filesystem is effectively marked as ineligible. This patch fixes the calls to this function in __track_dentry_update() by adding an extra parameter to the callback used in ext4_fc_track_template(). Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240923104909.18342-2-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: fix fast commit inode enqueueing during a full journal commitLuis Henriques (SUSE)2-2/+15
commit 6db3c1575a750fd417a70e0178bdf6efa0dd5037 upstream. When a full journal commit is on-going, any fast commit has to be enqueued into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing is done only once, i.e. if an inode is already queued in a previous fast commit entry it won't be enqueued again. However, if a full commit starts _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to be done into FC_Q_STAGING. And this is not being done in function ext4_fc_track_template(). This patch fixes the issue by re-enqueuing an inode into the STAGING queue during the fast commit clean-up callback when doing a full commit. However, to prevent a race with a fast-commit, the clean-up callback has to be called with the journal locked. This bug was found using fstest generic/047. This test creates several 32k bytes files, sync'ing each of them after it's creation, and then shutting down the filesystem. Some data may be loss in this operation; for example a file may have it's size truncated to zero. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240717172220.14201-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: fix incorrect tid assumption in jbd2_journal_shrink_checkpoint_list()Luis Henriques (SUSE)1-2/+5
commit 7a6443e1dad70281f99f0bd394d7fd342481a632 upstream. Function jbd2_journal_shrink_checkpoint_list() assumes that '0' is not a valid value for transaction IDs, which is incorrect. Don't assume that and use two extra boolean variables to control the loop iterations and keep track of the first and last tid. Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240724161119.13448-4-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: fix incorrect tid assumption in ext4_wait_for_tail_page_commit()Luis Henriques (SUSE)1-4/+7
commit dd589b0f1445e1ea1085b98edca6e4d5dedb98d0 upstream. Function ext4_wait_for_tail_page_commit() assumes that '0' is not a valid value for transaction IDs, which is incorrect. Don't assume that and invoke jbd2_log_wait_commit() if the journal had a committing transaction instead. Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240724161119.13448-2-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: update orig_path in ext4_find_extent()Baokun Li2-2/+2
commit 5b4b2dcace35f618fe361a87bae6f0d13af31bc1 upstream. In ext4_find_extent(), if the path is not big enough, we free it and set *orig_path to NULL. But after reallocating and successfully initializing the path, we don't update *orig_path, in which case the caller gets a valid path but a NULL ppath, and this may cause a NULL pointer dereference or a path memory leak. For example: ext4_split_extent path = *ppath = 2000 ext4_find_extent if (depth > path[0].p_maxdepth) kfree(path = 2000); *orig_path = path = NULL; path = kcalloc() = 3000 ext4_split_extent_at(*ppath = NULL) path = *ppath; ex = path[depth].p_ext; // NULL pointer dereference! ================================================================== BUG: kernel NULL pointer dereference, address: 0000000000000010 CPU: 6 UID: 0 PID: 576 Comm: fsstress Not tainted 6.11.0-rc2-dirty #847 RIP: 0010:ext4_split_extent_at+0x6d/0x560 Call Trace: <TASK> ext4_split_extent.isra.0+0xcb/0x1b0 ext4_ext_convert_to_initialized+0x168/0x6c0 ext4_ext_handle_unwritten_extents+0x325/0x4d0 ext4_ext_map_blocks+0x520/0xdb0 ext4_map_blocks+0x2b0/0x690 ext4_iomap_begin+0x20e/0x2c0 [...] ================================================================== Therefore, *orig_path is updated when the extent lookup succeeds, so that the caller can safely use path or *ppath. Fixes: 10809df84a4d ("ext4: teach ext4_ext_find_extent() to realloc path if necessary") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240822023545.1994557-6-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: fix access to uninitialised lock in fc replay pathLuis Henriques (SUSE)1-1/+2
commit 23dfdb56581ad92a9967bcd720c8c23356af74c1 upstream. The following kernel trace can be triggered with fstest generic/629 when executed against a filesystem with fast-commit feature enabled: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 0 PID: 866 Comm: mount Not tainted 6.10.0+ #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x66/0x90 register_lock_class+0x759/0x7d0 __lock_acquire+0x85/0x2630 ? __find_get_block+0xb4/0x380 lock_acquire+0xd1/0x2d0 ? __ext4_journal_get_write_access+0xd5/0x160 _raw_spin_lock+0x33/0x40 ? __ext4_journal_get_write_access+0xd5/0x160 __ext4_journal_get_write_access+0xd5/0x160 ext4_reserve_inode_write+0x61/0xb0 __ext4_mark_inode_dirty+0x79/0x270 ? ext4_ext_replay_set_iblocks+0x2f8/0x450 ext4_ext_replay_set_iblocks+0x330/0x450 ext4_fc_replay+0x14c8/0x1540 ? jread+0x88/0x2e0 ? rcu_is_watching+0x11/0x40 do_one_pass+0x447/0xd00 jbd2_journal_recover+0x139/0x1b0 jbd2_journal_load+0x96/0x390 ext4_load_and_init_journal+0x253/0xd40 ext4_fill_super+0x2cc6/0x3180 ... In the replay path there's an attempt to lock sbi->s_bdev_wb_lock in function ext4_check_bdev_write_error(). Unfortunately, at this point this spinlock has not been initialized yet. Moving it's initialization to an earlier point in __ext4_fill_super() fixes this splat. Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Link: https://patch.msgid.link/20240718094356.7863-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: fix timer use-after-free on failed mountXiaxi Shen1-1/+1
commit 0ce160c5bdb67081a62293028dc85758a8efb22a upstream. Syzbot has found an ODEBUG bug in ext4_fill_super The del_timer_sync function cancels the s_err_report timer, which reminds about filesystem errors daily. We should guarantee the timer is no longer active before kfree(sbi). When filesystem mounting fails, the flow goes to failed_mount3, where an error occurs when ext4_stop_mmpd is called, causing a read I/O failure. This triggers the ext4_handle_error function that ultimately re-arms the timer, leaving the s_err_report timer active before kfree(sbi) is called. Fix the issue by canceling the s_err_report timer after calling ext4_stop_mmpd. Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com> Reported-and-tested-by: syzbot+59e0101c430934bc9a36@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=59e0101c430934bc9a36 Link: https://patch.msgid.link/20240715043336.98097-1-shenxiaxi26@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: fix double brelse() the buffer of the extents pathBaokun Li1-0/+1
commit dcaa6c31134c0f515600111c38ed7750003e1b9c upstream. In ext4_ext_try_to_merge_up(), set path[1].p_bh to NULL after it has been released, otherwise it may be released twice. An example of what triggers this is as follows: split2 map split1 |--------|-------|--------| ext4_ext_map_blocks ext4_ext_handle_unwritten_extents ext4_split_convert_extents // path->p_depth == 0 ext4_split_extent // 1. do split1 ext4_split_extent_at |ext4_ext_insert_extent | ext4_ext_create_new_leaf | ext4_ext_grow_indepth | le16_add_cpu(&neh->eh_depth, 1) | ext4_find_extent | // return -ENOMEM |// get error and try zeroout |path = ext4_find_extent | path->p_depth = 1 |ext4_ext_try_to_merge | ext4_ext_try_to_merge_up | path->p_depth = 0 | brelse(path[1].p_bh) ---> not set to NULL here |// zeroout success // 2. update path ext4_find_extent // 3. do split2 ext4_split_extent_at ext4_ext_insert_extent ext4_ext_create_new_leaf ext4_ext_grow_indepth le16_add_cpu(&neh->eh_depth, 1) ext4_find_extent path[0].p_bh = NULL; path->p_depth = 1 read_extent_tree_block ---> return err // path[1].p_bh is still the old value ext4_free_ext_path ext4_ext_drop_refs // path->p_depth == 1 brelse(path[1].p_bh) ---> brelse a buffer twice Finally got the following WARRNING when removing the buffer from lru: ============================================ VFS: brelse: Trying to free free buffer WARNING: CPU: 2 PID: 72 at fs/buffer.c:1241 __brelse+0x58/0x90 CPU: 2 PID: 72 Comm: kworker/u19:1 Not tainted 6.9.0-dirty #716 RIP: 0010:__brelse+0x58/0x90 Call Trace: <TASK> __find_get_block+0x6e7/0x810 bdev_getblk+0x2b/0x480 __ext4_get_inode_loc+0x48a/0x1240 ext4_get_inode_loc+0xb2/0x150 ext4_reserve_inode_write+0xb7/0x230 __ext4_mark_inode_dirty+0x144/0x6a0 ext4_ext_insert_extent+0x9c8/0x3230 ext4_ext_map_blocks+0xf45/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] ============================================ Fixes: ecb94f5fdf4b ("ext4: collapse a single extent tree block into the inode if possible") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-9-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: aovid use-after-free in ext4_ext_insert_extent()Baokun Li1-0/+1
commit a164f3a432aae62ca23d03e6d926b122ee5b860d upstream. As Ojaswin mentioned in Link, in ext4_ext_insert_extent(), if the path is reallocated in ext4_ext_create_new_leaf(), we'll use the stale path and cause UAF. Below is a sample trace with dummy values: ext4_ext_insert_extent path = *ppath = 2000 ext4_ext_create_new_leaf(ppath) ext4_find_extent(ppath) path = *ppath = 2000 if (depth > path[0].p_maxdepth) kfree(path = 2000); *ppath = path = NULL; path = kcalloc() = 3000 *ppath = 3000; return path; /* here path is still 2000, UAF! */ eh = path[depth].p_hdr ================================================================== BUG: KASAN: slab-use-after-free in ext4_ext_insert_extent+0x26d4/0x3330 Read of size 8 at addr ffff8881027bf7d0 by task kworker/u36:1/179 CPU: 3 UID: 0 PID: 179 Comm: kworker/u6:1 Not tainted 6.11.0-rc2-dirty #866 Call Trace: <TASK> ext4_ext_insert_extent+0x26d4/0x3330 ext4_ext_map_blocks+0xe22/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 [...] Allocated by task 179: ext4_find_extent+0x81c/0x1f70 ext4_ext_map_blocks+0x146/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 ext4_writepages+0x26d/0x4e0 do_writepages+0x175/0x700 [...] Freed by task 179: kfree+0xcb/0x240 ext4_find_extent+0x7c0/0x1f70 ext4_ext_insert_extent+0xa26/0x3330 ext4_ext_map_blocks+0xe22/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 ext4_writepages+0x26d/0x4e0 do_writepages+0x175/0x700 [...] ================================================================== So use *ppath to update the path to avoid the above problem. Reported-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Closes: https://lore.kernel.org/r/ZqyL6rmtwl6N4MWR@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com Fixes: 10809df84a4d ("ext4: teach ext4_ext_find_extent() to realloc path if necessary") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240822023545.1994557-7-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: drop ppath from ext4_ext_replay_update_ex() to avoid double-freeBaokun Li1-11/+10
commit 5c0f4cc84d3a601c99bc5e6e6eb1cbda542cce95 upstream. When calling ext4_force_split_extent_at() in ext4_ext_replay_update_ex(), the 'ppath' is updated but it is the 'path' that is freed, thus potentially triggering a double-free in the following process: ext4_ext_replay_update_ex ppath = path ext4_force_split_extent_at(&ppath) ext4_split_extent_at ext4_ext_insert_extent ext4_ext_create_new_leaf ext4_ext_grow_indepth ext4_find_extent if (depth > path[0].p_maxdepth) kfree(path) ---> path First freed *orig_path = path = NULL ---> null ppath kfree(path) ---> path double-free !!! So drop the unnecessary ppath and use path directly to avoid this problem. And use ext4_find_extent() directly to update path, avoiding unnecessary memory allocation and freeing. Also, propagate the error returned by ext4_find_extent() instead of using strange error codes. Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-8-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10ext4: fix incorrect tid assumption in __jbd2_log_wait_for_space()Luis Henriques (SUSE)1-2/+5
commit 972090651ee15e51abfb2160e986fa050cfc7a40 upstream. Function __jbd2_log_wait_for_space() assumes that '0' is not a valid value for transaction IDs, which is incorrect. Don't assume that and invoke jbd2_log_wait_commit() if the journal had a committing transaction instead. Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msg