summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2024-11-14smb: client: leave only string and id in smb_version_valueshw24Enzo Matsumiya9-262/+101
Remove all other values as they're the same for SMB 3.0+. For SMB 2.1 handle the only 2 different values (.req_capabilities and .create_lease_size) in-place. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: remove ->calc_signature opEnzo Matsumiya5-52/+37
Make smb{2,3}_calc_signature() static as it's only called inside smb2transport.c. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: remove ->*transform* opsEnzo Matsumiya5-38/+30
- Add is_transform_hdr() and try_decrypt() functions to connect.c - Expose smb3_init_transform_rq() and smb3_receive_transform() in smb2proto.h Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: remove ->fallocate opEnzo Matsumiya4-13/+17
Expose smb3_fallocate() and use it directly instead. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: remove ->duplicate_extents opEnzo Matsumiya5-16/+9
Expose it in smb2proto.h and update comment in trace.h. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: remove ->parse_lease_buf opEnzo Matsumiya3-35/+21
Make it static in smb2pdu.c as it's used only in a single place. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: remove ->create_lease_buf opEnzo Matsumiya3-71/+48
Add a generic function for both create_lease v1 and v2. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: remove ->set_oplock_level opEnzo Matsumiya2-59/+58
Add set_oplock_level() that chooses the correct function based if SMB 2.1 or >= 3.0. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: remove ->set_integrity opEnzo Matsumiya4-12/+6
Call smb3_set_integrity() directly if protocol is >= 3.0. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: remove ->query_server_interfaces opEnzo Matsumiya6-29/+12
Call SMB3_request_interfaces() directly instead. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: remove ->validate_negotiate opEnzo Matsumiya4-21/+11
Make it static as it's only used in a single place. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: remove md5 secmech (SMB1)Enzo Matsumiya2-2/+0
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: remove some leftover bits from SMB1 codeEnzo Matsumiya2-5/+4
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: update Kconfig to remove old infoEnzo Matsumiya1-23/+3
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: replace function pointers with smb{2,3}_* functionsEnzo Matsumiya21-670/+423
Continuation of previous commit. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-14smb: client: replace function pointers of common operations for theHenrique Carvalho23-888/+405
common function Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
2024-11-12smb: client: remove big chunk of SMB1-only codeEnzo Matsumiya20-5067/+96
Remove no longer needed files, structs, macros, functions that were only used by SMB1. TODO: there are probably leftovers, so remove them as they're found. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-12smb: client: delete cifssmb.c and remove deleted functions fromHenrique Carvalho2-5665/+0
cifsproto.h Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
2024-11-12smb: client: remove support for CONFIG_CIFS_POSIXHenrique Carvalho4-589/+0
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
2024-11-12smb: remove CIFS_ALLOW_INSECURE_LEGACY and CIFS_POSIX from KconfigHenrique Carvalho1-27/+0
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
2024-11-12smb: delete smb1ops.c and remove CIFS_ALLOW_INSECURE_LEGACY fromHenrique Carvalho2-1195/+0
Makefile Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
2024-11-12smb: client: remove support for SMB 1.0 and 2.0 mountsEnzo Matsumiya10-168/+82
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
2024-11-12smb: remove support for CONFIG_CIFS_ALLOW_INSECURE_LEGACYHenrique Carvalho13-3182/+2
This commit removes all code guarded by the macro CONFIG_CIFS_ALLOW_INSECURE_LEGACY as a first step of the cifs module cleanup. Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
2024-11-08fs/ntfs3: Sequential field availability check in mi_enum_attr()Konstantin Komarov1-8/+7
commit 090f612756a9720ec18b0b130e28be49839d7cb5 upstream. The code is slightly reformatted to consistently check field availability without duplication. Fixes: 556bdf27c2dd ("ntfs3: Add bounds checking to mi_enum_attr()") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08btrfs: fix defrag not merging contiguous extents due to merged extent mapsFilipe Manana1-5/+5
[ Upstream commit 77b0d113eec49a7390ff1a08ca1923e89f5f86c6 ] When running defrag (manual defrag) against a file that has extents that are contiguous and we already have the respective extent maps loaded and merged, we end up not defragging the range covered by those contiguous extents. This happens when we have an extent map that was the result of merging multiple extent maps for contiguous extents and the length of the merged extent map is greater than or equals to the defrag threshold length. The script below reproduces this scenario: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT # Create a 256K file with 4 extents of 64K each. xfs_io -f -c "falloc 0 64K" \ -c "pwrite 0 64K" \ -c "falloc 64K 64K" \ -c "pwrite 64K 64K" \ -c "falloc 128K 64K" \ -c "pwrite 128K 64K" \ -c "falloc 192K 64K" \ -c "pwrite 192K 64K" \ $MNT/foo umount $MNT echo -n "Initial number of file extent items: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l mount $DEV $MNT # Read the whole file in order to load and merge extent maps. cat $MNT/foo > /dev/null btrfs filesystem defragment -t 128K $MNT/foo umount $MNT echo -n "Number of file extent items after defrag with 128K threshold: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l mount $DEV $MNT # Read the whole file in order to load and merge extent maps. cat $MNT/foo > /dev/null btrfs filesystem defragment -t 256K $MNT/foo umount $MNT echo -n "Number of file extent items after defrag with 256K threshold: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l Running it: $ ./test.sh Initial number of file extent items: 4 Number of file extent items after defrag with 128K threshold: 4 Number of file extent items after defrag with 256K threshold: 4 The 4 extents don't get merged because we have an extent map with a size of 256K that is the result of merging the individual extent maps for each of the four 64K extents and at defrag_lookup_extent() we have a value of zero for the generation threshold ('newer_than' argument) since this is a manual defrag. As a consequence we don't call defrag_get_extent() to get an extent map representing a single file extent item in the inode's subvolume tree, so we end up using the merged extent map at defrag_collect_targets() and decide not to defrag. Fix this by updating defrag_lookup_extent() to always discard extent maps that were merged and call defrag_get_extent() regardless of the minimum generation threshold ('newer_than' argument). A test case for fstests will be sent along soon. CC: stable@vger.kernel.org # 6.1+ Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08btrfs: fix extent map merging not happening for adjacent extentsFilipe Manana1-1/+6
[ Upstream commit a0f0625390858321525c2a8d04e174a546bd19b3 ] If we have 3 or more adjacent extents in a file, that is, consecutive file extent items pointing to adjacent extents, within a contiguous file range and compatible flags, we end up not merging all the extents into a single extent map. For example: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ xfs_io -f -d -c "pwrite -b 64K 0 64K" \ -c "pwrite -b 64K 64K 64K" \ -c "pwrite -b 64K 128K 64K" \ -c "pwrite -b 64K 192K 64K" \ /mnt/sdc/foo After all the ordered extents complete we unpin the extent maps and try to merge them, but instead of getting a single extent map we get two because: 1) When the first ordered extent completes (file range [0, 64K)) we unpin its extent map and attempt to merge it with the extent map for the range [64K, 128K), but we can't because that extent map is still pinned; 2) When the second ordered extent completes (file range [64K, 128K)), we unpin its extent map and merge it with the previous extent map, for file range [0, 64K), but we can't merge with the next extent map, for the file range [128K, 192K), because this one is still pinned. The merged extent map for the file range [0, 128K) gets the flag EXTENT_MAP_MERGED set; 3) When the third ordered extent completes (file range [128K, 192K)), we unpin its extent map and attempt to merge it with the previous extent map, for file range [0, 128K), but we can't because that extent map has the flag EXTENT_MAP_MERGED set (mergeable_maps() returns false due to different flags) while the extent map for the range [128K, 192K) doesn't have that flag set. We also can't merge it with the next extent map, for file range [192K, 256K), because that one is still pinned. At this moment we have 3 extent maps: One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set. One for file range [128K, 192K). One for file range [192K, 256K) which is still pinned; 4) When the fourth and final extent completes (file range [192K, 256K)), we unpin its extent map and attempt to merge it with the previous extent map, for file range [128K, 192K), which succeeds since none of these extent maps have the EXTENT_MAP_MERGED flag set. So we end up with 2 extent maps: One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set. One for file range [128K, 256K), with the flag EXTENT_MAP_MERGED set. Since after merging extent maps we don't attempt to merge again, that is, merge the resulting extent map with the one that is now preceding it (and the one following it), we end up with those two extent maps, when we could have had a single extent map to represent the whole file. Fix this by making mergeable_maps() ignore the EXTENT_MAP_MERGED flag. While this doesn't present any functional issue, it prevents the merging of extent maps which allows to save memory, and can make defrag not merging extents too (that will be addressed in the next patch). Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08xfs: fix finding a last resort AG in xfs_filestream_pick_agChristoph Hellwig2-21/+17
[ Upstream commit dc60992ce76fbc2f71c2674f435ff6bde2108028 ] When the main loop in xfs_filestream_pick_ag fails to find a suitable AG it tries to just pick the online AG. But the loop for that uses args->pag as loop iterator while the later code expects pag to be set. Fix this by reusing the max_pag case for this last resort, and also add a check for impossible case of no AG just to make sure that the uninitialized pag doesn't even escape in theory. Reported-by: syzbot+4125a3c514e3436a02e6@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: syzbot+4125a3c514e3436a02e6@syzkaller.appspotmail.com Fixes: f8f1ed1ab3baba ("xfs: return a referenced perag from filestreams allocator") Cc: <stable@vger.kernel.org> # v6.3 Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()Zhihao Cheng1-0/+1
[ Upstream commit aec8e6bf839101784f3ef037dcdb9432c3f32343 ] Mounting btrfs from two images (which have the same one fsid and two different dev_uuids) in certain executing order may trigger an UAF for variable 'device->bdev_file' in __btrfs_free_extra_devids(). And following are the details: 1. Attach image_1 to loop0, attach image_2 to loop1, and scan btrfs devices by ioctl(BTRFS_IOC_SCAN_DEV): / btrfs_device_1 → loop0 fs_device \ btrfs_device_2 → loop1 2. mount /dev/loop0 /mnt btrfs_open_devices btrfs_device_1->bdev_file = btrfs_get_bdev_and_sb(loop0) btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree fail: btrfs_close_devices // -ENOMEM btrfs_close_bdev(btrfs_device_1) fput(btrfs_device_1->bdev_file) // btrfs_device_1->bdev_file is freed btrfs_close_bdev(btrfs_device_2) fput(btrfs_device_2->bdev_file) 3. mount /dev/loop1 /mnt btrfs_open_devices btrfs_get_bdev_and_sb(&bdev_file) // EIO, btrfs_device_1->bdev_file is not assigned, // which points to a freed memory area btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree btrfs_free_extra_devids if (btrfs_device_1->bdev_file) fput(btrfs_device_1->bdev_file) // UAF ! Fix it by setting 'device->bdev_file' as 'NULL' after closing the btrfs_device in btrfs_close_one_device(). Fixes: 142388194191 ("btrfs: do not background blkdev_put()") CC: stable@vger.kernel.org # 4.19+ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219408 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08ocfs2: pass u64 to ocfs2_truncate_inline maybe overflowEdward Adam Davis1-0/+8
[ Upstream commit bc0a2f3a73fcdac651fca64df39306d1e5ebe3b0 ] Syzbot reported a kernel BUG in ocfs2_truncate_inline. There are two reasons for this: first, the parameter value passed is greater than ocfs2_max_inline_data_with_xattr, second, the start and end parameters of ocfs2_truncate_inline are "unsigned int". So, we need to add a sanity check for byte_start and byte_len right before ocfs2_truncate_inline() in ocfs2_remove_inode_range(), if they are greater than ocfs2_max_inline_data_with_xattr return -EINVAL. Link: https://lkml.kernel.org/r/tencent_D48DB5122ADDAEDDD11918CFB68D93258C07@qq.com Fixes: 1afc32b95233 ("ocfs2: Write support for inline data") Signed-off-by: Edward Adam Davis <eadavis@qq.com> Reported-by: syzbot+81092778aac03460d6b7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=81092778aac03460d6b7 Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08fork: do not invoke uffd on fork if error occursLorenzo Stoakes1-0/+28
[ Upstream commit f64e67e5d3a45a4a04286c47afade4b518acd47b ] Patch series "fork: do not expose incomplete mm on fork". During fork we may place the virtual memory address space into an inconsistent state before the fork operation is complete. In addition, we may encounter an error during the fork operation that indicates that the virtual memory address space is invalidated. As a result, we should not be exposing it in any way to external machinery that might interact with the mm or VMAs, machinery that is not designed to deal with incomplete state. We specifically update the fork logic to defer khugepaged and ksm to the end of the operation and only to be invoked if no error arose, and disallow uffd from observing fork events should an error have occurred. This patch (of 2): Currently on fork we expose the virtual address space of a process to userland unconditionally if uffd is registered in VMAs, regardless of whether an error arose in the fork. This is performed in dup_userfaultfd_complete() which is invoked unconditionally, and performs two duties - invoking registered handlers for the UFFD_EVENT_FORK event via dup_fctx(), and clearing down userfaultfd_fork_ctx objects established in dup_userfaultfd(). This is problematic, because the virtual address space may not yet be correctly initialised if an error arose. The change in commit d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") makes this more pertinent as we may be in a state where entries in the maple tree are not yet consistent. We address this by, on fork error, ensuring that we roll back state that we would otherwise expect to clean up through the event being handled by userland and perform the memory freeing duty otherwise performed by dup_userfaultfd_complete(). We do this by implementing a new function, dup_userfaultfd_fail(), which performs the same loop, only decrementing reference counts. Note that we perform mmgrab() on the parent and child mm's, however userfaultfd_ctx_put() will mmdrop() this once the reference count drops to zero, so we will avoid memory leaks correctly here. Link: https://lkml.kernel.org/r/cover.1729014377.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/d3691d58bb58712b6fb3df2be441d175bd3cdf07.1729014377.git.lorenzo.stoakes@oracle.com Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: Jann Horn <jannh@google.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08btrfs: fix error propagation of split biosNaohiro Aota2-24/+16
[ Upstream commit d48e1dea3931de64c26717adc2b89743c7ab6594 ] The purpose of btrfs_bbio_propagate_error() shall be propagating an error of split bio to its original btrfs_bio, and tell the error to the upper layer. However, it's not working well on some cases. * Case 1. Immediate (or quick) end_bio with an error When btrfs sends btrfs_bio to mirrored devices, btrfs calls btrfs_bio_end_io() when all the mirroring bios are completed. If that btrfs_bio was split, it is from btrfs_clone_bioset and its end_io function is btrfs_orig_write_end_io. For this case, btrfs_bbio_propagate_error() accesses the orig_bbio's bio context to increase the error count. That works well in most cases. However, if the end_io is called enough fast, orig_bbio's (remaining part after split) bio context may not be properly set at that time. Since the bio context is set when the orig_bbio (the last btrfs_bio) is sent to devices, that might be too late for earlier split btrfs_bio's completion. That will result in NULL pointer dereference. That bug is easily reproducible by running btrfs/146 on zoned devices [1] and it shows the following trace. [1] You need raid-stripe-tree feature as it create "-d raid0 -m raid1" FS. BUG: kernel NULL pointer dereference, address: 0000000000000020 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 UID: 0 PID: 13 Comm: kworker/u32:1 Not tainted 6.11.0-rc7-BTRFS-ZNS+ #474 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: writeback wb_workfn (flush-btrfs-5) RIP: 0010:btrfs_bio_end_io+0xae/0xc0 [btrfs] BTRFS error (device dm-0): bdev /dev/mapper/error-test errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 RSP: 0018:ffffc9000006f248 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888005a7f080 RCX: ffffc9000006f1dc RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff888005a7f080 RBP: ffff888011dfc540 R08: 0000000000000000 R09: 0000000000000001 R10: ffffffff82e508e0 R11: 0000000000000005 R12: ffff88800ddfbe58 R13: ffff888005a7f080 R14: ffff888005a7f158 R15: ffff888005a7f158 FS: 0000000000000000(0000) GS:ffff88803ea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000020 CR3: 0000000002e22006 CR4: 0000000000370ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die_body.cold+0x19/0x26 ? page_fault_oops+0x13e/0x2b0 ? _printk+0x58/0x73 ? do_user_addr_fault+0x5f/0x750 ? exc_page_fault+0x76/0x240 ? asm_exc_page_fault+0x22/0x30 ? btrfs_bio_end_io+0xae/0xc0 [btrfs] ? btrfs_log_dev_io_error+0x7f/0x90 [btrfs] btrfs_orig_write_end_io+0x51/0x90 [btrfs] dm_submit_bio+0x5c2/0xa50 [dm_mod] ? find_held_lock+0x2b/0x80 ? blk_try_enter_queue+0x90/0x1e0 __submit_bio+0xe0/0x130 ? ktime_get+0x10a/0x160 ? lockdep_hardirqs_on+0x74/0x100 submit_bio_noacct_nocheck+0x199/0x410 btrfs_submit_bio+0x7d/0x150 [btrfs] btrfs_submit_chunk+0x1a1/0x6d0 [btrfs] ? lockdep_hardirqs_on+0x74/0x100 ? __folio_start_writeback+0x10/0x2c0 btrfs_submit_bbio+0x1c/0x40 [btrfs] submit_one_bio+0x44/0x60 [btrfs] submit_extent_folio+0x13f/0x330 [btrfs] ? btrfs_set_range_writeback+0xa3/0xd0 [btrfs] extent_writepage_io+0x18b/0x360 [btrfs] extent_write_locked_range+0x17c/0x340 [btrfs] ? __pfx_end_bbio_data_write+0x10/0x10 [btrfs] run_delalloc_cow+0x71/0xd0 [btrfs] btrfs_run_delalloc_range+0x176/0x500 [btrfs] ? find_lock_delalloc_range+0x119/0x260 [btrfs] writepage_delalloc+0x2ab/0x480 [btrfs] extent_write_cache_pages+0x236/0x7d0 [btrfs] btrfs_writepages+0x72/0x130 [btrfs] do_writepages+0xd4/0x240 ? find_held_lock+0x2b/0x80 ? wbc_attach_and_unlock_inode+0x12c/0x290 ? wbc_attach_and_unlock_inode+0x12c/0x290 __writeback_single_inode+0x5c/0x4c0 ? do_raw_spin_unlock+0x49/0xb0 writeback_sb_inodes+0x22c/0x560 __writeback_inodes_wb+0x4c/0xe0 wb_writeback+0x1d6/0x3f0 wb_workfn+0x334/0x520 process_one_work+0x1ee/0x570 ? lock_is_held_type+0xc6/0x130 worker_thread+0x1d1/0x3b0 ? __pfx_worker_thread+0x10/0x10 kthread+0xee/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: dm_mod btrfs blake2b_generic xor raid6_pq rapl CR2: 0000000000000020 * Case 2. Earlier completion of orig_bbio for mirrored btrfs_bios btrfs_bbio_propagate_error() assumes the end_io function for orig_bbio is called last among split bios. In that case, btrfs_orig_write_end_io() sets the bio->bi_status to BLK_STS_IOERR by seeing the bioc->error [2]. Otherwise, the increased orig_bio's bioc->error is not checked by anyone and return BLK_STS_OK to the upper layer. [2] Actually, this is not true. Because we only increases orig_bioc->errors by max_errors, the condition "atomic_read(&bioc->error) > bioc->max_errors" is still not met if only one split btrfs_bio fails. * Case 3. Later completion of orig_bbio for un-mirrored btrfs_bios In contrast to the above case, btrfs_bbio_propagate_error() is not working well if un-mirrored orig_bbio is completed last. It sets orig_bbio->bio.bi_status to the btrfs_bio's error. But, that is easily over-written by orig_bbio's completion status. If the status is BLK_STS_OK, the upper layer would not know the failure. * Solution Considering the above cases, we can only save the error status in the orig_bbio (remaining part after split) itself as it is always available. Also, the saved error status should be propagated when all the split btrfs_bios are finished (i.e, bbio->pending_ios == 0). This commit introduces "status" to btrfs_bbio and saves the first error of split bios to original btrfs_bio's "status" variable. When all the split bios are finished, the saved status is loaded into original btrfs_bio's status. With this commit, btrfs/146 on zoned devices does not hit the NULL pointer dereference anymore. Fixes: 852eee62d31a ("btrfs: allow btrfs_submit_bio to split bios") CC: stable@vger.kernel.org # 6.6+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08btrfs: merge btrfs_orig_bbio_end_io() into btrfs_bio_end_io()Qu Wenruo1-18/+11
[ Upstream commit 9ca0e58cb752b09816f56f7a3147a39773d5e831 ] There are only two differences between the two functions: - btrfs_orig_bbio_end_io() does extra error propagation This is mostly to allow tolerance for write errors. - btrfs_orig_bbio_end_io() does extra pending_ios check This check can handle both the original bio, or the cloned one. (All accounting happens in the original one). This makes btrfs_orig_bbio_end_io() a much safer call. In fact we already had a double freeing error due to usage of btrfs_bio_end_io() in the error path of btrfs_submit_chunk(). So just move the whole content of btrfs_orig_bbio_end_io() into btrfs_bio_end_io(). For normal paths this brings no change, because they are already calling btrfs_orig_bbio_end_io() in the first place. For error paths (not only inside bio.c but also external callers), this change will introduce extra checks, especially for external callers, as they will error out without submitting the btrfs bio. But considering it's already in the error path, such slower but much safer checks are still an overall win. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Stable-dep-of: d48e1dea3931 ("btrfs: fix error propagation of split bios") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08nilfs2: fix potential deadlock with newly created symlinksRyusuke Konishi1-0/+3
commit b3a033e3ecd3471248d474ef263aadc0059e516a upstream. Syzbot reported that page_symlink(), called by nilfs_symlink(), triggers memory reclamation involving the filesystem layer, which can result in circular lock dependencies among the reader/writer semaphore nilfs->ns_segctor_sem, s_writers percpu_rwsem (intwrite) and the fs_reclaim pseudo lock. This is because after commit 21fc61c73c39 ("don't put symlink bodies in pagecache into highmem"), the gfp flags of the page cache for symbolic links are overwritten to GFP_KERNEL via inode_nohighmem(). This is not a problem for symlinks read from the backing device, because the __GFP_FS flag is dropped after inode_nohighmem() is called. However, when a new symlink is created with nilfs_symlink(), the gfp flags remain overwritten to GFP_KERNEL. Then, memory allocation called from page_symlink() etc. triggers memory reclamation including the FS layer, which may call nilfs_evict_inode() or nilfs_dirty_inode(). And these can cause a deadlock if they are called while nilfs->ns_segctor_sem is held: Fix this issue by dropping the __GFP_FS flag from the page cache GFP flags of newly created symlinks in the same way that nilfs_new_inode() and __nilfs_read_inode() do, as a workaround until we adopt nofs allocation scope consistently or improve the locking constraints. Link: https://lkml.kernel.org/r/20241020050003.4308-1-konishi.ryusuke@gmail.com Fixes: 21fc61c73c39 ("don't put symlink bodies in pagecache into highmem") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+9ef37ac20608f4836256@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9ef37ac20608f4836256 Tested-by: syzbot+9ef37ac20608f4836256@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08nilfs2: fix kernel bug due to missing clearing of checked flagRyusuke Konishi1-0/+1
commit 41e192ad2779cae0102879612dfe46726e4396aa upstream. Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08NFSD: Never decrement pending_async_copies on errorChuck Lever1-3/+1
[ Upstream commit 8286f8b622990194207df9ab852e0f87c60d35e9 ] The error flow in nfsd4_copy() calls cleanup_async_copy(), which already decrements nn->pending_async_copies. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08NFSD: Initialize struct nfsd4_copy earlierChuck Lever1-2/+2
[ Upstream commit 63fab04cbd0f96191b6e5beedc3b643b01c15889 ] Ensure the refcount and async_copies fields are initialized early. cleanup_async_copy() will reference these fields if an error occurs in nfsd4_copy(). If they are not correctly initialized, at the very least, a refcount underflow occurs. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations") Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08NFS: remove revoked delegation from server's delegation listDai Ngo1-0/+5
[ Upstream commit 7ef60108069b7e3cc66432304e1dd197d5c0a9b5 ] After the delegation is returned to the NFS server remove it from the server's delegations list to reduce the time it takes to scan this list. Network trace captured while running the below script shows the time taken to service the CB_RECALL increases gradually due to the overhead of traversing the delegation list in nfs_delegation_find_inode_server. The NFS server in this test is a Solaris server which issues CB_RECALL when receiving the all-zero stateid in the SETATTR. mount=/mnt/data for i in $(seq 1 20) do echo $i mkdir $mount/testtarfile$i time tar -C $mount/testtarfile$i -xf 5000_files.tar done Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08cifs: Fix creating native symlinks pointing to current or parent directoryPali Rohár1-3/+14
[ Upstream commit 63271b7d569fbe924bccc7dadc17d3d07a4e5f7a ] Calling 'ln -s . symlink' or 'ln -s .. symlink' creates symlink pointing to some object name which ends with U+F029 unicode codepoint. This is because trailing dot in the object name is replaced by non-ASCII unicode codepoint. So Linux SMB client currently is not able to create native symlink pointing to current or parent directory on Windows SMB server which can be read by either on local Windows server or by any other SMB client which does not implement compatible-reverse character replacement. Fix this problem in cifsConvertToUTF16() function which is doing that character replacement. Function comment already says that it does not need to handle special cases '.' and '..', but after introduction of native symlinks in reparse point form, this handling is needed. Note that this change depends on the previous change "cifs: Improve creating native symlinks pointing to directory". Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08cifs: Improve creating native symlinks pointing to directoryPali Rohár3-4/+164
[ Upstream commit 3eb40512530e4f64f819d8e723b6f41695dace5a ] SMB protocol for native symlinks distinguish between symlink to directory and symlink to file. These two symlink types cannot be exchanged, which means that symlink of file type pointing to directory cannot be resolved at all (and vice-versa). Windows follows this rule for local filesystems (NTFS) and also for SMB. Linux SMB client currenly creates all native symlinks of file type. Which means that Windows (and some other SMB clients) cannot resolve symlinks pointing to directory created by Linux SMB client. As Linux system does not distinguish between directory and file symlinks, its API does not provide enough information for Linux SMB client during creating of native symlinks. Add some heuristic into the Linux SMB client for choosing the correct symlink type during symlink creation. Check if the symlink target location ends with slash, or last path component is dot or dot-dot, and check if the target location on SMB share exists and is a directory. If at least one condition is truth then create a new SMB symlink of directory type. Otherwise create it as file type symlink. This change improves interoperability with Windows systems. Windows systems would be able to resolve more SMB symlinks created by Linux SMB client which points to existing directory. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08fs/ntfs3: Additional check in ntfs_file_releaseKonstantin Komarov1-1/+8
[ Upstream commit 031d6f608290c847ba6378322d0986d08d1a645a ] Reported-by: syzbot+8c652f14a0fde76ff11d@syzkaller.appspotmail.com Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08fs/ntfs3: Fix general protection fault in run_is_mapped_fullKonstantin Komarov1-1/+4
[ Upstream commit a33fb016e49e37aafab18dc3c8314d6399cb4727 ] Fixed deleating of a non-resident attribute in ntfs_create_inode() rollback. Reported-by: syzbot+9af29acd8f27fbce94bc@syzkaller.appspotmail.com Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08fs/ntfs3: Additional check in ni_clear()Konstantin Komarov1-1/+3
[ Upstream commit d178944db36b3369b78a08ba520de109b89bf2a9 ] Checking of NTFS_FLAGS_LOG_REPLAYING added to prevent access to uninitialized bitmap during replay process. Reported-by: syzbot+3bfd2cc059ab93efcdb4@syzkaller.appspotmail.com Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08fs/ntfs3: Fix possible deadlock in mi_readKonstantin Komarov1-1/+1
[ Upstream commit 03b097099eef255fbf85ea6a786ae3c91b11f041 ] Mutex lock with another subclass used in ni_lock_dir(). Reported-by: syzbot+bc7ca0ae4591cb2550f9@syzkaller.appspotmail.com Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08fs/ntfs3: Add rough attr alloc_size checkKonstantin Komarov1-0/+3
[ Upstream commit c4a8ba334262e9a5c158d618a4820e1b9c12495c ] Reported-by: syzbot+c6d94bedd910a8216d25@syzkaller.appspotmail.com Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08fs/ntfs3: Stale inode instead of badKonstantin Komarov1-3/+7
[ Upstream commit 1fd21919de6de245b63066b8ee3cfba92e36f0e9 ] Fixed the logic of processing inode with wrong sequence number. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08fs/ntfs3: Fix warning possible deadlock in ntfs_set_stateKonstantin Komarov1-1/+1
[ Upstream commit 5b2db723455a89dc96743d34d8bdaa23a402db2f ] Use non-zero subkey to skip analyzer warnings. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Reported-by: syzbot+c2ada45c23d98d646118@syzkaller.appspotmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08fs/ntfs3: Check if more than chunk-size bytes are writtenAndrew Ballance1-0/+3
[ Upstream commit 9931122d04c6d431b2c11b5bb7b10f28584067f0 ] A incorrectly formatted chunk may decompress into more than LZNT_CHUNK_SIZE bytes and a index out of bounds will occur in s_max_off. Signed-off-by: Andrew Ballance <andrewjballance@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08ntfs3: Add bounds checking to mi_enum_attr()lei lu1-13/+10
[ Upstream commit 556bdf27c2dd5c74a9caacbe524b943a6cd42d99 ] Added bounds checking to make sure that every attr don't stray beyond valid memory region. Signed-off-by: lei lu <llfamsec@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08smb: client: set correct device number on nfs reparse pointsPaulo Alcantara1-2/+2
[ Upstream commit a9de67336a4aa3ff2e706ba023fb5f7ff681a954 ] Fix major and minor numbers set on special files created with NFS reparse points. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08smb: client: fix parsing of device numbersPaulo Alcantara2-11/+4
[ Upstream commit 663f295e35594f4c2584fc68c28546b747b637cd ] Report correct major and minor numbers from special files created with NFS reparse points. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>