summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2024-01-18ext4: fix double-free of blocks due to wrong extents moved_lenBaokun Li1-4/+2
In ext4_move_extents(), moved_len is only updated when all moves are successfully executed, and only discards orig_inode and donor_inode preallocations when moved_len is not zero. When the loop fails to exit after successfully moving some extents, moved_len is not updated and remains at 0, so it does not discard the preallocations. If the moved extents overlap with the preallocated extents, the overlapped extents are freed twice in ext4_mb_release_inode_pa() and ext4_process_freed_data() (as described in commit 94d7c16cbbbd ("ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT")), and bb_free is incremented twice. Hence when trim is executed, a zero-division bug is triggered in mb_update_avg_fragment_size() because bb_free is not zero and bb_fragments is zero. Therefore, update move_len after each extent move to avoid the issue. Reported-by: Wei Chen <harperchen1110@gmail.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR2Q@mail.gmail.com Fixes: fcf6b1b729bc ("ext4: refactor ext4_move_extents code base") CC: <stable@vger.kernel.org> # 3.18 Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-18exfat: fix zero the unwritten part for dio readYuezhang Mo1-4/+3
For dio read, bio will be leave in flight when a successful partial aio read have been setup, blockdev_direct_IO() will return -EIOCBQUEUED. In the case, iter->iov_offset will be not advanced, the oops reported by syzbot will occur if revert iter->iov_offset with iov_iter_revert(). The unwritten part had been zeroed by aio read, so there is no need to zero it in dio read. Reported-by: syzbot+fd404f6b03a58e8bc403@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fd404f6b03a58e8bc403 Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-19/+37
Pull kvm updates from Paolo Bonzini: "Generic: - Use memdup_array_user() to harden against overflow. - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures. - Clean up Kconfigs that all KVM architectures were selecting - New functionality around "guest_memfd", a new userspace API that creates an anonymous file and returns a file descriptor that refers to it. guest_memfd files are bound to their owning virtual machine, cannot be mapped, read, or written by userspace, and cannot be resized. guest_memfd files do however support PUNCH_HOLE, which can be used to switch a memory area between guest_memfd and regular anonymous memory. - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify per-page attributes for a given page of guest memory; right now the only attribute is whether the guest expects to access memory via guest_memfd or not, which in Confidential SVMs backed by SEV-SNP, TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM). x86: - Support for "software-protected VMs" that can use the new guest_memfd and page attributes infrastructure. This is mostly useful for testing, since there is no pKVM-like infrastructure to provide a meaningfully reduced TCB. - Fix a relatively benign off-by-one error when splitting huge pages during CLEAR_DIRTY_LOG. - Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE. - Use more generic lockdep assertions in paths that don't actually care about whether the caller is a reader or a writer. - let Xen guests opt out of having PV clock reported as "based on a stable TSC", because some of them don't expect the "TSC stable" bit (added to the pvclock ABI by KVM, but never set by Xen) to be set. - Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL. - Advertise flush-by-ASID support for nSVM unconditionally, as KVM always flushes on nested transitions, i.e. always satisfies flush requests. This allows running bleeding edge versions of VMware Workstation on top of KVM. - Sanity check that the CPU supports flush-by-ASID when enabling SEV support. - On AMD machines with vNMI, always rely on hardware instead of intercepting IRET in some cases to detect unmasking of NMIs - Support for virtualizing Linear Address Masking (LAM) - Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state prior to refreshing the vPMU model. - Fix a double-overflow PMU bug by tracking emulated counter events using a dedicated field instead of snapshotting the "previous" counter. If the hardware PMC count triggers overflow that is recognized in the same VM-Exit that KVM manually bumps an event count, KVM would pend PMIs for both the hardware-triggered overflow and for KVM-triggered overflow. - Turn off KVM_WERROR by default for all configs so that it's not inadvertantly enabled by non-KVM developers, which can be problematic for subsystems that require no regressions for W=1 builds. - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL "features". - Don't force a masterclock update when a vCPU synchronizes to the current TSC generation, as updating the masterclock can cause kvmclock's time to "jump" unexpectedly, e.g. when userspace hotplugs a pre-created vCPU. - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths, partly as a super minor optimization, but mostly to make KVM play nice with position independent executable builds. - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on CONFIG_HYPERV as a minor optimization, and to self-document the code. - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation" at build time. ARM64: - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base granule sizes. Branch shared with the arm64 tree. - Large Fine-Grained Trap rework, bringing some sanity to the feature, although there is more to come. This comes with a prefix branch shared with the arm64 tree. - Some additional Nested Virtualization groundwork, mostly introducing the NV2 VNCR support and retargetting the NV support to that version of the architecture. - A small set of vgic fixes and associated cleanups. Loongarch: - Optimization for memslot hugepage checking - Cleanup and fix some HW/SW timer issues - Add LSX/LASX (128bit/256bit SIMD) support RISC-V: - KVM_GET_REG_LIST improvement for vector registers - Generate ISA extension reg_list using macros in get-reg-list selftest - Support for reporting steal time along with selftest s390: - Bugfixes Selftests: - Fix an annoying goof where the NX hugepage test prints out garbage instead of the magic token needed to run the test. - Fix build errors when a header is delete/moved due to a missing flag in the Makefile. - Detect if KVM bugged/killed a selftest's VM and print out a helpful message instead of complaining that a random ioctl() failed. - Annotate the guest printf/assert helpers with __printf(), and fix the various bugs that were lurking due to lack of said annotation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits) x86/kvm: Do not try to disable kvmclock if it was not enabled KVM: x86: add missing "depends on KVM" KVM: fix direction of dependency on MMU notifiers KVM: introduce CONFIG_KVM_COMMON KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache RISC-V: KVM: selftests: Add get-reg-list test for STA registers RISC-V: KVM: selftests: Add steal_time test support RISC-V: KVM: selftests: Add guest_sbi_probe_extension RISC-V: KVM: selftests: Move sbi_ecall to processor.c RISC-V: KVM: Implement SBI STA extension RISC-V: KVM: Add support for SBI STA registers RISC-V: KVM: Add support for SBI extension registers RISC-V: KVM: Add SBI STA info to vcpu_arch RISC-V: KVM: Add steal-update vcpu request RISC-V: KVM: Add SBI STA extension skeleton RISC-V: paravirt: Implement steal-time support RISC-V: Add SBI STA extension definitions RISC-V: paravirt: Add skeleton for pv-time support RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr() ...
2024-01-17Merge tag 'ubifs-for-linus-6.8-rc1' of ↵Linus Torvalds5-28/+40
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI and UBIFS updates from Richard Weinberger: "UBI: - Use in-tree fault injection framework and add new injection types - Fix for a memory leak in the block driver UBIFS: - kernel-doc fixes - Various minor fixes" * tag 'ubifs-for-linus-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: block: fix memleak in ubiblock_create() ubifs: fix kernel-doc warnings mtd: Add several functions to the fail_function list ubi: Reserve sufficient buffer length for the input mask ubi: Add six fault injection type for testing ubi: Split io_failures into write_failure and erase_failure ubi: Use the fault injection framework to enhance the fault injection capability ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path ubifs: Check @c->dirty_[n|p]n_cnt and @c->nroot state under @c->lp_mutex ubifs: describe function parameters ubifs: auth.c: fix kernel-doc function prototype warning ubifs: use crypto_shash_tfm_digest() in ubifs_hmac_wkm()
2024-01-17Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds1-0/+1
Pull fscrypt fix from Eric Biggers: "Fix a bug in my change to how f2fs frees its superblock info (which was part of changing the timing of fscrypt keyring destruction)" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: f2fs: fix double free of f2fs_sb_info
2024-01-17Merge tag 'vfs-6.8-rc1.fixes' of ↵Linus Torvalds1-22/+28
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains two fixes for the current merge window. The listmount changes that you requested and a fix for a fsnotify performance regression: - The proposed listmount changes are currently under my authorship. I wasn't sure whether you'd wanted to be author as the patch wasn't signed off. If you do I'm happy if you just apply your own patch. I've tested the patch with my sh4 cross-build setup. And confirmed that a) the build failure with sh on current upstream is reproducible and that b) the proposed patch fixes the build failure. That should only leave the task of fixing put_user on sh. - The fsnotify regression was caused by moving one of the hooks out of the security hook in preparation for other fsnotify work. This meant that CONFIG_SECURITY would have compiled out the fsnotify hook before but didn't do so now. That lead to up to 6% performance regression in some io_uring workloads that compile all fsnotify and security checks out. Fix this by making sure that the relevant hooks are covered by the already existing CONFIG_FANOTIFY_ACCESS_PERMISSIONS where the relevant hook belongs" * tag 'vfs-6.8-rc1.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: fs: rework listmount() implementation fsnotify: compile out fsnotify permission hooks if !FANOTIFY_ACCESS_PERMISSIONS
2024-01-17Merge tag 'mm-hotfixes-stable-2024-01-12-16-52' of ↵Linus Torvalds1-11/+13
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "For once not mostly MM-related. 17 hotfixes. 10 address post-6.7 issues and the other 7 are cc:stable" * tag 'mm-hotfixes-stable-2024-01-12-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: userfaultfd: avoid huge_zero_page in UFFDIO_MOVE MAINTAINERS: add entry for shrinker selftests: mm: hugepage-vmemmap fails on 64K page size systems mm/memory_hotplug: fix memmap_on_memory sysfs value retrieval mailmap: switch email for Tanzir Hasan mailmap: add old address mappings for Randy kernel/crash_core.c: make __crash_hotplug_lock static efi: disable mirror feature during crashkernel kexec: do syscore_shutdown() in kernel_kexec mailmap: update entry for Manivannan Sadhasivam fs/proc/task_mmu: move mmu notification mechanism inside mm lock mm: zswap: switch maintainers to recently active developers and reviewers scripts/decode_stacktrace.sh: optionally use LLVM utilities kasan: avoid resetting aux_lock lib/Kconfig.debug: disable CONFIG_DEBUG_INFO_BTF for Hexagon MAINTAINERS: update LTP maintainers kdump: defer the insertion of crashkernel resources
2024-01-16cifs: remove redundant variable tcon_existColin Ian King1-3/+3
The variable tcon_exist is being assigned however it is never read, the variable is redundant and can be removed. Cleans up clang scan build warning: warning: Although the value stored to 'tcon_exist' is used in the enclosing expression, the value is never actually readfrom 'tcon_exist' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-16eventfs: Use kcalloc() instead of kzalloc()Erick Archer1-3/+3
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. So, use the purpose specific kcalloc() function instead of the argument size * count in the kzalloc() function. [1] https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Link: https://lore.kernel.org/linux-trace-kernel/20240115181658.4562-1-erick.archer@gmx.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://github.com/KSPP/linux/issues/162 Signed-off-by: Erick Archer <erick.archer@gmx.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-16eventfs: Do not create dentries nor inodes in iterate_sharedSteven Rostedt (Google)1-15/+5
The original eventfs code added a wrapper around the dcache_readdir open callback and created all the dentries and inodes at open, and increment their ref count. A wrapper was added around the dcache_readdir release function to decrement all the ref counts of those created inodes and dentries. But this proved to be buggy[1] for when a kprobe was created during a dir read, it would create a dentry between the open and the release, and because the release would decrement all ref counts of all files and directories, that would include the kprobe directory that was not there to have its ref count incremented in open. This would cause the ref count to go to negative and later crash the kernel. To solve this, the dentries and inodes that were created and had their ref count upped in open needed to be saved. That list needed to be passed from the open to the release, so that the release would only decrement the ref counts of the entries that were incremented in the open. Unfortunately, the dcache_readdir logic was already using the file->private_data, which is the only field that can be used to pass information from the open to the release. What was done was the eventfs created another descriptor that had a void pointer to save the dcache_readdir pointer, and it wrapped all the callbacks, so that it could save the list of entries that had their ref counts incremented in the open, and pass it to the release. The wrapped callbacks would just put back the dcache_readdir pointer and call the functions it used so it could still use its data[2]. But Linus had an issue with the "hijacking" of the file->private_data (unfortunately this discussion was on a security list, so no public link). Which we finally agreed on doing everything within the iterate_shared callback and leave the dcache_readdir out of it[3]. All the information needed for the getents() could be created then. But this ended up being buggy too[4]. The iterate_shared callback was not the right place to create the dentries and inodes. Even Christian Brauner had issues with that[5]. An attempt was to go back to creating the inodes and dentries at the open, create an array to store the information in the file->private_data, and pass that information to the other callbacks.[6] The difference between that and the original method, is that it does not use dcache_readdir. It also does not up the ref counts of the dentries and pass them. Instead, it creates an array of a structure that saves the dentry's name and inode number. That information is used in the iterate_shared callback, and the array is freed in the dir release. The dentries and inodes created in the open are not used for the iterate_share or release callbacks. Just their names and inode numbers. Linus did not like that either[7] and just wanted to remove the dentries being created in iterate_shared and use the hard coded inode numbers. [ All this while Linus enjoyed an unexpected vacation during the merge window due to lack of power. ] [1] https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.local.home/ [2] https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.local.home/ [3] https://lore.kernel.org/linux-trace-kernel/20240104015435.682218477@goodmis.org/ [4] https://lore.kernel.org/all/202401152142.bfc28861-oliver.sang@intel.com/ [5] https://lore.kernel.org/all/20240111-unzahl-gefegt-433acb8a841d@brauner/ [6] https://lore.kernel.org/all/20240116114711.7e8637be@gandalf.local.home/ [7] https://lore.kernel.org/all/20240116170154.5bf0a250@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.573784051@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Fixes: 493ec81a8fb8 ("eventfs: Stop using dcache_readdir() for getdents()") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202401152142.bfc28861-oliver.sang@intel.com Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-16eventfs: Have the inodes all for files and directories all be the sameSteven Rostedt (Google)1-0/+10
The dentries and inodes are created in the readdir for the sole purpose of getting a consistent inode number. Linus stated that is unnecessary, and that all inodes can have the same inode number. For a virtual file system they are pretty meaningless. Instead use a single unique inode number for all files and one for all directories. Link: https://lore.kernel.org/all/20240116133753.2808d45e@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.412180363@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ajay Kaher <ajay.kaher@broadcom.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-16fs/ntfs3: Use kvfree to free memory allocated by kvmallocKonstantin Komarov4-7/+7
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-01-15erofs: Don't use certain unnecessary folio_*() functionsDavid Howells1-3/+3
Filesystems should use folio->index and folio->mapping, instead of folio_index(folio), folio_mapping() and folio_file_mapping() since they know that it's in the pagecache. Change this automagically with: perl -p -i -e 's/folio_mapping[(]([^)]*)[)]/\1->mapping/g' fs/erofs/*.c perl -p -i -e 's/folio_file_mapping[(]([^)]*)[)]/\1->mapping/g' fs/erofs/*.c perl -p -i -e 's/folio_index[(]([^)]*)[)]/\1->index/g' fs/erofs/*.c Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Yue Hu <huyue2@coolpad.com> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: linux-erofs@lists.ozlabs.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240115144635.1931422-1-hsiangkao@linux.alibaba.com
2024-01-15ceph: get rid of passing callbacks in __dentry_leases_walk()Al Viro1-8/+13
__dentry_leases_walk() gets a callback and calls it for a bunch of denties; there are exactly two callers and we already have a flag telling them apart - lwc->dir_lease. Seeing that indirect calls are costly these days, let's get rid of the callback and just call the right function directly. Has a side benefit of saner signatures... [ xiubli: a minor fix in the commit title ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-01-15ceph: d_obtain_{alias,root}(ERR_PTR(...)) will do the right thingAl Viro1-2/+0
Clean up the code. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-01-15ceph: fix invalid pointer access if get_quota_realm return ERR_PTRWenchao Hao1-17/+22
This issue is reported by smatch that get_quota_realm() might return ERR_PTR but we did not handle it. It's not a immediate bug, while we still should address it to avoid potential bugs if get_quota_realm() is changed to return other ERR_PTR in future. Set ceph_snap_realm's pointer in get_quota_realm()'s to address this issue, the pointer would be set to NULL if get_quota_realm() failed to get struct ceph_snap_realm, so no ERR_PTR would happen any more. [ xiubli: minor code style clean up ] Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-01-15ceph: remove duplicated code in ceph_netfs_issue_read()Xiubo Li1-2/+2
When allocating an osd request the libceph.ko will add the 'read_from_replica' flag by default. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-01-15ceph: send oldest_client_tid when renewing capsXiubo Li1-5/+17
Update the oldest_client_tid via the session renew caps msg to make sure that the MDSs won't pile up the completed request list in a very large size. [ idryomov: drop inapplicable comment ] Link: https://tracker.ceph.com/issues/63364 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-01-15ceph: rename create_session_open_msg() to create_session_full_msg()Xiubo Li1-3/+5
Makes the create session msg helper to be more general and could be used by other ops. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-01-15ceph: select FS_ENCRYPTION_ALGS if FS_ENCRYPTIONEric Biggers1-0/+1
The kconfig options for filesystems that support FS_ENCRYPTION are supposed to select FS_ENCRYPTION_ALGS. This is needed to ensure that required crypto algorithms get enabled as loadable modules or builtin as is appropriate for the set of enabled filesystems. Do this for CEPH_FS so that there aren't any missing algorithms if someone happens to have CEPH_FS as their only enabled filesystem that supports encryption. Cc: stable@vger.kernel.org Fixes: f061feda6c54 ("ceph: add fscrypt ioctls and ceph.fscrypt.auth vxattr") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-01-15ceph: fix deadlock or deadcode of misusing dget()Xiubo Li1-6/+3
The lock order is incorrect between denty and its parent, we should always make sure that the parent get the lock first. But since this deadcode is never used and the parent dir will always be set from the callers, let's just remove it. Link: https://lore.kernel.org/r/20231116081919.GZ1957730@ZenIV Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-01-15ceph: try to allocate a smaller extent map for sparse readXiubo Li3-3/+23
In fscrypt case and for a smaller read length we can predict the max count of the extent map. And for small read length use cases this could save some memories. [ idryomov: squash into a single patch to avoid build break, drop redundant variable in ceph_alloc_sparse_ext_map() ] Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-01-15ceph: reinitialize mds feature bit even when session in openVenky Shankar1-1/+1
Following along the same lines as per the user-space fix. Right now this isn't really an issue with the ceph kernel driver because of the feature bit laginess, however, that can change over time (when the new snaprealm info type is ported to the kernel driver) and depending on the MDS version that's being upgraded can cause message decoding issues - so, fix that early on. Link: http://tracker.ceph.com/issues/63188 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-01-15ceph: skip reconnecting if MDS is not readyXiubo Li1-1/+2
When MDS closed the session the kclient will send to reconnect to it immediately, but if the MDS just restarted and still not ready yet, such as still in the up:replay state and the sessionmap journal logs hasn't be replayed, the MDS will close the session. And then the kclient could remove the session and later when the mdsmap is in RECONNECT phrase it will skip reconnecting. But the MDS will wait until timeout and then evict the kclient. Just skip sending the reconnection request until the MDS is ready. Link: https://tracker.ceph.com/issues/62489 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-01-14ksmbd: only v2 leases handle the directoryNamjae Jeon1-0/+6
When smb2 leases is disable, ksmbd can send oplock break notification and cause wait oplock break ack timeout. It may appear like hang when accessing a directory. This patch make only v2 leases handle the directory. Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-14ksmbd: fix UAF issue in ksmbd_tcp_new_connection()Namjae Jeon4-18/+13
The race is between the handling of a new TCP connection and its disconnection. It leads to UAF on `struct tcp_transport` in ksmbd_tcp_new_connection() function. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-22991 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-14ksmbd: validate mech token in session setupNamjae Jeon3-5/+23
If client send invalid mech token in session setup request, ksmbd validate and make the error if it is invalid. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-22890 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-13erofs: fix inconsistent per-file compression formatGao Xiang2-11/+14
EROFS can select compression algorithms on a per-file basis, and each per-file compression algorithm needs to be marked in the on-disk superblock for initialization. However, syzkaller can generate inconsistent crafted images that use an unsupported algorithmtype for specific inodes, e.g. use MicroLZMA algorithmtype even it's not set in `sbi->available_compr_algs`. This can lead to an unexpected "BUG: kernel NULL pointer dereference" if the corresponding decompressor isn't built-in. Fix this by checking against `sbi->available_compr_algs` for each m_algorithmformat request. Incorrect !erofs_sb_has_compr_cfgs preset bitmap is now fixed together since it was harmless previously. Reported-by: <bugreport@ubisectech.com> Fixes: 8f89926290c4 ("erofs: get compression algorithms directly on mapping") Fixes: 622ceaddb764 ("erofs: lzma compression support") Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20240113150602.1471050-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-01-13fs: rework listmount() implementationChristian Brauner1-22/+28
Linus pointed out that there's error handling and naming issues in the that we should rewrite: * Perform the access checks for the buffer before actually doing any work instead of doing it during the iteration. * Rename the arguments to listmount() and do_listmount() to clarify what the arguments are used for. * Get rid of the pointless ctr variable and overflow checking. * Get rid of the pointless speculation check. Link: https://lore.kernel.org/r/CAHk-=wjh6Cypo8WC-McXgSzCaou3UXccxB+7PVeSuGR8AjCphg@mail.gmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-12f2fs: fix double free of f2fs_sb_infoEric Biggers1-0/+1
kill_f2fs_super() is called even if f2fs_fill_super() fails. f2fs_fill_super() frees the struct f2fs_sb_info, so it must set sb->s_fs_info to NULL to prevent it from being freed again. Fixes: 275dca4630c1 ("f2fs: move release of block devices to after kill_block_super()") Reported-by: <syzbot+8f477ac014ff5b32d81f@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/lkml/0000000000006cb174060ec34502@google.com Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/linux-f2fs-devel/20240113005747.38887-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-01-12Merge tag 'exfat-for-6.8-rc1' of ↵Linus Torvalds5-92/+335
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Replace the internal table lookup algorithm with the hweight library and ffs of the bitops library. - Handle the two types of stream entry, valid data size (has been written) and data size separately. It improves compatibility with two differently sized files created on Windows. * tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: do not zero the extended part exfat: change to get file size from DataLength exfat: using ffs instead of internal logic exfat: using hweight instead of internal logic
2024-01-12Merge tag 'pull-bcachefs-fix' of ↵Linus Torvalds2-17/+30
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull bcachefs locking fix from Al Viro: "Fix broken locking in bch2_ioctl_subvolume_destroy()" * tag 'pull-bcachefs-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: bch2_ioctl_subvolume_destroy(): fix locking new helper: user_path_locked_at()
2024-01-12fs/proc/task_mmu: move mmu notification mechanism inside mm lockMuhammad Usama Anjum1-11/+13
Move mmu notification mechanism inside mm lock to prevent race condition in other components which depend on it. The notifier will invalidate memory range. Depending upon the number of iterations, different memory ranges would be invalidated. The following warning would be removed by this patch: WARNING: CPU: 0 PID: 5067 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:734 kvm_mmu_notifier_change_pte+0x860/0x960 arch/x86/kvm/../../../virt/kvm/kvm_main.c:734 There is no behavioural and performance change with this patch when there is no component registered with the mmu notifier. [akpm@linux-foundation.org: narrow the scope of `range', per Sean] Link: https://lkml.kernel.org/r/20240109112445.590736-1-usama.anjum@collabora.com Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Reported-by: syzbot+81227d2bd69e9dedb802@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/000000000000f6d051060c6785bc@google.com/ Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: Andrei Vagin <avagin@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-11Merge tag 'f2fs-for-6.8-rc1' of ↵Linus Torvalds13-254/+260
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs update from Jaegeuk Kim: "In this series, we've some progress to support Zoned block device regarding to the power-cut recovery flow and enabling checkpoint=disable feature which is essential for Android OTA. Other than that, some patches touched sysfs entries and tracepoints which are minor, while several bug fixes on error handlers and compression flows are good to improve the overall stability. Enhancements: - enable checkpoint=disable for zoned block device - sysfs entries such as discard status, discard_io_aware, dir_level - tracepoints such as f2fs_vm_page_mkwrite(), f2fs_rename(), f2fs_new_inode() - use shared inode lock during f2fs_fiemap() and f2fs_seek_block() Bug fixes: - address some power-cut recovery issues on zoned block device - handle errors and logics on do_garbage_collect(), f2fs_reserve_new_block(), f2fs_move_file_range(), f2fs_recover_xattr_data() - don't set FI_PREALLOCATED_ALL for partial write - fix to update iostat correctly in f2fs_filemap_fault() - fix to wait on block writeback for post_read case - fix to tag gcing flag on page during block migration - restrict max filesize for 16K f2fs - fix to avoid dirent corruption - explicitly null-terminate the xattr list There are also several clean-up patches to remove dead codes and better readability" * tag 'f2fs-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (33 commits) f2fs: show more discard status by sysfs f2fs: Add error handling for negative returns from do_garbage_collect f2fs: Constrain the modification range of dir_level in the sysfs f2fs: Use wait_event_freezable_timeout() for freezable kthread f2fs: fix to check return value of f2fs_recover_xattr_data f2fs: don't set FI_PREALLOCATED_ALL for partial write f2fs: fix to update iostat correctly in f2fs_filemap_fault() f2fs: fix to check compress file in f2fs_move_file_range() f2fs: fix to wait on block writeback for post_read case f2fs: fix to tag gcing flag on page during block migration f2fs: add tracepoint for f2fs_vm_page_mkwrite() f2fs: introduce f2fs_invalidate_internal_cache() for cleanup f2fs: update blkaddr in __set_data_blkaddr() for cleanup f2fs: introduce get_dnode_addr() to clean up codes f2fs: delete obsolete FI_DROP_CACHE f2fs: delete obsolete FI_FIRST_BLOCK_WRITTEN f2fs: Restrict max filesize for 16K f2fs f2fs: let's finish or reset zones all the time f2fs: check write pointers when checkpoint=disable f2fs: fix write pointers on zoned device after roll forward ...
2024-01-11Merge tag '6.8-rc-smb-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds8-58/+70
Pull smb server updates from Steve French: - memory allocation fix - three lease fixes, including important rename fix - read only share fix - thread freeze fix - three cleanup fixes (two kernel doc related) - locking fix in setting EAs - packet header validation fix * tag '6.8-rc-smb-server-fixes' of git://git.samba.org/ksmbd: ksmbd: Add missing set_freezable() for freezable kthread ksmbd: free ppace array on error in parse_dacl ksmbd: send lease break notification on FILE_RENAME_INFORMATION ksmbd: don't allow O_TRUNC open on read-only share ksmbd: vfs: fix all kernel-doc warnings ksmbd: auth: fix most kernel-doc warnings ksmbd: Remove usage of the deprecated ida_simple_xx() API ksmbd: don't increment epoch if current state and request state are same ksmbd: fix potential circular locking issue in smb2_set_ea() ksmbd: set v2 lease version on lease upgrade ksmbd: validate the zero field of packet header
2024-01-11Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds14-96/+18
Pull misc filesystem updates from Al Viro: "Misc cleanups (the part that hadn't been picked by individual fs trees)" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: apparmorfs: don't duplicate kfree_link() orangefs: saner arguments passing in readdir guts ocfs2_find_match(): there's no such thing as NULL or negative ->d_parent reiserfs_add_entry(): get rid of pointless namelen checks __ocfs2_add_entry(), ocfs2_prepare_dir_for_insert(): namelen checks ext4_add_entry(): ->d_name.len is never 0 befs: d_obtain_alias(ERR_PTR(...)) will do the right thing affs: d_obtain_alias(ERR_PTR(...)) will do the right thing /proc/sys: use d_splice_alias() calling conventions to simplify failure exits hostfs: use d_splice_alias() calling conventions to simplify failure exits udf_fiiter_add_entry(): check for zero ->d_name.len is bogus... udf: d_obtain_alias(ERR_PTR(...)) will do the right thing... udf: d_splice_alias() will do the right thing on ERR_PTR() inode nfsd: kill stale comment about simple_fill_super() requirements bfs_add_entry(): get rid of pointless ->d_name.len checks nilfs2: d_obtain_alias(ERR_PTR(...)) will do the right thing... zonefs: d_splice_alias() will do the right thing on ERR_PTR() inode
2024-01-11Merge tag 'pull-dcache' of ↵Linus Torvalds14-570/+311
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: "Change of locking rules for __dentry_kill(), regularized refcounting rules in that area, assorted cleanups and removal of weird corner cases (e.g. now ->d_iput() on child is always called before the parent might hit __dentry_kill(), etc)" * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) dcache: remove unnecessary NULL check in dget_dlock() kill DCACHE_MAY_FREE __d_unalias() doesn't use inode argument d_alloc_parallel(): in-lookup hash insertion doesn't need an RCU variant get rid of DCACHE_GENOCIDE d_genocide(): move the extern into fs/internal.h simple_fill_super(): don't bother with d_genocide() on failure nsfs: use d_make_root() d_alloc_pseudo(): move setting ->d_op there from the (sole) caller kill d_instantate_anon(), fold __d_instantiate_anon() into remaining caller retain_dentry(): introduce a trimmed-down lockless variant __dentry_kill(): new locking scheme d_prune_aliases(): use a shrink list switch select_collect{,2}() to use of to_shrink_list() to_shrink_list(): call only if refcount is 0 fold dentry_kill() into dput() don't try to cut corners in shrink_lock_dentry() fold the call of retain_dentry() into fast_dput() Call retain_dentry() with refcount 0 dentry_kill(): don't bother with retain_dentry() on slow path ...
2024-01-11Merge tag 'pull-rename' of ↵Linus Torvalds17-126/+167
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rename updates from Al Viro: "Fix directory locking scheme on rename This was broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that" * tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rename(): avoid a deadlock in the case of parents having no common ancestor kill lock_two_inodes() rename(): fix the locking of subdirectories f2fs: Avoid reading renamed directory if parent does not change ext4: don't access the source subdirectory content on same-directory rename ext2: Avoid reading renamed directory if parent does not change udf_rename(): only access the child content on cross-directory rename ocfs2: Avoid touching renamed directory if parent does not change reiserfs: Avoid touching renamed directory if parent does not change
2024-01-11Merge tag 'pull-minix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds2-57/+38
Pull minixfs updates from Al Viro: "minixfs kmap_local_page() switchover and related fixes - very similar to sysv series" * tag 'pull-minix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: minixfs: switch to kmap_local_page() minixfs: Use dir_put_page() in minix_unlink() and minix_rename() minixfs: change the signature of dir_get_page() minixfs: use offset_in_page()
2024-01-11Merge tag 'docs-6.8' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull documentation update from Jonathan Corbet: "Another moderately busy cycle for documentation, including: - The minimum Sphinx requirement has been raised to 2.4.4, following a warning that was added in 6.2 - Some reworking of the Documentation/process front page to, hopefully, make it more useful - Various kernel-doc tweaks to, for example, make it deal properly with __counted_by annotations - We have also restored a warning for documentation of nonexistent structure members that disappeared a while back. That had the delightful consequence of adding some 600 warnings to the docs build. A sustained effort by Randy, Vegard, and myself has addressed almost all of those, bringing the documentation back into sync with the code. The fixes are going through the appropriate maintainer trees - Various improvements to the HTML rendered docs, including automatic links to Git revisions and a nice new pulldown to make translations easy to access - Speaking of translations, more of those for Spanish and Chinese ... plus the usual stream of documentation updates and typo fixes" * tag 'docs-6.8' of git://git.lwn.net/linux: (57 commits) MAINTAINERS: use tabs for indent of CONFIDENTIAL COMPUTING THREAT MODEL A reworked process/index.rst ring-buffer/Documentation: Add documentation on buffer_percent file Translated the RISC-V architecture boot documentation. Docs: remove mentions of fdformat from util-linux Docs/zh_CN: Fix the meaning of DEBUG to pr_debug() Documentation: move driver-api/dcdbas to userspace-api/ Documentation: move driver-api/isapnp to userspace-api/ Documentation/core-api : fix typo in workqueue Documentation/trace: Fixed typos in the ftrace FLAGS section kernel-doc: handle a void function without producing a warning scripts/get_abi.pl: ignore some temp files docs: kernel_abi.py: fix command injection scripts/get_abi: fix source path leak CREDITS, MAINTAINERS, docs/process/howto: Update man-pages' maintainer docs: translations: add translations links when they exist kernel-doc: Align quick help and the code MAINTAINERS: add reviewer for Spanish translations docs: ignore __counted_by attribute in structure definitions scripts: kernel-doc: Clarify missing struct member description ..
2024-01-12btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_argsQu Wenruo1-0/+4