summaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)AuthorFilesLines
2025-08-26xfs: do not propagate ENODATA disk errors into xattr codeEric Sandeen2-0/+13
ENODATA (aka ENOATTR) has a very specific meaning in the xfs xattr code; namely, that the requested attribute name could not be found. However, a medium error from disk may also return ENODATA. At best, this medium error may escape to userspace as "attribute not found" when in fact it's an IO (disk) error. At worst, we may oops in xfs_attr_leaf_get() when we do: error = xfs_attr_leaf_hasname(args, &bp); if (error == -ENOATTR) { xfs_trans_brelse(args->trans, bp); return error; } because an ENODATA/ENOATTR error from disk leaves us with a null bp, and the xfs_trans_brelse will then null-deref it. As discussed on the list, we really need to modify the lower level IO functions to trap all disk errors and ensure that we don't let unique errors like this leak up into higher xfs functions - many like this should be remapped to EIO. However, this patch directly addresses a reported bug in the xattr code, and should be safe to backport to stable kernels. A larger-scope patch to handle more unique errors at lower levels can follow later. (Note, prior to 07120f1abdff we did not oops, but we did return the wrong error code to userspace.) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines") Cc: stable@vger.kernel.org # v5.9+ Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-19xfs: reject swapon for inodes on a zoned file system earlierChristoph Hellwig1-0/+3
No point in going down into the iomap mapping loop when we know it will be rejected. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-19xfs: kick off inodegc when failing to reserve zoned blocksChristoph Hellwig1-0/+6
XFS processes truncating unlinked inodes asynchronously and thus the free space pool only sees them with a delay. The non-zoned write path thus calls into inodegc to accelerate this processing before failing an allocation due the lack of free blocks. Do the same for the zoned space reservation. Fixes: 0bb2193056b5 ("xfs: add support for zoned space reservations") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-19xfs: remove xfs_last_used_zoneChristoph Hellwig1-43/+2
This was my first attempt at caching the last used zone. But it turns out for O_DIRECT or RWF_DONTCACHE that operate concurrently or in very short sequence, the bmap btree does not record a written extent yet, so it fails. Because it then still finds the last written zone it can lead to a weird ping-pong around a few zones with writers seeing different values. Remove it entirely as the later added xfs_cached_zone actually does a much better job enforcing the locality as the zone is associated with the inode in the MRU cache as soon as the zone is selected. Fixes: 4e4d52075577 ("xfs: add the zoned space allocator") Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-19xfs: Default XFS_RT to Y if CONFIG_BLK_DEV_ZONED is enabledDamien Le Moal1-0/+1
XFS support for zoned block devices requires the realtime subvolume support (XFS_RT) to be enabled. Change the default configuration value of XFS_RT from N to CONFIG_BLK_DEV_ZONED to align with this requirement. This change still allows the user to disable XFS_RT if this feature is not desired for the user use case. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: split xfs_zone_record_blocksChristoph Hellwig2-13/+30
xfs_zone_record_blocks not only records successfully written blocks that now back file data, but is also used for blocks speculatively written by garbage collection that were never linked to an inode and instantly become invalid. Split the latter functionality out to be easier to understand. This also make it clear that we don't need to attach the rmap inode to a transaction for the skipped blocks case as we never dirty any peristent data structure. Also make the argument order to xfs_zone_record_blocks a bit more natural. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: fix scrub trace with null pointer in quotacheckAndrey Albershteyn1-1/+1
The quotacheck doesn't initialize sc->ip. Cc: stable@vger.kernel.org # v6.8 Fixes: 21d7500929c8a0 ("xfs: improve dquot iteration for scrub") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: reject max_atomic_write mount option for no reflinkJohn Garry1-0/+19
If the FS has no reflink, then atomic writes greater than 1x block are not supported. As such, for no reflink it is pointless to accept setting max_atomic_write when it cannot be supported, so reject max_atomic_write mount option in this case. It could be still possible to accept max_atomic_write option of size 1x block if HW atomics are supported, so check for this specifically. Fixes: 4528b9052731 ("xfs: allow sysadmins to specify a maximum atomic write limit at mount time") Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: disallow atomic writes on DAXJohn Garry3-5/+17
Atomic writes are not currently supported for DAX, but two problems exist: - we may go down DAX write path for IOCB_ATOMIC, which does not handle IOCB_ATOMIC properly - we report non-zero atomic write limits in statx (for DAX inodes) We may want atomic writes support on DAX in future, but just disallow for now. For this, ensure when IOCB_ATOMIC is set that we check the write size versus the atomic write min and max before branching off to the DAX write path. This is not strictly required for DAX, as we should not get this far in the write path as FMODE_CAN_ATOMIC_WRITE should not be set. In addition, due to reflink being supported for DAX, we automatically get CoW-based atomic writes support being advertised. Remedy this by disallowing atomic writes for a DAX inode for both sw and hw modes. Reported-by: Darrick J. Wong <djwong@kernel.org> Fixes: 9dffc58f2384 ("xfs: update atomic write limits") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: remove XFS_IBULK_SAME_AGChristoph Hellwig3-17/+7
Add a new field to struct xfs_ibulk to directly pass XFS_IWALK* flags, and thus remove the need to indirect the SAME_AG flag through XFS_IBULK*. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: fully decouple XFS_IBULK* flags from XFS_IWALK* flagsChristoph Hellwig1-1/+5
Fix up xfs_inumbers to now pass in the XFS_IBULK* flags into the flags argument to xfs_inobt_walk, which expects the XFS_IWALK* flags. Currently passing the wrong flags works for non-debug builds because the only XFS_IWALK* flag has the same encoding as the corresponding XFS_IBULK* flag, but in debug builds it can trigger an assert that no incorrect flag is passed. Instead just extra the relevant flag. Fixes: 5b35d922c52798 ("xfs: Decouple XFS_IBULK flags from XFS_IWALK flags") Cc: <stable@vger.kernel.org> # v5.19 Reported-by: cen zhang <zzzccc427@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: fix frozen file system assert in xfs_trans_allocChristoph Hellwig1-1/+1
Commit 83a80e95e797 ("xfs: decouple xfs_trans_alloc_empty from xfs_trans_alloc") move the place of the assert for a frozen file system after the sb_start_intwrite call that ensures it doesn't run on frozen file systems, and thus allows to incorrect trigger it. Fix that by moving it back to where it belongs. Fixes: 83a80e95e797 ("xfs: decouple xfs_trans_alloc_empty from xfs_trans_alloc") Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-31Merge tag 'mm-stable-2025-07-30-15-25' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "As usual, many cleanups. The below blurbiage describes 42 patchsets. 21 of those are partially or fully cleanup work. "cleans up", "cleanup", "maintainability", "rationalizes", etc. I never knew the MM code was so dirty. "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes) addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for merging with existing adjacent VMAs. "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park) adds a new kernel module which simplifies the setup and usage of DAMON in production environments. "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig) is a cleanup to the writeback code which removes a couple of pointers from struct writeback_control. "drivers/base/node.c: optimization and cleanups" (Donet Tom) contains largely uncorrelated cleanups to the NUMA node setup and management code. "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman) does some maintenance work on the userfaultfd code. "Readahead tweaks for larger folios" (Ryan Roberts) implements some tuneups for pagecache readahead when it is reading into order>0 folios. "selftests/mm: Tweaks to the cow test" (Mark Brown) provides some cleanups and consistency improvements to the selftests code. "Optimize mremap() for large folios" (Dev Jain) does that. A 37% reduction in execution time was measured in a memset+mremap+munmap microbenchmark. "Remove zero_user()" (Matthew Wilcox) expunges zero_user() in favor of the more modern memzero_page(). "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand) addresses some warts which David noticed in the huge page code. These were not known to be causing any issues at this time. "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park) provides some cleanup and consolidation work in DAMON. "use vm_flags_t consistently" (Lorenzo Stoakes) uses vm_flags_t in places where we were inappropriately using other types. "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy) increases the reliability of large page allocation in the memfd code. "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple) removes several now-unneeded PFN_* flags. "mm/damon: decouple sysfs from core" (SeongJae Park) implememnts some cleanup and maintainability work in the DAMON sysfs layer. "madvise cleanup" (Lorenzo Stoakes) does quite a lot of cleanup/maintenance work in the madvise() code. "madvise anon_name cleanups" (Vlastimil Babka) provides additional cleanups on top or Lorenzo's effort. "Implement numa node notifier" (Oscar Salvador) creates a standalone notifier for NUMA node memory state changes. Previously these were lumped under the more general memory on/offline notifier. "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan) cleans up the pageblock isolation code and fixes a potential issue which doesn't seem to cause any problems in practice. "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park) adds additional drgn- and python-based DAMON selftests which are more comprehensive than the existing selftest suite. "Misc rework on hugetlb faulting path" (Oscar Salvador) fixes a rather obscure deadlock in the hugetlb fault code and follows that fix with a series of cleanups. "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport) rationalizes and cleans up the highmem-specific code in the CMA allocator. "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand) provides cleanups and future-preparedness to the migration code. "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park) adds some tracepoints to some DAMON auto-tuning code. "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park) does that. "mm/damon: misc cleanups" (SeongJae Park) also does what it claims. "mm: folio_pte_batch() improvements" (David Hildenbrand) cleans up the large folio PTE batching code. "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park) facilitates dynamic alteration of DAMON's inter-node allocation policy. "Remove unmap_and_put_page()" (Vishal Moola) provides a couple of page->folio conversions. "mm: per-node proactive reclaim" (Davidlohr Bueso) implements a per-node control of proactive reclaim - beyond the current memcg-based implementation. "mm/damon: remove damon_callback" (SeongJae Park) replaces the damon_callback interface with a more general and powerful damon_call()+damos_walk() interface. "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes) implements a number of mremap cleanups (of course) in preparation for adding new mremap() functionality: newly permit the remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It still excludes some specialized situations where this cannot be performed reliably. "drop hugetlb_free_pgd_range()" (Anthony Yznaga) switches some sparc hugetlb code over to the generic version and removes the thus-unneeded hugetlb_free_pgd_range(). "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park) augments the present userspace-requested update of DAMON sysfs monitoring files. Automatic update is now provided, along with a tunable to control the update interval. "Some randome fixes and cleanups to swapfile" (Kemeng Shi) does what is claims. "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand) provides (and uses) a means by which debug-style functions can grab a copy of a pageframe and inspect it locklessly without tripping over the races inherent in operating on the live pageframe directly. "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan) addresses the large contention issues which can be triggered by reads from that procfs file. Latencies are reduced by more than half in some situations. The series also introduces several new selftests for the /proc/pid/maps interface. "__folio_split() clean up" (Zi Yan) cleans up __folio_split()! "Optimize mprotect() for large folios" (Dev Jain) provides some quite large (>3x) speedups to mprotect() when dealing with large folios. "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian) does some cleanup work in the selftests code. "tools/testing: expand mremap testing" (Lorenzo Stoakes) extends the mremap() selftest in several ways, including adding more checking of Lorenzo's recently added "permit mremap() move of multiple VMAs" feature. "selftests/damon/sysfs.py: test all parameters" (SeongJae Park) extends the DAMON sysfs interface selftest so that it tests all possible user-requested parameters. Rather than the present minimal subset" * tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits) MAINTAINERS: add missing headers to mempory policy & migration section MAINTAINERS: add missing file to cgroup section MAINTAINERS: add MM MISC section, add missing files to MISC and CORE MAINTAINERS: add missing zsmalloc file MAINTAINERS: add missing files to page alloc section MAINTAINERS: add missing shrinker files MAINTAINERS: move memremap.[ch] to hotplug section MAINTAINERS: add missing mm_slot.h file THP section MAINTAINERS: add missing interval_tree.c to memory mapping section MAINTAINERS: add missing percpu-internal.h file to per-cpu section mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() selftests/damon: introduce _common.sh to host shared function selftests/damon/sysfs.py: test runtime reduction of DAMON parameters selftests/damon/sysfs.py: test non-default parameters runtime commit selftests/damon/sysfs.py: generalize DAMON context commit assertion selftests/damon/sysfs.py: generalize monitoring attributes commit assertion selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion selftests/damon/sysfs.py: test DAMOS filters commitment selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion selftests/damon/sysfs.py: test DAMOS destinations commitment ...
2025-07-28Merge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linuxLinus Torvalds1-5/+0
Pull block updates from Jens Axboe: - MD pull request via Yu: - call del_gendisk synchronously (Xiao) - cleanup unused variable (John) - cleanup workqueue flags (Ryo) - fix faulty rdev can't be removed during resync (Qixing) - NVMe pull request via Christoph: - try PCIe function level reset on init failure (Keith Busch) - log TLS handshake failures at error level (Maurizio Lombardi) - pci-epf: do not complete commands twice if nvmet_req_init() fails (Rick Wertenbroek) - misc cleanups (Alok Tiwari) - Removal of the pktcdvd driver This has been more than a decade coming at this point, and some recently revealed breakages that had it causing issues even for cases where it isn't required made me re-pull the trigger on this one. It's known broken and nobody has stepped up to maintain the code - Series for ublk supporting batch commands, enabling the use of multishot where appropriate - Speed up ublk exit handling - Fix for the two-stage elevator fixing which could leak data - Convert NVMe to use the new IOVA based API - Increase default max transfer size to something more reasonable - Series fixing write operations on zoned DM devices - Add tracepoints for zoned block device operations - Prep series working towards improving blk-mq queue management in the presence of isolated CPUs - Don't allow updating of the block size of a loop device that is currently under exclusively ownership/open - Set chunk sectors from stacked device stripe size and use it for the atomic write size limit - Switch to folios in bcache read_super() - Fix for CD-ROM MRW exit flush handling - Various tweaks, fixes, and cleanups * tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits) block: restore two stage elevator switch while running nr_hw_queue update cdrom: Call cdrom_mrw_exit from cdrom_release function sunvdc: Balance device refcount in vdc_port_mpgroup_check nvme-pci: try function level reset on init failure dm: split write BIOs on zone boundaries when zone append is not emulated block: use chunk_sectors when evaluating stacked atomic write limits dm-stripe: limit chunk_sectors to the stripe size md/raid10: set chunk_sectors limit md/raid0: set chunk_sectors limit block: sanitize chunk_sectors for atomic write limits ilog2: add max_pow_of_two_factor() nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails nvme-tcp: log TLS handshake failures at error level docs: nvme: fix grammar in nvme-pci-endpoint-target.rst nvme: fix typo in status code constant for self-test in progress nvmet: remove redundant assignment of error code in nvmet_ns_enable() nvme: fix incorrect variable in io cqes error message nvme: fix multiple spelling and grammar issues in host drivers block: fix blk_zone_append_update_request_bio() kernel-doc md/raid10: fix set but not used variable in sync_request_write() ...
2025-07-28Merge tag 'vfs-6.17-rc1.iomap' of ↵Linus Torvalds5-88/+146
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iomap updates from Christian Brauner: - Refactor the iomap writeback code and split the generic and ioend/bio based writeback code. There are two methods that define the split between the generic writeback code, and the implemementation of it, and all knowledge of ioends and bios now sits below that layer. - Add fuse iomap support for buffered writes and dirty folio writeback. This is needed so that granular uptodate and dirty tracking can be used in fuse when large folios are enabled. This has two big advantages. For writes, instead of the entire folio needing to be read into the page cache, only the relevant portions need to be. For writeback, only the dirty portions need to be written back instead of the entire folio. * tag 'vfs-6.17-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fuse: refactor writeback to use iomap_writepage_ctx inode fuse: hook into iomap for invalidating and checking partial uptodateness fuse: use iomap for folio laundering fuse: use iomap for writeback fuse: use iomap for buffered writes iomap: build the writeback code without CONFIG_BLOCK iomap: add read_folio_range() handler for buffered writes iomap: improve argument passing to iomap_read_folio_sync iomap: replace iomap_folio_ops with iomap_write_ops iomap: export iomap_writeback_folio iomap: move folio_unlock out of iomap_writeback_folio iomap: rename iomap_writepage_map to iomap_writeback_folio iomap: move all ioend handling to ioend.c iomap: add public helpers for uptodate state manipulation iomap: hide ioends from the generic writeback code iomap: refactor the writeback interface iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks iomap: pass more arguments using the iomap writeback context iomap: header diet
2025-07-28Merge tag 'vfs-6.17-rc1.fileattr' of ↵Linus Torvalds2-11/+11
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP selinux: implement inode_file_[g|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file
2025-07-28Merge tag 'vfs-6.17-rc1.mmap_prepare' of ↵Linus Torvalds1-7/+8
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mmap_prepare updates from Christian Brauner: "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). This is preferred to the existing f_op->mmap() hook as it does require a VMA to be established yet, thus allowing the mmap logic to invoke this hook far, far earlier, prior to inserting a VMA into the virtual address space, or performing any other heavy handed operations. This allows for much simpler unwinding on error, and for there to be a single attempt at merging a VMA rather than having to possibly reattempt a merge based on potentially altered VMA state. Far more importantly, it prevents inappropriate manipulation of incompletely initialised VMA state, which is something that has been the cause of bugs and complexity in the past. The intent is to gradually deprecate f_op->mmap, and in that vein this series coverts the majority of file systems to using f_op->mmap_prepare. Prerequisite steps are taken - firstly ensuring all checks for mmap capabilities use the file_has_valid_mmap_hooks() helper rather than directly checking for f_op->mmap (which is now not a valid check) and secondly updating daxdev_mapping_supported() to not require a VMA parameter to allow ext4 and xfs to be converted. Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems") handles the nasty edge-case of nested file systems like overlayfs, which introduces a compatibility shim to allow f_op->mmap_prepare() to be invoked from an f_op->mmap() callback. This allows for nested filesystems to continue to function correctly with all file systems regardless of which callback is used. Once we finally convert all file systems, this shim can be removed. As a result, ecryptfs, fuse, and overlayfs remain unaltered so they can nest all other file systems. We additionally do not update resctl - as this requires an update to remap_pfn_range() (or an alternative to it) which we defer to a later series, equally we do not update cramfs which needs a mixed mapping insertion with the same issue, nor do we update procfs, hugetlbfs, syfs or kernfs all of which require VMAs for internal state and hooks. We shall return to all of these later" * tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: update porting, vfs documentation to describe mmap_prepare() fs: replace mmap hook with .mmap_prepare for simple mappings fs: convert most other generic_file_*mmap() users to .mmap_prepare() fs: convert simple use of generic_file_*_mmap() to .mmap_prepare() mm/filemap: introduce generic_file_*_mmap_prepare() helpers fs/xfs: transition from deprecated .mmap hook to .mmap_prepare fs/ext4: transition from deprecated .mmap hook to .mmap_prepare fs/dax: make it possible to check dev dax support without a VMA fs: consistently use can_mmap_file() helper mm/nommu: use file_has_valid_mmap_hooks() helper mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-24xfs: don't use a xfs_log_iovec for ri_buf in log recoveryChristoph Hellwig17-155/+157
ri_buf just holds a pointer/len pair and is not a log iovec used for writing to the log. Switch to use a kvec instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: don't use a xfs_log_iovec for attr_item names and valuesChristoph Hellwig3-66/+60
These buffers are not directly logged, just use a kvec and remove the xlog_copy_from_iovec helper only used here. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: use better names for size members in xfs_log_vecChristoph Hellwig3-24/+25
The lv_size member counts the size of the entire allocation, rename it to lv_alloc_size to make that clear. The lv_buf_len member tracks how much of lv_buf has been used up to format the log item, rename it to lv_buf_used to make that more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: cleanup the ordered item logic in xlog_cil_insert_format_itemsChristoph Hellwig1-18/+13
Split out handling of ordered items into a single branch in xlog_cil_insert_format_items so that the rest of the code becomes more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: don't pass the old lv to xfs_cil_prepare_itemChristoph Hellwig1-9/+7
By the time xfs_cil_prepare_item is called, the old lv is still pointed to by the log item. Take it from there instead of spreading the old lv logic over xlog_cil_insert_format_items and xfs_cil_prepare_item. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: remove unused trace event xfs_reflink_cow_enospcSteven Rostedt1-1/+0
The call to the event xfs_reflink_cow_enospc was removed when the COW handling was merged into xfs_file_iomap_begin_delay, but the trace event itself was not. Remove it. Fixes: db46e604adf8 ("xfs: merge COW handling into xfs_file_iomap_begin_delay") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: remove unused trace event xfs_discard_rtrelaxSteven Rostedt1-1/+0
The trace event xfs_discard_rtrelax was added but never used. Remove it. Fixes: a330cae8a7147 ("xfs: Remove header files which are included more than once") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: remove unused trace event xfs_log_cil_returnSteven Rostedt1-1/+0
The trace event xfs_log_cil_return was added but never used. Remove it. Fixes: c1220522ef405 ("xfs: grant heads track byte counts, not LSNs") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: remove unused trace event xfs_dqreclaim_dirtySteven Rostedt1-1/+0
The tracepoint trace_xfs_dqreclaim_dirty was removed with other code removed from xfs_qm_dquot_isolate() but the defined tracepoint was not. Fixes: d62016b1a2df ("xfs: avoid dquot buffer pin deadlock") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24fs/xfs: replace strncpy with memtostr_pad()Pranav Tyagi2-3/+2
Replace the deprecated strncpy() with memtostr_pad(). This also avoids the need for separate zeroing using memset(). Mark sb_fname buffer with __nonstring as its size is XFSLABEL_MAX and so no terminating NULL for sb_fname. Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: Remove unused label in xfs_dax_notify_dev_failureAlan Huang1-1/+0
Fixes: e967dc40d501 ("xfs: return the allocated transaction from xfs_trans_alloc_empty") Signed-off-by: Alan Huang <mmpgouride@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: improve the comments in xfs_select_zone_nowaitChristoph Hellwig1-8/+2
The top of the function comment is outdated, and the parts still correct duplicate information in comment inside the function. Remove the top of the function comment and instead improve a comment inside the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: improve the comments in xfs_max_open_zonesChristoph Hellwig1-6/+9
Describe the rationale for the decisions a bit better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: stop passing an inode to the zone space reservation helpersChristoph Hellwig4-27/+22
None of them actually needs the inode, the mount is enough. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: rename oz_write_pointer to oz_allocatedChristoph Hellwig5-29/+28
This member just tracks how much space we handed out for sequential write required zones. Only for conventional space it actually is the pointer where thing are written at, otherwise zone append manages that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: use a uint32_t to cache i_used_blocks in xfs_init_zoneChristoph Hellwig1-1/+1
i_used_blocks is a uint32_t, so use the same value for the local variable caching it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: improve the xg_active_ref check in xfs_group_freeChristoph Hellwig1-1/+2
Split up the XFS_IS_CORRUPT statement so that it immediately shows if the reference counter overflowed or underflowed. I ran into this quite a bit when developing the zoned allocator, and had to reapply the patch for some work recently. We might as well just apply it upstream given that freeing group is far removed from performance critical code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: remove the xlog_ticket_t typedefChristoph Hellwig2-5/+5
Almost no users of the typedef left, kill it and switch the remaining users to use the underlying struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: remove xrep_trans_{alloc,cancel}_hook_dummyChristoph Hellwig4-46/+6
XFS stopped using current->journal_info in commit f2e812c1522d ("xfs: don't use current->journal_info"), so there is no point in saving and restoring it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: return the allocated transaction from xchk_trans_alloc_emptyChristoph Hellwig11-41/+18
xchk_trans_alloc_empty can't return errors, so return the allocated transaction directly instead of an output double pointer argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: return the allocated transaction from xfs_trans_alloc_emptyChristoph Hellwig17-99/+31
xfs_trans_alloc_empty can't return errors, so return the allocated transaction directly instead of an output double pointer argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: don't use xfs_trans_reserve in xfs_trans_rollChristoph Hellwig1-79/+54
xfs_trans_roll uses xfs_trans_reserve to basically just call into xfs_log_regrant while bypassing the reset of xfs_trans_reserve. Open code the call to xfs_log_regrant in xfs_trans_roll and simplify xfs_trans_reserve now that it never regrants and always asks for a log reservation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: decouple xfs_trans_alloc_empty from xfs_trans_allocChristoph Hellwig1-24/+28
xfs_trans_alloc_empty only shares the very basic transaction structure allocation and initialization with xfs_trans_alloc. Split out a new __xfs_trans_alloc helper for that and otherwise decouple xfs_trans_alloc_empty from xfs_trans_alloc. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: don't use xfs_trans_reserve in xfs_trans_reserve_moreChristoph Hellwig1-3/+12
xfs_trans_reserve_more just tries to allocate additional blocks and/or rtextents and is otherwise unrelated to the transaction reservation logic. Open code the block and rtextent reservation in xfs_trans_reserve_more to prepare for simplifying xfs_trans_reserve. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: use xfs_trans_reserve_more in xfs_trans_reserve_more_inodeChristoph Hellwig1-2/+1
Instead of duplicating the empty transacaction reservation definition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: refactor xfs_btree_diff_two_ptrs() to take advantage of cmp_int()Fedor Pchelkin3-8/+8
Use cmp_int() to yield the result of a three-way-comparison instead of performing subtractions with extra casts. Thus also rename the function to make its name clearer in purpose. Found by Linux Verification Center (linuxtesting.org). Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: use a proper variable name and type for storing a comparison resultFedor Pchelkin1-11/+12
Perhaps that's just my silly imagination but 'diff' doesn't look good for the name of a variable to hold a result of a three-way-comparison (-1, 0, 1) which is what ->cmp_key_with_cur() does. It implies to contain an actual difference between the two integer variables but that's not true anymore after recent refactoring. Declaring it as int64_t is also misleading now. Plain integer type is more than enough. Found by Linux Verification Center (linuxtesting.org). Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: refactor cmp_key_with_cur routines to take advantage of cmp_int()Fedor Pchelkin9-74/+30
The net value of these functions is to determine the result of a three-way-comparison between operands of the same type. Simplify the code using cmp_int() to eliminate potential errors with opencoded casts and subtractions. This also means we can change the return value type of cmp_key_with_cur routines from int64_t to int and make the interface a bit clearer. Found by Linux Verification Center (linuxtesting.org). Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: refactor cmp_two_keys routines to take advantage of cmp_int()Fedor Pchelkin9-84/+48
The net value of these functions is to determine the result of a three-way-comparison between operands of the same type. Simplify the code using cmp_int() to eliminate potential errors with opencoded casts and subtractions. This also means we can change the return value type of cmp_two_keys routines from int64_t to int and make the interface a bit clearer. Found by Linux Verification Center (linuxtesting.org). Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: rename key_diff routinesFedor Pchelkin10-25/+28
key_diff routines compare a key value with a cursor value. Make the naming to be a bit more self-descriptive. Found by Linux Verification Center (linuxtesting.org). Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-07-24xfs: rename diff_two_keys routinesFedor Pchelkin10-35/+35