summaryrefslogtreecommitdiff
path: root/fs/libfs.c
AgeCommit message (Collapse)AuthorFilesLines
2024-02-05filelock: split leases out of struct file_lockJeff Layton1-1/+1
Add a new struct file_lease and move the lease-specific fields from struct file_lock to it. Convert the appropriate API calls to take struct file_lease instead, and convert the callers to use them. There is zero overlap between the lock manager operations for file locks and the ones for file leases, so split the lease-related operations off into a new lease_manager_operations struct. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-47-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-11Merge tag 'pull-dcache' of ↵Linus Torvalds1-36/+26
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: "Change of locking rules for __dentry_kill(), regularized refcounting rules in that area, assorted cleanups and removal of weird corner cases (e.g. now ->d_iput() on child is always called before the parent might hit __dentry_kill(), etc)" * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) dcache: remove unnecessary NULL check in dget_dlock() kill DCACHE_MAY_FREE __d_unalias() doesn't use inode argument d_alloc_parallel(): in-lookup hash insertion doesn't need an RCU variant get rid of DCACHE_GENOCIDE d_genocide(): move the extern into fs/internal.h simple_fill_super(): don't bother with d_genocide() on failure nsfs: use d_make_root() d_alloc_pseudo(): move setting ->d_op there from the (sole) caller kill d_instantate_anon(), fold __d_instantiate_anon() into remaining caller retain_dentry(): introduce a trimmed-down lockless variant __dentry_kill(): new locking scheme d_prune_aliases(): use a shrink list switch select_collect{,2}() to use of to_shrink_list() to_shrink_list(): call only if refcount is 0 fold dentry_kill() into dput() don't try to cut corners in shrink_lock_dentry() fold the call of retain_dentry() into fast_dput() Call retain_dentry() with refcount 0 dentry_kill(): don't bother with retain_dentry() on slow path ...
2023-11-25Merge branches 'work.dcache-misc' and 'work.dcache2' into work.dcacheAl Viro1-24/+21
2023-11-25simple_fill_super(): don't bother with d_genocide() on failureAl Viro1-12/+5
Failing ->fill_super() will be followed by ->kill_sb(), which should include kill_litter_super() if the call of simple_fill_super() had been asked to create anything besides the root dentry. So there's no need to empty the partially populated tree - it will be trimmed by inevitable kill_litter_super(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-25dentry: switch the lists of children to hlistAl Viro1-24/+21
Saves a pointer per struct dentry and actually makes the things less clumsy. Cleaned the d_walk() and dcache_readdir() a bit by use of hlist_for_... iterators. A couple of new helpers - d_first_child() and d_next_sibling(), to make the expressions less awful. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-20libfs: getdents() should return 0 after reaching EODChuck Lever1-3/+11
The new directory offset helpers don't conform with the convention of getdents() returning no more entries once a directory file descriptor has reached the current end-of-directory. To address this, copy the logic from dcache_readdir() to mark the open directory file descriptor once EOD has been reached. Seeking resets the mark. Reported-by: Tavian Barnes <tavianator@tavianator.com> Closes: https://lore.kernel.org/linux-fsdevel/20231113180616.2831430-1-tavianator@tavianator.com/ Fixes: 6faddda69f62 ("libfs: Add directory operations for stable offsets") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/170043792492.4628.15646203084646716134.stgit@bazille.1015granger.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-07Merge tag 'vfs-6.7.fsid' of ↵Linus Torvalds1-0/+44
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fanotify fsid updates from Christian Brauner: "This work is part of the plan to enable fanotify to serve as a drop-in replacement for inotify. While inotify is availabe on all filesystems, fanotify currently isn't. In order to support fanotify on all filesystems two things are needed: (1) all filesystems need to support AT_HANDLE_FID (2) all filesystems need to report a non-zero f_fsid This contains (1) and allows filesystems to encode non-decodable file handlers for fanotify without implementing any exportfs operations by encoding a file id of type FILEID_INO64_GEN from i_ino and i_generation. Filesystems that want to opt out of encoding non-decodable file ids for fanotify that don't support NFS export can do so by providing an empty export_operations struct. This also partially addresses (2) by generating f_fsid for simple filesystems as well as freevxfs. Remaining filesystems will be dealt with by separate patches. Finally, this contains the patch from the current exportfs maintainers which moves exportfs under vfs with Chuck, Jeff, and Amir as maintainers and vfs.git as tree" * tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: MAINTAINERS: create an entry for exportfs fs: fix build error with CONFIG_EXPORTFS=m or not defined freevxfs: derive f_fsid from bdev->bd_dev fs: report f_fsid from s_dev for "simple" filesystems exportfs: support encoding non-decodeable file handles by default exportfs: define FILEID_INO64_GEN* file handle types exportfs: make ->encode_fh() a mandatory method for NFS export exportfs: add helpers to check if filesystem can encode/decode file handles
2023-10-28fs: fix build error with CONFIG_EXPORTFS=m or not definedAmir Goldstein1-0/+41
Many of the filesystems that call the generic exportfs helpers do not select the EXPORTFS config. Move generic_encode_ino32_fh() to libfs.c, same as generic_fh_to_*() to avoid having to fix all those config dependencies. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310262151.renqMvme-lkp@intel.com/ Fixes: dfaf653dc415 ("exportfs: make ->encode_fh() a mandatory method for NFS export") Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231026204540.143217-1-amir73il@gmail.com Tested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28fs: report f_fsid from s_dev for "simple" filesystemsAmir Goldstein1-0/+3
There are many "simple" filesystems (*) that report null f_fsid in statfs(2). Those "simple" filesystems report sb->s_dev as the st_dev field of the stat syscalls for all inodes of the filesystem (**). In order to enable fanotify reporting of events with fsid on those "simple" filesystems, report the sb->s_dev number in f_fsid field of statfs(2). (*) For most of the "simple" filesystem refered to in this commit, the ->statfs() operation is simple_statfs(). Some of those fs assign the simple_statfs() method directly in their ->s_op struct and some assign it indirectly via a call to simple_fill_super() or to pseudo_fs_fill_super() with either custom or "simple" s_op. We also make the same change to efivarfs and hugetlbfs, although they do not use simple_statfs(), because they use the simple_* inode opreations (e.g. simple_lookup()). (**) For most of the "simple" filesystems, the ->getattr() method is not assigned, so stat() is implemented by generic_fillattr(). A few "simple" filesystem use the simple_getattr() method which also calls generic_fillattr() to fill most of the stat struct. The two exceptions are procfs and 9p. procfs implements several different ->getattr() methods, but they all end up calling generic_fillattr() to fill the st_dev field from sb->s_dev. 9p has more complicated ->getattr() methods, but they too, end up calling generic_fillattr() to fill the st_dev field from sb->s_dev. Note that 9p and kernfs also call simple_statfs() from custom ->statfs() methods which already fill the f_fsid field, but v9fs_statfs() calls simple_statfs() only in case f_fsid was not filled and kenrfs_statfs() overwrites f_fsid after calling simple_statfs(). Link: https://lore.kernel.org/r/20230919094820.g5bwharbmy2dq46w@quack3/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023143049.2944970-1-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-18fs: new accessor methods for atime and mtimeJeff Layton1-11/+30
Recently, we converted the ctime accesses in the kernel to use new accessor functions. Linus recently pointed out though that if we add accessors for the atime and mtime, then that would allow us to seamlessly change how these timestamps are stored in the inode. Add new accessor functions for the atime and mtime that mirror the accessors for the ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185239.80830-1-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-09-20direct_write_fallback(): on error revert the ->ki_pos update from buffered writeAl Viro1-0/+1
If we fail filemap_write_and_wait_range() on the range the buffered write went into, we only report the "number of bytes which we direct-written", to quote the comment in there. Which is fine, but buffered write has already advanced iocb->ki_pos, so we need to roll that back. Otherwise we end up with e.g. write(2) advancing position by more than the amount it reports having written. Fixes: 182c25e9c157 "filemap: update ki_pos in generic_perform_write" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Message-Id: <20230827214518.GU3390869@ZenIV> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-31Merge tag 'ext4_for_linus-6.6-rc1' of ↵Linus Torvalds1-12/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Many ext4 and jbd2 cleanups and bug fixes: - Cleanups in the ext4 remount code when going to and from read-only - Cleanups in ext4's multiblock allocator - Cleanups in the jbd2 setup/mounting code paths - Performance improvements when appending to a delayed allocation file - Miscellaneous syzbot and other bug fixes" * tag 'ext4_for_linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits) ext4: fix slab-use-after-free in ext4_es_insert_extent() libfs: remove redundant checks of s_encoding ext4: remove redundant checks of s_encoding ext4: reject casefold inode flag without casefold feature ext4: use LIST_HEAD() to initialize the list_head in mballoc.c ext4: do not mark inode dirty every time when appending using delalloc ext4: rename s_error_work to s_sb_upd_work ext4: add periodic superblock update check ext4: drop dio overwrite only flag and associated warning ext4: add correct group descriptors and reserved GDT blocks to system zone ext4: remove unused function declaration ext4: mballoc: avoid garbage value from err ext4: use sbi instead of EXT4_SB(sb) in ext4_mb_new_blocks_simple() ext4: change the type of blocksize in ext4_mb_init_cache() ext4: fix unttached inode after power cut with orphan file feature enabled jbd2: correct the end of the journal recovery scan range ext4: ext4_get_{dev}_journal return proper error value ext4: cleanup ext4_get_dev_journal() and ext4_get_journal() jbd2: jbd2_journal_init_{dev,inode} return proper error return value jbd2: drop useless error tag in jbd2_journal_wipe() ...
2023-08-28Merge tag 'v6.6-vfs.misc' of ↵Linus Torvalds1-21/+21
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual filesystems. Features: - Block mode changes on symlinks and rectify our broken semantics - Report file modifications via fsnotify() for splice - Allow specifying an explicit timeout for the "rootwait" kernel command line option. This allows to timeout and reboot instead of always waiting indefinitely for the root device to show up - Use synchronous fput for the close system call Cleanups: - Get rid of open-coded lockdep workarounds for async io submitters and replace it all with a single consolidated helper - Simplify epoll allocation helper - Convert simple_write_begin and simple_write_end to use a folio - Convert page_cache_pipe_buf_confirm() to use a folio - Simplify __range_close to avoid pointless locking - Disable per-cpu buffer head cache for isolated cpus - Port ecryptfs to kmap_local_page() api - Remove redundant initialization of pointer buf in pipe code - Unexport the d_genocide() function which is only used within core vfs - Replace printk(KERN_ERR) and WARN_ON() with WARN() Fixes: - Fix various kernel-doc issues - Fix refcount underflow for eventfds when used as EFD_SEMAPHORE - Fix a mainly theoretical issue in devpts - Check the return value of __getblk() in reiserfs - Fix a racy assert in i_readcount_dec - Fix integer conversion issues in various functions - Fix LSM security context handling during automounts that prevented NFS superblock sharing" * tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) cachefiles: use kiocb_{start,end}_write() helpers ovl: use kiocb_{start,end}_write() helpers aio: use kiocb_{start,end}_write() helpers io_uring: use kiocb_{start,end}_write() helpers fs: create kiocb_{start,end}_write() helpers fs: add kerneldoc to file_{start,end}_write() helpers io_uring: rename kiocb_end_write() local helper splice: Convert page_cache_pipe_buf_confirm() to use a folio libfs: Convert simple_write_begin and simple_write_end to use a folio fs/dcache: Replace printk and WARN_ON by WARN fs/pipe: remove redundant initialization of pointer buf fs: Fix kernel-doc warnings devpts: Fix kernel-doc warnings doc: idmappings: fix an error and rephrase a paragraph init: Add support for rootwait timeout parameter vfs: fix up the assert in i_readcount_dec fs: Fix one kernel-doc comment docs: filesystems: idmappings: clarify from where idmappings are taken fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing ...
2023-08-28Merge tag 'v6.6-vfs.tmpfs' of ↵Linus Torvalds1-0/+248
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull libfs and tmpfs updates from Christian Brauner: "This cycle saw a lot of work for tmpfs that required changes to the vfs layer. Andrew, Hugh, and I decided to take tmpfs through vfs this cycle. Things will go back to mm next cycle. Features ======== - By far the biggest work is the quota support for tmpfs. New tmpfs quota infrastructure is added to support it and a new QFMT_SHMEM uapi option is exposed. This offers user and group quotas to tmpfs (project quotas will be added later). Similar to other filesystems tmpfs quota are not supported within user namespaces yet. - Add support for user xattrs. While tmpfs already supports security xattrs (security.*) and POSIX ACLs for a long time it lacked support for user xattrs (user.*). With this pull request tmpfs will be able to support a limited number of user xattrs. This is accompanied by a fix (see below) to limit persistent simple xattr allocations. - Add support for stable directory offsets. Currently tmpfs relies on the libfs provided cursor-based mechanism for readdir. This causes issues when a tmpfs filesystem is exported via NFS. NFS clients do not open directories. Instead, each server-side readdir operation opens the directory, reads it, and then closes it. Since the cursor state for that directory is associated with the opened file it is discarded after each readdir operation. Such directory offsets are not just cached by NFS clients but also various userspace libraries based on these clients. As it stands there is no way to invalidate the caches when directory offsets have changed and the whole application depends on unchanging directory offsets. At LSFMM we discussed how to solve this problem and decided to support stable directory offsets. libfs now allows filesystems like tmpfs to use an xarrary to map a directory offset to a dentry. This mechanism is currently only used by tmpfs but can be supported by others as well. Fixes ===== - Change persistent simple xattrs allocations in libfs from GFP_KERNEL to GPF_KERNEL_ACCOUNT so they're subject to memory cgroup limits. Since this is a change to libfs it affects both tmpfs and kernfs. - Correctly verify {g,u}id mount options. A new filesystem context is created via fsopen() which records the namespace that becomes the owning namespace of the superblock when fsconfig(FSCONFIG_CMD_CREATE) is called for filesystems that are mountable in namespaces. However, fsconfig() calls can occur in a namespace different from the namespace where fsopen() has been called. Currently, when fsconfig() is called to set {g,u}id mount options the requested {g,u}id is mapped into a k{g,u}id according to the namespace where fsconfig() was called from. The resulting k{g,u}id is not guaranteed to be resolvable in the namespace of the filesystem (the one that fsopen() was called in). This means it's possible for an unprivileged user to create files owned by any group in a tmpfs mount since it's possible to set the setid bits on the tmpfs directory. The contract for {g,u}id mount options and {g,u}id values in general set from userspace has always been that they are translated according to the caller's idmapping. In so far, tmpfs has been doing the correct thing. But since tmpfs is mountable in unprivileged contexts it is also necessary to verify that the resulting {k,g}uid is representable in the namespace of the superblock to avoid such bugs. The new mount api's cross-namespace delegation abilities are already widely used. Having talked to a bunch of userspace this is the most faithful solution with minimal regression risks" * tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs mm: invalidation check mapping before folio_contains tmpfs: trivial support for direct IO tmpfs,xattr: enable limited user extended attributes tmpfs: track free_ispace instead of free_inodes xattr: simple_xattr_set() return old_xattr to be freed tmpfs: verify {g,u}id mount options correctly shmem: move spinlock into shmem_recalc_inode() to fix quota support libfs: Remove parent dentry locking in offset_iterate_dir() libfs: Add a lock class for the offset map's xa_lock shmem: stable directory offsets shmem: Refactor shmem_symlink() libfs: Add directory operations for stable offsets shmem: fix quota lock nesting in huge hole handling shmem: Add default quota limit mount options shmem: quota support shmem: prepare shmem quota infrastructure quota: Check presence of quota operation structures instead of ->quota_read and ->quota_write callbacks shmem: make shmem_get_inode() return ERR_PTR instead of NULL shmem: make shmem_inode_acct_block() return error
2023-08-27libfs: remove redundant checks of s_encodingEric Biggers1-12/+2
Now that neither ext4 nor f2fs allows inodes with the casefold flag to be instantiated when unsupported, it's unnecessary to repeatedly check for support later on during random filesystem operations. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230814182903.37267-4-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-08-21libfs: Convert simple_write_begin and simple_write_end to use a folioMatthew Wilcox (Oracle)1-20/+20
Remove a number of implicit calls to compound_head() and various calls to compatibility functions. This is not sufficient to enable support for large folios; generic_perform_write() must be converted first. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Message-Id: <20230821141322.2535459-1-willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09libfs: Remove parent dentry locking in offset_iterate_dir()Chuck Lever1-7/+4
Since offset_iterate_dir() does not walk the parent's d_subdir list nor does it manipulate the parent's d_child, there doesn't seem to be a reason to hold the parent's d_lock. The offset_ctx's xarray can be sufficiently protected with just the RCU read lock. Flame graph data captured during the git regression run shows a 20% reduction in CPU cycles consumed in offset_find_next(). Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202307171640.e299f8d5-oliver.sang@intel.com Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Message-Id: <169030957098.157536.9938425508695693348.stgit@manet.1015granger.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09libfs: Add a lock class for the offset map's xa_lockChuck Lever1-0/+3
Tie the dynamically-allocated xarray locks into a single class so contention on the directory offset xarrays can be observed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Message-Id: <169020933088.160441.9405180953116076087.stgit@manet.1015granger.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09libfs: Add directory operations for stable offsetsChuck Lever1-0/+248
Create a vector of directory operations in fs/libfs.c that handles directory seeks and readdir via stable offsets instead of the current cursor-based mechanism. For the moment these are unused. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Message-Id: <168814732984.530310.11190772066786107220.stgit@manet.1015granger.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09fs: pass the request_mask to generic_fillattrJeff Layton1-2/+2
generic_fillattr just fills in the entire stat struct indiscriminately today, copying data from the inode. There is at least one attribute (STATX_CHANGE_COOKIE) that can have side effects when it is reported, and we're looking at adding more with the addition of multigrain timestamps. Add a request_mask argument to generic_fillattr and have most callers just pass in the value that is passed to getattr. Have other callers (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of STATX_CHANGE_COOKIE into generic_fillattr. Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-24kernfs: convert to ctime accessor functionsJeff Layton1-9/+10
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-54-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-10fs: Pass argument to fcntl_setlease as intLuca Vizzarro1-1/+1
The interface for fcntl expects the argument passed for the command F_SETLEASE to be of type int. The current code wrongly treats it as a long. In order to avoid access to undefined bits, we should explicitly cast the argument to int. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Kevin Brodsky <Kevin.Brodsky@arm.com> Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com> Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: David Laight <David.Laight@ACULAB.com> Cc: Mark Rutland <Mark.Rutland@arm.com> Cc: linux-fsdevel@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: linux-nfs@vger.kernel.org Cc: linux-morello@op-lists.linaro.org Signed-off-by: Luca Vizzarro <Luca.Vizzarro@arm.com> Message-Id: <20230414152459.816046-3-Luca.Vizzarro@arm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-10fs: new helper: simple_rename_timestampJeff Layton1-9/+27
A rename potentially involves updating 4 different inode timestamps. Add a function that handles the details sanely, and convert the libfs.c callers to use it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705185812.579118-3-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-09fs: factor out a direct_write_fallback helperChristoph Hellwig1-0/+41
Add a helper dealing with handling the syncing of a buffered write fallback for direct I/O. Link: https://lkml.kernel.org/r/20230601145904.1385409-10-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Hannes Reinecke <hare@suse.de> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-03fs: consolidate duplicate dt_type helpersJeff Layton1-7/+2
There are three copies of the same dt_type helper sprinkled around the tree. Convert them to use the common fs_umode_to_dtype function instead, which has the added advantage of properly returning DT_UNKNOWN when given a mode that contains an unrecognized type. Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Phillip Potter <phil@philpotter.co.uk> Suggested-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Message-Id: <20230330104144.75547-1-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-02-20Merge tag 'fs.idmapped.v6.3' of ↵Linus Torvalds1-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs idmapping updates from Christian Brauner: - Last cycle we introduced the dedicated struct mnt_idmap type for mount idmapping and the required infrastucture in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). As promised in last cycle's pull request message this converts everything to rely on struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevant on the mount level. Especially for non-vfs developers without detailed knowledge in this area this was a potential source for bugs. This finishes the conversion. Instead of passing the plain namespace around this updates all places that currently take a pointer to a mnt_userns with a pointer to struct mnt_idmap. Now that the conversion is done all helpers down to the really low-level helpers only accept a struct mnt_idmap argument instead of two namespace arguments. Conflating mount and other idmappings will now cause the compiler to complain loudly thus eliminating the possibility of any bugs. This makes it impossible for filesystem developers to mix up mount and filesystem idmappings as they are two distinct types and require distinct helpers that cannot be used interchangeably. Everything associated with struct mnt_idmap is moved into a single separate file. With that change no code can poke around in struct mnt_idmap. It can only be interacted with through dedicated helpers. That means all filesystems are and all of the vfs is completely oblivious to the actual implementation of idmappings. We are now also able to extend struct mnt_idmap as we see fit. For example, we can decouple it completely from namespaces for users that don't require or don't want to use them at all. We can also extend the concept of idmappings so we can cover filesystem specific requirements. In combination with the vfs{g,u}id_t work we finished in v6.2 this makes this feature substantially more robust and thus difficult to implement wrong by a given filesystem and also protects the vfs. - Enable idmapped mounts for tmpfs and fulfill a longstanding request. A long-standing request from users had been to make it possible to create idmapped mounts for tmpfs. For example, to share the host's tmpfs mount between multiple sandboxes. This is a prerequisite for some advanced Kubernetes cases. Systemd also has a range of use-cases to increase service isolation. And there are more users of this. However, with all of the other work going on this was way down on the priority list but luckily someone other than ourselves picked this up. As usual the patch is tiny as all the infrastructure work had been done multiple kernel releases ago. In addition to all the tests that we already have I requested that Rodrigo add a dedicated tmpfs testsuite for idmapped mounts to xfstests. It is to be included into xfstests during the v6.3 development cycle. This should add a slew of additional tests. * tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits) shmem: support idmapped mounts for tmpfs fs: move mnt_idmap fs: port vfs{g,u}id helpers to mnt_idmap fs: port fs{g,u}id helpers to mnt_idmap fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap fs: port i_{g,u}id_{needs_}update() to mnt_idmap quota: port to mnt_idmap fs: port privilege checking helpers to mnt_idmap fs: port inode_owner_or_capable() to mnt_idmap fs: port inode_init_owner() to mnt_idmap fs: port acl to mnt_idmap fs: port xattr to mnt_idmap fs: port ->permission() to pass mnt_idmap fs: port ->fileattr_set() to pass mnt_idmap fs: port ->set_acl() to pass mnt_idmap fs: port ->get_acl() to pass mnt_idmap fs: port ->tmpfile() to pass mnt_idmap fs: port ->rename() to pass mnt_idmap fs: port ->mknod() to pass mnt_idmap fs: port ->mkdir() to pass mnt_idmap ...
2023-01-26fs: uninline inode_query_iversionJeff Layton1-0/+36
Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2023-01-19fs: port ->rename() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->getattr() to pass mnt_idmapChristian Brauner1-4/+4
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->setattr() to pass mnt_idmapChristian Brauner1-5/+5
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-11-30libfs: add DEFINE_SIMPLE_ATTRIBUTE_SIGNED for signed valueAkinobu Mita1-3/+19
Patch series "fix error when writing negative value to simple attribute files". The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"), but some attribute files want to accept a negative value. This patch (of 3): The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"), so we have to use a 64-bit value to write a negative value. This adds DEFINE_SIMPLE_ATTRIBUTE_SIGNED for a signed value. Link: https://lkml.kernel.org/r/20220919172418.45257-1-akinobu.mita@gmail.com Link: https://lkml.kernel.org/r/20220919172418.45257-2-akinobu.mita@gmail.com Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()") Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-03fs: uninline inode_maybe_inc_iversion()Andrew Morton1-0/+46
It has many callsites and is large. text data bss dec hex filename 91796 15984 512 108292 1a704 mm/shmem.o-before 91180 15984 512 107676 1a49c mm/shmem.o-after Acked-by: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-09fs: Convert simple_readpage to simple_read_folioMatthew Wilcox (Oracle)1-7/+7
This is a full folio conversion; it is prepared to handle folios of arbitrary size. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-08fs: Remove flags parameter from aops->write_beginMatthew Wilcox (Oracle)1-1/+1
There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-05-08fs: Remove aop flags parameter from grab_cache_page_write_begin()Matthew Wilcox (Oracle)1-1/+1
There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-03-16fs: Convert __set_page_dirty_no_writeback to noop_dirty_folioMatthew Wilcox (Oracle)1-2/+2
This is a mechanical change. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15fs: Remove noop_invalidatepage()Matthew Wilcox (Oracle)1-11/+0
We used to have to use noop_invalidatepage() to prevent block_invalidatepage() from being called, but that behaviour is now gone. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-01-20unicode: clean up the Kconfig symbol confusionChristoph Hellwig1-5/+5
Turn the CONFIG_UNICODE symbol into a tristate that generates some always built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol. Note that a lot of the IS_ENABLED() checks could be turned from cpp statements into normal ifs, but this change is intended to be fairly mechanic, so that should be cleaned up later. Fixes: 2b3d04787012 ("unicode: Add utf8-data module") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
2021-11-03libfs: Support RENAME_EXCHANGE in simple_rename()Lorenz Bauer1-1/+4
Allow atomic exchange via RENAME_EXCHANGE when using simple_rename. This affects binderfs, ramfs, hubetlbfs and bpffs. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/bpf/20211028094724.59043-3-lmb@cloudflare.com
2021-11-03libfs: Move shmem_exchange to simple_rename_exchangeLorenz Bauer1-0/+24
Move shmem_exchange and make it available to other callers. Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/bpf/20211028094724.59043-2-lmb@cloudflare.com
2021-06-29fs: remove noop_set_page_dirty()Matthew Wilcox (Oracle)1-16/+0
Use __set_page_dirty_no_writeback() instead. This will set the dirty bit on the page, which will be used to avoid calling set_page_dirty() in the future. It will have no effect on actually writing the page back, as the pages are not on any LRU lists. [akpm@linux-foundation.org: export __set_page_dirty_no_writeback() to modules] Link: https://lkml.kernel.org/r/20210615162342.1669332-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29fs: remove anon_set_page_dirty()Matthew Wilcox (Oracle)1-10/+1
Use __set_page_dirty_no_writeback() instead. This will set the dirty bit on the page, which will be used to avoid calling set_page_dirty() in the future. It will have no effect on actually writing the page back, as the pages are not on any LRU lists. Link: https://lkml.kernel.org/r/20210615162342.1669332-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29fs: move ramfs_aops to libfsChristoph Hellwig1-4/+13
Move the ramfs aops to libfs and reuse them for kernfs and configfs. Thosw two did not wire up ->set_page_dirty before and now get __set_page_dirty_no_writeback, which is the right one for no-writeback address_space usage. Drop the now unused exports of the libfs helpers only used for ramfs-style pagecache usage. Link: https://lkml.kernel.org/r/20210614061512.3966143-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-23libfs: fix kernel-doc for mnt_usernsRandy Dunlap1-0/+1
Fix kernel-doc warning in libfs.c. ../fs/libfs.c:498: warning: Function parameter or member 'mnt_userns' not described in 'simple_setattr' Link: https://lore.kernel.org/r/20210216042929.8931-2-rdunlap@infradead.org/ Fixes: 549c7297717c ("fs: make helpers idmap mount aware") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Al