summaryrefslogtreecommitdiff
path: root/fs/kernfs/inode.c
AgeCommit message (Collapse)AuthorFilesLines
2023-10-30Merge tag 'vfs-6.7.ctime' of ↵Linus Torvalds1-3/+3
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode time accessor updates from Christian Brauner: "This finishes the conversion of all inode time fields to accessor functions as discussed on list. Changing timestamps manually as we used to do before is error prone. Using accessors function makes this robust. It does not contain the switch of the time fields to discrete 64 bit integers to replace struct timespec and free up space in struct inode. But after this, the switch can be trivially made and the patch should only affect the vfs if we decide to do it" * tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits) fs: rename inode i_atime and i_mtime fields security: convert to new timestamp accessors selinux: convert to new timestamp accessors apparmor: convert to new timestamp accessors sunrpc: convert to new timestamp accessors mm: convert to new timestamp accessors bpf: convert to new timestamp accessors ipc: convert to new timestamp accessors linux: convert to new timestamp accessors zonefs: convert to new timestamp accessors xfs: convert to new timestamp accessors vboxsf: convert to new timestamp accessors ufs: convert to new timestamp accessors udf: convert to new timestamp accessors ubifs: convert to new timestamp accessors tracefs: convert to new timestamp accessors sysv: convert to new timestamp accessors squashfs: convert to new timestamp accessors server: convert to new timestamp accessors client: convert to new timestamp accessors ...
2023-10-18kernfs: convert to new timestamp accessorsJeff Layton1-3/+3
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-47-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-09kernfs: move kernfs_xattr_handlers to .rodataWedson Almeida Filho1-1/+1
This makes it harder for accidental or malicious changes to kernfs_xattr_handlers at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com> Link: https://lore.kernel.org/r/20230930050033.41174-18-wedsonaf@gmail.com Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-28Merge tag 'v6.6-vfs.tmpfs' of ↵Linus Torvalds1-17/+29
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull libfs and tmpfs updates from Christian Brauner: "This cycle saw a lot of work for tmpfs that required changes to the vfs layer. Andrew, Hugh, and I decided to take tmpfs through vfs this cycle. Things will go back to mm next cycle. Features ======== - By far the biggest work is the quota support for tmpfs. New tmpfs quota infrastructure is added to support it and a new QFMT_SHMEM uapi option is exposed. This offers user and group quotas to tmpfs (project quotas will be added later). Similar to other filesystems tmpfs quota are not supported within user namespaces yet. - Add support for user xattrs. While tmpfs already supports security xattrs (security.*) and POSIX ACLs for a long time it lacked support for user xattrs (user.*). With this pull request tmpfs will be able to support a limited number of user xattrs. This is accompanied by a fix (see below) to limit persistent simple xattr allocations. - Add support for stable directory offsets. Currently tmpfs relies on the libfs provided cursor-based mechanism for readdir. This causes issues when a tmpfs filesystem is exported via NFS. NFS clients do not open directories. Instead, each server-side readdir operation opens the directory, reads it, and then closes it. Since the cursor state for that directory is associated with the opened file it is discarded after each readdir operation. Such directory offsets are not just cached by NFS clients but also various userspace libraries based on these clients. As it stands there is no way to invalidate the caches when directory offsets have changed and the whole application depends on unchanging directory offsets. At LSFMM we discussed how to solve this problem and decided to support stable directory offsets. libfs now allows filesystems like tmpfs to use an xarrary to map a directory offset to a dentry. This mechanism is currently only used by tmpfs but can be supported by others as well. Fixes ===== - Change persistent simple xattrs allocations in libfs from GFP_KERNEL to GPF_KERNEL_ACCOUNT so they're subject to memory cgroup limits. Since this is a change to libfs it affects both tmpfs and kernfs. - Correctly verify {g,u}id mount options. A new filesystem context is created via fsopen() which records the namespace that becomes the owning namespace of the superblock when fsconfig(FSCONFIG_CMD_CREATE) is called for filesystems that are mountable in namespaces. However, fsconfig() calls can occur in a namespace different from the namespace where fsopen() has been called. Currently, when fsconfig() is called to set {g,u}id mount options the requested {g,u}id is mapped into a k{g,u}id according to the namespace where fsconfig() was called from. The resulting k{g,u}id is not guaranteed to be resolvable in the namespace of the filesystem (the one that fsopen() was called in). This means it's possible for an unprivileged user to create files owned by any group in a tmpfs mount since it's possible to set the setid bits on the tmpfs directory. The contract for {g,u}id mount options and {g,u}id values in general set from userspace has always been that they are translated according to the caller's idmapping. In so far, tmpfs has been doing the correct thing. But since tmpfs is mountable in unprivileged contexts it is also necessary to verify that the resulting {k,g}uid is representable in the namespace of the superblock to avoid such bugs. The new mount api's cross-namespace delegation abilities are already widely used. Having talked to a bunch of userspace this is the most faithful solution with minimal regression risks" * tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs mm: invalidation check mapping before folio_contains tmpfs: trivial support for direct IO tmpfs,xattr: enable limited user extended attributes tmpfs: track free_ispace instead of free_inodes xattr: simple_xattr_set() return old_xattr to be freed tmpfs: verify {g,u}id mount options correctly shmem: move spinlock into shmem_recalc_inode() to fix quota support libfs: Remove parent dentry locking in offset_iterate_dir() libfs: Add a lock class for the offset map's xa_lock shmem: stable directory offsets shmem: Refactor shmem_symlink() libfs: Add directory operations for stable offsets shmem: fix quota lock nesting in huge hole handling shmem: Add default quota limit mount options shmem: quota support shmem: prepare shmem quota infrastructure quota: Check presence of quota operation structures instead of ->quota_read and ->quota_write callbacks shmem: make shmem_get_inode() return ERR_PTR instead of NULL shmem: make shmem_inode_acct_block() return error
2023-08-09xattr: simple_xattr_set() return old_xattr to be freedHugh Dickins1-17/+29
tmpfs wants to support limited user extended attributes, but kernfs (or cgroupfs, the only kernfs with KERNFS_ROOT_SUPPORT_USER_XATTR) already supports user extended attributes through simple xattrs: but limited by a policy (128KiB per inode) too liberal to be used on tmpfs. To allow a different limiting policy for tmpfs, without affecting the policy for kernfs, change simple_xattr_set() to return the replaced or removed xattr (if any), leaving the caller to update their accounting then free the xattr (by simple_xattr_free(), renamed from the static free_simple_xattr()). Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Message-Id: <158c6585-2aa7-d4aa-90ff-f7c3f8fe407c@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09fs: pass the request_mask to generic_fillattrJeff Layton1-1/+1
generic_fillattr just fills in the entire stat struct indiscriminately today, copying data from the inode. There is at least one attribute (STATX_CHANGE_COOKIE) that can have side effects when it is reported, and we're looking at adding more with the addition of multigrain timestamps. Add a request_mask argument to generic_fillattr and have most callers just pass in the value that is passed to getattr. Have other callers (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of STATX_CHANGE_COOKIE into generic_fillattr. Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-24kernfs: convert to ctime accessor functionsJeff Layton1-3/+2
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-54-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-03-29kernfs: Introduce separate rwsem to protect inode attributes.Imran Khan1-8/+8
Right now a global per-fs rwsem (kernfs_rwsem) synchronizes multiple kernfs operations. On a large system with few hundred CPUs and few hundred applications simultaneoulsy trying to access sysfs, this results in multiple sys_open(s) contending on kernfs_rwsem via kernfs_iop_permission and kernfs_dop_revalidate. For example on a system with 384 cores, if I run 200 instances of an application which is mostly executing the following loop: for (int loop = 0; loop <100 ; loop++) { for (int port_num = 1; port_num < 2; port_num++) { for (int gid_index = 0; gid_index < 254; gid_index++ ) { char ret_buf[64], ret_buf_lo[64]; char gid_file_path[1024]; int ret_len; int ret_fd; ssize_t ret_rd; ub4 i, saved_errno; memset(ret_buf, 0, sizeof(ret_buf)); memset(gid_file_path, 0, sizeof(gid_file_path)); ret_len = snprintf(gid_file_path, sizeof(gid_file_path), "/sys/class/infiniband/%s/ports/%d/gids/%d", dev_name, port_num, gid_index); ret_fd = open(gid_file_path, O_RDONLY | O_CLOEXEC); if (ret_fd < 0) { printf("Failed to open %s\n", gid_file_path); continue; } /* Read the GID */ ret_rd = read(ret_fd, ret_buf, 40); if (ret_rd == -1) { printf("Failed to read from file %s, errno: %u\n", gid_file_path, saved_errno); continue; } close(ret_fd); } } I see contention around kernfs_rwsem as follows: path_openat | |----link_path_walk.part.0.constprop.0 | | | |--49.92%--inode_permission | | | | | --48.69%--kernfs_iop_permission | | | | | |--18.16%--down_read | | | | | |--15.38%--up_read | | | | | --14.58%--_raw_spin_lock | | | | | ----- | | | |--29.08%--walk_component | | | | | --29.02%--lookup_fast | | | | | |--24.26%--kernfs_dop_revalidate | | | | | | | |--14.97%--down_read | | | | | | | --9.01%--up_read | | | | | --4.74%--__d_lookup | | | | | --4.64%--_raw_spin_lock | | | | | ---- Having a separate per-fs rwsem to protect kernfs inode attributes, will avoid the above mentioned contention and result in better performance as can bee seen below: path_openat | |----link_path_walk.part.0.constprop.0 | | | | | |--27.06%--inode_permission | | | | | --25.84%--kernfs_iop_permission | | | | | |--9.29%--up_read | | | | | |--8.19%--down_read | | | | | --7.89%--_raw_spin_lock | | | | | ---- | | | |--22.42%--walk_component | | | | | --22.36%--lookup_fast | | | | | |--16.07%--__d_lookup | | | | | | | --16.01%--_raw_spin_lock | | | | | | | ---- | | | | | --6.28%--kernfs_dop_revalidate | | | | | |--3.76%--down_read | | | | | --2.26%--up_read As can be seen from the above data the overhead due to both kerfs_iop_permission and kernfs_dop_revalidate have gone down and this also reduces overall run time of the earlier mentioned loop. Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20230309110932.2889010-2-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-19fs: port xattr to mnt_idmapChristian Brauner1-2/+2
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->permission() to pass mnt_idmapChristian Brauner1-2/+2
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->getattr() to pass mnt_idmapChristian Brauner1-2/+2
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->setattr() to pass mnt_idmapChristian Brauner1-3/+3
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-11-23kernfs: fix all kernel-doc warnings and multiple typosRandy Dunlap1-4/+4
Fix kernel-doc warnings. Many of these are about a function's return value, so use the kernel-doc Return: format to fix those Use % prefix on numeric constant values. dir.c: fix typos/spellos file.c fix typo: s/taret/target/ Fix all of these kernel-doc warnings: dir.c:305: warning: missing initial short description on line: * kernfs_name_hash dir.c:137: warning: No description found for return value of 'kernfs_path_from_node_locked' dir.c:196: warning: No description found for return value of 'kernfs_name' dir.c:224: warning: No description found for return value of 'kernfs_path_from_node' dir.c:292: warning: No description found for return value of 'kernfs_get_parent' dir.c:312: warning: No description found for return value of 'kernfs_name_hash' dir.c:404: warning: No description found for return value of 'kernfs_unlink_sibling' dir.c:588: warning: No description found for return value of 'kernfs_node_from_dentry' dir.c:806: warning: No description found for return value of 'kernfs_find_ns' dir.c:879: warning: No description found for return value of 'kernfs_find_and_get_ns' dir.c:904: warning: No description found for return value of 'kernfs_walk_and_get_ns' dir.c:927: warning: No description found for return value of 'kernfs_create_root' dir.c:996: warning: No description found for return value of 'kernfs_root_to_node' dir.c:1016: warning: No description found for return value of 'kernfs_create_dir_ns' dir.c:1048: warning: No description found for return value of 'kernfs_create_empty_dir' dir.c:1306: warning: No description found for return value of 'kernfs_next_descendant_post' dir.c:1568: warning: No description found for return value of 'kernfs_remove_self' dir.c:1630: warning: No description found for return value of 'kernfs_remove_by_name_ns' dir.c:1667: warning: No description found for return value of 'kernfs_rename_ns' file.c:66: warning: No description found for return value of 'of_on' file.c:88: warning: No description found for return value of 'kernfs_deref_open_node_locked' file.c:1036: warning: No description found for return value of '__kernfs_create_file' inode.c:100: warning: No description found for return value of 'kernfs_setattr' mount.c:160: warning: No description found for return value of 'kernfs_root_from_sb' mount.c:198: warning: No description found for return value of 'kernfs_node_dentry' mount.c:302: warning: No description found for return value of 'kernfs_super_ns' mount.c:318: warning: No description found for return value of 'kernfs_get_tree' symlink.c:28: warning: No description found for return value of 'kernfs_create_link' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221112031456.22980-1-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-20kernfs: dont take i_lock on inode attr readIan Kent1-4/+0
The kernfs write lock is held when the kernfs node inode attributes are updated. Therefore, when either kernfs_iop_getattr() or kernfs_iop_permission() are called the kernfs node inode attributes won't change. Consequently concurrent kernfs_refresh_inode() calls always copy the same values from the kernfs node. So there's no need to take the inode i_lock to get consistent values for generic_fillattr() and generic_permission(), the kernfs read lock is sufficient. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/166606036215.13363.1288735296954908554.stgit@donald.themaw.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-24kernfs: switch global kernfs_rwsem lock to per-fs lockMinchan Kim1-8/+14
The kernfs implementation has big lock granularity(kernfs_rwsem) so every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the lock. It makes trouble for some cases to wait the global lock for a long time even though they are totally independent contexts each other. A general example is process A goes under direct reclaim with holding the lock when it accessed the file in sysfs and process B is waiting the lock with exclusive mode and then process C is waiting the lock until process B could finish the job after it gets the lock from process A. This patch switches the global kernfs_rwsem to per-fs lock, which put the rwsem into kernfs_root. Suggested-by: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27kernfs: use i_lock to protect concurrent inode updatesIan Kent1-6/+12
The inode operations .permission() and .getattr() use the kernfs node write lock but all that's needed is the read lock to protect against partial updates of these kernfs node fields which are all done under the write lock. And .permission() is called frequently during path walks and can cause quite a bit of contention between kernfs node operations and path walks when the number of concurrent walks is high. To change kernfs_iop_getattr() and kernfs_iop_permission() to take the rw sem read lock instead of the write lock an additional lock is needed to protect against multiple processes concurrently updating the inode attributes and link count in kernfs_refresh_inode(). The inode i_lock seems like the sensible thing to use to protect these inode attribute updates so use it in kernfs_refresh_inode(). The last hunk in the patch, applied to kernfs_fill_super(), is possibly not needed but taking the lock was present originally. I prefer to continue to take it to protect against a partial update of the source kernfs fields during the call to kernfs_refresh_inode() made by kernfs_get_inode(). Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642771474.63632.16295959115893904470.stgit@web.messagingengine.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27kernfs: switch kernfs to use an rwsemIan Kent1-8/+8
The kernfs global lock restricts the ability to perform kernfs node lookup operations in parallel during path walks. Change the kernfs mutex to an rwsem so that, when opportunity arises, node searches can be done in parallel with path walk lookups. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642770946.63632.2218304587223241374.stgit@web.messagingengine.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-29fs: move ramfs_aops to libfsChristoph Hellwig1-7/+1
Move the ramfs aops to libfs and reuse them for kernfs and configfs. Thosw two did not wire up ->set_page_dirty before and now get __set_page_dirty_no_writeback, which is the right one for no-writeback address_space usage. Drop the now unused exports of the libfs helpers only used for ramfs-style pagecache usage. Link: https://lkml.kernel.org/r/20210614061512.3966143-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-24fs: make helpers idmap mount awareChristian Brauner1-3/+6
Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has been marked with. This can be used for additional permission checking and also to enable filesystems to translate between uids and gids if they need to. We have implemented all relevant helpers in earlier patches. As requested we simply extend the exisiting inode method instead of introducing new ones. This is a little more code churn but it's mostly mechanical and doesnt't leave us with additional inode methods. Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24stat: handle idmapped mountsChristian Brauner1-1/+1
The generic_fillattr() helper fills in the basic attributes associated with an inode. Enable it to handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace before we store the uid and gid. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-12-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24acl: handle idmapped mountsChristian Brauner1-0/+2
The posix acl permission checking helpers determine whether a caller is privileged over an inode according to the acls associated with the inode. Add helpers that make it possible to handle acls on idmapped mounts. The vfs and the filesystems targeted by this first iteration make use of posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to translate basic posix access and default permissions such as the ACL_USER and ACL_GROUP type according to the initial user namespace (or the superblock's user namespace) to and from the caller's current user namespace. Adapt these two helpers to handle idmapped mounts whereby we either map from or into the mount's user namespace depending on in which direction we're translating. Similarly, cap_convert_nscap() is used by the vfs to translate user namespace and non-user namespace aware filesystem capabilities from the superblock's user namespace to the caller's user namespace. Enable it to handle idmapped mounts by accounting for the mount's user namespace. In addition the fileystems targeted in the first iteration of this patch series make use of the posix_acl_chmod() and, posix_acl_update_mode() helpers. Both helpers perform permission checks on the target inode. Let them handle idmapped mounts. These two helpers are called when posix acls are set by the respective filesystems to handle this case we extend the ->set() method to take an additional user namespace argument to pass the mount's user namespace down. Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24attr: handle idmapped mountsChristian Brauner1-2/+2
When file attributes are changed most filesystems rely on the setattr_prepare(), setattr_copy(), and notify_change() helpers for initialization and permission checking. Let them handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Helpers that perform checks on the ia_uid and ia_gid fields in struct iattr assume that ia_uid and ia_gid are intended values and have already been mapped correctly at the userspace-kernelspace boundary as we already do today. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-8-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24namei: make permission helpers idmapped mount awareChristian Brauner1-1/+1
The two helpers inode_permission() and generic_permission() are used by the vfs to perform basic permission checking by verifying that the caller is privileged over an inode. In order to handle idmapped mounts we extend the two helpers with an additional user namespace argument. On idmapped mounts the two helpers will make sure to map the inode according to the mount's user namespace and then peform identical permission checks to inode_permission() and generic_permission(). If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-03-16kernfs: Add option to enable user xattrsDaniel Xu1-0/+89
User extended attributes are useful as metadata storage for kernfs consumers like cgroups. Especially in the case of cgroups, it is useful to have a central metadata store that multiple processes/services can use to coordinate actions. A concrete example is for userspace out of memory killers. We want to let delegated cgroup subtree owners (running as non-root) to be able to say "please avoid killing this cgroup". This is especially important for desktop linux as delegated subtrees owners are less likely to run as root. This patch introduces a new flag, KERNFS_ROOT_SUPPORT_USER_XATTR, that lets kernfs consumers enable user xattr support. An initial limit of 128 entries or 128KB -- whichever is hit first -- is placed per cgroup because xattrs come from kernel memory and we don't want to let unprivileged users accidentally eat up too much kernel memory. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Chris Down <chris@chrisdown.name> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2020-03-16kernfs: Add removed_size out param for simple_xattr_setDaniel Xu1-1/+1
This helps set up size accounting in the next commit. Without this out param, it's difficult to find out the removed xattr size without taking a lock for longer and walking the xattr linked list twice. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Chris Down <chris@chrisdown.name> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-12-08kernfs: don't bother with timestamp truncationAl Viro1-3/+3
kernfs users are not going to have limited range or granularity anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-11-12kernfs: convert kernfs_node->id from union kernfs_node_id to u64Tejun Heo1-2/+2
kernfs_node->id is currently a union kernfs_node_id which represents either a 32bit (ino, gen) pair or u64 value. I can't see much value in the usage of the union - all that's needed is a 64bit ID which the current code is already limited to. Using a union makes the code unnecessarily complicated and prevents using 64bit ino without adding practical benefits. This patch drops union kernfs_node_id and makes kernfs_node->id a u64. ino is stored in the lower 32bits and gen upper. Accessors - kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the ino and gen. This simplifies ID handling less cumbersome and will allow using 64bit inos on supported archs. This patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexei Starovoitov <ast@kernel.org>
2019-08-30timestamp_truncate: Replace users of timespec64_truncDeepa Dinamani1-4/+3
Update the inode timestamp updates to use timestamp_truncate() instead of timespec64_trunc(). The change was mostly generated by the following coccinelle script. virtual context virtual patch @r1 depends on patch forall@ struct inode *inode; identifier i_xtime =~ "^i_[acm]time$"; expression e; @@ inode->i_xtime = - timespec64_trunc( + timestamp_truncate( ..., - e); + inode); Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: adrian.hunter@intel.com Cc: dedekind1@gmail.com Cc: gregkh@linuxfoundation.org Cc: hch@lst.de Cc: jaegeuk@kernel.org Cc: jlbec@evilplan.org Cc: richard@nod.at Cc: tj@kernel.org Cc: yuchao0@huawei.com Cc: linux-f2fs-devel@lists.sourceforge.net Cc: linux-ntfs-dev@lists.sourceforge.net Cc: linux-mtd@lists.infradead.org
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428Thomas Gleixner1-2/+1
Based on 1 normalized pattern(s): this file is released under the gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 68 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-04kernfs: fix xattr name handling in LSM helpersOndrej Mosnacek1-41/+21
The implementation of kernfs_security_xattr_*() helpers reuses the kernfs_node_xattr_*() functions, which take the suffix of the xattr name and extract full xattr name from it using xattr_full_name(). However, this function relies on the fact that the suffix passed to xattr handlers from VFS is always constructed from the full name by just incerementing the pointer. This doesn't necessarily hold for the callers of kernfs_security_xattr_*(), so their usage will easily lead to out-of-bounds access. Fix this by moving the xattr name reconstruction to the VFS xattr handlers and replacing the kernfs_security_xattr_*() helpers with more general kernfs_xattr_*() helpers that take full xattr name and allow accessing all kernfs node's xattrs. Reported-by: kernel test robot <rong.a.chen@intel.com> Fixes: b230d5aba2d1 ("LSM: add new hook for kernfs node initialization") Fixes: ec882da5cda9 ("selinux: implement the kernfs_init_security hook") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-03-20LSM: add new hook for kernfs node initializationOndrej Mosnacek1-9/+39
This patch introduces a new security hook that is intended for initializing the security data for newly created kernfs nodes, which provide a way of storing a non-default security context, but need to operate independently from mounts (and therefore may not have an associated inode at the moment of creation). The main motivation is to allow kernfs nodes to inherit the context of the parent under SELinux, similar to the behavior of security_inode_init_security(). Other LSMs may implement their own logic for handling the creation of new nodes. This patch also adds helper functions to <linux/kernfs.h> for getting/setting security xattrs of a kernfs node so that LSMs hooks are able to do their job. Other important attributes should be accessible direcly in the kernfs_node fields (in case there is need for more, then new helpers should be added to kernfs.h along with the patch that needs them). Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> [PM: more manual merge fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-03-20kernfs: use simple_xattrs for security attributesOndrej Mosnacek1-53/+2
Replace the special handling of security xattrs with simple_xattrs, as is already done for the trusted xattrs. This simplifies the code and allows LSMs to use more than just a single xattr to do their business. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> [PM: manual merge fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-03-20kernfs: do not alloc iattrs in kernfs_xattr_getOndrej Mosnacek1-4/+14
This is a read-only operation, so we can simply return -ENODATA if kn->iattr is NULL. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> [PM: minor merge fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-03-20kernfs: clean up struct kernfs_iattrsOndrej Mosnacek1-27/+20
Right now, kernfs_iattrs embeds the whole struct iattr, even though it doesn't really use half of its fields... This both leads to wasting space and makes the code look awkward. Let's just list the few fields we need directly in struct kernfs_iattrs. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> [PM: merged a number of chunks manually due to fuzz] Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-02-08kernfs: Allocating memory for kernfs_iattrs with kmem_cache.Ayush Mittal1-1/+1
Creating a new cache for kernfs_iattrs. Currently, memory is allocated with kzalloc() which always gives aligned memory. On ARM, this is 64 byte aligned. To avoid the wastage of memory in aligning the size requested, a new cache for kernfs_iattrs is created. Size of struct kernfs_iattrs is 80 Bytes. On ARM, it will come in kmalloc-128 slab. and it will come in kmalloc-192 slab if debug info is enabled. Extra bytes taken 48 bytes. Total number of objects created : 4096 Total saving = 48*4096 = 192 KB After creating new slab(When debug info is enabled) : sh-3.2# cat /proc/slabinfo ... kernfs_iattrs_cache 4069 4096 128 32 1 : tunables 0 0 0 : slabdata 128 128 0 ... All testing has been done on ARM target. Signed-off-by: Ayush Mittal <ayush.m@samsung.com> Signed-off-by: Vaneet Narang <v.narang@samsung.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-20kernfs: allow creating kernfs objects with arbitrary uid/gidDmitry Torokhov1-1/+1
This change allows creating kernfs files and directories with arbitrary uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments. The "simple" kernfs_create_file() and kernfs_create_dir() are left alone and always create objects belonging to the global root. When creating symlinks ownership (uid/gid) is taken from the target kernfs object. Co-Developed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05vfs: change inode times to use struct timespec64Deepa Dinamani1-4/+4
struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct \( iattr \| inode \| kstat \) { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (*update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec *t, + struct timespec64 *t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec *t + struct timespec64 *t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode *inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode *node1; local idexpression struct inode *node2; local idexpression struct iattr *attr1; local idexpression struct iattr *attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; | - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) | - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) | - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) | - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) | ts = current_time(e) | fn_update_time(..., &ts,...) | inode_node->i_xtime = ts | node1->i_xtime = ts | ts = inode_node->i_xtime | <+... attr1->ia_xtime ...+> = ts | ts = attr1->ia_xtime | ts.tv_sec | ts.tv_nsec | btrfs_set_stack_timespec_sec(..., ts.tv_sec) | btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) | - ts = timespec64_to_timespec( + ts = ... -) | - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) | - ts = E3 + ts = timespec_to_timespec64(E3) | - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) | fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return ti