summaryrefslogtreecommitdiff
path: root/fs/kernfs/dir.c
AgeCommit message (Collapse)AuthorFilesLines
2024-01-30kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in ↵Tejun Heo1-11/+20
kernfs_find_and_get_node_by_id() The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id() which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock. This can lead to deadlocks as bpf_cgroup_from_id() can be called from any BPF programs including e.g. the ones that attach to functions which are holding the scheduler rq lock. Consider the following BPF program: SEC("fentry/__set_cpus_allowed_ptr_locked") int BPF_PROG(__set_cpus_allowed_ptr_locked, struct task_struct *p, struct affinity_context *affn_ctx, struct rq *rq, struct rq_flags *rf) { struct cgroup *cgrp = bpf_cgroup_from_id(p->cgroups->dfl_cgrp->kn->id); if (cgrp) { bpf_printk("%d[%s] in %s", p->pid, p->comm, cgrp->kn->name); bpf_cgroup_release(cgrp); } return 0; } __set_cpus_allowed_ptr_locked() is called with rq lock held and the above BPF program calls bpf_cgroup_from_id() within leading to the following lockdep warning: ===================================================== WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected 6.7.0-rc3-work-00053-g07124366a1d7-dirty #147 Not tainted ----------------------------------------------------- repro/1620 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: ffffffff833b3688 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1e/0x70 and this task is already holding: ffff888237ced698 (&rq->__lock){-.-.}-{2:2}, at: task_rq_lock+0x4e/0xf0 which would create a new lock dependency: (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2} ... Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(kernfs_idr_lock); local_irq_disable(); lock(&rq->__lock); lock(kernfs_idr_lock); <Interrupt> lock(&rq->__lock); *** DEADLOCK *** ... Call Trace: dump_stack_lvl+0x55/0x70 dump_stack+0x10/0x20 __lock_acquire+0x781/0x2a40 lock_acquire+0xbf/0x1f0 _raw_spin_lock+0x2f/0x40 kernfs_find_and_get_node_by_id+0x1e/0x70 cgroup_get_from_id+0x21/0x240 bpf_cgroup_from_id+0xe/0x20 bpf_prog_98652316e9337a5a___set_cpus_allowed_ptr_locked+0x96/0x11a bpf_trampoline_6442545632+0x4f/0x1000 __set_cpus_allowed_ptr_locked+0x5/0x5a0 sched_setaffinity+0x1b3/0x290 __x64_sys_sched_setaffinity+0x4f/0x60 do_syscall_64+0x40/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e Let's fix it by protecting kernfs_node and kernfs_root with RCU and making kernfs_find_and_get_node_by_id() acquire rcu_read_lock() instead of kernfs_idr_lock. This adds an rcu_head to kernfs_node making it larger by 16 bytes on 64bit. Combined with the preceding rearrange patch, the net increase is 8 bytes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrea Righi <andrea.righi@canonical.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20240109214828.252092-4-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-11Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock"Tejun Heo1-13/+10
This reverts commit dad3fb67ca1cbef87ce700e83a55835e5921ce8a. The commit converted kernfs_idr_lock to an IRQ-safe raw_spinlock because it could be acquired while holding an rq lock through bpf_cgroup_from_id(). However, kernfs_idr_lock is held while doing GPF_NOWAIT allocations which involves acquiring an non-IRQ-safe and non-raw lock leading to the following lockdep warning: ============================= [ BUG: Invalid wait context ] 6.7.0-rc5-kzm9g-00251-g655022a45b1c #578 Not tainted ----------------------------- swapper/0/0 is trying to lock: dfbcd488 (&c->lock){....}-{3:3}, at: local_lock_acquire+0x0/0xa4 other info that might help us debug this: context-{5:5} 2 locks held by swapper/0/0: #0: dfbc9c60 (lock){+.+.}-{3:3}, at: local_lock_acquire+0x0/0xa4 #1: c0c012a8 (kernfs_idr_lock){....}-{2:2}, at: __kernfs_new_node.constprop.0+0x68/0x258 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc5-kzm9g-00251-g655022a45b1c #578 Hardware name: Generic SH73A0 (Flattened Device Tree) unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x68/0x90 dump_stack_lvl from __lock_acquire+0x3cc/0x168c __lock_acquire from lock_acquire+0x274/0x30c lock_acquire from local_lock_acquire+0x28/0xa4 local_lock_acquire from ___slab_alloc+0x234/0x8a8 ___slab_alloc from __slab_alloc.constprop.0+0x30/0x44 __slab_alloc.constprop.0 from kmem_cache_alloc+0x7c/0x148 kmem_cache_alloc from radix_tree_node_alloc.constprop.0+0x44/0xdc radix_tree_node_alloc.constprop.0 from idr_get_free+0x110/0x2b8 idr_get_free from idr_alloc_u32+0x9c/0x108 idr_alloc_u32 from idr_alloc_cyclic+0x50/0xb8 idr_alloc_cyclic from __kernfs_new_node.constprop.0+0x88/0x258 __kernfs_new_node.constprop.0 from kernfs_create_root+0xbc/0x154 kernfs_create_root from sysfs_init+0x18/0x5c sysfs_init from mnt_init+0xc4/0x220 mnt_init from vfs_caches_init+0x6c/0x88 vfs_caches_init from start_kernel+0x474/0x528 start_kernel from 0x0 Let's rever the commit. It's undesirable to spread out raw spinlock usage anyway and the problem can be solved by protecting the lookup path with RCU instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrea Righi <andrea.righi@canonical.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: http://lkml.kernel.org/r/CAMuHMdV=AKt+mwY7svEq5gFPx41LoSQZ_USME5_MEdWQze13ww@mail.gmail.com Link: https://lore.kernel.org/r/20240109214828.252092-2-tj@kernel.org Tested-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-04kernfs: convert kernfs_idr_lock to an irq safe raw spinlockAndrea Righi1-10/+13
bpf_cgroup_from_id() is basically a wrapper to cgroup_get_from_id(), that is relying on kernfs to determine the right cgroup associated to the target id. As a kfunc, it has the potential to be attached to any function through BPF, particularly in contexts where certain locks are held. However, kernfs is not using an irq safe spinlock for kernfs_idr_lock, that means any kernfs function that is acquiring this lock can be interrupted and potentially hit bpf_cgroup_from_id() in the process, triggering a deadlock. For example, it is really easy to trigger a lockdep splat between kernfs_idr_lock and rq->_lock, attaching a small BPF program to __set_cpus_allowed_ptr_locked() that just calls bpf_cgroup_from_id(): ===================================================== WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected 6.7.0-rc7-virtme #5 Not tainted ----------------------------------------------------- repro/131 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: ffffffffb2dc4578 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1d/0x80 and this task is already holding: ffff911cbecaf218 (&rq->__lock){-.-.}-{2:2}, at: task_rq_lock+0x50/0xc0 which would create a new lock dependency: (&rq->__lock){-.-.}-{2:2} -> (kernfs_idr_lock){+.+.}-{2:2} but this new dependency connects a HARDIRQ-irq-safe lock: (&rq->__lock){-.-.}-{2:2} ... which became HARDIRQ-irq-safe at: lock_acquire+0xbf/0x2b0 _raw_spin_lock_nested+0x2e/0x40 scheduler_tick+0x5d/0x170 update_process_times+0x9c/0xb0 tick_periodic+0x27/0xe0 tick_handle_periodic+0x24/0x70 __sysvec_apic_timer_interrupt+0x64/0x1a0 sysvec_apic_timer_interrupt+0x6f/0x80 asm_sysvec_apic_timer_interrupt+0x1a/0x20 memcpy+0xc/0x20 arch_dup_task_struct+0x15/0x30 copy_process+0x1ce/0x1eb0 kernel_clone+0xac/0x390 kernel_thread+0x6f/0xa0 kthreadd+0x199/0x230 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1b/0x30 to a HARDIRQ-irq-unsafe lock: (kernfs_idr_lock){+.+.}-{2:2} ... which became HARDIRQ-irq-unsafe at: ... lock_acquire+0xbf/0x2b0 _raw_spin_lock+0x30/0x40 __kernfs_new_node.isra.0+0x83/0x280 kernfs_create_root+0xf6/0x1d0 sysfs_init+0x1b/0x70 mnt_init+0xd9/0x2a0 vfs_caches_init+0xcf/0xe0 start_kernel+0x58a/0x6a0 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xc5/0xe0 secondary_startup_64_no_verify+0x178/0x17b other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(kernfs_idr_lock); local_irq_disable(); lock(&rq->__lock); lock(kernfs_idr_lock); <Interrupt> lock(&rq->__lock); *** DEADLOCK *** Prevent this deadlock condition converting kernfs_idr_lock to a raw irq safe spinlock. The performance impact of this change should be negligible and it also helps to prevent similar deadlock conditions with any other subsystems that may depend on kernfs. Fixes: 332ea1f697be ("bpf: Add bpf_cgroup_from_id() kfunc") Cc: stable <stable@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20231229074916.53547-1-andrea.righi@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-15kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy()Kees Cook1-17/+17
One of the last remaining users of strlcpy() in the kernel is kernfs_path_from_node_locked(), which passes back the problematic "length we _would_ have copied" return value to indicate truncation. Convert the chain of all callers to use the negative return value (some of which already doing this explicitly). All callers were already also checking for negative return values, so the risk to missed checks looks very low. In this analysis, it was found that cgroup1_release_agent() actually didn't handle the "too large" condition, so this is technically also a bug fix. :) Here's the chain of callers, and resolution identifying each one as now handling the correct return value: kernfs_path_from_node_locked() kernfs_path_from_node() pr_cont_kernfs_path() returns void kernfs_path() sysfs_warn_dup() return value ignored cgroup_path() blkg_path() bfq_bic_update_cgroup() return value ignored TRACE_IOCG_PATH() return value ignored TRACE_CGROUP_PATH() return value ignored perf_event_cgroup() return value ignored task_group_path() return value ignored damon_sysfs_memcg_path_eq() return value ignored get_mm_memcg_path() return value ignored lru_gen_seq_show() return value ignored cgroup_path_from_kernfs_id() return value ignored cgroup_show_path() already converted "too large" error to negative value cgroup_path_ns_locked() cgroup_path_ns() bpf_iter_cgroup_show_fdinfo() return value ignored cgroup1_release_agent() wasn't checking "too large" error proc_cgroup_show() already converted "too large" to negative value Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Waiman Long <longman@redhat.com> Cc: <cgroups@vger.kernel.org> Co-developed-by: Azeem Shaikh <azeemshaikh38@gmail.com> Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Link: https://lore.kernel.org/r/20231116192127.1558276-3-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231212211741.164376-3-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-15kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy()Kees Cook1-5/+5
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated[1]. Additionally, it returns the size of the source string, not the resulting size of the destination string. In an effort to remove strlcpy() completely[2], replace strlcpy() here with strscpy(). Nothing actually checks the return value coming from kernfs_name_locked(), so this has no impact on error paths. The caller hierarchy is: kernfs_name_locked() kernfs_name() pr_cont_kernfs_name() return value ignored cgroup_name() current_css_set_cg_links_read() return value ignored print_page_owner_memcg() return value ignored Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [1] Link: https://github.com/KSPP/linux/issues/89 [2] Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: Azeem Shaikh <azeemshaikh38@gmail.com> Link: https://lore.kernel.org/r/20231116192127.1558276-2-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231212211741.164376-2-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-15kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy()Kees Cook1-3/+3
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated[1]. Additionally, it returns the size of the source string, not the resulting size of the destination string. In an effort to remove strlcpy() completely[2], replace strlcpy() here with strscpy(). Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [1] Link: https://github.com/KSPP/linux/issues/89 [2] Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: Azeem Shaikh <azeemshaikh38@gmail.com> Link: https://lore.kernel.org/r/20231116192127.1558276-1-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231212211741.164376-1-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-15fs/kernfs/dir: obey S_ISGIDMax Kellermann1-0/+12
Handling of S_ISGID is usually done by inode_init_owner() in all other filesystems, but kernfs doesn't use that function. In kernfs, struct kernfs_node is the primary data structure, and struct inode is only created from it on demand. Therefore, inode_init_owner() can't be used and we need to imitate its behavior. S_ISGID support is useful for the cgroup filesystem; it allows subtrees managed by an unprivileged process to retain a certain owner gid, which then enables sharing access to the subtree with another unprivileged process. -- v1 -> v2: minor coding style fix (comment) Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20231208093310.297233-2-max.kellermann@ionos.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-01Merge tag 'driver-core-6.6-rc1' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is a small set of driver core updates and additions for 6.6-rc1. Included in here are: - stable kernel documentation updates - class structure const work from Ivan on various subsystems - kernfs tweaks - driver core tests! - kobject sanity cleanups - kobject structure reordering to save space - driver core error code handling fixups - other minor driver core cleanups All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits) driver core: Call in reversed order in device_platform_notify_remove() driver core: Return proper error code when dev_set_name() fails kobject: Remove redundant checks for whether ktype is NULL kobject: Add sanity check for kset->kobj.ktype in kset_register() drivers: base: test: Add missing MODULE_* macros to root device tests drivers: base: test: Add missing MODULE_* macros for platform devices tests drivers: base: Free devm resources when unregistering a device drivers: base: Add basic devm tests for platform devices drivers: base: Add basic devm tests for root devices kernfs: fix missing kernfs_iattr_rwsem locking docs: stable-kernel-rules: mention that regressions must be prevented docs: stable-kernel-rules: fine-tune various details docs: stable-kernel-rules: make the examples for option 1 a proper list docs: stable-kernel-rules: move text around to improve flow docs: stable-kernel-rules: improve structure by changing headlines base/node: Remove duplicated include kernfs: attach uuid for every kernfs and report it in fsid kernfs: add stub helper for kernfs_generic_poll() x86/resctrl: make pseudo_lock_class a static const structure x86/MSR: make msr_class a static const structure ...
2023-08-12kernfs: fix missing kernfs_iattr_rwsem lockingIan Kent1-0/+4
When the kernfs_iattr_rwsem was introduced a case was missed. The update of the kernfs directory node child count was also protected by the kernfs_rwsem and needs to be included in the change so that the child count (and so the inode n_link attribute) does not change while holding the rwsem for read. Fixes: 9caf69614225 ("kernfs: Introduce separate rwsem to protect inode attributes.") Cc: stable <stable@kernel.org> Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-By: Imran Khan <imran.f.khan@oracle.com> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Eric Sandeen <sandeen@sandeen.net> Link: https://lore.kernel.org/r/169128520941.68052.15749253469930138901.stgit@donald.themaw.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-10tmpfs,xattr: enable limited user extended attributesHugh Dickins1-1/+1
Enable "user." extended attributes on tmpfs, limiting them by tracking the space they occupy, and deducting that space from the limited ispace (unless tmpfs mounted with nr_inodes=0 to leave that ispace unlimited). tmpfs inodes and simple xattrs are both unswappable, and have to be in lowmem on a 32-bit highmem kernel: so the ispace limit is appropriate for xattrs, without any need for a further mount option. Add simple_xattr_space() to give approximate but deterministic estimate of the space taken up by each xattr: with simple_xattrs_free() outputting the space freed if required (but kernfs and even some tmpfs usages do not require that, so don't waste time on strlen'ing if not needed). Security and trusted xattrs were already supported: for consistency and simplicity, account them from the same pool; though there's a small risk that a tmpfs with enough space before would now be considered too small. When extended attributes are used, "df -i" does show more IUsed and less IFree than can be explained by the inodes: document that (manpage later). xfstests tests/generic which were not run on tmpfs before but now pass: 020 037 062 070 077 097 103 117 337 377 454 486 523 533 611 618 728 with no new failures. Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Message-Id: <2e63b26e-df46-5baa-c7d6-f9a8dd3282c5@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-05-31kernfs: fix missing kernfs_idr_lock to remove an ID from the IDRMuchun Song1-0/+2
The root->ino_idr is supposed to be protected by kernfs_idr_lock, fix it. Fixes: 488dee96bb62 ("kernfs: allow creating kernfs objects with arbitrary uid/gid") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230523024017.24851-1-songmuchun@bytedance.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-27Merge tag 'driver-core-6.4-rc1' of ↵Linus Torvalds1-9/+17
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...
2023-04-03fs: consolidate duplicate dt_type helpersJeff Layton1-7/+1
There are three copies of the same dt_type helper sprinkled around the tree. Convert them to use the common fs_umode_to_dtype function instead, which has the added advantage of properly returning DT_UNKNOWN when given a mode that contains an unrecognized type. Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Phillip Potter <phil@philpotter.co.uk> Suggested-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Message-Id: <20230330104144.75547-1-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-03-29kernfs: change kernfs_rename_lock into a read-write lock.Imran Khan1-9/+9
kernfs_rename_lock protects a node's ->parent and thus kernfs topology. Thus it can be used in cases that rely on a stable kernfs topology. Change it to a read-write lock for better scalability. Suggested by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20230309110932.2889010-4-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-29kernfs: Use a per-fs rwsem to protect per-fs list of kernfs_super_info.Imran Khan1-0/+1
Right now per-fs kernfs_rwsem protects list of kernfs_super_info instances for a kernfs_root. Since kernfs_rwsem is used to synchronize several other operations across kernfs and since most of these operations don't impact kernfs_super_info, we can use a separate per-fs rwsem to synchronize access to list of kernfs_super_info. This helps in reducing contention around kernfs_rwsem and also allows operations that change/access list of kernfs_super_info to proceed without contending for kernfs_rwsem. Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20230309110932.2889010-3-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-29kernfs: Introduce separate rwsem to protect inode attributes.Imran Khan1-0/+7
Right now a global per-fs rwsem (kernfs_rwsem) synchronizes multiple kernfs operations. On a large system with few hundred CPUs and few hundred applications simultaneoulsy trying to access sysfs, this results in multiple sys_open(s) contending on kernfs_rwsem via kernfs_iop_permission and kernfs_dop_revalidate. For example on a system with 384 cores, if I run 200 instances of an application which is mostly executing the following loop: for (int loop = 0; loop <100 ; loop++) { for (int port_num = 1; port_num < 2; port_num++) { for (int gid_index = 0; gid_index < 254; gid_index++ ) { char ret_buf[64], ret_buf_lo[64]; char gid_file_path[1024]; int ret_len; int ret_fd; ssize_t ret_rd; ub4 i, saved_errno; memset(ret_buf, 0, sizeof(ret_buf)); memset(gid_file_path, 0, sizeof(gid_file_path)); ret_len = snprintf(gid_file_path, sizeof(gid_file_path), "/sys/class/infiniband/%s/ports/%d/gids/%d", dev_name, port_num, gid_index); ret_fd = open(gid_file_path, O_RDONLY | O_CLOEXEC); if (ret_fd < 0) { printf("Failed to open %s\n", gid_file_path); continue; } /* Read the GID */ ret_rd = read(ret_fd, ret_buf, 40); if (ret_rd == -1) { printf("Failed to read from file %s, errno: %u\n", gid_file_path, saved_errno); continue; } close(ret_fd); } } I see contention around kernfs_rwsem as follows: path_openat | |----link_path_walk.part.0.constprop.0 | | | |--49.92%--inode_permission | | | | | --48.69%--kernfs_iop_permission | | | | | |--18.16%--down_read | | | | | |--15.38%--up_read | | | | | --14.58%--_raw_spin_lock | | | | | ----- | | | |--29.08%--walk_component | | | | | --29.02%--lookup_fast | | | | | |--24.26%--kernfs_dop_revalidate | | | | | | | |--14.97%--down_read | | | | | | | --9.01%--up_read | | | | | --4.74%--__d_lookup | | | | | --4.64%--_raw_spin_lock | | | | | ---- Having a separate per-fs rwsem to protect kernfs inode attributes, will avoid the above mentioned contention and result in better performance as can bee seen below: path_openat | |----link_path_walk.part.0.constprop.0 | | | | | |--27.06%--inode_permission | | | | | --25.84%--kernfs_iop_permission | | | | | |--9.29%--up_read | | | | | |--8.19%--down_read | | | | | --7.89%--_raw_spin_lock | | | | | ---- | | | |--22.42%--walk_component | | | | | --22.36%--lookup_fast | | | | | |--16.07%--__d_lookup | | | | | | | --16.01%--_raw_spin_lock | | | | | | | ---- | | | | | --6.28%--kernfs_dop_revalidate | | | | | |--3.76%--down_read | | | | | --2.26%--up_read As can be seen from the above data the overhead due to both kerfs_iop_permission and kernfs_dop_revalidate have gone down and this also reduces overall run time of the earlier mentioned loop. Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20230309110932.2889010-2-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-24Merge tag 'driver-core-6.3-rc1' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.3-rc1. There's a lot of changes this development cycle, most of the work falls into two different categories: - fw_devlink fixes and updates. This has gone through numerous review cycles and lots of review and testing by lots of different devices. Hopefully all should be good now, and Saravana will be keeping a watch for any potential regression on odd embedded systems. - driver core changes to work to make struct bus_type able to be moved into read-only memory (i.e. const) The recent work with Rust has pointed out a number of areas in the driver core where we are passing around and working with structures that really do not have to be dynamic at all, and they should be able to be read-only making things safer overall. This is the contuation of that work (started last release with kobject changes) in moving struct bus_type to be constant. We didn't quite make it for this release, but the remaining patches will be finished up for the release after this one, but the groundwork has been laid for this effort. Other than that we have in here: - debugfs memory leak fixes in some subsystems - error path cleanups and fixes for some never-able-to-be-hit codepaths. - cacheinfo rework and fixes - Other tiny fixes, full details are in the shortlog All of these have been in linux-next for a while with no reported problems" [ Geert Uytterhoeven points out that that last sentence isn't true, and that there's a pending report that has a fix that is queued up - Linus ] * tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits) debugfs: drop inline constant formatting for ERR_PTR(-ERROR) OPP: fix error checking in opp_migrate_dentry() debugfs: update comment of debugfs_rename() i3c: fix device.h kernel-doc warnings dma-mapping: no need to pass a bus_type into get_arch_dma_ops() driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place Revert "driver core: add error handling for devtmpfs_create_node()" Revert "devtmpfs: add debug info to handle()" Revert "devtmpfs: remove return value of devtmpfs_delete_node()" driver core: cpu: don't hand-override the uevent bus_type callback. devtmpfs: remove return value of devtmpfs_delete_node() devtmpfs: add debug info to handle() driver core: add error handling for devtmpfs_create_node() driver core: bus: update my copyright notice driver core: bus: add bus_get_dev_root() function driver core: bus: constify bus_unregister() driver core: bus: constify some internal functions driver core: bus: constify bus_get_kset() driver core: bus: constify bus_register/unregister_notifier() driver core: remove private pointer from struct bus_type ...
2023-01-19kernfs: remove an unused if statement in kernfs_path_from_node_locked()Zhen Lei1-3/+0
It makes no sense to call kernfs_path_from_node_locked() with NULL buf, and no one is doing that right now. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221126111634.1994-1-thunder.leizhen@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-19fs: port ->rename() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->mkdir() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-11-23kernfs: fix all kernel-doc warnings and multiple typosRandy Dunlap1-32/+50
Fix kernel-doc warnings. Many of these are about a function's return value, so use the kernel-doc Return: format to fix those Use % prefix on numeric constant values. dir.c: fix typos/spellos file.c fix typo: s/taret/target/ Fix all of these kernel-doc warnings: dir.c:305: warning: missing initial short description on line: * kernfs_name_hash dir.c:137: warning: No description found for return value of 'kernfs_path_from_node_locked' dir.c:196: warning: No description found for return value of 'kernfs_name' dir.c:224: warning: No description found for return value of 'kernfs_path_from_node' dir.c:292: warning: No description found for return value of 'kernfs_get_parent' dir.c:312: warning: No description found for return value of 'kernfs_name_hash' dir.c:404: warning: No description found for return value of 'kernfs_unlink_sibling' dir.c:588: warning: No description found for return value of 'kernfs_node_from_dentry' dir.c:806: warning: No description found for return value of 'kernfs_find_ns' dir.c:879: warning: No description found for return value of 'kernfs_find_and_get_ns' dir.c:904: warning: No description found for return value of 'kernfs_walk_and_get_ns' dir.c:927: warning: No description found for return value of 'kernfs_create_root' dir.c:996: warning: No description found for return value of 'kernfs_root_to_node' dir.c:1016: warning: No description found for return value of 'kernfs_create_dir_ns' dir.c:1048: warning: No description found for return value of 'kernfs_create_empty_dir' dir.c:1306: warning: No description found for return value of 'kernfs_next_descendant_post' dir.c:1568: warning: No description found for return value of 'kernfs_remove_self' dir.c:1630: warning: No description found for return value of 'kernfs_remove_by_name_ns' dir.c:1667: warning: No description found for return value of 'kernfs_rename_ns' file.c:66: warning: No description found for return value of 'of_on' file.c:88: warning: No description found for return value of 'kernfs_deref_open_node_locked' file.c:1036: warning: No description found for return value of '__kernfs_create_file' inode.c:100: warning: No description found for return value of 'kernfs_setattr' mount.c:160: warning: No description found for return value of 'kernfs_root_from_sb' mount.c:198: warning: No description found for return value of 'kernfs_node_dentry' mount.c:302: warning: No description found for return value of 'kernfs_super_ns' mount.c:318: warning: No description found for return value of 'kernfs_get_tree' symlink.c:28: warning: No description found for return value of 'kernfs_create_link' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221112031456.22980-1-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-21Merge 6.1-rc6 into driver-core-nextGreg Kroah-Hartman1-2/+12
We need the kernfs changes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10kernfs: Fix spurious lockdep warning in kernfs_find_and_get_node_by_id()Tejun Heo1-2/+12
c25491747b21 ("kernfs: Add KERNFS_REMOVING flags") made kernfs_find_and_get_node_by_id() test kernfs_active() instead of KERNFS_ACTIVATED. kernfs_find_and_get_by_id() is called without holding the kernfs_rwsem triggering the following lockdep warning. WARNING: CPU: 1 PID: 6191 at fs/kernfs/dir.c:36 kernfs_active+0xe8/0x120 fs/kernfs/dir.c:38 Modules linked in: CPU: 1 PID: 6191 Comm: syz-executor.1 Not tainted 6.0.0-syzkaller-09413-g4899a36f91a9 #0 Hardware name: linux,dummy-virt (DT) pstate: 10000005 (nzcV daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kernfs_active+0xe8/0x120 fs/kernfs/dir.c:36 lr : lock_is_held include/linux/lockdep.h:283 [inline] lr : kernfs_active+0x94/0x120 fs/kernfs/dir.c:36 sp : ffff8000182c7a00 x29: ffff8000182c7a00 x28: 0000000000000002 x27: 0000000000000001 x26: ffff00000ee1f6a8 x25: 1fffe00001dc3ed5 x24: 0000000000000000 x23: ffff80000ca1fba0 x22: ffff8000089efcb0 x21: 0000000000000001 x20: ffff0000091181d0 x19: ffff0000091181d0 x18: ffff00006a9e6b88 x17: 0000000000000000 x16: 0000000000000000 x15: ffff00006a9e6bc4 x14: 1ffff00003058f0e x13: 1fffe0000258c816 x12: ffff700003058f39 x11: 1ffff00003058f38 x10: ffff700003058f38 x9 : dfff800000000000 x8 : ffff80000e482f20 x7 : ffff0000091d8058 x6 : ffff80000e482c60 x5 : ffff000009402ee8 x4 : 1ffff00001bd1f46 x3 : 1fffe0000258c6d1 x2 : 0000000000000003 x1 : 00000000000000c0 x0 : 0000000000000000 Call trace: kernfs_active+0xe8/0x120 fs/kernfs/dir.c:38 kernfs_find_and_get_node_by_id+0x6c/0x140 fs/kernfs/dir.c:708 __kernfs_fh_to_dentry fs/kernfs/mount.c:102 [inline] kernfs_fh_to_dentry+0x88/0x1fc fs/kernfs/mount.c:128 exportfs_decode_fh_raw+0x104/0x560 fs/exportfs/expfs.c:435 exportfs_decode_fh+0x10/0x5c fs/exportfs/expfs.c:575 do_handle_to_path fs/fhandle.c:152 [inline] handle_to_path fs/fhandle.c:207 [inline] do_handle_open+0x2a4/0x7b0 fs/fhandle.c:223 __do_compat_sys_open_by_handle_at fs/fhandle.c:277 [inline] __se_compat_sys_open_by_handle_at fs/fhandle.c:274 [inline] __arm64_compat_sys_open_by_handle_at+0x6c/0x9c fs/fhandle.c:274 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x6c/0x260 arch/arm64/kernel/syscall.c:52 el0_svc_common.constprop.0+0xc4/0x254 arch/arm64/kernel/syscall.c:142 do_el0_svc_compat+0x40/0x70 arch/arm64/kernel/syscall.c:212 el0_svc_compat+0x54/0x140 arch/arm64/kernel/entry-common.c:772 el0t_32_sync_handler+0x90/0x140 arch/arm64/kernel/entry-common.c:782 el0t_32_sync+0x190/0x194 arch/arm64/kernel/entry.S:586 irq event stamp: 232 hardirqs last enabled at (231): [<ffff8000081edf70>] raw_spin_rq_unlock_irq kernel/sched/sched.h:1367 [inline] hardirqs last enabled at (231): [<ffff8000081edf70>] finish_lock_switch kernel/sched/core.c:4943 [inline] hardirqs last enabled at (231): [<ffff8000081edf70>] finish_task_switch.isra.0+0x200/0x880 kernel/sched/core.c:5061 hardirqs last disabled at (232): [<ffff80000c888bb4>] el1_dbg+0x24/0x80 arch/arm64/kernel/entry-common.c:404 softirqs last enabled at (228): [<ffff800008010938>] _stext+0x938/0xf58 softirqs last disabled at (207): [<ffff800008019380>] ____do_softirq+0x10/0x20 arch/arm64/kernel/irq.c:79 ---[ end trace 0000000000000000 ]--- The lockdep warning in kernfs_active() is there to ensure that the activated state stays stable for the caller. For kernfs_find_and_get_node_by_id(), all that's needed is ensuring that a node which has never been activated can't be looked up and guaranteeing lookup success when the caller knows the node to be active, both of which can be achieved by testing the active count without holding the kernfs_rwsem. Fix the spurious warning by introducing __kernfs_active() which doesn't have the lockdep annotation. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: syzbot+590ce62b128e79cf0a35@syzkaller.appspotmail.com Fixes: c25491747b21 ("kernfs: Add KERNFS_REMOVING flags") Cc: Amir Goldstein <amir73il@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/Y0SwqBsZ9BMmZv6x@slm.duckdns.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-20kernfs: dont take i_lock on revalidateIan Kent1-7/+17
In kernfs_dop_revalidate() when the passed in dentry is negative the dentry directory is checked to see if it has changed and if so the negative dentry is discarded so it can refreshed. During this check the dentry inode i_lock is taken to mitigate against a possible concurrent rename. But if it's racing with a rename, becuase the dentry is negative, it can't be the source it must be the target and it must be going to do a d_move() otherwise the rename will return an error. In this case the parent dentry of the target will not change, it will be the same over the d_move(), only the source dentry parent may change so the inode i_lock isn't needed. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/166606036967.13363.9336408133975631967.stgit@donald.themaw.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-24kernfs: fix use-after-free in __kernfs_removeChristian A. Ehrhardt1-1/+4
Syzkaller managed to trigger concurrent calls to kernfs_remove_by_name_ns() for the same file resulting in a KASAN detected use-after-free. The race occurs when the root node is freed during kernfs_drain(). To prevent this acquire an additional reference for the root of the tree that is removed before calling __kernfs_remove(). Found by syzkaller with the following reproducer (slab_nomerge is required): syz_mount_image$ext4(0x0, &(0x7f0000000100)='./file0\x00', 0x100000, 0x0, 0x0, 0x0, 0x0) r0 = openat(0xffffffffffffff9c, &(0x7f0000000080)='/proc/self/exe\x00', 0x0, 0x0) close(r0) pipe2(&(0x7f0000000140)={0xffffffffffffffff, <r1=>0xffffffffffffffff}, 0x800) mount$9p_fd(0x0, &(0x7f0000000040)='./file0\x00', &(0x7f00000000c0), 0x408, &(0x7f0000000280)={'trans=fd,', {'rfdno', 0x3d, r0}, 0x2c, {'wfdno', 0x3d, r1}, 0x2c, {[{@cache_loose}, {@mmap}, {@loose}, {@loose}, {@mmap}], [{@mask={'mask', 0x3d, '^MAY_EXEC'}}, {@fsmagic={'fsmagic', 0x3d, 0x10001}}, {@dont_hash}]}}) Sample report: ================================================================== BUG: KASAN: use-after-free in kernfs_type include/linux/kernfs.h:335 [inline] BUG: KASAN: use-after-free in kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline] BUG: KASAN: use-after-free in __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369 Read of size 2 at addr ffff8880088807f0 by task syz-executor.2/857 CPU: 0 PID: 857 Comm: syz-executor.2 Not tainted 6.0.0-rc3-00363-g7726d4c3e60b #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x6e/0x91 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x5e/0x5e5 mm/kasan/report.c:433 kasan_report+0xa3/0x130 mm/kasan/report.c:495 kernfs_type include/linux/kernfs.h:335 [inline] kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline] __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369 __kernfs_remove fs/kernfs/dir.c:1356 [inline] kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589 sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943 __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335 p9_client_create+0xd4d/0x1190 net/9p/client.c:993 v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408 v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126 legacy_get_tree+0xf1/0x200 fs/fs_context.c:610 vfs_get_tree+0x85/0x2e0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x675/0x1d00 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x282/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f725f983aed Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f725f0f7028 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007f725faa3f80 RCX: 00007f725f983aed RDX: 00000000200000c0 RSI: 0000000020000040 RDI: 0000000000000000 RBP: 00007f725f9f419c R08: 0000000020000280 R09: 0000000000000000 R10: 0000000000000408 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000006 R14: 00007f725faa3f80 R15: 00007f725f0d7000 </TASK> Allocated by task 855: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:45 [inline] set_alloc_info mm/kasan/common.c:437 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:470 kasan_slab_alloc include/linux/kasan.h:224 [inline] slab_post_alloc_hook mm/slab.h:727 [inline] slab_alloc_node mm/slub.c:3243 [inline] slab_alloc mm/slub.c:3251 [inline] __kmem_cache_alloc_lru mm/slub.c:3258 [inline] kmem_cache_alloc+0xbf/0x200 mm/slub.c:3268 kmem_cache_zalloc include/linux/slab.h:723 [inline] __kernfs_new_node+0xd4/0x680 fs/kernfs/dir.c:593 kernfs_new_node fs/kernfs/dir.c:655 [inline] kernfs_create_dir_ns+0x9c/0x220 fs/kernfs/dir.c:1010 sysfs_create_dir_ns+0x127/0x290 fs/sysfs/dir.c:59 create_dir lib/kobject.c:63 [inline] kobject_add_internal+0x24a/0x8d0 lib/kobject.c:223 kobject_add_varg lib/kobject.c:358 [inline] kobject_init_and_add+0x101/0x160 lib/kobject.c:441 sysfs_slab_add+0x156/0x1e0 mm/slub.c:5954 __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335 p9_client_create+0xd4d/0x1190 net/9p/client.c:993 v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408 v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126 legacy_get_tree+0xf1/0x200 fs/fs_context.c:610 vfs_get_tree+0x85/0x2e0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x675/0x1d00 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x282/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 857: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:45 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:367 [inline] ____kasan_slab_free mm/kasan/common.c:329 [inline] __kasan_slab_free+0x108/0x190 mm/kasan/common.c:375 kasan_slab_free include/linux/kasan.h:200 [inline] slab_free_hook mm/slub.c:1754 [inline] slab_free_freelist_hook mm/slub.c:1780 [inline] slab_free mm/slub.c:3534 [inline] kmem_cache_free+0x9c/0x340 mm/slub.c:3551 kernfs_put.part.0+0x2b2/0x520 fs/kernfs/dir.c:547 kernfs_put+0x42/0x50 fs/kernfs/dir.c:521 __kernfs_remove.part.0+0x72d/0x960 fs/kernfs/dir.c:1407 __kernfs_remove fs/kernfs/dir.c:1356 [inline] kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589 sysfs_slab_add+0x133/0x1e0 mm/slub.c:5