summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-06-22kbuild: generate KSYMTAB entries by modpostMasahiro Yamada1-0/+12
Commit 7b4537199a4a ("kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS") made modpost output CRCs in the same way whether the EXPORT_SYMBOL() is placed in *.c or *.S. For further cleanups, this commit applies a similar approach to the entire data structure of EXPORT_SYMBOL(). The EXPORT_SYMBOL() compilation is split into two stages. When a source file is compiled, EXPORT_SYMBOL() will be converted into a dummy symbol in the .export_symbol section. For example, EXPORT_SYMBOL(foo); EXPORT_SYMBOL_NS_GPL(bar, BAR_NAMESPACE); will be encoded into the following assembly code: .section ".export_symbol","a" __export_symbol_foo: .asciz "" /* license */ .asciz "" /* name space */ .balign 8 .quad foo /* symbol reference */ .previous .section ".export_symbol","a" __export_symbol_bar: .asciz "GPL" /* license */ .asciz "BAR_NAMESPACE" /* name space */ .balign 8 .quad bar /* symbol reference */ .previous They are mere markers to tell modpost the name, license, and namespace of the symbols. They will be dropped from the final vmlinux and modules because the *(.export_symbol) will go into /DISCARD/ in the linker script. Then, modpost extracts all the information about EXPORT_SYMBOL() from the .export_symbol section, and generates the final C code: KSYMTAB_FUNC(foo, "", ""); KSYMTAB_FUNC(bar, "_gpl", "BAR_NAMESPACE"); KSYMTAB_FUNC() (or KSYMTAB_DATA() if it is data) is expanded to struct kernel_symbol that will be linked to the vmlinux or a module. With this change, EXPORT_SYMBOL() works in the same way for *.c and *.S files, providing the following benefits. [1] Deprecate EXPORT_DATA_SYMBOL() In the old days, EXPORT_SYMBOL() was only available in C files. To export a symbol in *.S, EXPORT_SYMBOL() was placed in a separate *.c file. arch/arm/kernel/armksyms.c is one example written in the classic manner. Commit 22823ab419d8 ("EXPORT_SYMBOL() for asm") removed this limitation. Since then, EXPORT_SYMBOL() can be placed close to the symbol definition in *.S files. It was a nice improvement. However, as that commit mentioned, you need to use EXPORT_DATA_SYMBOL() for data objects on some architectures. In the new approach, modpost checks symbol's type (STT_FUNC or not), and outputs KSYMTAB_FUNC() or KSYMTAB_DATA() accordingly. There are only two users of EXPORT_DATA_SYMBOL: EXPORT_DATA_SYMBOL_GPL(empty_zero_page) (arch/ia64/kernel/head.S) EXPORT_DATA_SYMBOL(ia64_ivt) (arch/ia64/kernel/ivt.S) They are transformed as follows and output into .vmlinux.export.c KSYMTAB_DATA(empty_zero_page, "_gpl", ""); KSYMTAB_DATA(ia64_ivt, "", ""); The other EXPORT_SYMBOL users in ia64 assembly are output as KSYMTAB_FUNC(). EXPORT_DATA_SYMBOL() is now deprecated. [2] merge <linux/export.h> and <asm-generic/export.h> There are two similar header implementations: include/linux/export.h for .c files include/asm-generic/export.h for .S files Ideally, the functionality should be consistent between them, but they tend to diverge. Commit 8651ec01daed ("module: add support for symbol namespaces.") did not support the namespace for *.S files. This commit shifts the essential implementation part to C, which supports EXPORT_SYMBOL_NS() for *.S files. <asm/export.h> and <asm-generic/export.h> will remain as a wrapper of <linux/export.h> for a while. They will be removed after #include <asm/export.h> directives are all replaced with #include <linux/export.h>. [3] Implement CONFIG_TRIM_UNUSED_KSYMS in one-pass algorithm (by a later commit) When CONFIG_TRIM_UNUSED_KSYMS is enabled, Kbuild recursively traverses the directory tree to determine which EXPORT_SYMBOL to trim. If an EXPORT_SYMBOL turns out to be unused by anyone, Kbuild begins the second traverse, where some source files are recompiled with their EXPORT_SYMBOL() tuned into a no-op. We can do this better now; modpost can selectively emit KSYMTAB entries that are really used by modules. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2023-05-28Revert "kheaders: substituting --sort in archive creation"Masahiro Yamada1-6/+3
This reverts commit 700dea5a0bea9f64eba89fae7cb2540326fdfdc1. The reason for that commit was --sort=ORDER introduced in tar 1.28 (2014). More than 3 years have passed since then. Requiring GNU tar 1.28 should be fine now because we require GCC 5.1 (2015). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2023-05-14Merge tag 'locking_urgent_for_v6.4_rc2' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Make sure __down_read_common() is always inlined so that the callers' names land in traceevents output and thus the blocked function can be identified * tag 'locking_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Add __always_inline annotation to __down_read_common() and inlined callers
2023-05-14Merge tag 'perf_urgent_for_v6.4_rc2' of ↵Linus Torvalds1-1/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Make sure the PEBS buffer is flushed before reprogramming the hardware so that the correct record sizes are used - Update the sample size for AMD BRS events - Fix a confusion with using the same on-stack struct with different events in the event processing path * tag 'perf_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG perf/x86: Fix missing sample size update on AMD BRS perf/core: Fix perf_sample_data not properly initialized for different swevents in perf_tp_event()
2023-05-14Merge tag 'sched_urgent_for_v6.4_rc2' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: - Fix a couple of kernel-doc warnings * tag 'sched_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: fix cid_lock kernel-doc warnings
2023-05-08tick/broadcast: Make broadcast device replacement work correctlyThomas Gleixner1-32/+88
When a tick broadcast clockevent device is initialized for one shot mode then tick_broadcast_setup_oneshot() OR's the periodic broadcast mode cpumask into the oneshot broadcast cpumask. This is required when switching from periodic broadcast mode to oneshot broadcast mode to ensure that CPUs which are waiting for periodic broadcast are woken up on the next tick. But it is subtly broken, when an active broadcast device is replaced and the system is already in oneshot (NOHZ/HIGHRES) mode. Victor observed this and debugged the issue. Then the OR of the periodic broadcast CPU mask is wrong as the periodic cpumask bits are sticky after tick_broadcast_enable() set it for a CPU unless explicitly cleared via tick_broadcast_disable(). That means that this sets all other CPUs which have tick broadcasting enabled at that point unconditionally in the oneshot broadcast mask. If the affected CPUs were already idle and had their bits set in the oneshot broadcast mask then this does no harm. But for non idle CPUs which were not set this corrupts their state. On their next invocation of tick_broadcast_enable() they observe the bit set, which indicates that the broadcast for the CPU is already set up. As a consequence they fail to update the broadcast event even if their earliest expiring timer is before the actually programmed broadcast event. If the programmed broadcast event is far in the future, then this can cause stalls or trigger the hung task detector. Avoid this by telling tick_broadcast_setup_oneshot() explicitly whether this is the initial switch over from periodic to oneshot broadcast which must take the periodic broadcast mask into account. In the case of initialization of a replacement device this prevents that the broadcast oneshot mask is modified. There is a second problem with broadcast device replacement in this function. The broadcast device is only armed when the previous state of the device was periodic. That is correct for the switch from periodic broadcast mode to oneshot broadcast mode as the underlying broadcast device could operate in oneshot state already due to lack of periodic state in hardware. In that case it is already armed to expire at the next tick. For the replacement case this is wrong as the device is in shutdown state. That means that any already pending broadcast event will not be armed. This went unnoticed because any CPU which goes idle will observe that the broadcast device has an expiry time of KTIME_MAX and therefore any CPUs next timer event will be earlier and cause a reprogramming of the broadcast device. But that does not guarantee that the events of the CPUs which were already in idle are delivered on time. Fix this by arming the newly installed device for an immediate event which will reevaluate the per CPU expiry times and reprogram the broadcast device accordingly. This is simpler than caching the last expiry time in yet another place or saving it before the device exchange and handing it down to the setup function. Replacement of broadcast devices is not a frequent operation and usually happens once somewhere late in the boot process. Fixes: 9c336c9935cf ("tick/broadcast: Allow late registered device to enter oneshot mode") Reported-by: Victor Hassan <victor@allwinnertech.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/87pm7d2z1i.ffs@tglx
2023-05-08sched: fix cid_lock kernel-doc warningsRandy Dunlap1-2/+2
Fix kernel-doc warnings for cid_lock and use_cid_lock. These comments are not in kernel-doc format. kernel/sched/core.c:11496: warning: Cannot understand * @cid_lock: Guarantee forward-progress of cid allocation. on line 11496 - I thought it was a doc line kernel/sched/core.c:11505: warning: Cannot understand * @use_cid_lock: Select cid allocation behavior: lock-free vs spinlock. on line 11505 - I thought it was a doc line Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230428031111.322-1-rdunlap@infradead.org
2023-05-08perf/core: Fix perf_sample_data not properly initialized for different ↵Yang Jihong1-1/+13
swevents in perf_tp_event() data->sample_flags may be modified in perf_prepare_sample(), in perf_tp_event(), different swevents use the same on-stack perf_sample_data, the previous swevent may change sample_flags in perf_prepare_sample(), as a result, some members of perf_sample_data are not correctly initialized when next swevent_event preparing sample (for example data->id, the value varies according to swevent). A simple scenario triggers this problem is as follows: # perf record -e sched:sched_switch --switch-output-event sched:sched_switch -a sleep 1 [ perf record: dump data: Woken up 0 times ] [ perf record: Dump perf.data.2023041209014396 ] [ perf record: dump data: Woken up 0 times ] [ perf record: Dump perf.data.2023041209014662 ] [ perf record: dump data: Woken up 0 times ] [ perf record: Dump perf.data.2023041209014910 ] [ perf record: Woken up 0 times to write data ] [ perf record: Dump perf.data.2023041209015164 ] [ perf record: Captured and wrote 0.069 MB perf.data.<timestamp> ] # ls -l total 860 -rw------- 1 root root 95694 Apr 12 09:01 perf.data.2023041209014396 -rw------- 1 root root 606430 Apr 12 09:01 perf.data.2023041209014662 -rw------- 1 root root 82246 Apr 12 09:01 perf.data.2023041209014910 -rw------- 1 root root 82342 Apr 12 09:01 perf.data.2023041209015164 # perf script -i perf.data.2023041209014396 0x11d58 [0x80]: failed to process type: 9 [Bad address] Solution: Re-initialize perf_sample_data after each event is processed. Note that data->raw->frag.data may be accessed in perf_tp_event_match(). Therefore, need to init sample_data and then go through swevent hlist to prevent reference of NULL pointer, reported by [1]. After fix: # perf record -e sched:sched_switch --switch-output-event sched:sched_switch -a sleep 1 [ perf record: dump data: Woken up 0 times ] [ perf record: Dump perf.data.2023041209442259 ] [ perf record: dump data: Woken up 0 times ] [ perf record: Dump perf.data.2023041209442514 ] [ perf record: dump data: Woken up 0 times ] [ perf record: Dump perf.data.2023041209442760 ] [ perf record: Woken up 0 times to write data ] [ perf record: Dump perf.data.2023041209443003 ] [ perf record: Captured and wrote 0.069 MB perf.data.<timestamp> ] # ls -l total 864 -rw------- 1 root root 100166 Apr 12 09:44 perf.data.2023041209442259 -rw------- 1 root root 606438 Apr 12 09:44 perf.data.2023041209442514 -rw------- 1 root root 82246 Apr 12 09:44 perf.data.2023041209442760 -rw------- 1 root root 82342 Apr 12 09:44 perf.data.2023041209443003 # perf script -i perf.data.2023041209442259 | head -n 5 perf 232 [000] 66.846217: sched:sched_switch: prev_comm=perf prev_pid=232 prev_prio=120 prev_state=D ==> next_comm=perf next_pid=234 next_prio=120 perf 234 [000] 66.846449: sched:sched_switch: prev_comm=perf prev_pid=234 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=232 next_prio=120 perf 232 [000] 66.846546: sched:sched_switch: prev_comm=perf prev_pid=232 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=234 next_prio=120 perf 234 [000] 66.846606: sched:sched_switch: prev_comm=perf prev_pid=234 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=232 next_prio=120 perf 232 [000] 66.846646: sched:sched_switch: prev_comm=perf prev_pid=232 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=234 next_prio=120 [1] Link: https://lore.kernel.org/oe-lkp/202304250929.efef2caa-yujie.liu@intel.com Fixes: bb447c27a467 ("perf/core: Set data->sample_flags in perf_prepare_sample()") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230425103217.130600-1-yangjihong1@huawei.com
2023-05-08locking/rwsem: Add __always_inline annotation to __down_read_common() and ↵John Stultz1-4/+4
inlined callers Apparently despite it being marked inline, the compiler may not inline __down_read_common() which makes it difficult to identify the cause of lock contention, as the blocked function in traceevents will always be listed as __down_read_common(). So this patch adds __always_inline annotation to the common function (as well as the inlined helper callers) to force it to be inlined so the blocking function will be listed (via Wchan) in traceevents. Fixes: c995e638ccbb ("locking/rwsem: Fold __down_{read,write}*()") Reported-by: Tim Murray <timmurray@google.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Waiman Long <longman@redhat.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20230503023351.2832796-1-jstultz@google.com
2023-05-05Merge tag 'trace-v6.4-rc1' of ↵Linus Torvalds2-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull more tracing updates from Steven Rostedt: - Make buffer_percent read/write. The buffer_percent file is how users can state how long to block on the tracing buffer depending on how much is in the buffer. When it hits the "buffer_percent" it will wake the task waiting on the buffer. For some reason it was set to read-only. This was not noticed because testing was done as root without SELinux, but with SELinux it will prevent even root to write to it without having CAP_DAC_OVERRIDE. - The "touched_functions" was added this merge window, but one of the reasons for adding it was not implemented. That was to show what functions were not only touched, but had either a direct trampoline attached to it, or a kprobe or live kernel patching that can "hijack" the function to run a different function. The point is to know if there's functions in the kernel that may not be behaving as the kernel code shows. This can be used for debugging. TODO: Add this information to kernel oops too. * tag 'trace-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ftrace: Add MODIFIED flag to show if IPMODIFY or direct was attached tracing: Fix permissions for the buffer_percent file
2023-05-05Merge tag 'locking-core-2023-05-05' of ↵Linus Torvalds1-9/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce local{,64}_try_cmpxchg() - a slightly more optimal primitive, which will be used in perf events ring-buffer code - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation - Misc cleanups/fixes * tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomic: Correct (cmp)xchg() instrumentation locking/x86: Define arch_try_cmpxchg_local() locking/arch: Wire up local_try_cmpxchg() locking/generic: Wire up local{,64}_try_cmpxchg() locking/atomic: Add generic try_cmpxchg{,64}_local() support locking/rwbase: Mitigate indefinite writer starvation locking/arch: Rename all internal __xchg() names to __arch_xchg()
2023-05-05ftrace: Add MODIFIED flag to show if IPMODIFY or direct was attachedSteven Rostedt (Google)1-3/+9
If a function had ever had IPMODIFY or DIRECT attached to it, where this is how live kernel patching and BPF overrides work, mark them and display an "M" in the enabled_functions and touched_functions files. This can be used for debugging. If a function had been modified and later there's a bug in the code related to that function, this can be used to know if the cause is possibly from a live kernel patch or a BPF program that changed the behavior of the code. Also update the documentation on the enabled_functions and touched_functions output, as it was missing direct callers and CALL_OPS. And include this new modify attribute. Link: https://lore.kernel.org/linux-trace-kernel/20230502213233.004e3ae4@gandalf.local.home Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-04Merge tag 'mm-hotfixes-stable-2023-05-03-16-27' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hitfixes from Andrew Morton: "Five hotfixes. Three are cc:stable, two for this -rc cycle" * tag 'mm-hotfixes-stable-2023-05-03-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: change per-VMA lock statistics to be disabled by default MAINTAINERS: update Michal Simek's email mm/mempolicy: correctly update prev when policy is equal on mbind relayfs: fix out-of-bounds access in relay_file_read kasan: hw_tags: avoid invalid virt_to_page()
2023-05-04Merge tag 'mm-stable-2023-05-03-16-22' of ↵Linus Torvalds1-9/+3
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - Some DAMON cleanups from Kefeng Wang - Some KSM work from David Hildenbrand, to make the PR_SET_MEMORY_MERGE ioctl's behavior more similar to KSM's behavior. [ Andrew called these "final", but I suspect we'll have a series fixing up the fact that the last commit in the dmapools series in the previous pull seems to have unintentionally just reverted all the other commits in the same series.. - Linus ] * tag 'mm-stable-2023-05-03-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: hwpoison: coredump: support recovery from dump_user_range() mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page() mm/ksm: move disabling KSM from s390/gmap code to KSM code selftests/ksm: ksm_functional_tests: add prctl unmerge test mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0 mm/damon/paddr: fix missing folio_sz update in damon_pa_young() mm/damon/paddr: minor refactor of damon_pa_mark_accessed_or_deactivate() mm/damon/paddr: minor refactor of damon_pa_pageout()
2023-05-03Merge tag 'modules-6.4-rc1-v2' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules fix from Luis Chamberlain: "One fix by Arnd far for modules which came in after the first pull request. The issue was found as part of some late compile tests with 0-day. I take it 0-day does some secondary late builds with after some initial ones" * tag 'modules-6.4-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module: include internal.h in module/dups.c
2023-05-03Merge tag 'sysctl-6.4-rc1-v2' of ↵Linus Torvalds2-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull more sysctl updates from Luis Chamberlain: "As mentioned on my first pull request for sysctl-next, for v6.4-rc1 we're very close to being able to deprecating register_sysctl_paths(). I was going to assess the situation after the first week of the merge window. That time is now and things are looking good. We only have one which had already an ACK for so I'm picking this up here now and the last patch is the one that uses an axe. I have boot tested the last patch and 0-day build completed successfully" * tag 'sysctl-6.4-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sysctl: remove register_sysctl_paths() kernel: pid_namespace: simplify sysctls with register_sysctl()
2023-05-03Merge tag 'pm-6.4-rc1-2' of ↵Linus Torvalds3-5/+19
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These fix a hibernation test mode regression and clean up the intel_idle driver. Specifics: - Make test_resume work again after the changes that made hibernation open the snapshot device in exclusive mode (Chen Yu) - Clean up code in several places in intel_idle (Artem Bityutskiy)" * tag 'pm-6.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: intel_idle: mark few variables as __read_mostly intel_idle: do not sprinkle module parameter definitions around intel_idle: fix confusing message intel_idle: improve C-state flags handling robustness intel_idle: further intel_idle_init_cstates_icpu() cleanup intel_idle: clean up intel_idle_init_cstates_icpu() intel_idle: use pr_info() instead of printk() PM: hibernate: Do not get block device exclusively in test_resume mode PM: hibernate: Turn snapshot_test into global variable
2023-05-03tracing: Fix permissions for the buffer_percent fileOndrej Mosnacek1-1/+1
This file defines both read and write operations, yet it is being created as read-only. This means that it can't be written to without the CAP_DAC_OVERRIDE capability. Fix the permissions to allow root to write to it without the need to override DAC perms. Link: https://lore.kernel.org/linux-trace-kernel/20230503140114.3280002-1-omosnace@redhat.com Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 03329f993978 ("tracing: Add tracefs file buffer_percentage") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-02module: include internal.h in module/dups.cArnd Bergmann1-0/+2
Two newly introduced functions are declared in a header that is not included before the definition, causing a warning with sparse or 'make W=1': kernel/module/dups.c:118:6: error: no previous prototype for 'kmod_dup_request_exists_wait' [-Werror=missing-prototypes] 118 | bool kmod_dup_request_exists_wait(char *module_name, bool wait, int *dup_ret) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/module/dups.c:220:6: error: no previous prototype for 'kmod_dup_request_announce' [-Werror=missing-prototypes] 220 | void kmod_dup_request_announce(char *module_name, int ret) | ^~~~~~~~~~~~~~~~~~~~~~~~~ Add an explicit include to ensure the prototypes match. Fixes: 8660484ed1cf ("module: add debugging auto-load duplicate module support") Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304141440.DYO4NAzp-lkp@intel.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-02kernel: pid_namespace: simplify sysctls with register_sysctl()Luis Chamberlain2-4/+2
register_sysctl_paths() is only required if your child (directories) have entries and pid_namespace does not. So use register_sysctl_init() instead where we don't care about the return value and use register_sysctl() where we do. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Jeff Xu <jeffxu@google.com> Link: https://lore.kernel.org/r/20230302202826.776286-9-mcgrof@kernel.org
2023-05-02relayfs: fix out-of-bounds access in relay_file_readZhang Zhengming1-1/+2
There is a crash in relay_file_read, as the var from point to the end of last subbuf. The oops looks something like: pc : __arch_copy_to_user+0x180/0x310 lr : relay_file_read+0x20c/0x2c8 Call trace: __arch_copy_to_user+0x180/0x310 full_proxy_read+0x68/0x98 vfs_read+0xb0/0x1d0 ksys_read+0x6c/0xf0 __arm64_sys_read+0x20/0x28 el0_svc_common.constprop.3+0x84/0x108 do_el0_svc+0x74/0x90 el0_svc+0x1c/0x28 el0_sync_handler+0x88/0xb0 el0_sync+0x148/0x180 We get the condition by analyzing the vmcore: 1). The last produced byte and last consumed byte both at the end of the last subbuf 2). A softirq calls function(e.g __blk_add_trace) to write relay buffer occurs when an program is calling relay_file_read_avail(). relay_file_read relay_file_read_avail relay_file_read_consume(buf, 0, 0); //interrupted by softirq who will write subbuf .... return 1; //read_start point to the end of the last subbuf read_start = relay_file_read_start_pos //avail is equal to subsize avail = relay_file_read_subbuf_avail //from points to an invalid memory address from = buf->start + read_start //system is crashed copy_to_user(buffer, from, avail) Link: https://lkml.kernel.org/r/20230419040203.37676-1-zhang.zhengming@h3c.com Fixes: 8d62fdebdaf9 ("relay file read: start-pos fix") Signed-off-by: Zhang Zhengming <zhang.zhengming@h3c.com> Reviewed-by: Zhao Lei <zhao_lei1@hoperun.com> Reviewed-by: Zhou Kete <zhou.kete@h3c.com> Reviewed-by: Pengcheng Yang <yangpc@wangsu.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0David Hildenbrand1-9/+3
Patch series "mm/ksm: improve PR_SET_MEMORY_MERGE=0 handling and cleanup disabling KSM", v2. (1) Make PR_SET_MEMORY_MERGE=0 unmerge pages like setting MADV_UNMERGEABLE does, (2) add a selftest for it and (3) factor out disabling of KSM from s390/gmap code. This patch (of 3): Let's unmerge any KSM pages when setting PR_SET_MEMORY_MERGE=0, and clear the VM_MERGEABLE flag from all VMAs -- just like KSM would. Of course, only do that if we previously set PR_SET_MEMORY_MERGE=1. Link: https://lkml.kernel.org/r/20230422205420.30372-1-david@redhat.com Link: https://lkml.kernel.org/r/20230422205420.30372-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Stefan Roesch <shr@devkernel.io> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-30Merge tag 'iommu-updates-v6.4' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Convert to platform remove callback returning void - Extend changing default domain to normal group - Intel VT-d updates: - Remove VT-d virtual command interface and IOASID - Allow the VT-d driver to support non-PRI IOPF - Remove PASID supervisor request support - Various small and misc cleanups - ARM SMMU updates: - Device-tree binding updates: * Allow Qualcomm GPU SMMUs to accept relevant clock properties * Document Qualcomm 8550 SoC as implementing an MMU-500 * Favour new "qcom,smmu-500" binding for Adreno SMMUs - Fix S2CR quirk detection on non-architectural Qualcomm SMMU implementations - Acknowledge SMMUv3 PRI queue overflow when consuming events - Document (in a comment) why ATS is disabled for bypass streams - AMD IOMMU updates: - 5-level page-table support - NUMA awareness for memory allocations - Unisoc driver: Support for reattaching an existing domain - Rockchip driver: Add missing set_platform_dma_ops callback - Mediatek driver: Adjust the dma-ranges - Various other small fixes and cleanups * tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (82 commits) iommu: Remove iommu_group_get_by_id() iommu: Make iommu_release_device() static iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope() iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn) iommu/vt-d: Remove BUG_ON in map/unmap() iommu/vt-d: Remove BUG_ON when domain->pgd is NULL iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation iommu/vt-d: Remove BUG_ON on checking valid pfn range iommu/vt-d: Make size of operands same in bitwise operations iommu/vt-d: Remove PASID supervisor request support iommu/vt-d: Use non-privileged mode for all PASIDs iommu/vt-d: Remove extern from function prototypes iommu/vt-d: Do not use GFP_ATOMIC when not needed iommu/vt-d: Remove unnecessary checks in iopf disabling path iommu/vt-d: Move PRI handling to IOPF feature path iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path iommu/vt-d: Move iopf code from SVA to IOPF enabling path iommu/vt-d: Allow SVA with device-specific IOPF dmaengine: idxd: Add enable/disable device IOPF feature arm64: dts: mt8186: Add dma-ranges for the parent "soc" node ...
2023-04-30Merge tag 's390-6.4-1' of ↵Linus Torvalds1-4/+13
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add support for stackleak feature. Also allow specifying architecture-specific stackleak poison function to enable faster implementation. On s390, the mvc-based implementation helps decrease typical overhead from a factor of 3 to just 25% - Convert all assembler files to use SYM* style macros, deprecating the ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS - Improve KASLR to also randomize module and special amode31 code base load addresses - Rework decompressor memory tracking to support memory holes and improve error handling - Add support for protected virtualization AP binding - Add support for set_direct_map() calls - Implement set_memory_rox() and noexec module_alloc() - Remove obsolete overriding of mem*() functions for KASAN - Rework kexec/kdump to avoid using nodat_stack to call purgatory - Convert the rest of the s390 code to use flexible-array member instead of a zero-length array - Clean up uaccess inline asm - Enable ARCH_HAS_MEMBARRIER_SYNC_CORE - Convert to using CONFIG_FUNCTION_ALIGNMENT and enable DEBUG_FORCE_FUNCTION_ALIGN_64B - Resolve last_break in userspace fault reports - Simplify one-level sysctl registration - Clean up branch prediction handling - Rework CPU counter facility to retrieve available counter sets just once - Other various small fixes and improvements all over the code * tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (118 commits) s390/stackleak: provide fast __stackleak_poison() implementation stackleak: allow to specify arch specific stackleak poison function s390: select ARCH_USE_SYM_ANNOTATIONS s390/mm: use VM_FLUSH_RESET_PERMS in module_alloc() s390: wire up memfd_secret system call s390/mm: enable ARCH_HAS_SET_DIRECT_MAP s390/mm: use BIT macro to generate SET_MEMORY bit masks s390/relocate_kernel: adjust indentation s390/relocate_kernel: use SYM* macros instead of ENTRY(), etc. s390/entry: use SYM* macros instead of ENTRY(), etc. s390/purgatory: use SYM* macros instead of ENTRY(), etc. s390/kprobes: use SYM* macros instead of ENTRY(), etc. s390/reipl: use SYM* macros instead of ENTRY(), etc. s390/head64: use SYM* macros instead of ENTRY(), etc. s390/earlypgm: use SYM* macros instead of ENTRY(), etc. s390/mcount: use SYM* macros instead of ENTRY(), etc. s390/crc32le: use SYM* macros instead of ENTRY(), etc. s390/crc32be: use SYM* macros instead of ENTRY(), etc. s390/crypto,chacha: use SYM* macros instead of ENTRY(), etc. s390/amode31: use SYM* macros instead of ENTRY(), etc. ...
2023-04-29Merge tag 'dma-mapping-6.4-2023-04-28' of ↵Linus Torvalds5-81/+175
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - fix a PageHighMem check in dma-coherent initialization (Doug Berger) - clean up the coherency defaul initialiation (Jiaxun Yang) - add cacheline to user/kernel dma-debug space dump messages (Desnes Nunes, Geert Uytterhoeve) - swiotlb statistics improvements (Michael Kelley) - misc cleanups (Petr Tesarik) * tag 'dma-mapping-6.4-2023-04-28' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: Omit total_used and used_hiwater if !CONFIG_DEBUG_FS swiotlb: track and report io_tlb_used high water marks in debugfs swiotlb: fix debugfs reporting of reserved memory pools swiotlb: relocate PageHighMem test away from rmem_swiotlb_setup of: address: always use dma_default_coherent for default coherency dma-mapping: provide CONFIG_ARCH_DMA_DEFAULT_COHERENT dma-mapping: provide a fallback dma_default_coherent dma-debug: Use %pa to format phys_addr_t dma-debug: add cacheline to user/kernel space dump messages dma-debug: small dma_debug_entry's comment and variable name updates dma-direct: cleanup parameters to dma_direct_optimal_gfp_mask
2023-04-29Merge tag 'timers-core-2023-04-28' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more timer updates from Thomas Gleixner: "Timekeeping and clocksource/event driver updates the second batch: - A trivial documentation fix in the timekeeping core - A really boring set of small fixes, enhancements and cleanups in the drivers code. No new clocksource/clockevent drivers for a change" * tag 'timers-core-2023-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Fix references to nonexistent ktime_get_fast_ns() dt-bindings: timer: rockchip: Add rk3588 compatible dt-bindings: timer: rockchip: Drop superfluous rk3288 compatible clocksource/drivers/ti: Use of_property_read_bool() for boolean properties clocksource/drivers/timer-ti-dm: Fix finding alwon timer clocksource/drivers/davinci: Fix memory leak in davinci_timer_register when init fails clocksource/drivers/stm32-lp: Drop of_match_ptr for ID table clocksource/drivers/timer-ti-dm: Convert to platform remove callback returning void clocksource/drivers/timer-tegra186: Convert to platform remove callback returning void clocksource/drivers/timer-ti-dm: Improve error message in .remove clocksource/drivers/timer-stm32-lp: Mark driver as non-removable clocksource/drivers/sh_mtu2: Mark driver as non-removable clocksource/drivers/timer-ti-dm: Use of_address_to_resource() clocksource/drivers/timer-imx-gpt: Remove non-DT function clocksource/drivers/timer-mediatek: Split out CPUXGPT timers clocksource/drivers/exynos_mct: Explicitly return 0 for shared timer
2023-04-29Merge tag 'cgroup-for-6.4' of ↵Linus Torvalds6-72/+80
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - cpuset changes including the fix for an incorrect interaction with CPU hotplug and an optimization - Other doc and cosmetic changes * tag 'cgroup-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: docs: cgroup-v1/cpusets: update libcgroup project link cgroup/cpuset: Minor updates to test_cpuset_prs.sh cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset cpuset: Clean up cpuset_node_allowed cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers
2023-04-29Merge tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2-14/+122
Pull workqueue updates from Tejun Heo: "Mostly changes from Petr to improve warning and error reporting. Workqueue now reports more of the relevant failures with better context which should help debugging" * tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Introduce show_freezable_workqueues workqueue: Print backtraces from CPUs with hung CPU bound workqueues workqueue: Warn when a rescuer could not be created workqueue: Interrupted create_worker() is not a repeated event workqueue: Warn when a new worker could not be created workqueue: Fix hung time report of worker pools workqueue: Simplify a pr_warn() call in wq_select_unbound_cpu() MAINTAINERS: Add workqueue_internal.h to the WORKQUEUE entry
2023-04-29locking/rwbase: Mitigate indefinite writer starvationSebastian Andrzej Siewior1-9/+0
On PREEMPT_RT, rw_semaphore and rwlock_t locks are unfair to writers. Readers can indefinitely acquire the lock unless the writer fully acquired the lock, which might never happen if there is always a reader in the critical section owning the lock. Mel Gorman reported that since LTP-20220121 the dio_truncate test case went from having 1 reader to having 16 readers and that number of readers is sufficient to prevent the down_write ever succeeding while readers exist. Eventually the test is killed after 30 minutes as a failure. Mel proposed a timeout to limit how long a writer can be blocked until the reader is forced into the slowpath. Thomas argued that there is no added value by providing this timeout. From a PREEMPT_RT point of view, there are no critical rw_semaphore or rwlock_t locks left where the reader must be preferred. Mitigate indefinite writer starvation by forcing the READER into the slowpath once the WRITER attempts to acquire the lock. Reported-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/877cwbq4cq.ffs@tglx Link: https://lore.kernel.org/r/20230321161140.HMcQEhHb@linutronix.de Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-28Merge tag 'trace-tools-v6.4' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing tools updates from Steven Rostedt: - Add auto-analysis only option to rtla/timerlat Add an --aa-only option to the tooling to perform only the auto analysis and not to parse and format the data. - Other minor fixes and clean ups * tag 'trace-tools-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rtla/timerlat: Fix "Previous IRQ" auto analysis' line rtla/timerlat: Add auto-analysis only option rv: Remove redundant assignment to variable retval rv: Fix addition on an uninitialized variable 'run' rtla: Add .gitignore file
2023-04-28Merge tag 'trace-v6.4' of ↵Linus Torvalds12-260/+1172
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - User events are finally ready! After lots of collaboration between various parties, we finally locked down on a stable interface for user events that can also work with user space only tracing. This is implemented by telling the kernel (or user space library, but that part is user space only and not part of this patch set), where the variable is that the application uses to know if something is listening to the trace. There's also an interface to tell the kernel about these events, which will show up in the /sys/kernel/tracing/events/user_events/ directory, where it can be enabled. When it's enabled, the kernel will update the variable, to tell the application to start writing to the kernel. See https://lwn.net/Articles/927595/ - Cleaned up the direct trampolines code to simplify arm64 addition of direct trampolines. Direct trampolines use the ftrace interface but instead of jumping to the ftrace trampoline, applications (mostly BPF) can register their own trampoline for performance reasons. - Some updates to the fprobe infrastructure. fprobes are more efficient than kprobes, as it does not need to save all the registers that kprobes on ftrace do. More work needs to be done before the fprobes will be exposed as dynamic events. - More updates to references to the obsolete path of /sys/kernel/debug/tracing for the new /sys/kernel/tracing path. - Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer line by line instead of all at once. There are users in production kernels that have a large data dump that originally used printk() directly, but the data dump was larger than what printk() allowed as a single print. Using seq_buf() to do the printing fixes that. - Add /sys/kernel/tracing/touched_functions that shows all functions that was every traced by ftrace or a direct trampoline. This is used for debugging issues where a traced function could have caused a crash by a bpf program or live patching. - Add a "fields" option that is similar to "raw" but outputs the fields of the events. It's easier to read by humans. - Some minor fixes and clean ups. * tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (41 commits) ring-buffer: Sync IRQ works before buffer destruction tracing: Add missing spaces in trace_print_hex_seq() ring-buffer: Ensure proper resetting of atomic variables in ring_buffer_reset_online_cpus recordmcount: Fix memory leaks in the uwrite function tracing/user_events: Limit max fault-in attempts tracing/user_events: Prevent same address and bit per process tracing/user_events: Ensure bit is cleared on unregister tracing/user_events: Ensure write index cannot be negative seq_buf: Add seq_buf_do_printk() helper tracing: Fix print_fields() for __dyn_loc/__rel_loc tracing/user_events: Set event filter_type from type ring-buffer: Clearly check null ptr returned by rb_set_head_page() tracing: Unbreak user events tracing/user_events: Use print_format_fields() for trace output tracing/user_events: Align structs with tabs for readability tracing/user_events: Limit global user_event count tracing/user_events: Charge event allocs to cgroups tracing/user_events: Update documentation for ABI tracing/user_events: Use write ABI in example tracing/user_events: Add ABI self-test ...
2023-04-28Merge tag 'smp-core-2023-04-27' of ↵Linus Torvalds4-244/+105
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP cross-CPU function-call updates from Ingo Molnar: - Remove diagnostics and adjust config for CSD lock diagnostics - Add a generic IPI-sending tracepoint, as currently there's no easy way to instrument IPI origins: it's arch dependent and for some major architectures it's not even consistently available. * tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: trace,smp: Trace all smp_function_call*() invocations trace: Add trace_ipi_send_cpu() sched, smp: Trace smp callback causing an IPI smp: reword smp call IPI comment treewide: Trace IPIs sent via smp_send_reschedule() irq_work: Trace self-IPIs sent via arch_irq_work_raise() smp: Trace IPIs sent via arch_send_call_function_ipi_mask() sched, smp: Trace IPIs sent via send_call_function_single_ipi() trace: Add trace_ipi_send_cpumask() kernel/smp: Make csdlock_debug= resettable locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging locking/csd_lock: Remove added data from CSD lock debugging locking/csd_lock: Add Kconfig option for csd_debug default
2023-04-28Merge tag 'sched-core-2023-04-27' of ↵Linus Torvalds12-316/+1266
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Allow unprivileged PSI poll()ing - Fix performance regression introduced by mm_cid - Improve livepatch stalls by adding livepatch task switching to cond_resched(). This resolves livepatching busy-loop stalls with certain CPU-bound kthreads - Improve sched_move_task() performance on autogroup configs - On core-scheduling CPUs, avoid selecting throttled tasks to run - Misc cleanups, fixes and improvements * tag 'sched-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock: Fix local_clock() before sched_clock_init() sched/rt: Fix bad task migration for rt tasks sched: Fix performance regression introduced by mm_cid sched/core: Make sched_dynamic_mutex static sched/psi: Allow unprivileged polling of N*2s period sched/psi: Extract update_triggers side effect sched/psi: Rename existing poll members in preparation sched/psi: Rearrange polling code in preparation sched/fair: Fix inaccurate tally of ttwu_move_affine vhost: Fix livepatch timeouts in vhost_worker() livepatch,sched: Add livepatch task switching to cond_resched() livepatch: Skip task_call_func() for current task livepatch: Convert stack entries array to percpu sched: Interleave cfs bandwidth timers for improved single thread performance at low utilization sched/core: Reduce cost of sched_move_task when config autogroup sched/core: Avoid selecting the task that is throttled to run when core-sched enable sched/topology: Make sched_energy_mutex,update static
2023-04-28Merge tag 'perf-core-2023-04-27' of ↵Linus Torvalds1-2/