diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-08-02 19:12:45 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-08-02 19:12:45 -0700 |
| commit | 7d9d077c783e33995c80d8b28fea1a98161934f4 (patch) | |
| tree | 5404e84d237b764a07aaf299d175c1dcfbad9966 | |
| parent | c2a24a7a036b3bd3a2e6c66730dfc777cae6540a (diff) | |
| parent | 34bc7b454dc31f75a0be7ee8ab378135523d7c51 (diff) | |
| download | linux-7d9d077c783e33995c80d8b28fea1a98161934f4.tar.gz linux-7d9d077c783e33995c80d8b28fea1a98161934f4.tar.bz2 linux-7d9d077c783e33995c80d8b28fea1a98161934f4.zip | |
Merge tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates
- Miscellaneous fixes
- Callback-offload updates, perhaps most notably a new
RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be
offloaded at boot time, regardless of kernel boot parameters.
This is useful to battery-powered systems such as ChromeOS and
Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot
parameter prevents offloaded callbacks from interfering with
real-time workloads and with energy-efficiency mechanisms
- Polled grace-period updates, perhaps most notably making these APIs
account for both normal and expedited grace periods
- Tasks RCU updates, perhaps most notably reducing the CPU overhead of
RCU tasks trace grace periods by more than a factor of two on a
system with 15,000 tasks.
The reduction is expected to increase with the number of tasks, so it
seems reasonable to hypothesize that a system with 150,000 tasks
might see a 20-fold reduction in CPU overhead
- Torture-test updates
- Updates that merge RCU's dyntick-idle tracking into context tracking,
thus reducing the overhead of transitioning to kernel mode from
either idle or nohz_full userspace execution for kernels that track
context independently of RCU.
This is expected to be helpful primarily for kernels built with
CONFIG_NO_HZ_FULL=y
* tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (98 commits)
rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings
rcu: Diagnose extended sync_rcu_do_polled_gp() loops
rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings
rcutorture: Test polled expedited grace-period primitives
rcu: Add polled expedited grace-period primitives
rcutorture: Verify that polled GP API sees synchronous grace periods
rcu: Make Tiny RCU grace periods visible to polled APIs
rcu: Make polled grace-period API account for expedited grace periods
rcu: Switch polled grace-period APIs to ->gp_seq_polled
rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty
rcu/nocb: Add option to opt rcuo kthreads out of RT priority
rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
rcu/nocb: Add an option to offload all CPUs on boot
rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call
rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order
rcu/nocb: Add/del rdp to iterate from rcuog itself
rcu/tree: Add comment to describe GP-done condition in fqs loop
rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
rcu/kvfree: Remove useless monitor_todo flag
rcu: Cleanup RCU urgency state for offline CPU
...
76 files changed, 2124 insertions, 1279 deletions
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst index 04ed8bf27a0e..a0f8164c8513 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.rst +++ b/Documentation/RCU/Design/Requirements/Requirements.rst @@ -1844,10 +1844,10 @@ that meets this requirement. Furthermore, NMI handlers can be interrupted by what appear to RCU to be normal interrupts. One way that this can happen is for code that -directly invokes rcu_irq_enter() and rcu_irq_exit() to be called +directly invokes ct_irq_enter() and ct_irq_exit() to be called from an NMI handler. This astonishing fact of life prompted the current -code structure, which has rcu_irq_enter() invoking -rcu_nmi_enter() and rcu_irq_exit() invoking rcu_nmi_exit(). +code structure, which has ct_irq_enter() invoking +ct_nmi_enter() and ct_irq_exit() invoking ct_nmi_exit(). And yes, I also learned of this requirement the hard way. Loadable Modules @@ -2195,7 +2195,7 @@ scheduling-clock interrupt be enabled when RCU needs it to be: sections, and RCU believes this CPU to be idle, no problem. This sort of thing is used by some architectures for light-weight exception handlers, which can then avoid the overhead of - rcu_irq_enter() and rcu_irq_exit() at exception entry and + ct_irq_enter() and ct_irq_exit() at exception entry and exit, respectively. Some go further and avoid the entireties of irq_enter() and irq_exit(). Just make very sure you are running some of your tests with @@ -2226,7 +2226,7 @@ scheduling-clock interrupt be enabled when RCU needs it to be: +-----------------------------------------------------------------------+ | **Answer**: | +-----------------------------------------------------------------------+ -| One approach is to do ``rcu_irq_exit();rcu_irq_enter();`` every so | +| One approach is to do ``ct_irq_exit();ct_irq_enter();`` every so | | often. But given that long-running interrupt handlers can cause other | | problems, not least for response time, shouldn't you work to keep | | your interrupt handler's runtime within reasonable bounds? | diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst index 794837eb519b..e38c587067fc 100644 --- a/Documentation/RCU/stallwarn.rst +++ b/Documentation/RCU/stallwarn.rst @@ -97,12 +97,12 @@ warnings: which will include additional debugging information. - A low-level kernel issue that either fails to invoke one of the - variants of rcu_user_enter(), rcu_user_exit(), rcu_idle_enter(), - rcu_idle_exit(), rcu_irq_enter(), or rcu_irq_exit() on the one + variants of rcu_eqs_enter(true), rcu_eqs_exit(true), ct_idle_enter(), + ct_idle_exit(), ct_irq_enter(), or ct_irq_exit() on the one hand, or that invokes one of them too many times on the other. Historically, the most frequent issue has been an omission of either irq_enter() or irq_exit(), which in turn invoke - rcu_irq_enter() or rcu_irq_exit(), respectively. Building your + ct_irq_enter() or ct_irq_exit(), respectively. Building your kernel with CONFIG_RCU_EQS_DEBUG=y can help track down these types of issues, which sometimes arise in architecture-specific code. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 79e724175a49..df9a5b60ee46 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3667,6 +3667,9 @@ just as if they had also been called out in the rcu_nocbs= boot parameter. + Note that this argument takes precedence over + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option. + noiotrap [SH] Disables trapped I/O port accesses. noirqdebug [X86-32] Disables the code which attempts to detect and @@ -4560,6 +4563,9 @@ no-callback mode from boot but the mode may be toggled at runtime via cpusets. + Note that this argument takes precedence over + the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option. + rcu_nocb_poll [KNL] Rather than requiring that offloaded CPUs (specified by rcu_nocbs= above) explicitly @@ -4669,6 +4675,34 @@ When RCU_NOCB_CPU is set, also adjust the priority of NOCB callback kthreads. + rcutree.rcu_divisor= [KNL] + Set the shift-right count to use to compute + the callback-invocation batch limit bl from + the number of callbacks queued on this CPU. + The result will be bounded below by the value of + the rcutree.blimit kernel parameter. Every bl + callbacks, the softirq handler will exit in + order to allow the CPU to do other work. + + Please note that this callback-invocation batch + limit applies only to non-offloaded callback + invocation. Offloaded callbacks are instead + invoked in the context of an rcuoc kthread, which + scheduler will preempt as it does any other task. + + rcutree.nocb_nobypass_lim_per_jiffy= [KNL] + On callback-offloaded (rcu_nocbs) CPUs, + RCU reduces the lock contention that would + otherwise be caused by callback floods through + use of the ->nocb_bypass list. However, in the + common non-flooded case, RCU queues directly to + the main ->cblist in order to avoid the extra + overhead of the ->nocb_bypass list and its lock. + But if there are too many callbacks queued during + a single jiffy, RCU pre-queues the callbacks into + the ->nocb_bypass queue. The definition of "too + many" is supplied by this kernel boot parameter. + rcutree.rcu_nocb_gp_stride= [KNL] Set the number of NOCB callback kthreads in each group, which defaults to the square root diff --git a/Documentation/features/time/context-tracking/arch-support.txt b/Documentation/features/time/context-tracking/arch-support.txt index c9e0a16290e6..e59071a49090 100644 --- a/Documentation/features/time/context-tracking/arch-support.txt +++ b/Documentation/features/time/context-tracking/arch-support.txt @@ -1,7 +1,7 @@ # -# Feature name: context-tracking -# Kconfig: HAVE_CONTEXT_TRACKING -# description: arch supports context tracking for NO_HZ_FULL +# Feature name: user-context-tracking +# Kconfig: HAVE_CONTEXT_TRACKING_USER +# description: arch supports user context tracking for NO_HZ_FULL # ----------------------- | arch |status| diff --git a/MAINTAINERS b/MAINTAINERS index 7aa3658dc007..dfe5052aac91 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5165,6 +5165,7 @@ F: include/linux/console* CONTEXT TRACKING M: Frederic Weisbecker <frederic@kernel.org> +M: "Paul E. McKenney" <paulmck@kernel.org> S: Maintained F: kernel/context_tracking.c F: include/linux/context_tracking* diff --git a/arch/Kconfig b/arch/Kconfig index 5ea3e3838c21..761d169ba7fd 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -784,7 +784,7 @@ config HAVE_ARCH_WITHIN_STACK_FRAMES and similar) by implementing an inline arch_within_stack_frames(), which is used by CONFIG_HARDENED_USERCOPY. -config HAVE_CONTEXT_TRACKING +config HAVE_CONTEXT_TRACKING_USER bool help Provide kernel/user boundaries probes necessary for subsystems @@ -792,10 +792,10 @@ config HAVE_CONTEXT_TRACKING Syscalls need to be wrapped inside user_exit()-user_enter(), either optimized behind static key or through the slow path using TIF_NOHZ flag. Exceptions handlers must be wrapped as well. Irqs are already - protected inside rcu_irq_enter/rcu_irq_exit() but preemption or signal + protected inside ct_irq_enter/ct_irq_exit() but preemption or signal handling on irq exit still need to be protected. -config HAVE_CONTEXT_TRACKING_OFFSTACK +config HAVE_CONTEXT_TRACKING_USER_OFFSTACK bool help Architecture neither relies on exception_enter()/exception_exit() @@ -807,7 +807,7 @@ config HAVE_CONTEXT_TRACKING_OFFSTACK - Critical entry code isn't preemptible (or better yet: not interruptible). - - No use of RCU read side critical sections, unless rcu_nmi_enter() + - No use of RCU read side critical sections, unless ct_nmi_enter() got called. - No use of instrumentation, unless instrumentation_begin() got called. diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 405596676473..4294c0123857 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -84,7 +84,7 @@ config ARM select HAVE_ARCH_TRANSPARENT_HUGEPAGE if ARM_LPAE select HAVE_ARM_SMCCC if CPU_V7 select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32 - select HAVE_CONTEXT_TRACKING + select HAVE_CONTEXT_TRACKING_USER select HAVE_C_RECORDMCOUNT select HAVE_BUILDTIME_MCOUNT_SORT select HAVE_DEBUG_KMEMLEAK if !XIP_KERNEL diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index 6a447ac67d80..405a607b754f 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -28,7 +28,7 @@ #include "entry-header.S" saved_psr .req r8 -#if defined(CONFIG_TRACE_IRQFLAGS) || defined(CONFIG_CONTEXT_TRACKING) +#if defined(CONFIG_TRACE_IRQFLAGS) || defined(CONFIG_CONTEXT_TRACKING_USER) saved_pc .req r9 #define TRACE(x...) x #else @@ -38,7 +38,7 @@ saved_pc .req lr .section .entry.text,"ax",%progbits .align 5 -#if !(IS_ENABLED(CONFIG_TRACE_IRQFLAGS) || IS_ENABLED(CONFIG_CONTEXT_TRACKING) || \ +#if !(IS_ENABLED(CONFIG_TRACE_IRQFLAGS) || IS_ENABLED(CONFIG_CONTEXT_TRACKING_USER) || \ IS_ENABLED(CONFIG_DEBUG_RSEQ)) /* * This is the fast syscall return path. We do as little as possible here, diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S index 5865621bf691..99411fa91350 100644 --- a/arch/arm/kernel/entry-header.S +++ b/arch/arm/kernel/entry-header.S @@ -366,25 +366,25 @@ ALT_UP_B(.L1_\@) * between user and kernel mode. */ .macro ct_user_exit, save = 1 -#ifdef CONFIG_CONTEXT_TRACKING +#ifdef CONFIG_CONTEXT_TRACKING_USER .if \save stmdb sp!, {r0-r3, ip, lr} - bl context_tracking_user_exit + bl user_exit_callable ldmia sp!, {r0-r3, ip, lr} .else - bl context_tracking_user_exit + bl user_exit_callable .endif #endif .endm .macro ct_user_enter, save = 1 -#ifdef CONFIG_CONTEXT_TRACKING +#ifdef CONFIG_CONTEXT_TRACKING_USER .if \save stmdb sp!, {r0-r3, ip, lr} - bl context_tracking_user_enter + bl user_enter_callable ldmia sp!, {r0-r3, ip, lr} .else - bl context_tracking_user_enter + bl user_enter_callable .endif #endif .endm diff --git a/arch/arm/mach-imx/cpuidle-imx6q.c b/arch/arm/mach-imx/cpuidle-imx6q.c index 094337dc1bc7..d086cbae09c3 100644 --- a/arch/arm/mach-imx/cpuidle-imx6q.c +++ b/arch/arm/mach-imx/cpuidle-imx6q.c @@ -3,6 +3,7 @@ * Copyright (C) 2012 Freescale Semiconductor, Inc. */ +#include <linux/context_tracking.h> #include <linux/cpuidle.h> #include <linux/module.h> #include <asm/cpuidle.h> @@ -24,9 +25,9 @@ static int imx6q_enter_wait(struct cpuidle_device *dev, imx6_set_lpm(WAIT_UNCLOCKED); raw_spin_unlock(&cpuidle_lock); - rcu_idle_enter(); + ct_idle_enter(); cpu_do_idle(); - rcu_idle_exit(); + ct_idle_exit(); raw_spin_lock(&cpuidle_lock); if (num_idle_cpus-- == num_online_cpus()) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 95839c8d7229..e05fc9743767 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -176,7 +176,7 @@ config ARM64 select HAVE_C_RECORDMCOUNT select HAVE_CMPXCHG_DOUBLE select HAVE_CMPXCHG_LOCAL - select HAVE_CONTEXT_TRACKING + select HAVE_CONTEXT_TRACKING_USER select HAVE_DEBUG_KMEMLEAK select HAVE_DMA_CONTIGUOUS select HAVE_DYNAMIC_FTRACE diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c index 56cefd33eb8e..c75ca36b4a49 100644 --- a/arch/arm64/kernel/entry-common.c +++ b/arch/arm64/kernel/entry-common.c @@ -41,7 +41,7 @@ static __always_inline void __enter_from_kernel_mode(struct pt_regs *regs) if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) { lockdep_hardirqs_off(CALLER_ADDR0); - rcu_irq_enter(); + ct_irq_enter(); trace_hardirqs_off_finish(); regs->exit_rcu = true; @@ -76,7 +76,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs) if (regs->exit_rcu) { trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); - rcu_irq_exit(); + ct_irq_exit(); lockdep_hardirqs_on(CALLER_ADDR0); return; } @@ -84,7 +84,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs) trace_hardirqs_on(); } else { if (regs->exit_rcu) - rcu_irq_exit(); + ct_irq_exit(); } } @@ -161,7 +161,7 @@ static void noinstr arm64_enter_nmi(struct pt_regs *regs) __nmi_enter(); lockdep_hardirqs_off(CALLER_ADDR0); lockdep_hardirq_enter(); - rcu_nmi_enter(); + ct_nmi_enter(); trace_hardirqs_off_finish(); ftrace_nmi_enter(); @@ -182,7 +182,7 @@ static void noinstr arm64_exit_nmi(struct pt_regs *regs) lockdep_hardirqs_on_prepare(); } - rcu_nmi_exit(); + ct_nmi_exit(); lockdep_hardirq_exit(); if (restore) lockdep_hardirqs_on(CALLER_ADDR0); @@ -199,7 +199,7 @@ static void noinstr arm64_enter_el1_dbg(struct pt_regs *regs) regs->lockdep_hardirqs = lockdep_hardirqs_enabled(); lockdep_hardirqs_off(CALLER_ADDR0); - rcu_nmi_enter(); + ct_nmi_enter(); trace_hardirqs_off_finish(); } @@ -218,7 +218,7 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs) lockdep_hardirqs_on_prepare(); } - rcu_nmi_exit(); + ct_nmi_exit(); if (restore) lockdep_hardirqs_on(CALLER_ADDR0); } diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig index 21d72b078eef..f55ba1745f7b 100644 --- a/arch/csky/Kconfig +++ b/arch/csky/Kconfig @@ -42,7 +42,7 @@ config CSKY select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_MMAP_RND_BITS select HAVE_ARCH_SECCOMP_FILTER - select HAVE_CONTEXT_TRACKING + select HAVE_CONTEXT_TRACKING_USER select HAVE_VIRT_CPU_ACCOUNTING_GEN select HAVE |
