<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/watchdog.c, branch v6.6.132</title>
<subtitle>Clone of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git</subtitle>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/'/>
<entry>
<title>watchdog: fix watchdog may detect false positive of softlockup</title>
<updated>2025-06-27T10:08:49+00:00</updated>
<author>
<name>Luo Gengkun</name>
<email>luogengkun@huaweicloud.com</email>
</author>
<published>2025-04-21T03:50:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=442e80dcf6fe32f482bc890a0852f5192abf67b2'/>
<id>442e80dcf6fe32f482bc890a0852f5192abf67b2</id>
<content type='text'>
commit 7123dbbef88cfd9f09e8a7899b0911834600cfa3 upstream.

When updating `watchdog_thresh`, there is a race condition between writing
the new `watchdog_thresh` value and stopping the old watchdog timer.  If
the old timer triggers during this window, it may falsely detect a
softlockup due to the old interval and the new `watchdog_thresh` value
being used.  The problem can be described as follow:

 # We asuume previous watchdog_thresh is 60, so the watchdog timer is
 # coming every 24s.
echo 10 &gt; /proc/sys/kernel/watchdog_thresh (User space)
|
+------&gt;+ update watchdog_thresh (We are in kernel now)
	|
	|	  # using old interval and new `watchdog_thresh`
	+------&gt;+ watchdog hrtimer (irq context: detect softlockup)
		|
		|
	+-------+
	|
	|
	+ softlockup_stop_all

To fix this problem, introduce a shadow variable for `watchdog_thresh`.
The update to the actual `watchdog_thresh` is delayed until after the old
timer is stopped, preventing false positives.

The following testcase may help to understand this problem.

---------------------------------------------
echo RT_RUNTIME_SHARE &gt; /sys/kernel/debug/sched/features
echo -1 &gt; /proc/sys/kernel/sched_rt_runtime_us
echo 0 &gt; /sys/kernel/debug/sched/fair_server/cpu3/runtime
echo 60 &gt; /proc/sys/kernel/watchdog_thresh
taskset -c 3 chrt -r 99 /bin/bash -c "while true;do true; done" &amp;
echo 10 &gt; /proc/sys/kernel/watchdog_thresh &amp;
---------------------------------------------

The test case above first removes the throttling restrictions for
real-time tasks.  It then sets watchdog_thresh to 60 and executes a
real-time task ,a simple while(1) loop, on cpu3.  Consequently, the final
command gets blocked because the presence of this real-time thread
prevents kworker:3 from being selected by the scheduler.  This eventually
triggers a softlockup detection on cpu3 due to watchdog_timer_fn operating
with inconsistent variable - using both the old interval and the updated
watchdog_thresh simultaneously.

[nysal@linux.ibm.com: fix the SOFTLOCKUP_DETECTOR=n case]
  Link: https://lkml.kernel.org/r/20250502111120.282690-1-nysal@linux.ibm.com
Link: https://lkml.kernel.org/r/20250421035021.3507649-1-luogengkun@huaweicloud.com
Signed-off-by: Luo Gengkun &lt;luogengkun@huaweicloud.com&gt;
Signed-off-by: Nysal Jan K.A. &lt;nysal@linux.ibm.com&gt;
Cc: Doug Anderson &lt;dianders@chromium.org&gt;
Cc: Joel Granados &lt;joel.granados@kernel.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Thomas Gleinxer &lt;tglx@linutronix.de&gt;
Cc: "Nysal Jan K.A." &lt;nysal@linux.ibm.com&gt;
Cc: Venkat Rao Bagalkote &lt;venkat88@linux.ibm.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7123dbbef88cfd9f09e8a7899b0911834600cfa3 upstream.

When updating `watchdog_thresh`, there is a race condition between writing
the new `watchdog_thresh` value and stopping the old watchdog timer.  If
the old timer triggers during this window, it may falsely detect a
softlockup due to the old interval and the new `watchdog_thresh` value
being used.  The problem can be described as follow:

 # We asuume previous watchdog_thresh is 60, so the watchdog timer is
 # coming every 24s.
echo 10 &gt; /proc/sys/kernel/watchdog_thresh (User space)
|
+------&gt;+ update watchdog_thresh (We are in kernel now)
	|
	|	  # using old interval and new `watchdog_thresh`
	+------&gt;+ watchdog hrtimer (irq context: detect softlockup)
		|
		|
	+-------+
	|
	|
	+ softlockup_stop_all

To fix this problem, introduce a shadow variable for `watchdog_thresh`.
The update to the actual `watchdog_thresh` is delayed until after the old
timer is stopped, preventing false positives.

The following testcase may help to understand this problem.

---------------------------------------------
echo RT_RUNTIME_SHARE &gt; /sys/kernel/debug/sched/features
echo -1 &gt; /proc/sys/kernel/sched_rt_runtime_us
echo 0 &gt; /sys/kernel/debug/sched/fair_server/cpu3/runtime
echo 60 &gt; /proc/sys/kernel/watchdog_thresh
taskset -c 3 chrt -r 99 /bin/bash -c "while true;do true; done" &amp;
echo 10 &gt; /proc/sys/kernel/watchdog_thresh &amp;
---------------------------------------------

The test case above first removes the throttling restrictions for
real-time tasks.  It then sets watchdog_thresh to 60 and executes a
real-time task ,a simple while(1) loop, on cpu3.  Consequently, the final
command gets blocked because the presence of this real-time thread
prevents kworker:3 from being selected by the scheduler.  This eventually
triggers a softlockup detection on cpu3 due to watchdog_timer_fn operating
with inconsistent variable - using both the old interval and the updated
watchdog_thresh simultaneously.

[nysal@linux.ibm.com: fix the SOFTLOCKUP_DETECTOR=n case]
  Link: https://lkml.kernel.org/r/20250502111120.282690-1-nysal@linux.ibm.com
Link: https://lkml.kernel.org/r/20250421035021.3507649-1-luogengkun@huaweicloud.com
Signed-off-by: Luo Gengkun &lt;luogengkun@huaweicloud.com&gt;
Signed-off-by: Nysal Jan K.A. &lt;nysal@linux.ibm.com&gt;
Cc: Doug Anderson &lt;dianders@chromium.org&gt;
Cc: Joel Granados &lt;joel.granados@kernel.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Thomas Gleinxer &lt;tglx@linutronix.de&gt;
Cc: "Nysal Jan K.A." &lt;nysal@linux.ibm.com&gt;
Cc: Venkat Rao Bagalkote &lt;venkat88@linux.ibm.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog: move softlockup_panic back to early_param</title>
<updated>2023-11-28T17:19:57+00:00</updated>
<author>
<name>Krister Johansen</name>
<email>kjlx@templeofstupid.com</email>
</author>
<published>2023-10-27T21:46:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=dbfbac0f94a113611a8b2016eb126ceeabaa60c5'/>
<id>dbfbac0f94a113611a8b2016eb126ceeabaa60c5</id>
<content type='text'>
commit 8b793bcda61f6c3ed4f5b2ded7530ef6749580cb upstream.

Setting softlockup_panic from do_sysctl_args() causes it to take effect
later in boot.  The lockup detector is enabled before SMP is brought
online, but do_sysctl_args runs afterwards.  If a user wants to set
softlockup_panic on boot and have it trigger should a softlockup occur
during onlining of the non-boot processors, they could do this prior to
commit f117955a2255 ("kernel/watchdog.c: convert {soft/hard}lockup boot
parameters to sysctl aliases").  However, after this commit the value
of softlockup_panic is set too late to be of help for this type of
problem.  Restore the prior behavior.

Signed-off-by: Krister Johansen &lt;kjlx@templeofstupid.com&gt;
Cc: stable@vger.kernel.org
Fixes: f117955a2255 ("kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases")
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 8b793bcda61f6c3ed4f5b2ded7530ef6749580cb upstream.

Setting softlockup_panic from do_sysctl_args() causes it to take effect
later in boot.  The lockup detector is enabled before SMP is brought
online, but do_sysctl_args runs afterwards.  If a user wants to set
softlockup_panic on boot and have it trigger should a softlockup occur
during onlining of the non-boot processors, they could do this prior to
commit f117955a2255 ("kernel/watchdog.c: convert {soft/hard}lockup boot
parameters to sysctl aliases").  However, after this commit the value
of softlockup_panic is set too late to be of help for this type of
problem.  Restore the prior behavior.

Signed-off-by: Krister Johansen &lt;kjlx@templeofstupid.com&gt;
Cc: stable@vger.kernel.org
Fixes: f117955a2255 ("kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases")
Signed-off-by: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog/hardlockup: avoid large stack frames in watchdog_hardlockup_check()</title>
<updated>2023-08-18T17:19:00+00:00</updated>
<author>
<name>Douglas Anderson</name>
<email>dianders@chromium.org</email>
</author>
<published>2023-08-04T14:00:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=1f38c86bb29f4548b8df01b47a313518e6ed2dfe'/>
<id>1f38c86bb29f4548b8df01b47a313518e6ed2dfe</id>
<content type='text'>
After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") we started storing a `struct cpumask` on the
stack in watchdog_hardlockup_check().  On systems with CONFIG_NR_CPUS set
to 8192 this takes up 1K on the stack.  That triggers warnings with
`CONFIG_FRAME_WARN` set to 1024.

We'll use the new trigger_allbutcpu_cpu_backtrace() to avoid needing to
use a CPU mask at all.

Link: https://lkml.kernel.org/r/20230804065935.v4.2.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid
Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()")
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/r/202307310955.pLZDhpnl-lkp@intel.com
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Lecopzer Chen &lt;lecopzer.chen@mediatek.com&gt;
Cc: Pingfan Liu &lt;kernelfans@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") we started storing a `struct cpumask` on the
stack in watchdog_hardlockup_check().  On systems with CONFIG_NR_CPUS set
to 8192 this takes up 1K on the stack.  That triggers warnings with
`CONFIG_FRAME_WARN` set to 1024.

We'll use the new trigger_allbutcpu_cpu_backtrace() to avoid needing to
use a CPU mask at all.

Link: https://lkml.kernel.org/r/20230804065935.v4.2.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid
Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()")
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/r/202307310955.pLZDhpnl-lkp@intel.com
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Lecopzer Chen &lt;lecopzer.chen@mediatek.com&gt;
Cc: Pingfan Liu &lt;kernelfans@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nmi_backtrace: allow excluding an arbitrary CPU</title>
<updated>2023-08-18T17:19:00+00:00</updated>
<author>
<name>Douglas Anderson</name>
<email>dianders@chromium.org</email>
</author>
<published>2023-08-04T14:00:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=8d539b84f1e3478436f978ceaf55a0b6cab497b5'/>
<id>8d539b84f1e3478436f978ceaf55a0b6cab497b5</id>
<content type='text'>
The APIs that allow backtracing across CPUs have always had a way to
exclude the current CPU.  This convenience means callers didn't need to
find a place to allocate a CPU mask just to handle the common case.

Let's extend the API to take a CPU ID to exclude instead of just a
boolean.  This isn't any more complex for the API to handle and allows the
hardlockup detector to exclude a different CPU (the one it already did a
trace for) without needing to find space for a CPU mask.

Arguably, this new API also encourages safer behavior.  Specifically if
the caller wants to avoid tracing the current CPU (maybe because they
already traced the current CPU) this makes it more obvious to the caller
that they need to make sure that the current CPU ID can't change.

[akpm@linux-foundation.org: fix trigger_allbutcpu_cpu_backtrace() stub]
Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Lecopzer Chen &lt;lecopzer.chen@mediatek.com&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Pingfan Liu &lt;kernelfans@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The APIs that allow backtracing across CPUs have always had a way to
exclude the current CPU.  This convenience means callers didn't need to
find a place to allocate a CPU mask just to handle the common case.

Let's extend the API to take a CPU ID to exclude instead of just a
boolean.  This isn't any more complex for the API to handle and allows the
hardlockup detector to exclude a different CPU (the one it already did a
trace for) without needing to find space for a CPU mask.

Arguably, this new API also encourages safer behavior.  Specifically if
the caller wants to avoid tracing the current CPU (maybe because they
already traced the current CPU) this makes it more obvious to the caller
that they need to make sure that the current CPU ID can't change.

[akpm@linux-foundation.org: fix trigger_allbutcpu_cpu_backtrace() stub]
Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Lecopzer Chen &lt;lecopzer.chen@mediatek.com&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Pingfan Liu &lt;kernelfans@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64</title>
<updated>2023-06-19T23:25:29+00:00</updated>
<author>
<name>Petr Mladek</name>
<email>pmladek@suse.com</email>
</author>
<published>2023-06-16T15:06:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=47f4cb433923a08d81f1e5c065cb680215109db9'/>
<id>47f4cb433923a08d81f1e5c065cb680215109db9</id>
<content type='text'>
The HAVE_ prefix means that the code could be enabled. Add another
variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix.
It will be set when it should be built. It will make it compatible
with the other hardlockup detectors.

Before, it is far from obvious that the SPARC64 variant is actually used:

$&gt; make ARCH=sparc64 defconfig
$&gt; grep HARDLOCKUP_DETECTOR .config
CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y

After, it is more clear:

$&gt; make ARCH=sparc64 defconfig
$&gt; grep HARDLOCKUP_DETECTOR .config
CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y
CONFIG_HARDLOCKUP_DETECTOR_SPARC64=y

Link: https://lkml.kernel.org/r/20230616150618.6073-6-pmladek@suse.com
Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Reviewed-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The HAVE_ prefix means that the code could be enabled. Add another
variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix.
It will be set when it should be built. It will make it compatible
with the other hardlockup detectors.

Before, it is far from obvious that the SPARC64 variant is actually used:

$&gt; make ARCH=sparc64 defconfig
$&gt; grep HARDLOCKUP_DETECTOR .config
CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y

After, it is more clear:

$&gt; make ARCH=sparc64 defconfig
$&gt; grep HARDLOCKUP_DETECTOR .config
CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y
CONFIG_HARDLOCKUP_DETECTOR_SPARC64=y

Link: https://lkml.kernel.org/r/20230616150618.6073-6-pmladek@suse.com
Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Reviewed-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific</title>
<updated>2023-06-19T23:25:29+00:00</updated>
<author>
<name>Petr Mladek</name>
<email>pmladek@suse.com</email>
</author>
<published>2023-06-16T15:06:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=a5fcc2367e223c45c78a882438c2b8e13fe0f580'/>
<id>a5fcc2367e223c45c78a882438c2b8e13fe0f580</id>
<content type='text'>
There are several hardlockup detector implementations and several Kconfig
values which allow selection and build of the preferred one.

CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb
("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36.
It was a preparation step for introducing the new generic perf hardlockup
detector.

The existing arch-specific variants did not support the to-be-created
generic build configurations, sysctl interface, etc. This distinction
was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog:
Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR")
in v2.6.38.

CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105
("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced
the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used
by three architectures, namely blackfin, mn10300, and sparc.

The support for blackfin and mn10300 architectures has been completely
dropped some time ago. And sparc is the only architecture with the historic
NMI watchdog at the moment.

And the old sparc implementation is really special. It is always built on
sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b
("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added
in v4.10-rc1.

There are only few locations where the sparc64 NMI watchdog interacts
with the generic hardlockup detectors code:

  + implements arch_touch_nmi_watchdog() which is called from the generic
    touch_nmi_watchdog()

  + implements watchdog_hardlockup_enable()/disable() to support
    /proc/sys/kernel/nmi_watchdog

  + is always preferred over other generic watchdogs, see
    CONFIG_HARDLOCKUP_DETECTOR

  + includes asm/nmi.h into linux/nmi.h because some sparc-specific
    functions are needed in sparc-specific code which includes
    only linux/nmi.h.

The situation became more complicated after the commit 05a4a95279311c3
("kernel/watchdog: split up config options") and commit 2104180a53698df5
("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1.
They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc
specific hardlockup detector. It was compatible with the perf one
regarding the general boot, sysctl, and programming interfaces.

HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of
HAVE_NMI_WATCHDOG. It made some sense because all arch-specific
detectors had some common requirements, namely:

  + implemented arch_touch_nmi_watchdog()
  + included asm/nmi.h into linux/nmi.h
  + defined the default value for /proc/sys/kernel/nmi_watchdog

But it actually has made things pretty complicated when the generic
buddy hardlockup detector was added. Before the generic perf detector
was newer supported together with an arch-specific one. But the buddy
detector could work on any SMP system. It means that an architecture
could support both the arch-specific and buddy detector.

As a result, there are few tricky dependencies. For example,
CONFIG_HARDLOCKUP_DETECTOR depends on:

  ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) &amp;&amp; !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH

The problem is that the very special sparc implementation is defined as:

  HAVE_NMI_WATCHDOG &amp;&amp; !HAVE_HARDLOCKUP_DETECTOR_ARCH

Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear
without reading understanding the history.

Make the logic less tricky and more self-explanatory by making
HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to
HAVE_HARDLOCKUP_DETECTOR_SPARC64.

Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF,
and HARDLOCKUP_DETECTOR_BUDDY may conflict only with
HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR
and it is not longer enabled when HAVE_NMI_WATCHDOG is set.

Link: https://lkml.kernel.org/r/20230616150618.6073-5-pmladek@suse.com
Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Reviewed-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are several hardlockup detector implementations and several Kconfig
values which allow selection and build of the preferred one.

CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb
("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36.
It was a preparation step for introducing the new generic perf hardlockup
detector.

The existing arch-specific variants did not support the to-be-created
generic build configurations, sysctl interface, etc. This distinction
was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog:
Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR")
in v2.6.38.

CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105
("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced
the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used
by three architectures, namely blackfin, mn10300, and sparc.

The support for blackfin and mn10300 architectures has been completely
dropped some time ago. And sparc is the only architecture with the historic
NMI watchdog at the moment.

And the old sparc implementation is really special. It is always built on
sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b
("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added
in v4.10-rc1.

There are only few locations where the sparc64 NMI watchdog interacts
with the generic hardlockup detectors code:

  + implements arch_touch_nmi_watchdog() which is called from the generic
    touch_nmi_watchdog()

  + implements watchdog_hardlockup_enable()/disable() to support
    /proc/sys/kernel/nmi_watchdog

  + is always preferred over other generic watchdogs, see
    CONFIG_HARDLOCKUP_DETECTOR

  + includes asm/nmi.h into linux/nmi.h because some sparc-specific
    functions are needed in sparc-specific code which includes
    only linux/nmi.h.

The situation became more complicated after the commit 05a4a95279311c3
("kernel/watchdog: split up config options") and commit 2104180a53698df5
("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1.
They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc
specific hardlockup detector. It was compatible with the perf one
regarding the general boot, sysctl, and programming interfaces.

HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of
HAVE_NMI_WATCHDOG. It made some sense because all arch-specific
detectors had some common requirements, namely:

  + implemented arch_touch_nmi_watchdog()
  + included asm/nmi.h into linux/nmi.h
  + defined the default value for /proc/sys/kernel/nmi_watchdog

But it actually has made things pretty complicated when the generic
buddy hardlockup detector was added. Before the generic perf detector
was newer supported together with an arch-specific one. But the buddy
detector could work on any SMP system. It means that an architecture
could support both the arch-specific and buddy detector.

As a result, there are few tricky dependencies. For example,
CONFIG_HARDLOCKUP_DETECTOR depends on:

  ((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) &amp;&amp; !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH

The problem is that the very special sparc implementation is defined as:

  HAVE_NMI_WATCHDOG &amp;&amp; !HAVE_HARDLOCKUP_DETECTOR_ARCH

Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear
without reading understanding the history.

Make the logic less tricky and more self-explanatory by making
HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to
HAVE_HARDLOCKUP_DETECTOR_SPARC64.

Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF,
and HARDLOCKUP_DETECTOR_BUDDY may conflict only with
HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR
and it is not longer enabled when HAVE_NMI_WATCHDOG is set.

Link: https://lkml.kernel.org/r/20230616150618.6073-5-pmladek@suse.com
Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Reviewed-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog/hardlockup: move SMP barriers from common code to buddy code</title>
<updated>2023-06-19T23:25:28+00:00</updated>
<author>
<name>Douglas Anderson</name>
<email>dianders@chromium.org</email>
</author>
<published>2023-05-27T01:41:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=28168eca3297d68faa8a9433ec93cb6acf06d2f4'/>
<id>28168eca3297d68faa8a9433ec93cb6acf06d2f4</id>
<content type='text'>
It's been suggested that since the SMP barriers are only potentially
useful for the buddy hardlockup detector, not the perf hardlockup
detector, that the barriers belong in the buddy code.  Let's move them and
add clearer comments about why they're needed.

Link: https://lkml.kernel.org/r/20230526184139.9.I5ab0a0eeb0bd52fb23f901d298c72fa5c396e22b@changeid
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Suggested-by: Petr Mladek &lt;pmladek@suse.com&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It's been suggested that since the SMP barriers are only potentially
useful for the buddy hardlockup detector, not the perf hardlockup
detector, that the barriers belong in the buddy code.  Let's move them and
add clearer comments about why they're needed.

Link: https://lkml.kernel.org/r/20230526184139.9.I5ab0a0eeb0bd52fb23f901d298c72fa5c396e22b@changeid
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Suggested-by: Petr Mladek &lt;pmladek@suse.com&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called</title>
<updated>2023-06-19T23:25:27+00:00</updated>
<author>
<name>Douglas Anderson</name>
<email>dianders@chromium.org</email>
</author>
<published>2023-05-27T01:41:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=d3b62ace0f097f1d863fb6c41df3c61503e4ec9e'/>
<id>d3b62ace0f097f1d863fb6c41df3c61503e4ec9e</id>
<content type='text'>
In the patch ("watchdog/hardlockup: detect hard lockups using secondary
(buddy) CPUs"), we added a call from the common watchdog.c file into the
buddy.  That call could be done more cleanly.  Specifically:

1. If we move the call into watchdog_hardlockup_kick() then it keeps
   watchdog_timer_fn() simpler.
2. We don't need to pass an "unsigned long" to the buddy for the timer
   count. In the patch ("watchdog/hardlockup: add a "cpu" param to
   watchdog_hardlockup_check()") the count was changed to "atomic_t"
   which is backed by an int, so we should match types.

Link: https://lkml.kernel.org/r/20230526184139.6.I006c7d958a1ea5c4e1e4dc44a25596d9bb5fd3ba@changeid
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Suggested-by: Petr Mladek &lt;pmladek@suse.com&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the patch ("watchdog/hardlockup: detect hard lockups using secondary
(buddy) CPUs"), we added a call from the common watchdog.c file into the
buddy.  That call could be done more cleanly.  Specifically:

1. If we move the call into watchdog_hardlockup_kick() then it keeps
   watchdog_timer_fn() simpler.
2. We don't need to pass an "unsigned long" to the buddy for the timer
   count. In the patch ("watchdog/hardlockup: add a "cpu" param to
   watchdog_hardlockup_check()") the count was changed to "atomic_t"
   which is backed by an int, so we should match types.

Link: https://lkml.kernel.org/r/20230526184139.6.I006c7d958a1ea5c4e1e4dc44a25596d9bb5fd3ba@changeid
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Suggested-by: Petr Mladek &lt;pmladek@suse.com&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()</title>
<updated>2023-06-19T23:25:27+00:00</updated>
<author>
<name>Douglas Anderson</name>
<email>dianders@chromium.org</email>
</author>
<published>2023-05-27T01:41:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=7a71d8e650b06833095e7a0d4206585e8585c00f'/>
<id>7a71d8e650b06833095e7a0d4206585e8585c00f</id>
<content type='text'>
In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") we started using a cpumask to keep track of
which CPUs to backtrace.  When setting up this cpumask, it's better to use
cpumask_copy() than to just copy the structure directly.  Fix this.

Link: https://lkml.kernel.org/r/20230526184139.4.Iccee2d1ea19114dafb6553a854ea4d8ab2a3f25b@changeid
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Suggested-by: Petr Mladek &lt;pmladek@suse.com&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") we started using a cpumask to keep track of
which CPUs to backtrace.  When setting up this cpumask, it's better to use
cpumask_copy() than to just copy the structure directly.  Fix this.

Link: https://lkml.kernel.org/r/20230526184139.4.Iccee2d1ea19114dafb6553a854ea4d8ab2a3f25b@changeid
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Suggested-by: Petr Mladek &lt;pmladek@suse.com&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()</title>
<updated>2023-06-19T23:25:27+00:00</updated>
<author>
<name>Douglas Anderson</name>
<email>dianders@chromium.org</email>
</author>
<published>2023-05-27T01:41:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.exis.tech/linux.git/commit/?id=2711e4adef4fac2eeaee66e3c22a2f75ee86e7b3'/>
<id>2711e4adef4fac2eeaee66e3c22a2f75ee86e7b3</id>
<content type='text'>
In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") there was no reason to use raw_cpu_ptr(). 
Using this_cpu_ptr() works fine.

Link: https://lkml.kernel.org/r/20230526184139.3.I660e103077dcc23bb29aaf2be09cb234e0495b2d@changeid
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Suggested-by: Petr Mladek &lt;pmladek@suse.com&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") there was no reason to use raw_cpu_ptr(). 
Using this_cpu_ptr() works fine.

Link: https://lkml.kernel.org/r/20230526184139.3.I660e103077dcc23bb29aaf2be09cb234e0495b2d@changeid
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Suggested-by: Petr Mladek &lt;pmladek@suse.com&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
